1/ Genomics data processing often involves multiple steps. For RNA-seq, we start with raw FASTQ files and progress through quality control, trimming, alignment, and quantification.
2/ First, we run FastQC for quality control. Then, FastP is used to trim adapters. The data is then aligned to the transcriptome using tools like STAR.
3/ After alignment, we quantify gene expression using tools like featureCounts or HTSeq-count. Alternatively, we can use Salmon or Kallisto for alignment-free quantification.
4/ Each step produces a specific data format that must be fed into the next. Managing this flow manually can become chaotic and error-prone, especially with large datasets.
Comments