Posts by author jlaros
            Proposed Quality Control.
Pre-alignment:
- Duplicate input checking.
 - Quality scores histogram from the BGI.
- Maybe other graphs/data provided by the BGI.
 
 - Quality scores with the FastqC toolkit.
- GC content.
 - Quality scores per base.
 - Quality scores per read.
 - N-content.
 - Length distribution.
 - Over representation of reads.
 - A summary of this data provided by a script.
 
 
Alignment:
- Percentage aligned.
 - Insert size distribution.
 - Coverage.
- Distribution.
 - Visualisation as a wiggle track.
- Intra sample distance calculation of the wiggle tracks.
 
 
 - Mapping quality distribution.
 - Look into Picard Tools.
 
Recalibration:
- Any available statistics from GATK?
 
Variant calling:
- Transition / transversion rate.
 - X, Y coverage (check encoding of sample tags).
 - Mutation rate.
- Autosomal.
 - Y.
 - M.
 
 - Distribution of SNPs found in dnSNP.
 - Indel / substitution rate.
 - Cross check with immuno-chip.
 
In all steps, cross check with the data provided by the BGI.

 rss