Proposed Quality Control.
Pre-alignment:
- Duplicate input checking.
- Quality scores histogram from the BGI.
- Maybe other graphs/data provided by the BGI.
- Quality scores with the FastqC toolkit.
- GC content.
- Quality scores per base.
- Quality scores per read.
- N-content.
- Length distribution.
- Over representation of reads.
- A summary of this data provided by a script.
Alignment:
- Percentage aligned.
- Insert size distribution.
- Coverage.
- Distribution.
- Visualisation as a wiggle track.
- Intra sample distance calculation of the wiggle tracks.
- Mapping quality distribution.
- Look into Picard Tools.
Recalibration:
- Any available statistics from GATK?
Variant calling:
- Transition / transversion rate.
- X, Y coverage (check encoding of sample tags).
- Mutation rate.
- Autosomal.
- Y.
- M.
- Distribution of SNPs found in dnSNP.
- Indel / substitution rate.
- Cross check with immuno-chip.
In all steps, cross check with the data provided by the BGI.
Comments
No comments.