| | 108 | |
| | 109 | == Quality Control == |
| | 110 | The current important values discussed for the quality control along with their thresholds are the following: |
| | 111 | * RawData |
| | 112 | ** FastQC report (per mate of the pair) |
| | 113 | *** Manual look at files and check: |
| | 114 | **** Avg Quality per read > 30 |
| | 115 | **** Num sequences ~60Mio |
| | 116 | **** Sequence quality should look OK |
| | 117 | * Alignment (per lane) |
| | 118 | ** Picard Alignment Summary Metrics |
| | 119 | *** %Purified reads aligned > 90% |
| | 120 | *** Purified High Quality Error Rate < 1% |
| | 121 | *** Purified reads aligned > 150Mio |
| | 122 | ** Picard GC Bias Metrics |
| | 123 | *** GC Curve should look OK |
| | 124 | *** Median GC% windows between 30 and 40 |
| | 125 | *** Avg Mean Base Quality should be OK |
| | 126 | ** Picard Insertsize Metrics |
| | 127 | *** Peak should be ~500 |
| | 128 | *** Peak should be narrow |
| | 129 | *** Should have few outliers |
| | 130 | ** Picard BAM Index Stats |
| | 131 | *** Should be uniform by Chromosome |
| | 132 | ** GATK or Picard (currently testing) Coverage Metrics |
| | 133 | *** Should correspond to a Poisson curve with peak at 12x |
| | 134 | ** Picard Mark Duplicates |
| | 135 | *** %duplicates between 5% and 8% |
| | 136 | * Recalibration |
| | 137 | ** GATK Analyze Covariate |
| | 138 | *** No output currently; should revisit when working |
| | 139 | ** Picard Quality by Cycle |
| | 140 | *** To be determined once data is produced |
| | 141 | ** Picard Quality Distribution |
| | 142 | *** To be determined once data is produced |
| | 143 | * Initial SNP Calling |
| | 144 | ** To be determined once data is produced and analyzed. A first basis for it should be derived from the difference between chipdata and sequence data and the %of SNPs found in dbSNP. |
| | 145 | |