| 1 | [[TOC()]] |
| 2 | = Work plan for quality control, call improvement and phasing (VCF to haplotypes) = |
| 3 | |
| 4 | Compiled: Yurii Aulchenko, September 12, 2010 |
| 5 | |
| 6 | As noted by Morris, we have to run our project in two phases; phase 1 (from now till the end of 2010) running with minimal personnel (available at present) and phase 2 (starting ~Jan 2011) based on proper resource plan. It is assumed that pilot VCF data will be available at the end of September; we expect all data be available by the January 2011. |
| 7 | |
| 8 | This document aims to provide an overview of the “VCF to haplotypes” work package. |
| 9 | |
| 10 | == Plan for phase 1. == |
| 11 | |
| 12 | Starting with the end September, when pilot data are available, it is required to build a pipeline for basic post-VCF quality control. This will include two independent sub-projects which may be ran in parallel: |
| 13 | |
| 14 | '''Chip QC project: '''Crosscheck of the results obtained from BGI with already available GWA scans data. |
| 15 | |
| 16 | This work package ''aims to'': |
| 17 | |
| 18 | * Establish custom pipeline for Chip-based QC. |
| 19 | * Check quality of sequence data. |
| 20 | * Identify factors affecting quality of sequencing (e.g. batch effects). |
| 21 | * Establish (preliminary) thresholds of quality metrics maximizing sensitivity and specificity. |
| 22 | * Using above thresholds, establish the false-positive and false-negative rates for variants discovered in our study (if we do not take trio structure into account). |
| 23 | * Check if these rates are in agreement with theoretically expected (thus we do not miss any important experimental factor). |
| 24 | |
| 25 | ''Detailedworkflow'' is summarized in a separate document. |
| 26 | |
| 27 | * Estimated costs'' for pilot data check and establishing the pipeline: |
| 28 | * 3 months of BI/data manager/programmer at 1.0 fte + experienced supervisor at 0.1 fte. |
| 29 | * Suggested timeline:'' end of September – end of December |
| 30 | * Depends on:'' availability of VCF pilot data |
| 31 | * Other projects depending on this: MendelianQcPipeline (soft), QC’ed data (hard) |