[[TOC()]] = Work plan for quality control, call improvement and phasing (VCF to haplotypes) = Compiled: Yurii Aulchenko, September 12, 2010 As noted by Morris, we have to run our project in two phases; phase 1 (from now till the end of 2010) running with minimal personnel (available at present) and phase 2 (starting ~Jan 2011) based on proper resource plan. It is assumed that pilot VCF data will be available at the end of September; we expect all data be available by the January 2011. This document aims to provide an overview of the “VCF to haplotypes” work package. == Plan for phase 1. == Starting with the end September, when pilot data are available, it is required to build a pipeline for basic post-VCF quality control. This will include two independent sub-projects which may be ran in parallel: '''Chip QC project: '''Crosscheck of the results obtained from BGI with already available GWA scans data. This work package ''aims to'': * Establish custom pipeline for Chip-based QC. * Check quality of sequence data. * Identify factors affecting quality of sequencing (e.g. batch effects). * Establish (preliminary) thresholds of quality metrics maximizing sensitivity and specificity. * Using above thresholds, establish the false-positive and false-negative rates for variants discovered in our study (if we do not take trio structure into account). * Check if these rates are in agreement with theoretically expected (thus we do not miss any important experimental factor). ''Detailedworkflow'' is summarized in a separate document. * Estimated costs'' for pilot data check and establishing the pipeline: * 3 months of BI/data manager/programmer at 1.0 fte + experienced supervisor at 0.1 fte. * Suggested timeline:'' end of September – end of December * Depends on:'' availability of VCF pilot data * Other projects depending on this: MendelianQcPipeline (soft), QC’ed data (hard)