Table of Contents

  1. Plan for phase 1.

Work plan for quality control, call improvement and phasing (VCF to haplotypes)

Compiled: Yurii Aulchenko, September 12, 2010

As noted by Morris, we have to run our project in two phases; phase 1 (from now till the end of 2010) running with minimal personnel (available at present) and phase 2 (starting ~Jan 2011) based on proper resource plan. It is assumed that pilot VCF data will be available at the end of September; we expect all data be available by the January 2011.

This document aims to provide an overview of the “VCF to haplotypes” work package.

Starting with the end September, when pilot data are available, it is required to build a pipeline for basic post-VCF quality control. This will include two independent sub-projects which may be ran in parallel:

Chip QC project: Crosscheck of the results obtained from BGI with already available GWA scans data.

This work package aims to:

  • Establish custom pipeline for Chip-based QC.
  • Check quality of sequence data.
  • Identify factors affecting quality of sequencing (e.g. batch effects).
  • Establish (preliminary) thresholds of quality metrics maximizing sensitivity and specificity.
  • Using above thresholds, establish the false-positive and false-negative rates for variants discovered in our study (if we do not take trio structure into account).
  • Check if these rates are in agreement with theoretically expected (thus we do not miss any important experimental factor).

Detailedworkflow is summarized in a separate document.

  • Estimated costs for pilot data check and establishing the pipeline:
  • 3 months of BI/data manager/programmer at 1.0 fte + experienced supervisor at 0.1 fte.
  • Suggested timeline: end of September – end of December
  • Depends on: availability of VCF pilot data
  • Other projects depending on this: MendelianQcPipeline (soft), QC’ed data (hard)