wiki:ImputationPipeline

Version 40 (modified by a.kanterakis, 13 years ago) (diff)

--

This page describes the Imputation pipelines developed by the GoNL - Impute team. Please contribute. For help with trac wiki formatting: http://trac.edgewall.org/wiki/WikiFormatting
All scripts presented here are located in our SVN repository: http://www.bbmriwiki.nl/svn/imputation
Minutes of our Team Calls: http://www.bbmriwiki.nl/wiki/Imputations/Minutes

Contributors and Teams

  • UMC Groningen: Alexandros Kanterakis alexandros.kanterakis@…

Study data

Reference data

Pre processing

Normalize beagle datasets

  • Location: http://www.bbmriwiki.nl/svn/Imputation/alex/scripts/Normalize_beagle_datasets.ftl
    Takes a list of beagle and marker files and applies the following checks:
  • Checks if the SNPs are compatible. If the compatibility cannot be corrected by SNP inversion then it is discarded.
  • Checks if SNP has null alleles, if so, SNP is removed from study data.
  • Checks if two SNPs with same reference code (rs) are in the same position.
  • Checks if two SNPs in the same position have the same reference code (rs).
  • Checks if a SNP in the study has MAF < MAF_minimum, HWE < HWE_minimum and CR < CR_minimum if any of these criteria are met, the SNP is discarded. (MAF = Minor Allele Frequency, HWE = Hardy Weinberg Equilibrium, CR = Call Rate)

It generates a log file with all inconsistencies found: At the end of this file there is a summary of the problems found:

  • SNPs inverted: For Example A/G SNPs in reference , T/C SNPs in study
  • Allele problems: Number of SNPs with inconsistent alleles in study and in reference that could not be fixed with flipping
  • Position problems (different references, same loci): As it says. These SNPs are NOT removed. We keep the reference (rs number) of the reference panel
  • Unresolved single alleles problems: SNPs in study that have only one allele. These SNPs are filtered out.
  • Double rs codes problems: As it says. This SNPs are filtered out.
  • SNPs in study with MAF < MAF_minimum: SNPs with MAF < MAF_minimum set.
  • SNPs in study with HWE < HWE_minimum: SNPs with HWE < HWE_minimum set.
  • SNPs in study with CR < CR_minimum: SNPs with Call Rate < CR_minimum set
  • SNPs that differ in Allele Frequencies: SNPs with difference in AF between reference and study over CR_minimum set.


Options:

  • input_beagle_study : The study in beagle format
  • input_beagle_reference : The reference in beagle format
  • input_markers_study : The study's markers in beagle format
  • input_markers_reference : The reference's markers in beagle format
  • output_beagle_study : The Normalized output of the study (Use this as "study" for imputation)
  • output_beagle_reference : The Normalized output of the reference (Normally you will not use this file)
  • output_markers_study : The Markers of the normalized study
  • output_markers_reference : The Markers of the normalized reference
  • output_log_filename : the log filename

Imputation software

  • Impute2
  • Beagle
  • Mach / Minimach

Quality metrics

Convert impute2 gprobs to TPED

This method is suitable to convert results from impute2 imputation to TPED. You can define an R2 threshold. The R2 is the allelic R2 according to http://www.sciencedirect.com/science/article/pii/S0002929709000123#sec2.7.2 . You can copy the TFAM from the original study in order to have a complete TPED / TFAM dataset.

Options:

  • input_impute2_gprobs_filename : The gprobs file generated from impute2
  • output_TPED_filename : The output TPED filename
  • output_stats_filename : The file where the R2 estimation will be printed. It will contain ALL the R2 values not only these surpassing the threshold
  • chromosome : The chromosome of this study
  • r2_threshold : The R2 threshold

Complete pipelines

Results

References

  • Brian L. Browning, Sharon R. Browning. A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals. AJHG, Volume 84, Issue 2, 13 February 2009, Pages 210-223. doi:10.1016/j.ajhg.2009.01.005

See also