Version 40 (modified by 13 years ago) (diff) | ,
---|
This page describes the Imputation pipelines developed by the GoNL - Impute team. Please contribute. For help with trac wiki formatting: http://trac.edgewall.org/wiki/WikiFormatting
All scripts presented here are located in our SVN repository: http://www.bbmriwiki.nl/svn/imputation
Minutes of our Team Calls: http://www.bbmriwiki.nl/wiki/Imputations/Minutes
Contributors and Teams
- UMC Groningen: Alexandros Kanterakis alexandros.kanterakis@…
Study data
Reference data
Pre processing
Normalize beagle datasets
- Location: http://www.bbmriwiki.nl/svn/Imputation/alex/scripts/Normalize_beagle_datasets.ftl
Takes a list of beagle and marker files and applies the following checks: - Checks if the SNPs are compatible. If the compatibility cannot be corrected by SNP inversion then it is discarded.
- Checks if SNP has null alleles, if so, SNP is removed from study data.
- Checks if two SNPs with same reference code (rs) are in the same position.
- Checks if two SNPs in the same position have the same reference code (rs).
- Checks if a SNP in the study has MAF < MAF_minimum, HWE < HWE_minimum and CR < CR_minimum if any of these criteria are met, the SNP is discarded. (MAF = Minor Allele Frequency, HWE = Hardy Weinberg Equilibrium, CR = Call Rate)
It generates a log file with all inconsistencies found: At the end of this file there is a summary of the problems found:
- SNPs inverted: For Example A/G SNPs in reference , T/C SNPs in study
- Allele problems: Number of SNPs with inconsistent alleles in study and in reference that could not be fixed with flipping
- Position problems (different references, same loci): As it says. These SNPs are NOT removed. We keep the reference (rs number) of the reference panel
- Unresolved single alleles problems: SNPs in study that have only one allele. These SNPs are filtered out.
- Double rs codes problems: As it says. This SNPs are filtered out.
- SNPs in study with MAF < MAF_minimum: SNPs with MAF < MAF_minimum set.
- SNPs in study with HWE < HWE_minimum: SNPs with HWE < HWE_minimum set.
- SNPs in study with CR < CR_minimum: SNPs with Call Rate < CR_minimum set
- SNPs that differ in Allele Frequencies: SNPs with difference in AF between reference and study over CR_minimum set.
Options:
- input_beagle_study : The study in beagle format
- input_beagle_reference : The reference in beagle format
- input_markers_study : The study's markers in beagle format
- input_markers_reference : The reference's markers in beagle format
- output_beagle_study : The Normalized output of the study (Use this as "study" for imputation)
- output_beagle_reference : The Normalized output of the reference (Normally you will not use this file)
- output_markers_study : The Markers of the normalized study
- output_markers_reference : The Markers of the normalized reference
- output_log_filename : the log filename
Imputation software
- Impute2
- Beagle
- Mach / Minimach
Quality metrics
Convert impute2 gprobs to TPED
- Location: http://www.bbmriwiki.nl/svn/Imputation/alex/scripts/Convert_impute2_gprobs_to_PEDMAP_beagle.ftl
This method is suitable to convert results from impute2 imputation to TPED. You can define an R2 threshold. The R2 is the allelic R2 according to http://www.sciencedirect.com/science/article/pii/S0002929709000123#sec2.7.2 . You can copy the TFAM from the original study in order to have a complete TPED / TFAM dataset.
Options:
- input_impute2_gprobs_filename : The gprobs file generated from impute2
- output_TPED_filename : The output TPED filename
- output_stats_filename : The file where the R2 estimation will be printed. It will contain ALL the R2 values not only these surpassing the threshold
- chromosome : The chromosome of this study
- r2_threshold : The R2 threshold
Complete pipelines
Results
References
- Brian L. Browning, Sharon R. Browning. A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals. AJHG, Volume 84, Issue 2, 13 February 2009, Pages 210-223. doi:10.1016/j.ajhg.2009.01.005
See also
- An older version of the imputation pipeline developed mainly by Harm-Jan and Lude Franke: ImputationPipeline_old it uses the ImputationTool for study / reference normalization.
- SVN repository: http://www.bbmriwiki.nl/svn/imputation
- http://gettinggeneticsdone.blogspot.com/2010/04/probabel-r-package-for-gwas-data.html ProbABEL - R package for GWAS data imputation. http://www.biomedcentral.com/1471-2105/11/134 http://mga.bionet.nsc.ru/~yurii/ABEL/GenABEL/ an R library for Genome-wide association analysis.
- MACH: http://www.sph.umich.edu/csg/abecasis/MACH/
- Impute2: http://mathgen.stats.ox.ac.uk/impute/impute_v2.html