| 10 | === Normalize beagle datasets === |
| 11 | * Location: http://www.bbmriwiki.nl/svn/Imputation/alex/scripts/Normalize_beagle_datasets.ftl [[BR]] |
| 12 | Takes a list of beagle and marker files and applies the following checks: |
| 13 | * Checks if the SNPs are compatible. If the compatibility cannot be corrected by SNP inversion then it is discarded. |
| 14 | * Checks if SNP has null alleles, if so, SNP is removed from study data. |
| 15 | * Checks if two SNPs with same reference code (rs) are in the same position. |
| 16 | * Checks if two SNPs in the same position have the same reference code (rs). |
| 17 | * Checks if a SNP in the study has MAF < MAF_minimum, HWE < HWE_minimum and CR < CR_minimum if any of these criteria are met, the SNP is discarded. (MAF = Minor Allele Frequency, HWE = Hardy Weinberg Equilibrium, CR = Call Rate) |
| 19 | It generates a log file with all inconsistencies found: At the end of this file there is a summary of the problems found: |
| 20 | * '''SNPs inverted''': For Example A/G SNPs in reference , T/C SNPs in study |
| 21 | * '''Allele problems''': Number of SNPs with inconsistent alleles in study and in reference that could not be fixed with flipping |
| 22 | * '''Position problems (different references, same loci)''': As it says. These SNPs are NOT removed. We keep the reference (rs number) of the reference panel |
| 23 | * '''Unresolved single alleles problems''': SNPs in study that have only one allele. These SNPs are filtered out. |
| 24 | * '''Double rs codes problems''': As it says. This SNPs are filtered out. |
| 25 | * '''SNPs in study with MAF < MAF_minimum''': SNPs with MAF < MAF_minimum set. |
| 26 | * '''SNPs in study with HWE < HWE_minimum''': SNPs with HWE < HWE_minimum set. |
| 27 | * '''SNPs in study with CR < CR_minimum''': SNPs with Call Rate < CR_minimum set |
| 28 | * '''SNPs that differ in Allele Frequencies''': SNPs with difference in AF between reference and study over CR_minimum set. |