Changes between Version 6 and Version 7 of ImputationPipeline


Ignore:
Timestamp:
Oct 7, 2010 2:50:06 PM (12 years ago)
Author:
a.kanterakis
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ImputationPipeline

    v6 v7  
    1717
    1818'' Commands to run locally: ''
    19  1. if the dataset is in binary plink format, use plink --recode to convert back to ped+map)
    20  2. convert dataset to trityper format, if it is in ped+map format.
    21 {{{ 
    22 java -Xmx4g -jar ImputationTool.jar --mode pmtt --in $plinkLocation --out $trityperOutputLocation
     19=== Step 1 ===
     20 * According to Harm-Jan: if the dataset is in binary plink format, use plink —recode to convert back to ped+map
     21 * Transform .bed   ,    .bim   and    .fam   files into ASCII format
     22{{{
     23/Users/alexandroskanterakis/Tools/plink/plink-1.07-mac-intel/plink --bfile /Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05 --ped /Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05.ped --map CD_Finnuncorr.maf05.map --recode
     24
    2325}}}
    24  3. compare the dataset to be imputed to the reference dataset (for example HapMap2 release 24, also in TriTyper format), and remove any snps for which the haplotypes are different, or do not correlate to the reference dataset. Also remove any SNP that is not in the reference. Save the output as Ped+Map
     26 * Produces the files:
    2527{{{
    26 java -Xmx4g -jar ImputationTool.jar ttpmh $trityperOutputLocation $referenceLocation $pedAndMapOutputLocation [$famFile] # supply a famfile, if you have any... it is not required
     28-rw-r--r--   1 alexandroskanterakis  staff    11761120 Oct  5 12:11 plink.map
     29-rw-r--r--   1 alexandroskanterakis  staff  4898208052 Oct  5 12:11 plink.ped
    2730}}}
     31* real Execution time: 26m20.723s
     32* creates rather big files 4.6G  plink.ped
     33
     34
     35=== Step 2 ===
     36
     37 * Convert dataset to trityper format, if it is in ped+map format.
     38{{{
     39java -Xmx4g -jar /Users/alexandroskanterakis/Tools/imputation/ImputationTool/dist/ImputationTool.jar --mode pmtt --in /Users/alexandroskanterakis/Data/Finnish_cohort/ --out /Users/alexandroskanterakis/Data/Finnish_cohort/
     40
     41real    74m52.547s
     42user    38m14.646s
     43sys     5m25.472s
     44
     45}}}
     46 * Files Created:
     47{{{
     48-rw-r--r--   1 alexandroskanterakis  staff  2449071024 Oct  5 15:05 GenotypeMatrix.dat
     49-rw-r--r--   1 alexandroskanterakis  staff       46196 Oct  5 13:54 Individuals.txt
     50-rw-r--r--   1 alexandroskanterakis  staff       98791 Oct  5 13:54 PhenotypeInformation.txt
     51-rw-r--r--   1 alexandroskanterakis  staff    10771996 Oct  5 13:54 SNPMappings.txt
     52-rw-r--r--   1 alexandroskanterakis  staff     5006368 Oct  5 13:54 SNPs.txt
     53}}}
     54
     55=== Step 3 ===
     56 * compare the dataset to be imputed to the reference dataset (for example HapMap?2 release 24, also in TriTyper? format), and remove any snps for which the haplotypes are different, or do not correlate to the reference dataset. Also remove any SNP that is not in the reference. Save the output as Ped+Map
     57{{{
     58java -Xmx4g -jar /Users/alexandroskanterakis/Tools/imputation/ImputationTool/dist/ImputationTool.jar --mode ttpmh --in /Users/alexandroskanterakis/Data/Finnish_cohort/ --hap /Users/alexandroskanterakis/Data/HapMap2-r24-CEU/ --out /Users/alexandroskanterakis/Data/Finnish_cohort/referenceOutput/
     59
     60real    60m53.623s
     61user    30m35.172s
     62sys     2m34.325s
     63}}}
     64
     65 * Created Files (for each chromosome):
     66{{{
     67-rw-r--r--    1 alexandroskanterakis  staff    221184 Oct  5 16:07 chr1.dat
     68-rw-r--r--    1 alexandroskanterakis  staff   1004358 Oct  5 16:06 chr1.excludedsnps.txt
     69-rw-r--r--    1 alexandroskanterakis  staff    442368 Oct  5 16:07 chr1.map
     70-rw-r--r--    1 alexandroskanterakis  staff    441350 Oct  5 16:07 chr1.markersBeagleFormat
     71-rw-r--r--    1 alexandroskanterakis  staff   5802372 Oct  5 16:07 chr1.ped
     72-rw-r--r--    1 alexandroskanterakis  staff    117708 Oct  5 16:06 chr1.warningsnps.txt
     73}}}
     74
     75=== Steps 4-9 ===
     76
    2877 4. split the ped files in batches of 300 samples
    2978{{{