Changes between Version 2 and Version 3 of ImputationPipeline


Ignore:
Timestamp:
Sep 13, 2010 4:37:29 PM (14 years ago)
Author:
a.kanterakis
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ImputationPipeline

    v2 v3  
    1010TODO: describe the protocols here;
    1111
     12== Description from Harm-Jan ==
     13
     14The imputation pipeline has changed, in such a way that it was reduced to only a few steps. To facilitate QC and conversion steps, I've bundled our conversion tools in one single program called ImputationTool.jar. 
     15
     16Here, I shortly describe the steps that need to be in the new pipeline, in placeholders I also describe what the commands could look like, if you would implement this in a shellscript (or java program). These examples can be the complete execution steps of the pipeline.
     17
     18'' Commands to run locally: ''
     19 1. if the dataset is in binary plink format, use plink --recode to convert back to ped+map)
     20 2. convert dataset to trityper format, if it is in ped+map format.
     21{{{ 
     22java -Xmx4g -jar ImputationTool.jar pmtt $plinkLocation $trityperOutputLocation
     23}}}
     24 3. compare the dataset to be imputed to the reference dataset (for example HapMap2 release 24, also in TriTyper format), and remove any snps for which the haplotypes are different, or do not correlate to the reference dataset. Also remove any SNP that is not in the reference. Save the output as Ped+Map
     25{{{
     26java -Xmx4g -jar ImputationTool.jar ttpmh $trityperOutputLocation $referenceLocation $pedAndMapOutputLocation [$famFile] # supply a famfile, if you have any... it is not required
     27}}}
     28 4. split the ped files in batches of 300 samples
     29{{{
     30  * mkdir -p ".$datasetLocation."/batches/
     31  * split -a2 -l$batchSize $pedAndMapOutputLocation $batchOutputLocation
     32}}}
     33 5. run linkage2beagle to convert the ped and map files to beagle format
     34{{{
     35for each batch
     36do
     37      java -Xmx7g -jar linkage2beagle.jar data=$batchOutputLocation/chr$chromosome.dat pedigree=$batchOutputLocation/chr$chromosome.ped.$batch  beagle=$beagleLocation/chr$chromosome.bgl.$batch
     38done
     39}}}
     40
     41'' Commands to run in server: ''
     42 6. run the actual imputation on the batches on the cluster (needs hapmap to be recoded to beagle format as well, but I have these files for you)
     43{{{
     44for each batch
     45do
     46        java -Xmx11g -Djava.io.tmpdir=\$TMPDIR -jar beagle.jar unphased=$beagleLocation/chr$chromosome.bgl.$batch phased=$referenceLocation/HM2_Chr$chromosome-BEAGLE markers=$referenceLocation/markers_Chr$chromosome.txt missing=0 out=$outputLocation/Chr$chromosome/chr$chromosome-$batch
     47done
     48}}}
     49
     50'' Commands to run locally: ''
     51 7. convert the beagle imputed files into trityper format
     52{{{
     53java -Xmx4g -jar ImputationTool.jar bttb $outputLocation Chr/ChrCHROMOSOME-BATCH $imputedTriTyperLocation $numSamples   
     54}}}
     55 8. correlate the imputed snps to the snps in the original dataset
     56{{{
     57java -Xmx4g -jar ImputationTool.jar corr $trityperOutputLocation $datasetName $imputedTriTyperLocation $imputedDatasetName
     58}}}
     59 9. (if needed) convert to other formats (plink dosage / ped+map))
     60
     61That's basically it. A lot simpler than the previous version, don't you think? The required tool is attached to this e-mail, but might still be a bit buggish. Any recommendations are therefore more than welcome.
     62
    1263== IMPUTE pipeline ==
    1364