| | 1 | = Workflow 2: Alignment per Lane, per Chr = |
| | 2 | [[TOC()]] |
| | 3 | |
| | 4 | This workflow aligns reads per lane and chromosome, including: |
| | 5 | * re-alignment to prevend false SNP calls caused by indels (using known indels) |
| | 6 | * markduplicates to prevend false coverage caused by PCR errors (per library = lane) |
| | 7 | * base quality recalibration to correct for false low scores caused by true variation |
| | 8 | |
| | 9 | Workflow Inputs: |
| | 10 | * lane.1.fq.gz - raw reads for lane, pair end 1 |
| | 11 | * lane.2.fq.gz - raw reads for lane, pair end 2 |
| | 12 | * genome.chr.fasta - reference genome split on chromosome |
| | 13 | * genome.chr.realign.intervals - targets for realignment per chromosome |
| | 14 | * genome.chr.dbsnpXYZ.rod - known snp variants, here from dpbsnp |
| | 15 | * genome.chr.indelsXYZ.vcf - known indels from, here from 1KG |
| | 16 | |
| | 17 | Workflow ouputs: |
| | 18 | * lane.chr.1.sai - alignment index for first pair |
| | 19 | * lane.chr.2.sai - alignment index for second pair |
| | 20 | * lane.chr.sam - alignment map for |
| | 21 | * lane.chr.bam - alignment map in binary format |
| | 22 | * lane.chr.sorted.bam - sorted alignment map |
| | 23 | * lane.chr.sorted.bai - sorted alignment index |
| | 24 | * lane.chr.dedup.bam - marked duplicate PCR elements |
| | 25 | * lane.chr.dedup.metrics - metrics describing deduplication |
| | 26 | * lane.chr.realigned.bam - realigned based on known indels |
| | 27 | * lane.chr.matefixed.bam - fixed the mate pair ends |
| | 28 | * lane.chr.covariate_table.csv - table of countcovariates output for recalibration |
| | 29 | * lane.chr.recal.bam - alignment map with recalibrated quality scores |
| | 30 | |
| | 31 | == align == |
| | 32 | Align each end of paired end. |
| | 33 | |
| | 34 | ||tool: ||bwa-align || |
| | 35 | ||input: ||chr.fasta, lane.1.fq.gz, lane.2.fq.gz || |
| | 36 | ||output: ||lane.chr.1.sai, lane.chr.2.sai || |
| | 37 | ||docs: ||http://bio-bwa.sourceforge.net/bwa.shtml || |
| | 38 | |
| | 39 | == align-pe == |
| | 40 | Align the pairs as one |
| | 41 | |
| | 42 | ||tool: ||bwa sampe || |
| | 43 | ||inputs: ||chr.fasta [[BR]] lane.1.fq.gz [[BR]] lane.2.fq.gz [[BR]] lane.chr.1.sai [[BR]] lane.chr.2.sai || |
| | 44 | ||outputs: ||lane.chr.sam || |
| | 45 | ||docs: ||http://bio-bwa.sourceforge.net/bwa.shtml || |
| | 46 | |
| | 47 | == sam-to-bam == |
| | 48 | Convert sam to bam |
| | 49 | |
| | 50 | ||tool: ||samtools view || |
| | 51 | ||inputs: ||lane.chr.sam || |
| | 52 | ||outputs: ||lane.chr.bam || |
| | 53 | ||docs: ||http://samtools.sourceforge.net/samtools.shtml || |
| | 54 | |
| | 55 | (Question: can this not index and sort?) |
| | 56 | |
| | 57 | == sam-sort == |
| | 58 | Sort bam file on coordinate |
| | 59 | |
| | 60 | ||tool: ||samtools sort || |
| | 61 | ||inputs: ||lane.chr.bam || |
| | 62 | ||outputs: ||lane.chr.sorted.bam || |
| | 63 | ||docs: ||http://samtools.sourceforge.net/samtools.shtml || |
| | 64 | |
| | 65 | == sam-index == |
| | 66 | Index bam file for quicker access |
| | 67 | |
| | 68 | ||tool: ||samtools index || |
| | 69 | ||inputs: ||lane.chr.sorted.bam || |
| | 70 | ||outputs: ||lane.chr.sorted.bai || |
| | 71 | ||docs: ||http://samtools.sourceforge.net/samtools.shtml || |
| | 72 | |
| | 73 | == !MarkDuplicates == |
| | 74 | Mark duplicate PCR fragments to be filtered in analysis |
| | 75 | |
| | 76 | ||tool: ||MarkDuplicates.jar || |
| | 77 | ||inputs: ||lane.chr.sorted.bam || |
| | 78 | ||outputs: ||lane.chr.dedup.bam [[BR]] lane.chr.dedup.metrics || |
| | 79 | ||docs: ||http://picard.sourceforge.net/command-line-overview.shtml#MarkDuplicates || |
| | 80 | |
| | 81 | == !IndelRealigner-!KnownsOnly == |
| | 82 | Improve the alignment using known indel information (will reduce false SNP calls) |
| | 83 | |
| | 84 | ||tool: ||GenomeAnalysisTK.jar -T IndelRealigner || |
| | 85 | ||inputs: ||lane.chr.dedup.bam [[BR]] genome.chr.realign.intervals [[BR]] genome.chr.dbsnpXYZ.rod [[BR]] genome.chr.indelsXYZ.vcf || |
| | 86 | ||outputs: ||lane.chr.realigned.bam || |
| | 87 | ||docs ||http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels#Running_the_Indel_Realigner_only_at_known_sites || |
| | 88 | |
| | 89 | == !FixMateInformation == |
| | 90 | Fix the paired end information as consequence of the realignment. |
| | 91 | |
| | 92 | ||tool: ||FixMateInformation.jar || |
| | 93 | ||inputs: ||lane.chr.realigned.bam |
| | 94 | ||outputs: ||lane.chr.matefixed.bam || |
| | 95 | ||docs: ||http://picard.sourceforge.net/command-line-overview.shtml#FixMateInformation, |
| | 96 | |
| | 97 | http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels#Fixing_Mate_Pairs || |
| | 98 | |
| | 99 | == !CountCovariates == |
| | 100 | Count covariants, such as machine cycle and bp position, to be used as basis for quality recalibration. |
| | 101 | Optionally: plot the results to pdf using AnalyzeCovariates |
| | 102 | |
| | 103 | ||tool: ||GenomeAnalysisTK.jar -T CountCovariates, AnalyzeCovariates.jar || |
| | 104 | ||inputs: ||lane.chr.matefixed.bam [[BR]] genome.chr.dbsnpXYZ.rod || |
| | 105 | ||outputs: ||lane.chr.covariate_table.csv || |
| | 106 | ||docs: ||http://www.broadinstitute.org/gsa/wiki/index.php/Base_quality_score_recalibration#CountCovariates [[BR]] |
| | 107 | |
| | 108 | http://www.broadinstitute.org/gsa/wiki/index.php/Base_quality_score_recalibration#AnalyzeCovariates.jar || |
| | 109 | |
| | 110 | == !TableRecalibration == |
| | 111 | Recalibrate quality scores based on the covariate table |
| | 112 | ||tool: ||GenomeAnalysisTK.jar -T TableRecalibration || |
| | 113 | ||inputs: ||lane.chr.matefixed.bam [[BR]]lanec.chr.recal_table.csv [[BR]]chr.fasta || |
| | 114 | ||outputs: ||lane.chr.recal.bam |
| | 115 | ||docs: ||http://www.broadinstitute.org/gsa/wiki/index.php/Base_quality_score_recalibration#TableRecalibration || |
| | 116 | |
| | 117 | == Repeat: sam-sort, sam-index, countcovariates == |
| | 118 | See steps above for commands and docs. |
| | 119 | |
| | 120 | ||inputs: ||lane.chr.recal.bam || |
| | 121 | ||outputs: ||lane.chr.recal.sorted.bam, lane.chr.recal.sorted.bam.bai, lane.chr.recal.covariate_table.csv || |
| | 122 | |
| | 123 | Discussion: |
| | 124 | > wy do we need to sort and index after recalibration? does it mess up the order of things? |