= SNP calling pipeline = Status: Alpha Authors: Freerk van Dijk, Morris Swertz This is the documentation of the BBMRI-NL snp calling pipeline based on the [http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit Broad GATK]. It consists of the following three workflows: * Workflow 1: SnpCallingPipeline/ReferencePreparation * Workflow 2: SnpCallingPipeline/AlignmentAndCleaning * Workflow 3: SnpCallingPipeline/VariantCalling == Schematic Overview == This simplified overview this schema hides intermediate sort and indexing steps and only shows data inputs/outputs first time they occur. {{{#!graphviz digraph g { size="10,10" node [shape=box,style=filled,color=white] "dbsnp" "reference.fasta" "realign.intervals" "indelcalls.vcf" "chr[1-24].fasta" "flowcell_lane.1.fq.gz" "flowcell_lane.2.fq.gz" "flowcell_lane.aligned.bam" "flowcell_lane2.aligned.bam" "flowcell_lane3.aligned.bam" "sample.aligned.bam" "sample QC reports" "sample_chr[1-24].vcf" node [shape=ellipse,color=yellow] subgraph cluster_0 { style=filled; color=lightgrey; "reference.fasta" -> RealignerTargetCreator -> "realign.intervals" "indelcalls.vcf"-> RealignerTargetCreator "reference.fasta"->Split->"chr[1-24].fasta" dbsnp -> RealignerTargetCreator label = "Per genome (1)"; } subgraph cluster_1 { style=filled; color=lightgrey; "flowcell_lane.1.fq.gz" -> align1 -> alignPE "chr[1-24].fasta" -> align1 "chr[1-24].fasta" -> align2 "chr[1-24].fasta" -> alignPE "flowcell_lane.2.fq.gz" -> align2 -> alignPE -> MarkDuplicates -> "IndelRealigner & \n FixMateInformation (knownsOnly)" ->"Quality Recalibration"->"flowcell_lane.aligned.bam" "realign.intervals" -> "IndelRealigner & \n FixMateInformation (knownsOnly)" label = "Per Lane*Chromosome (750*3*24=54k) "; } subgraph cluster_2 { style=filled; color=lightgrey; "flowcell_lane.aligned.bam" -> Merge -> "sample.aligned.bam" -> "IndelRealigner & FixMateInformation" "flowcell_lane2.aligned.bam" -> Merge "flowcell_lane3.aligned.bam" -> Merge "IndelRealigner & FixMateInformation" -> IndelGenotyperV2 -> FilterSingleCalls -> UnifiedGenotyper -> Filtration -> VariantEval -> "sample QC reports" Filtration -> "sample_chr[1-24].vcf" label = "Per Sample or Trio*Chromosome (750*24=18k)"; } } }}} == List of steps == [[TOC(SnpCallingPipeline/ReferencePreparation,SnpCallingPipeline/AlignmentAndCleaning,SnpCallingPipeline/VariantCalling,inline,noheading)]]