wiki:SnpCallingPipeline

Version 42 (modified by Morris Swertz, 14 years ago) (diff)

--

SNP calling pipeline

Status: Alpha

Authors: Freerk van Dijk, Morris Swertz

Based on Broad GATK pipeline.

To perform the analysis as fast and good as possible the pipeline has been divided into several small processes. These processes are all numbered and can be found below, including commands, input and output files starting with pre-alignment and ending with variation calling & filtering.

Simplified Overview

This simplified overview this schema hides intermediate sort and indexing steps and only shows data inputs/outputs first time they occur.

Error: Failed to load processor graphviz
No macro or processor named 'graphviz' found

Workflow 1: genome reference file creation

This workflow creates reference files per chromosome including:

  • genome, dbsnp and indel vcfs per chromosome
  • realign targets for faster realignment target creation
  • index files for samtools and bwa

Workflow inputs:

  • genome.chr.fa - downloaded from genome supplier (now hg19)
  • dbsnpXYZ.rod - downloaded reference SNPs from dbsnp (now 129)
  • indelsXYZ.vcf - downloaded reference indels from 1KG

Workflow outputs:

  • genome.chr.fa - cleaned headers
  • genome.chr.fa.fa - index for samtools
  • genome.chr.fa.<format> - multilple index files for bwa
  • dbsnpXYZ.chr.rod - split per chromosome
  • indelsXYZ.chr.vcf - split per chromosome
  • genome.chr.realign.intervals - targets for realignment

clean-fasta-headers

Clean headers to only have '1' instead of Chr1, etc

tool:
inputs: genome.chr.fa
outputs: genome.chr.fa
doc: internally developed

split-vcf-chr for dbsnp and indels

Split vcf per chromosome

tool:
inputs: dbsnpXYZ.rod, indelsXYZ.vcf
outputs: dbsnpXYz.chr.rod, indelsXYZ.vcf
doc:

Discussion:

Can we use http://vcftools.sourceforge.net/options.html ?

vcftools --vcf indelsXYZ.vcf --chr <i> --recode --out indelsXYZ.chr

index-chromosomes

Index reference sequence for each chromosome in the FASTA format

tool: samtools faidx
input: genome.chr.fa
output: genome.chr.fa.fai
doc: http://samtools.sourceforge.net/samtools.shtml#3

bwa-index-chromosomes

Index reference sequence for each chromosome for bwa alignment

tool: bwa index -a IS
input: genome.chr.fa
output: genome.chr.fa.xyz
doc: http://bio-bwa.sourceforge.net/bwa.shtml#3

RealignerTargetCreator

Generate realignment targets for known sites for each chromosome

tool: GenomeAnalysisTK.jar -T RealignerTargetCreator?
input: genome.chr.fa, dbsnpXYz.chr.rod, indelsXYZ.vcf
output: genome.chr.realign.intervals
doc: http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels#Running_the_Indel_Realigner_only_at_known_sites

Attachments (3)

Download all attachments as: .zip