[[TOC()]] = Coverage Analysis Pipeline = TODO. Suggested parties to take this up: Antoine van Kampen, Barbera van Schaik, Silvia D Olabarriaga, Mark Santcroos, AMC = Workflows = == Create grid directory and change permissions == [[Image(CreateGridDirectory.png)]] * Creates a directory on the LFC * Changes the permissions such that it is in-accessible to the group and others == Create a BWA index on database == [[Image(bwaIndexDatabase.png, 50%)]] Gunzip fasta file. Build BWA index. Tar-gzip the results. == Split fastq file == [[Image(splitFastq.png, 50%)]] Splits a large fastq file (gzipped) into several smaller files with the unix command 'split'. The results are uploaded to the directory that is specified in 'gridOutputDir' == Alignment with BWA on each split file == [[Image(BWAparam.png, 50%)]] Runs BWA with adjustable parameter settings. * Matches sequence reads to a reference database * Convert sai to sam * Convert sam to bam * Sort bam file * Index sorted bam file * Tar-gzip all results. Also the intermediate files == Merge bam files == [[Image(MergeIndexSNPcall.png, 50%)]] * Downloads all bai, bam, sam and tar.gz files from the gridInputDirectory * Gunzip tar the tar.gz files if they are present * Gunzip the reference file (fasta format) * Merge all _sorted.bam files * Build index on this merged file * Call SNPs and make selection. Output in pileup format. * Convert pileup format to bed format == SNP calling with varscan, determine coverage == [[Image(Coverage_Varscan_BaseCoverage.png)]] * Creates a pileup file (with samtools pileup -f) Sends the output to Varscan. Calls SNPs, indels and copy number variations. * Calculates coverage per 50kbp * Calculates coverage per base