21 | | |
22 | | === Implemented workflow components === |
23 | | |
24 | | This list of workflow components are already available. We can expand it with Pindel and (parts of) the GATK pipeline. |
25 | | |
26 | | * Splitting of fastq files |
27 | | * Building a BWA index on the genome sequence (base space and color space) |
28 | | * BWA for shotgun reads (base space and color space) It is possible to do parameter sweeps. Output is in bam format |
29 | | * Merge bam results |
30 | | * Samtools pileup |
31 | | * Varscan (pileup to snp, indel and cns) |
32 | | * Bam2coverage creates a UCSC wiggle file to display the genome coverage (per 50kbp) |
33 | | * Coverage-per-base determines the coverage for every base in the genome and it summarizes the results (coverage versus frequency) |
34 | | * Annovar (works for hg18, working on other assemblies). This is a pipeline to annotate variants (gene, dbsnp, hapmap, 1000g, conservation, etc) |
35 | | * FastqC |
36 | | |
37 | | === Implemented components of the Groningen pipeline === |
38 | | |
39 | | Template (grid component) |
40 | | |
41 | | ==== Alignment, realignment, recalibration, stats ==== |
42 | | * pe0--fastqc.ftl (FastqToFastQC, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/quality/Workflow/FastqToFastQC.gwendia) |
43 | | * pe00-bwa-align-pair1.ftl (!BwaIllumina, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/BwaIllumina.gwendia) |
44 | | * pe01-bwa-align-pair2.ftl (!BwaIllumina, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/BwaIllumina.gwendia) |
45 | | * pe02-bwa-sampe.ftl (!BwaIllumina, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/BwaIllumina.gwendia) |
46 | | * pe03-sam-to-bam.ftl (!BwaIllumina, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/BwaIllumina.gwendia) |
47 | | * pe04a-!HsMetrics.ftl (!CalculateHsMetrics, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/CalculateHsMetrics.gwendia) |
48 | | * pe04b-picardQC.ftl (PicardQC, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/PicardQC.gwendia) |
49 | | * pe04-sam-sort.ftl (!SamSort, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/SamSort.gwendia) |
50 | | * pe05-mark-duplicates.ftl (!MarkDuplicates, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/MarkDuplicates.gwendia) |
51 | | * pe06-realign.ftl (!ReAlign, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/ReAlign.gwendia) |
52 | | * pe07-fixmates.ftl (!FixMates, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/FixMates.gwendia) |
53 | | * pe08-covariates-before.ftl (!GatkCovariates, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/GatkCovariates.gwendia) |
54 | | * pe09-recalibrate.ftl (!GatkRecalibrate, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/GatkRecalibrate.gwendia) |
55 | | * pe10-sam-sort.ftl (!SamSort, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/SamSort.gwendia) |
56 | | * pe11-covariates-after.ftl (!GatkCovariates, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/GatkCovariates.gwendia) |
57 | | * pe12-analyze-covariates.ftl (!GatkAnalyzeCovariates, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/GatkAnalyzeCovariates.gwendia) |
58 | | |
59 | | ==== Merge bam per sample and perform SNP and indel calling ==== |
60 | | * vc00a-unified-genotyper.ftl '''to do''' |
61 | | * vc00b-variant-filtration.ftl '''to do''' |
62 | | * vc00c-variant-eval.ftl '''to do''' |
63 | | * vc00d-picardMetrics.ftl (PicardQC, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/PicardQC.gwendia) |
64 | | * vc00-merge.ftl '''to do''' |
65 | | * vc00.merge.ftl '''to do''' |
66 | | * vc01-coverage.ftl '''to do''' |
67 | | * vc01.unified_genotyper.ftl '''to do''' |
68 | | * vc02.picardQC.ftl (PicardQC, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/PicardQC.gwendia) |
69 | | * vc02-realigner-target-creator.ftl '''to do''' |
70 | | * vc03.coverage.ftl '''to do''' |
71 | | * vc03-realign.ftl (!ReAlign, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/ReAlign.gwendia) |
72 | | * vc04-fixmates.ftl (!FixMates, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/FixMates.gwendia) |
73 | | * vc05-indel-genotyper-v2.ftl '''to do''' |
74 | | * vc06-filter-indels.ftl '''to do''' |
75 | | * vc07-unified-genotyper.ftl '''to do''' |
76 | | * vc08-make-indel-mask.ftl '''to do''' |
77 | | * vc09-variant-filtration.ftl '''to do''' |
78 | | * vc10-variant-eval.ftl '''to do''' |
79 | | * vc11-name-sort-bam.ftl (!SamSort, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/gvnl/Workflows/SamSort.gwendia) |
80 | | * Pindel (Pindel, lfn://lfc.grid.sara.nl:5010/grid/vlemed/AMC-e-BioScience/Sequence_WF/pindel/Workflows/Pindel.gwendia) |
124 | | == Workflow execution / progress data analysis == |
125 | | |
126 | | '''First alignment step - running''' |
127 | | || '''#''' || '''Sample''' || '''WF''' || '''Status''' || '''Start''' || |
128 | | || 1 || A4a || [http://orange.ebioscience.amc.nl/workflows/workflow-693426f3/html/workflow-693426f3.html F] [http://orange.ebioscience.amc.nl/workflows/workflow-35a9b777/html/workflow-35a9b777.html F] [http://orange.ebioscience.amc.nl/workflows/workflow-4ba3f651/html/workflow-4ba3f651.html F] [http://orange.ebioscience.amc.nl/workflows/workflow-425d9ceb/html/workflow-425d9ceb.html workflow-425d9ceb] || done || |
129 | | || 2 || Vartest || [http://orange.ebioscience.amc.nl/workflows/workflow-490b15f8/html/workflow-490b15f8.html workflow-490b15f8] || done || |
130 | | || 3 || Iteration test || [http://orange.ebioscience.amc.nl/workflows/workflow-bf48aff1/html/workflow-bf48aff1.html workflow-bf48aff1] || failed || |
131 | | || 4 || Iteration test || [http://orange.ebioscience.amc.nl/workflows/workflow-923c6588/html/workflow-923c6588.html workflow-923c6588] || done || |
132 | | || 5 || 60-samples-batch (15 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-d80b5767/html/workflow-d80b5767.html workflow-d80b5767] || 10 / 15 done || 11-02-2011 19:30 || |
133 | | || 6 || 60-samples-batch A (55 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-cbaca6e5/html/workflow-cbaca6e5.html workflow-cbaca6e5] || 15 / 55 done || 12-02-2011 13:55 || |
134 | | || 7 || 60-samples-batch A remaining 1 (17 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-835250b1/html/workflow-835250b1.html workflow-835250b1] || failed (grid very busy) || 07-03-2011 17:45 || |
135 | | || 8 || 60-samples-batch A remaining 1 (17 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-31eb952d/html/workflow-31eb952d.html workflow-31eb952d] || 1/17 done || 08-03-2011 14:15 || |
136 | | || 9 || 60-samples-batch A remaining (27 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-fd98c7c8/html/workflow-fd98c7c8.html workflow-fd98c7c8] || 11/27 done || 15-03-2011 10:45 || |
137 | | || 10 || 60-samples-batch G (27/54 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-a781209c/html/workflow-a781209c.html workflow-a781209c] || 3/27 done || 15-03-2011 20:37 || |
138 | | || 11 || second-batch R10-11-12 (27 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-6fe1cb10/html/workflow-6fe1cb10.html workflow-6fe1cb10] || 6/27 done || 16-03-2011 10:57 || |
139 | | || 12 || second-batch R13-14-15 (27 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-77b82882/html/workflow-77b82882.html workflow-77b82882] || 3/27 done || 18-03-2011 19:07 || |
140 | | || 13 || second-batch R16-17 (24 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-18ec0160/html/workflow-18ec0160.html workflow-18ec0160] || 7/24 done || 19-03-2011 11:00 || |
141 | | || 14 || second-batch R18 (11 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-7596a676/html/workflow-7596a676.html workflow-7596a676] || 2/11 done || 19-03-2011 18:33 || |
142 | | || 15 || second-batch R19-20-21 (31 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-58ea16f1/html/workflow-58ea16f1.html workflow-58ea16f1] || 4 done, scheduled downtime, rest will fail || 20-03-2011 11:50 || |
143 | | || 16 || second-batch R22-8-9 (altered submission scheme: submits 1 job/5 min) || [http://orange.ebioscience.amc.nl/workflows/workflow-ea6b36a6/html/workflow-ea6b36a6.html workflow-ea6b36a6] || 4 done, scheduled downtime, rest will fail || 20-03-2011 12:37 || |
144 | | || 17 || second-batch R10-R17 A9-23 (206 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-7d2b2b5b/html/workflow-7d2b2b5b.html workflow-7d2b2b5b] || 5 done, scheduled downtime, rest will fail || 21-03-2011 11:59 || |
145 | | || 18 || second-batch (244 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-c902ebf3/html/workflow-c902ebf3.html workflow-c902ebf3] || This WF is cancelled, because we only want to run these jobs on Gina and HTC || 23-03-2011 09:48 || |
146 | | || 19 || second-batch (237 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-62e0f254/html/workflow-62e0f254.html workflow-62e0f254] || 135 done || 25-03-2011 12:20 || |
147 | | || 20 || second-batch (102 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-b00af623/html/workflow-b00af623.html workflow-b00af623] || 33 done || 31-03-2011 08:21 || |
148 | | || 21 || second-batch (69 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-d5709592/html/workflow-d5709592.html workflow-d5709592] || 4 done || 04-04-2011 18:17 || |
149 | | || 22 || second-batch (65 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-7d8743c1/html/workflow-7d8743c1.html workflow-7d8743c1] || 25 done || 06-04-2011 23:24 || |
150 | | || 23 || second-batch (40 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-d8cdf017/html/workflow-d8cdf017.html workflow-d8cdf017] || 16 done || 15-04-2011 18:07 || |
151 | | || 24 || second-batch (24 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-177917d4/html/workflow-177917d4.html workflow-177917d4] || 12 done || 18-04-2011 11:25 || |
152 | | || 25 || second-batch (12 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-289c831f/html/workflow-289c831f.html workflow-289c831f] || 5 done || 20-04-2011 21:15 || |
153 | | || 26 || second-batch (7 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-96f414e3/html/workflow-96f414e3.html workflow-96f414e3] || running || 22-04-2011 || |
154 | | || 27 || second-batch (3 lanes) || [http://orange.ebioscience.amc.nl/workflows/workflow-63ecec34/html/workflow-63ecec34.html workflow-63ecec34] || running || 27-04-2011 15:22 || |
155 | | |
156 | | '''Fastqc analysis - done''' |
157 | | || '''#''' || '''Sample''' || '''WF''' || '''Status''' || '''Start''' || |
158 | | || 1 || second-batch (2x295 lanes = 590 fastq files) || [http://orange.ebioscience.amc.nl/workflows/workflow-417415cc/html/workflow-417415cc.html workflow-417415cc] || 396 done || 23-04-2011 02:44 || |
159 | | || 2 || second-batch (194 fastq files) || [http://orange.ebioscience.amc.nl/workflows/workflow-6ded80d3/html/workflow-6ded80d3.html workflow-6ded80d3] || 189 done || 23-04-2011 11:40 || |
160 | | || 3 || second-batch (5 fastq file) || [http://orange.ebioscience.amc.nl/workflows/workflow-2bb76072/html/workflow-2bb76072.html workflow-2bb76072] || 5 done || 23-04-2011 14:40 || |
161 | | |
162 | | RESULTS: [http://www.bbmriwiki.nl/attachment/wiki/BigCompute/log-fastqc20110423.xls log-fastqc20110423.xls] - contains information about run time and disk usage on the compute nodes and info about the number of sequences per lane |
163 | | |
164 | | THROUGHPUT: workflow run time was 12 hrs, total CPU run time was 12 days (speedup of ~24x for this component) |
165 | | |
166 | | |
167 | | '''Mark-duplicates analysis on all files that are aligned so far''' |
168 | | || '''#''' || '''Sample''' || '''WF''' || '''Status''' || '''Start''' || |
169 | | || 1 || 351 lanes || [http://orange.ebioscience.amc.nl/workflows/workflow-a5a7a078/html/workflow-a5a7a078.html workflow-a5a7a078] || 212 done || 04-05-2011 12:54 || |
170 | | || 2 || 139 lanes || [http://orange.ebioscience.amc.nl/workflows/workflow-87b3994d/html/workflow-87b3994d.html workflow-87b3994d] || 115 done || 05-05-2011 15:29 || |
171 | | || 3 || 24 lanes || [http://orange.ebioscience.amc.nl/workflows/workflow-7e395331/html/workflow-7e395331.html workflow-7e395331] || failed || 06-05-2011 08:22 || |
172 | | || 4 || 24 lanes || [http://orange.ebioscience.amc.nl/workflows/workflow-6207c67d/html/workflow-6207c67d.html workflow-6207c67d] || 19 done || 06-05-2011 12:55 || |
173 | | || 5 || 5 lanes || [http://orange.ebioscience.amc.nl/workflows/workflow-62e2653e/html/workflow-62e2653e.html workflow-62e2653e] || running || 07-05-2011 14:08 || |
174 | | |
175 | | '''Monitor clusters''' |
176 | | * [http://ganglia.sara.nl/?m=load_one&r=week&s=descending&c=LifeScience+Grid&h=&sh=1&hc=4&z=small Ganglia - LifeScience grid] |
177 | | * [http://ganglia.sara.nl/?m=load_one&r=week&s=descending&c=GINA+Cluster&h=&sh=1&hc=4&z=small Ganglia - Gina cluster] |
178 | | |