Version 15 (modified by 14 years ago) (diff) | ,
---|
Table of Contents
Groningen cluster
People UMCG: Morris, Freerk, more?
Description Description here about code template and automatic PBS script generation. Job submission/monitoring
Port applications to Dutch Life Science Grid
People
- AMC: Antoine van Kampen, Barbera van Schaik, Silvia D Olabarriaga, Mark Santcroos
- Sara/BiGGrid: Tom Visser, more?
- UMCG: Morris, Freerk
Description Software is going to be implemented as workflow components. The workflows will run on the Dutch life science grid.
- Information about the infrastructure: http://www.bioinformaticslaboratory.nl/twiki/bin/view/EBioScience/
- Getting started: http://www.bioinformaticslaboratory.nl/twiki/bin/view/EBioScience/GettingStarted
Implemented workflow components at AMC
This list of workflow components are already available. We can expand it with Pindel and (parts of) the GATK pipeline.
- Splitting of fastq files
- Building a BWA index on the genome sequence (base space and color space)
- BWA for shotgun reads (base space and color space) It is possible to do parameter sweeps. Output is in bam format
- Merge bam results
- Samtools pileup
- Varscan (pileup to snp, indel and cns)
- Bam2coverage creates a UCSC wiggle file to display the genome coverage (per 50kbp)
- Coverage-per-base determines the coverage for every base in the genome and it summarizes the results (coverage versus frequency)
- Annovar (currently working on the implementation). This is a pipeline to annotate variants (gene, dbsnp, hapmap, 1000g, conservation, etc)
Implemented components of the Groningen pipeline A more detailed description will follow later
- BwaIllumina (implementation phase) - pe00-bwa-align-pair1.ftl, pe01-bwa-align-pair2.ftl, pe02-bwa-sampe.ftl, pe03-sam-to-bam.ftl, pe04-sam-sort.ftl
- MarkDuplicates (implementation phase) - pe05-mark-duplicates.ftl
- PicardQC (implementation phase) - pe04b-picardQC.ftl
- progress WF
To be implemented
- The components of the Groningen pipeline that not implemented as a workflow component yet
- Pindel
Data access rights
To ensure that the most limited group of people has access to the data we have created a subgroup "gvnl" within the "vlemed" Virtual Organisation (VO). For people to become part of this group, it is required that they have a Grid certificate and that they are part of the "vlemed" VO. On the following page there is information on how to get a certificate, how to get into the "vlemed" VO: http://www.bioinformaticslaboratory.nl/twiki/bin/view/EBioScience/EBioInfra#Access
For more information about data access see http://www.bioinformaticslaboratory.nl/twiki/bin/view/EBioScience/DataManagement
Things to address
- Available disk space on the grid storage elements / worker nodes
Alternatives
Clusters
- Groningen
- Leiden
- Huygens
- Lisa
- Philips
- DAS
Grid
Attachments (3)
-
r-environment.txt (134 bytes) - added by 14 years ago.
info about R packages
- log-picardqc-20101220.ods (17.1 KB) - added by 14 years ago.
- log-fastqc20110423.xls (118.0 KB) - added by 14 years ago.
Download all attachments as: .zip