Version 5 (modified by 12 years ago) (diff) | ,
---|
October 2012 Evaluation GoNL
People present: Morris Swertz, David van Enckevort, Paul de Bakker, Lennart Karssen, Kai Ye, Tom Visser
E-mail contributions: Hailiang Mei, Jan Bot
Introduction
The goal of the evaluation was to learn from our experiences in this project, both from 'What Went Well' (www) and what we have to 'Take A Look At' (tala). We tried to identify which items we have to address immediately in the project and what are the lessons we have learned for the next project. The evaluation was done with a brain storm session in which each person could first write down five items that went well and five items to take a look at. We categorised these items and discussed them in the group.
Evaluation
In total we had 29 www and 37 tala items. These items could be divided in two major groups: organisational and technical, which left only a few other uncategorised items. We discussed these items and tried to distill actions and lessons from them.
In general the project evaluated quite positive, if we had to rate ourselves we would give the project a 7. We identified several things that we feel we should try to keep and implement again in follow-up projects:
- Weekly Skype calls;
- Mailing list;
- Open communication and low-threshold to find each other;
- Sharing of best practices nationally and internationally;
- Forming the group;
- Access to international collaboration;
- Sharing knowledge and code via wiki+svn;
- Self-organization in working groups along sensible lines;
- Using pragmatic solution and get started;
- The involvement of NBIC BioAssist was instrumental in establishing the group.
Technical
File management & replication
- General backup strategy and restore?
- What is where (ToC of files)?
- Is the file in hand the same as in ToC (checksum)?
- What version is this file (e.g. multiple align runs)
- Does the researcher have the file available on site?
- Data freeze: can we mark data sets.
- Data librarian: who is responsible for keeping the lists
Action items
- create a series of user stories describing the practical issue we encountered during the project to share with SARA and BigGrid?
- Version individual files, not the whole set because to big (+index, etc)
- Have overview of who wants what
- Create small files we can release as a whole, e.g. SNP releases
- Sort out backup strategy, what to keep, how to distribute over the resources and make it automated.
- Make people responsible for data management.
Distribution of the analysis
- Where do you compute what? There was not a clear planning on the usage of the resources, the simple queueing and per scheduling of resource usage caused some project to get into trouble.
- Can we really distribute analyses over multiple sites
- Currently we most work was done on LISA (Imputation/GWAS) and UMCG (SV/Alignment) clusters, only some indel calling was done on the Grid, could have done more on the Grid.
- What pipelines do we want to distribute and why, and what are the barriers???
Action items
- Reduce dependency on single resources:
- Make pipelines distributed: deploy pipelines on multiple clusters
- Make dependent executable available on other clusters
- Make data available on other clusters
- This is taken up within RP2 and the eBioGrid project.
QC and tracing of errors
- Robustness of the analysis
- How do we make certain that data analyses are used
Action items
- Clear QC steps but pragmatic. E.g. compare unique aligned reads.
- Verification of pipelines accross sites using overlap samples.
Organizational
- Coordination: Communication problems
- Overview of external GoNL projects
- Very good that we have a SC member (Cisca) on the call all the time.
- Foreign contributors is nice, but it seems like they take away nice projects away. Need better communication.
- Who is responsible for what?
- Decentralized management (we can not boss other locations)
- It's not always clear who is paid by the project and who is a volunteer, you can only kindly ask the volunteers to do tasks.
- It was approached as a scientific project, which meant there was not always a clear direction from above.
- Organization:
- It's not always clear which people resources are actually available
- SV team has too little man power to do the work (largely volunteers, hard to stimulare people)
- Some groups could use some strengthening from one or more experienced people (Pheno, Imputation)
- Not clear what should go into which paper, responsibility for the papers.
Action items
- Communication:
- At every Steering Committee meeting have one of the subproject report results to Steering Committee
- Organisation:
- Ask the Steering Committee about available human resources (do the GoNL members get the time they need?)
- Group responsible of rolling roadmap for one year (get from the steering committee)
- Have more bioinformaticians in the steering committee and recognition of that
- The technical people should get appreciation for their scientific contribution!
- Need experienced person for each working group (SV is okay, imputation and pheno are a bit light because Yurii left)
- Science / Roadmap:
- Paper plan
- Get from the steering committee general directions, very broad, what can / should do next with the data (GoNL flag, or just using)