wiki:Evaluations/20121002

Version 3 (modified by David van Enckevort, 12 years ago) (diff)

--

October 2012 Evaluation GoNL

People present: Morris Swertz, David van Enckevort, Paul de Bakker, Lennart Karssen, Kai Ye, Tom Visser

E-mail contributions: Hailiang Mei, Jan Bot

Introduction

The goal of the evaluation was to learn from our experiences in this project, both from 'What Went Well' (www) and what we have to 'Take A Look At' (tala). We tried to identify which items we have to address immediately in the project and what are the lessons we have learned for the next project. The evaluation was done with a brain storm session in which each person could first write down five items that went well and five items to take a look at. We categorised these items and discussed them in the group.

In total we had 29 www and 37 tala items. These items could be divided in two major groups: organisational and technical, which left only a few other uncategorised items.

Action items

Scratch

Technical: File management & replication

  • General backup strategy and restore?
  • What is where (ToC of files)?
  • Is the file in hand the same as in ToC (checksum)?
  • What version is this file (e.g. multiple align runs)
  • Does the researcher have the file available on site?
  • Data freeze: can we mark data sets.
  • Data librarian: who is responsible for keeping the lists

(dcache instance writes)

Action items: => create a series of user stories describing => Version individual files, not the whole set because to big (+index, etc) => Have overview of who wants what => Small files we can release as a whole, e.g. SNP releases

Technical: Distribution of the analysis

  • Where do you compute what?
  • Can we really distribute analyses over multiple sites
  • Currently we depend on LISA and UMCG clusters.

=> Make pipelines distributed: deploy pipelines on multiple clusters => Make dependent executable available on other clusters => Make data available on other clusters What pipelines do we want to distribute and why, and what are the barriers???

Technical: QC and tracing of errors

  • Robustness of the analysis
  • How do we make certain that data analyses are used

=> Action item: clear QC steps but pragmatic. E.g. compare unique aligned reads. => Action item: verification of pipelines accross sites using overlap samples.

Coordination: Communication problems

Organization: which resources are actually available

Science / Roadmap:

  • Paper plan
  • Get from the steering committee general directions, very broad, what

can / should do next with the data (GoNL flag, or just using)

  • Group responsible of rolling roadmap for one year (get from the

steering committee)

  • Have more bioinformaticians in the steering committee and recognition of that
  • At every SC meeting have one of the subproject report results to SC
  • Overview of external GoNL projects
  • Very good that we have a SC member (Cisca) on the call all the time.
  • The technical people should get appreciation for their scientific

contribution!

  • Need experienced person for each working group (SV is okay,

imputation and pheno are a bit light because Yurii left)

  • Foreign contributors is nice, but it seems like they take away nice

projects away. Need better communication.

Organization: Roadmap and planning

  • Who is responsible for what?
  • Decentralized management (we can not boss other locations)

Actions:

  • Ask the SC people resources available (do the GoNL members get the

time they need?)

  • SV team has too little man power to do the work (largely volunteers,

hard to stimulare people)

Keep:

  • Weekly skypes
  • Mailing list
  • Open communication and low-threshold to find each other
  • Sharing of best practices nationally and internationally
  • Forming the group
  • Access to international collaboration
  • Sharing knowledge and code via wiki+svn
  • Self-organization in working groups along sensible lines
  • Using pragmatic solution and get started