wiki:Evaluations/20121002

Version 4 (modified by David van Enckevort, 12 years ago) (diff)

--

October 2012 Evaluation GoNL

People present: Morris Swertz, David van Enckevort, Paul de Bakker, Lennart Karssen, Kai Ye, Tom Visser

E-mail contributions: Hailiang Mei, Jan Bot

Introduction

The goal of the evaluation was to learn from our experiences in this project, both from 'What Went Well' (www) and what we have to 'Take A Look At' (tala). We tried to identify which items we have to address immediately in the project and what are the lessons we have learned for the next project. The evaluation was done with a brain storm session in which each person could first write down five items that went well and five items to take a look at. We categorised these items and discussed them in the group.

In total we had 29 www and 37 tala items. These items could be divided in two major groups: organisational and technical, which left only a few other uncategorised items. We discussed these items and tried to distill actions and lessons from them.

Technical

File management & replication

  • General backup strategy and restore?
  • What is where (ToC of files)?
  • Is the file in hand the same as in ToC (checksum)?
  • What version is this file (e.g. multiple align runs)
  • Does the researcher have the file available on site?
  • Data freeze: can we mark data sets.
  • Data librarian: who is responsible for keeping the lists

Action items

  • create a series of user stories describing the practical issue we encountered during the project to share with SARA and BigGrid?
  • Version individual files, not the whole set because to big (+index, etc)
  • Have overview of who wants what
  • Create small files we can release as a whole, e.g. SNP releases
  • Sort out backup strategy, what to keep, how to distribute over the resources and make it automated.
  • Make people responsible for data management.

Distribution of the analysis

  • Where do you compute what? There was not a clear plan on the usage of the resources.
  • Can we really distribute analyses over multiple sites
  • Currently we depend on LISA and UMCG clusters.
  • What pipelines do we want to distribute and why, and what are the barriers???

Action items

  • Reduce dependency on single resources:
    • Make pipelines distributed: deploy pipelines on multiple clusters
    • Make dependent executable available on other clusters
    • Make data available on other clusters

QC and tracing of errors

  • Robustness of the analysis
  • How do we make certain that data analyses are used

Action items

  • Clear QC steps but pragmatic. E.g. compare unique aligned reads.
  • Verification of pipelines accross sites using overlap samples.

Organizational

  • Coordination: Communication problems
    • Overview of external GoNL projects
    • Very good that we have a SC member (Cisca) on the call all the time.
    • Foreign contributors is nice, but it seems like they take away nice projects away. Need better communication.
    • Who is responsible for what?
    • Decentralized management (we can not boss other locations)
  • Organization:
    • It's not always clear which resources are actually available
    • SV team has too little man power to do the work (largely volunteers, hard to stimulare people)
    • Some groups could use some strengthening from one or more experienced people (Pheno, Imputation)
    • Not clear what should go into which paper, responsibility for the papers.

Action items

  • Communication:
    • At every Steering Committee meeting have one of the subproject report results to Steering Committee
  • Organisation:
    • Ask the Steering Committee about available human resources (do the GoNL members get the time they need?)
    • Group responsible of rolling roadmap for one year (get from the steering committee)
    • Have more bioinformaticians in the steering committee and recognition of that
    • The technical people should get appreciation for their scientific contribution!
    • Need experienced person for each working group (SV is okay, imputation and pheno are a bit light because Yurii left)
  • Science / Roadmap:
    • Paper plan
    • Get from the steering committee general directions, very broad, what can / should do next with the data (GoNL flag, or just using)

Things to Keep

  • Weekly skypes
  • Mailing list
  • Open communication and low-threshold to find each other
  • Sharing of best practices nationally and internationally
  • Forming the group
  • Access to international collaboration
  • Sharing knowledge and code via wiki+svn
  • Self-organization in working groups along sensible lines
  • Using pragmatic solution and get started