| 6 |  |  | 
                        | 7 |  | === GCC-level Directory Structure === | 
                        | 8 |  | The root for all subsequent directories is data/gcc/ | 
                        | 9 |  |  | 
                        | 10 |  | * /tools | 
                        | 11 |  | * Contains all GCC tools including''' '''GoNL tools | 
                        | 12 |  | * All tools should be put in a folder using the naming convention: ''toolname-version'' | 
                        | 13 |  | * Ex: Picard v1.32 should be found in'' /data/gcc/tools/picard-tools-1.32/'' | 
                        | 14 |  | * /resources | 
                        | 15 |  | * Contains all GCC resources inlcluding GoNL resources | 
                        | 16 |  | * All resources should be put in a folder precising their version. Normally, should follow resource-version. | 
                        | 17 |  | * Ex: Human Genome build 19 should be found in'' /data/gcc/resources/hg-19/'' | 
                        | 18 |  |  | 
                        | 19 |  | === Pipeline Result Files Naming Convention === | 
                        | 20 |  | The following convention applies to all files that are generated by the pipeline. For containing folders, see sections above. | 
                        | 21 |  |  | 
                        | 22 |  | * General convention | 
                        | 23 |  | * Filenames are composed of tokens identifying their content. The tokens are separated by '.' and if necessary the words within the tokens can be separated by '_' for reading purpose. | 
                        | 24 |  | * Except where it references specific names using another convention (ex: sample name), file names should be all small letters. | 
                        | 25 |  | * Sample-level files should be named using: ''sample_name.step_id.step_name.genome_build.time_stamp.extension'' | 
                        | 26 |  | * Ex: A vcf file for the sample A2a produced by the step vc02 (step 2 of variant calling) with the tool !UnifiedGenotyper using genome build human_g1k_v37 on a run that begun on February 1st 2011 at 12:00 should be named: ''A2a.vc02.unified_genotyper.human_g1k_v37.2011_02_01_12_00.snp'' | 
                        | 27 |  | * Lane-level files should be named using: ''sample_name.lane_name.step_id.step_name.genome_build.time_stamp.extension'' | 
                        | 28 |  | * Ex: A bam file for the lane FC20005_L1 of the sample A2a produced by the step pe03 (step 3 of paired-end alignment) with the tool BWA sampe using genome build human_g1k_v37 on a  run that begun on February 1st 2011 at 12:00 should be named: ''A2a.FC20005_L1.pe03.bwa_sampe.human_g1k_v37.2011_02_12_00.bam'' | 
                        | 29 |  | * Log file names should correspond to their output counterparts and have the .log extension. | 
                        | 30 |  | * Ex: log file for the vcf sample-level step above should be: ''A2a.vc02.unified_genotyper.human_g1k_v37.2011_02_01_12_00.log'' | 
                        | 31 |  | * Ex: log file for the bam lane-level step above should be: ''A2a.FC20005_L1.pe03.bwa_sampe.human_g1k_v37.2011_02_12_00.log'' | 
                        | 32 |  | == Logging == | 
                        | 33 |  | The logging strategy is currently under development but will be composed of both file logs and database entries in a Molgenis platform. The status is described below. | 
                        | 34 |  |  | 
                        | 35 |  | === Log Files === | 
                        | 36 |  | * At each step of the pipeline a single log is produced and contains: | 
                        | 37 |  | * PBS out and err | 
                        | 38 |  | * Tool out and err | 
                        | 39 |  | * Other tool-produced log where applicable | 
                        | 40 |  | * For log file naming, see section above. | 
                        | 41 |  |  | 
                        | 42 |  | === Molgenis === | 
                        | 43 |  | The Molgenis platform will be used to provide a more advanced and general view of the status of the pipeline runs (including different views, sorting, etc.) The current status is: | 
                        | 44 |  |  | 
                        | 45 |  | * Molgenis instance created with proposed model | 
                        | 46 |  | * Scripts for insertion under development | 
                      
                        |  | 6 | * DataManagement/ProjectResources - where resources and tools that are used by the pipelines | 
                        |  | 7 | * DataManagement/FileNameConventions - how are files named so we understand eachother |