wiki:DataManagement/ProjectData

Version 5 (modified by laurent, 13 years ago) (diff)

--

~/gcc/groups/gonl

Directory structure of data management. On the Groningen cluster that would be target/gpfs2/gcc/groups/gonl Permission to read by gonl group, some folders also write.

  • /tools
    • All software, scripts and tools used to process the data.
    • Note that all users can install tools of common interest in this shared directory; tools should specify version as typically multiple versions of the same tool cohabit.
  • /resources
    • All resources needed for data processing, including genome references, dbSNP releases, etc.
    • Note that all users can put resources of common interest here.
  • /home
    • one private folder per member of this group
  • /general
    • presentations, publications, other stuff
  • /projects
    • /batchX
      • /rawdata
        • here is a list of fq.gz files
      • /results
        • /alignment
          • here is a list of bam files
        • /stats
          • here is one file per QC tool
        • /snp
          • here is one vcf file per analysis run
      • /logs
      • /intermediate_results
        • whatever is needed, will be empty at end of project
    • /gwas_data
      • /rawdata
        • original provided plink or genomestudio
      • /results
        • here the cleaned up genotypes in agreed upon format
      • /logs
    • /groningen_immunochip
      • /rawdata
      • /results
    • /pilot
      • /rawdata
        • /alignment
          • symlinks to the raw alignments used -> /first_batch/rawdata/some.aligned.cleaned.bam
      • /result
        • /snp
        • /indel
        • /cnv
      • logs
    • /bgi
      • /batchX
        • A set of compressed files containing the plain text data and md5 files for downloading purpose. These are named as follows: timestamp.BGI.batchX.data_type.hg1X.data_format.tar.bz2. All plain text data should be available as a compressed file, including but not limited to: CNV, InDel, InDel annotations, SNP, SNP annotation. Some of these are available in multiple formats; see BGI data page for more explanation about the BGI data and its formats.md5 checksum files is also available for all files.
      • batchX/bam OR batchX/alignment
        • The BAM files aligned by BGI
      • batchX/CNV
        • CNVs in CNV Detector format. If you want to download for all samples, please download the compressed archive from batchX/
      • batchX/indel
        • InDels in samtools pileup format. If you want to download for all samples, please download the compressed archive from batchX/
      • batchX/indel_annotation
        • Indels annotations in GFF format. If you want to download for all samples, please download the compressed archive from batchX/
      • batchX/SNP
        • SNP in SOAPsnp format. If you want to download for all samples, please download the compressed archive from batchX/
      • batchX/SNP_annotation
        • SNP annotations in GFF format. If you want to download for all samples, please download the compressed archive from batchX/
      • batchX/vcf_format/CNV
        • CNV in VCF format. If you want to download for all samples, please download the compressed archive from batchX/
      • batchX/vcf_format/indel
        • Indel in VCF format. If you want to download for all samples, please download the compressed archive from batchX/
      • batchX/vcf_format/SNP
        • SNP in VCF format. If you want to download for all samples, please download the compressed archive from batchX/
        • SNP in VCF format. If you want to download for all samples, please download the compressed archive from batchX/