wiki:DataManagement/SftpServer

Version 8 (modified by laurent, 13 years ago) (diff)

--

UMCG SFTP (application20.target.rug.nl)

The SFTP server can be used to access most of the data on the UMCG cluster. Please note that since bandwidth is limited, you should only download the minimum files you need and should download compressed version of the files when available (usually available for all plain text files).

Access:

There 2 kind of access to the gonl SFTP server: personal access and gonlsv access. Personal access gives access to almost all data, while gonlsv only gives access to a limited subset of the data. For gonlsv users who need access to specific data that is not available with their privileges, you should send an email to someone at UMCG to request the data; upon approval, access to the data will be provided.

gonlsv users

/target/gpfs2/gcc/groups/gonl/sftp/

Root of the SFTP for people using the gonlsv limited account.

/target/gpfs2/gcc/groups/gonl/sftp/A4

Contains all the information about the A4 test trio, including all the raw and aligned data.

/target/gpfs2/gcc/groups/gonl/sftp/pilot

Data fro the pilot, including aligned BAMs and SNPs.

/target/gpfs2/gcc/groups/gonl/sftp/resources

GoNL resources tarball (Thanks Freerk!)

/target/gpfs2/gcc/groups/gonl/sftp/upload

This is where everyone has write permissions. This directory should be used for data exchange.

personal users

/target/gpfs2/gcc/groups/gonl/

Root of the gonl data.

/target/gpfs2/gcc/groups/gonl/projects

Root of the gonl projects. This is where all the raw data and results live. They are organized into projects; please have a look at the full data structure here: DataManagement/ProjectData

/target/gpfs2/gcc/groups/gonl/projects/bgi

Contains all the data coming from BGI, including their variant calls. The data is organized by batch in the batchX subfolders. Each of the subfolders typically contains the following:

  • batchX/
    • A set of compressed files containing the plain text data and md5 files for downloading purpose. These are named as follows: timestamp.BGI.batchX.data_type.hg1X.data_format.tar.bz2. All plain text data should be available as a compressed file, including but not limited to: CNV, InDel, InDel annotations, SNP, SNP annotation. Some of these are available in multiple formats; see BGI data page for more explanation about the BGI data and its formats. md5 checksum files for all files.
  • batchX/bam OR batchX/alignment
    • The BAM files aligned by BGI
  • batchX/CNV
    • CNVs in CNV Detector format. If you want to download for all samples, please download the compressed archive from batchX/
  • batchX/indel
    • InDels in samtools pileup format. If you want to download for all samples, please download the compressed archive from batchX/
  • batchX/indel_annotation
    • Indels annotations in GFF format. If you want to download for all samples, please download the compressed archive from batchX/
  • batchX/SNP
    • SNP in SOAPsnp format. If you want to download for all samples, please download the compressed archive from batchX/
  • batchX/SNP_annotation
    • SNP annotations in GFF format. If you want to download for all samples, please download the compressed archive from batchX/
  • batchX/vcf_format/CNV
    • CNV in VCF format. If you want to download for all samples, please download the compressed archive from batchX/
  • batchX/vcf_format/indel
    • Indel in VCF format. If you want to download for all samples, please download the compressed archive from batchX/
  • batchX/vcf_format/SNP
    • SNP in VCF format. If you want to download for all samples, please download the compressed archive from batchX/

NOTES:

  • Unless specified otherwise, all data is aligned on hg19
  • Some of the folder/filenames are inconsistent from one batch to the other. This is because the original names as found on the BGI HD have been kept.