wiki:DataManagement/SftpServer

Version 4 (modified by laurent, 13 years ago) (diff)

--

UMCG SFTP (application20.target.rug.nl)

The SFTP server can be used to access most of the data on the UMCG cluster. Please note that since bandwidth is limited, you should only download the minimum files you need and should download compressed version of the files when available (usually available for all plain text files).

/target/gpfs2/gcc/groups/gonl/sftp/

Root of the SFTP.

/target/gpfs2/gcc/groups/gonl/sftp/A4

Contains all the information about the A4 test trio, including all the raw and aligned data.

/target/gpfs2/gcc/groups/gonl/sftp/BGI

Contains all the data coming from BGI, including their variant calls. The data is organized by batch in the batchX subfolders. Each of the subfolders typically contains the following:

  • batchX/
    • A set of compressed files containing the plain text data and md5 files for downloading purpose. These are named as follows: timestamp.BGI.batchX.data_type.hg1X.data_format.tar.bz2. All plain text data should be available as a compressed file, including but not limited to: CNV, InDel, InDel annotations, SNP, SNP annotation. Some of these are available in multiple formats; see BGI data page for more explanation about the BGI data and its formats. md5 checksum files for all files.
  • batchX/bam OR batchX/alignment
    • The BAM files aligned by BGI
  • batchX/CNV
    • CNVs in CNV Detector format. If you want to download for all samples, please download the compressed archive from batchX/
  • batchX/indel
    • InDels in samtools pileup format. If you want to download for all samples, please download the compressed archive from batchX/
  • batchX/indel_annotation
    • Indels annotations in GFF format. If you want to download for all samples, please download the compressed archive from batchX/
  • batchX/SNP
    • SNP in SOAPsnp format. If you want to download for all samples, please download the compressed archive from batchX/
  • batchX/SNP_annotation
    • SNP annotations in GFF format. If you want to download for all samples, please download the compressed archive from batchX/
  • batchX/vcf_format/CNV
    • CNV in VCF format. If you want to download for all samples, please download the compressed archive from batchX/
  • batchX/vcf_format/indel
    • Indel in VCF format. If you want to download for all samples, please download the compressed archive from batchX/
  • batchX/vcf_format/SNP
    • SNP in VCF format. If you want to download for all samples, please download the compressed archive from batchX/

NOTES:

  • Unless specified otherwise, all data is aligned on hg19
  • Some of the folder/filenames are inconsistent from one batch to the other. This is because the original names as found on the BGI HD have been kept.

/target/gpfs2/gcc/groups/gonl/sftp/pilot

Data fro the pilot, including aligned BAMs and SNPs.

/target/gpfs2/gcc/groups/gonl/sftp/resources

GoNL resources tarball (Thanks Freerk!)

/target/gpfs2/gcc/groups/gonl/sftp/upload

This is where everyone has write permissions. This directory should be used for data exchange.