wiki:BIOS_Pipeline/ReplicationTest

Version 1 (modified by jamverlouw, 8 years ago) (diff)

--

Replication test

The replication test was done on the Shark cluster in Leiden on 20 LLS samples. The pipeline can be found attached to this page. There are four files that are checked on similarity based on the md5sum. We checked the exon, gene and transcript counts as well as the bam file. (sorted and marked for duplicates).

Exon and gene counts

The initial result showed the exon counts to be similar. The contents of the gene counts, after removing the header, were identical also.

BAM files

For comparisson the BAMs were converted to SAM files. These are easier to read and showed different reads on the same location being ordered in different ways. The content was the same and the numerical sort on position was still correct, yet Samtools seemed to have a random order when multiple reads are reported to a single position. A sort with Linux's powertools confirmed this, output hereof was identical for random selected test sample BC1KBKACXX-4-22 LLS-346-130804.

Transcript counts

Files differ by 227 out of 193865 lines. There are actual different values in the file, so it's not due to headers or other static properties inside the files. An example of a difference:

1 protein_coding transcript 44435680 44438393 . + . transcript_id "ENST00000412950"; locus_id "1:44435672-44439041W"; gene_id "ENSG00000132768"; reads 0.057163; length 1596; RPKM 0.000932 1 protein_coding transcript 44435680 44438393 . + . transcript_id "ENST00000412950"; locus_id "1:44435672-44439041W"; gene_id "ENSG00000132768"; reads 2.282137; length 1596; RPKM 0.037210

Replication outcome

Bam file 20x deviant
Sam from BAM 20x deviant
Linux sorted SAM Identical for tested sample
Exon count 20x identical
Transcript count 20x deviant
Gene count 20x identical (beside header)

Samples used for test:

AD1NAMACXX-1-6 LLS-453-130804, AD1NAMACXX-5-14 LLS-640-130804, AD1NAMACXX-8-4 LLS-786-130804, AD1NE2ACXX-2-27 LLS-815-130804, AD1NE2ACXX-3-11 LLS-64-130804, AD1NE2ACXX-3-8 LLS-81-130804, AD1NE2ACXX-4-15 LLS-128-130804, AD1NE2ACXX-5-1 LLS-368-130804, AD1NE2ACXX-6-11 LLS-113-130804, AD1NE2ACXX-6-7 LLS-187-130804, AD1NFNACXX-4-10 LLS-90-130804, AD1NFNACXX-6-1 LLS-15-130804, AD1NFNACXX-7-12 LLS-731-130804, BC1KBKACXX-2-19 LLS-499-130804, BC1KBKACXX-4-22 LLS-346-130804, BC1KBKACXX-5-5 LLS-345-130804, BC1KBKACXX-8-16 LLS-375-130804, BD1NW4ACXX-7-12 LLS-34-130804, BD1NW4ACXX-8-25 LLS-79-130804, BD1NYRACXX-1-5 LLS-435-130804