| 1 | |
| 2 | = Replication test = |
| 3 | |
| 4 | The replication test was done on the Shark cluster in Leiden on 20 LLS samples. The pipeline can be found attached to this page. There are four files that are checked on similarity based on the md5sum. We checked the exon, gene and transcript counts as well as the bam file. (sorted and marked for duplicates). |
| 5 | |
| 6 | '''Exon and gene counts''' |
| 7 | |
| 8 | The initial result showed the exon counts to be similar. The contents of the gene counts, after removing the header, were identical also. |
| 9 | |
| 10 | '''BAM files''' |
| 11 | |
| 12 | For comparisson the BAMs were converted to SAM files. These are easier to read and showed different reads on the same location being ordered in different ways. The content was the same and the numerical sort on position was still correct, yet Samtools seemed to have a random order when multiple reads are reported to a single position. A sort with Linux's powertools confirmed this, output hereof was identical for random selected test sample BC1KBKACXX-4-22 LLS-346-130804. |
| 13 | |
| 14 | '''Transcript counts''' |
| 15 | |
| 16 | Files differ by 227 out of 193865 lines. There are actual different values in the file, so it's not due to headers or other static properties inside the files. An example of a difference: |
| 17 | |
| 18 | {{{ |
| 19 | #!div style="font-size: 80%" |
| 20 | 1 protein_coding transcript 44435680 44438393 . + . transcript_id "ENST00000412950"; locus_id "1:44435672-44439041W"; gene_id "ENSG00000132768"; reads 0.057163; length 1596; RPKM 0.000932 |
| 21 | 1 protein_coding transcript 44435680 44438393 . + . transcript_id "ENST00000412950"; locus_id "1:44435672-44439041W"; gene_id "ENSG00000132768"; reads 2.282137; length 1596; RPKM 0.037210 |
| 22 | }}} |
| 23 | |
| 24 | '''Replication outcome''' |
| 25 | |
| 26 | ||= Bam file =||= 20x deviant =|| |
| 27 | ||= Sam from BAM =||= 20x deviant =|| |
| 28 | ||= Linux sorted SAM =||= '''Identical for tested sample''' =|| |
| 29 | || Exon count || '''20x identical''' || |
| 30 | ||= Transcript count =||= 20x deviant =|| |
| 31 | || Gene count || '''20x identical''' (beside header) || |
| 32 | |
| 33 | Samples used for test: |
| 34 | {{{ |
| 35 | #!div style="font-size: 80%" |
| 36 | AD1NAMACXX-1-6 LLS-453-130804, |
| 37 | AD1NAMACXX-5-14 LLS-640-130804, |
| 38 | AD1NAMACXX-8-4 LLS-786-130804, |
| 39 | AD1NE2ACXX-2-27 LLS-815-130804, |
| 40 | AD1NE2ACXX-3-11 LLS-64-130804, |
| 41 | AD1NE2ACXX-3-8 LLS-81-130804, |
| 42 | AD1NE2ACXX-4-15 LLS-128-130804, |
| 43 | AD1NE2ACXX-5-1 LLS-368-130804, |
| 44 | AD1NE2ACXX-6-11 LLS-113-130804, |
| 45 | AD1NE2ACXX-6-7 LLS-187-130804, |
| 46 | AD1NFNACXX-4-10 LLS-90-130804, |
| 47 | AD1NFNACXX-6-1 LLS-15-130804, |
| 48 | AD1NFNACXX-7-12 LLS-731-130804, |
| 49 | BC1KBKACXX-2-19 LLS-499-130804, |
| 50 | BC1KBKACXX-4-22 LLS-346-130804, |
| 51 | BC1KBKACXX-5-5 LLS-345-130804, |
| 52 | BC1KBKACXX-8-16 LLS-375-130804, |
| 53 | BD1NW4ACXX-7-12 LLS-34-130804, |
| 54 | BD1NW4ACXX-8-25 LLS-79-130804, |
| 55 | BD1NYRACXX-1-5 LLS-435-130804 |
| 56 | }}} |