wiki:gene_exon_transcript_count

Version 9 (modified by pacthoen, 10 years ago) (diff)

--

Gene, exon and transcript counts

Counts and Spearman correlations for run 1

Date: 06-november-2013
Analysis by: Peter-Bram 't Hoen

The combined gene counts for the 2330 samples from run 1 are available on the VM: /virdir/Backup/run_1_gene_counts/combined_gene_count_run_1.txt and were generated using this script: R script for merging gene count tables
Subsequently, pairwise Spearman correlations were calculated: /virdir/Backup/run_1_gene_counts/Spearman_correlations_complete_gene_data_run_1.txt
From these the median Spearman correlation for each sample to each other sample was calculated. This is also called the D-statistic. The D-statistics (ranked from low to high) can be found in this file Median Spearman correlations

Boxplot of median Spearman correlations grouped by flowcell (Martijn Vermaat)

Dstat_biobank_boxplot.pdf Boxplot of median Spearman correlations grouped by biobank]

After removing the two samples with very low Spearman correlations to all other samples, the distance matrix was calculated (1 - correlation matrix), and a two-dimensional MDS plot was created using the R function cmdscale. This is the resulting mdsplot. The plot was colored according to the following color scheme:
"LL" - gold
"RS" - blue
"CODAM" - orange
"LLS" - pink
"Amsterdam" - darkred

Same mds plot but now colored according to mean GC percentage: mdsplot GC

Attachments (7)

Download all attachments as: .zip