wiki:gene_exon_transcript_count

Version 10 (modified by pacthoen, 10 years ago) (diff)

--

Gene, exon and transcript counts

Counts and Spearman correlations for run 1

Date: 06-november-2013
Analysis by: Peter-Bram 't Hoen

The combined gene counts for the 2330 samples from run 1 are available on the VM: /virdir/Backup/run_1_gene_counts/combined_gene_count_run_1.txt and were generated using this script: R script for merging gene count tables
Subsequently, pairwise Spearman correlations were calculated: /virdir/Backup/run_1_gene_counts/Spearman_correlations_complete_gene_data_run_1.txt
From these the median Spearman correlation for each sample to each other sample was calculated. This is also called the D-statistic. The D-statistics (ranked from low to high) can be found in this file Median Spearman correlations

Boxplot of median Spearman correlations grouped by flowcell (Martijn Vermaat)

Boxplot of median Spearman correlations grouped by biobank

After removing the two samples with very low Spearman correlations to all other samples, the distance matrix was calculated (1 - correlation matrix), and a two-dimensional MDS plot was created using the R function cmdscale. This is the resulting mdsplot. The plot was colored according to the following color scheme:
"LL" - gold
"RS" - blue
"CODAM" - orange
"LLS" - pink
"Amsterdam" - darkred

Same mds plot but now colored according to mean GC percentage: mdsplot GC

Attachments (7)

Download all attachments as: .zip