= Gene, exon and transcript counts = == Counts and Spearman correlations for run 1 == Date: 06-november-2013[[BR]] Analysis by: Peter-Bram 't Hoen[[BR]] The combined gene counts for the 2330 samples from run 1 are available on the VM: /virdir/Backup/run_1_gene_counts/combined_gene_count_run_1.txt and were generated using this script: [raw-attachment:merge_count_script.r R script for merging gene count tables][[BR]] Subsequently, pairwise Spearman correlations were calculated: /virdir/Backup/run_1_gene_counts/Spearman_correlations_complete_gene_data_run_1.txt[[BR]] From these the median Spearman correlation for each sample to each other sample was calculated. This is also called the D-statistic. The D-statistics (ranked from low to high) can be found in this file [raw-attachment:Median_pairwise_spearman_correlations_complete_gene_data_run_1.txt Median Spearman correlations][[BR]] [raw-attachment:Median_pairwise_spearman_correlations_by_flowcell_complete_gene_data_run_1.pdf Boxplot of median Spearman correlations grouped by flowcell] (Martijn Vermaat)[[BR]] [raw-attachment:Dstat_biobank_boxplot.pdf Boxplot of median Spearman correlations grouped by biobank] [[BR]] After removing the two samples with very low Spearman correlations to all other samples, the distance matrix was calculated (1 - correlation matrix), and a two-dimensional MDS plot was created using the R function cmdscale. [raw-attachment:mdsplot_filt_colored_biobank.pdf This is the resulting mdsplot]. The plot was colored according to the following color scheme: [[BR]] "LL" - gold[[BR]] "RS" - blue[[BR]] "CODAM" - orange[[BR]] "LLS" - pink[[BR]] "Amsterdam" - darkred[[BR]] Same mds plot but now colored according to mean GC percentage: [raw-attachment:mdsplot_filt_colored_gc.pdf mdsplot GC]