Version 2 (modified by 8 years ago) (diff) | ,
---|
Data QC based on mix-up mapping and concordance of imputed genotypes with genotypes called from RNAseq data
We used 3 ways of doing the QC:
- mix-up mapper: matching genotypes with expression for each sample;
- genotype concordance: calculating the concordance of imputed genotypes with genotypes called from RNAseq data;
- heterozygosity rate.
The blacklist of samples that do not pass these quality checks can be found in the attachment.
LLS
MixupMapper? detected 5 swaps and 7 samples with wrong genotype. The swaps will be performed and the 7 genotypes replaced. This leaves 7 samples (pers_id: 2014, 3142, 3144, 2634, 2890, 3126 and 3150) without genotype, these should be removed.
geno_id | run_id | Best Match (geno_id) | Best Match (run_id) | Action |
---|---|---|---|---|
561 | BD2CPRACXX-1-12 | 563 | BD2CPRACXX-1-21 | swap |
563 | BD2CPRACXX-1-21 | 561 | BD2CPRACXX-1-12 | swap |
974 | BD24PGACXX-8-25 | 978 | BD24PGACXX-7-8 | swap |
978 | BD24PGACXX-7-8 | 974 | BD24PGACXX-8-25 | swap |
1841 | AD2CJPACXX-6-9 | 1842 | AD2CJPACXX-5-1 | swap |
1842 | AD2CJPACXX-5-1 | 1841 | AD2CJPACXX-6-9 | swap |
2585 | AD2DATACXX-3-21 | 3273 | AD2DATACXX-3-22 | swap |
3273 | AD2DATACXX-3-22 | 2585 | AD2DATACXX-3-21 | swap |
3411 | BD2D5MACXX-3-7 | 3413 | BD2D5MACXX-4-15 | swap |
3413 | BD2D5MACXX-4-15 | 3411 | BD2D5MACXX-3-7 | swap |
2928 | AD2DATACXX-8-1 | 2014 | BD2CPRACXX-1-22 | replace genotype |
3126 | AD1NFNACXX-8-25 | 3142 | BD1NYRACXX-2-15 | replace genotype |
3142 | BD1NYRACXX-2-15 | 3144 | AD1NAMACXX-7-19 | replace genotype |
3194 | AD2DATACXX-4-5 | 2634 | AD2DATACXX-4-9 | replace genotype |
311 | BD1NW4ACXX-7-13 | 2890 | BD1NYRACXX-5-23 | replace genotype |
905 | AD1NFNACXX-8-27 | 3126 | AD1NFNACXX-8-25 | replace genotype |
6039 | AD1NE2ACXX-5-22 | 3150 | BD24PGACXX-5-5 | replace genotype |
Possibly contaminated samples
The outliers that show high heterozygosity rate in genotypes called from RNA-seq.
Also present in gender-specific analysis (see below):
BC1KBKACXX-5-6
BD1NW4ACXX-8-5
BD1NYRACXX-2-16
BC1KBKACXX-5-3
BC1KBKACXX-5-1
BD1NYRACXX-2-27
BD1NYRACXX-4-19
BC1KBKACXX-5-4
Possible gender-neutral contaminations:
BC1KBKACXX-3-12
BC1KBKACXX-5-7
BD24PGACXX-7-10
BC1KBKACXX-5-5
BD1NYRACXX-3-1
BC1KBKACXX-5-2
AD1NFNACXX-4-8
BD1NYRACXX-2-18
LifeLines?
http://www.molgenis.org/wiki/DeepNoteworthyObservations
LLDeep_0063
Corresponding RNA-seq sample is AC1C40ACXX-4-4 (old id: 103001429206) has only 76% of reads aligned. Flagged by MixupMapper? as sample mix-up. Also shows many discordant genotypes when using SNVMix.
LLDeep_0350
Corresponding RNA-seq sample is AD1GWFACXX-4-15 (old id: 103001383279), not flagged by MixupMapper?. However, shows many discordant genotypes when using SNVMix.
Has both high XIST and high chromosome Y expression levels. Average heteryzygosity for all samples = 49%, stdev = 1.9%. Sample LLDeep_0350, 103001383279 has heterozygosity rate of 72%: contaminated sample, where a male and female sample have likely been mixed in very similar proportions, hence the high expression levels of both XIST and chromosome Y genes.
Link to file with genotype concordance and heterozygosity rates on imputed genotpyes can be found here
CODAM
eQTL mapping (gene level) results:
6804 unique cis-regulated genes.
Samples that failed the QC:
2345 (RNA-seq ids: AD10W1ACXX-8-11, CODAM-102-130804): mix-up mapper + genotype concordance;
2495 (RNA-seq ids: AD10W1ACXX-5-18, CODAM-156-130804): mix-up mapper + genotype concordance;
It looks like RNA-seq sample ids were swapped for these two samples (see: http://www.bbmriwiki.nl/wiki/BIOS_QualityControl/BIOS_QualityControlRun1 of 12-December-2013)
Link to file with genotype concordance and heterozygosity rates on imputed genotpyes can be found here
RS
eQTL mapping (gene level) results:
7708 unique cis-regulated genes.
Samples that failed the QC:
8190002 (RNA-seq ids: AD1NNNACXX-4-18, RS-287-130804): mix-up mapper + genotype concordance;
9353 (AC1JV9ACXX-1-13, RS-761-130804): mix-up mapper + genotype concordance;
3520 (BC1JTJACXX-6-7, RS-442-130804): genotype concordance;
562 (BC1KAVACXX-8-13, RS-55-130804): genotype concordance + heterozygosity rate;
6734 (RS-502-130804): genotype concordance + heterozygosity rate; (passed QC in the first run data)
Link to file with genotype concordance and heterozygosity rates on imputed genotpyes can be found here
Data QC based on median correlations of gene counts from each sample to all other samples
Samples with much lower median correlations to all other samples
For methods see: http://www.bbmriwiki.nl/wiki/gene_exon_transcript_count
AC1JV9ACXX.1.10 0.0471
AD1NE2ACXX.5.22 0.1174
AD2D8RACXX.3.3 0.8028
AD2D8RACXX.6.3 0.8093
AD2D8RACXX.1.3 0.8257
Outliers to be removed based on QC stats and PC analysis
Updated: 12-December-2013
Analysis by: Peter-Bram 't Hoen
Too few reads:
AC1JV9ACXX-1-10
AD1NE2ACXX-5-22
BD1NW4ACXX-3-27
Other reasons: See http://www.bbmriwiki.nl/wiki/BIOS_QualityControl/BIOS_QualityControlRun
BD1NYRACXX-6-10 too low percentage of mapped reads, outlier on principal component 1,4,5,6
AD2CJPACXX-8-9 low exon correlation, outlier on principal component 1,11,14
BD1NR9ACXX-7-27 low percentage of mapped reads, outlier on principal component 4, likely degraded
Outliers to be removed based on gender-specific expression analysis
Updated: 12-December-2013
Analysis by: Peter-Bram 't Hoen
The normalized gene expression values (edgeR TMM method, expressed cpm) for XIST and for the sum of all protein-coding Y-chromosomal genes was used to check for contaminations between samples with different gender. The script can be found here. In addition to sample LL AD1GWFACXX-4-15, the following samples (all from LLS) came up and appeared to be contaminated:
BC1KBKACXX-5-1
BC1KBKACXX-5-3
BC1KBKACXX-5-4
BC1KBKACXX-5-6
BC1KBKACXX-5-8
BD1NW4ACXX-8-5
BD1NYRACXX-2-16
BD1NYRACXX-2-27
BD1NYRACXX-4-19
Attachments (8)
- blacklist_20140117.txt (810 bytes) - added by 8 years ago.
- blacklist_genotypeCalling_run1.txt (218 bytes) - added by 8 years ago.
- blacklist_QC.txt (169 bytes) - added by 8 years ago.
- gender_analysis.2.r (3.6 KB) - added by 8 years ago.
- gender_analysis.r (3.6 KB) - added by 8 years ago.
- genotype_concordance_heterozygosity_rate_imputed_RS_CODAM_LLS.xlsx (263.0 KB) - added by 8 years ago.
- pooling_qc_data_131125.xlsx (1.7 MB) - added by 8 years ago.
- pooling_qc_data_131125_edPB.xlsx (2.3 MB) - added by 8 years ago.