Changes between Version 9 and Version 10 of GwasQcPipeline
- Timestamp:
- Feb 10, 2011 5:36:17 PM (14 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
GwasQcPipeline
v9 v10 3 3 Removed all bad samples (taken from Mathieu). 4 4 5 {{{p-link --bfile GvNL_250111 --remove GvNL_bad_samples.txt --make-bed --out GvNL_good_samples}}} 5 {{{ 6 p-link --bfile GvNL_250111 --remove GvNL_bad_samples.txt --make-bed --out GvNL_good_samples}}} 6 7 7 8 Update family information 8 9 9 {{{p-link --bfile GvNL_samples --update-ids GvNL_family_update.txt --make-bed --out GvNL_good_fam}}} 10 {{{ 11 p-link --bfile GvNL_samples --update-ids GvNL_family_update.txt --make-bed --out GvNL_good_fam}}} 10 12 11 13 Set trio relationships 12 14 13 {{{p-link --bfile GvNL_good_fam --update-parents GvNL_parents_update.txt --make-bed --out GvNL_good_fam_par}}} 15 {{{ 16 p-link --bfile GvNL_good_fam --update-parents GvNL_parents_update.txt --make-bed --out GvNL_good_fam_par}}} 14 17 15 18 Manual check for suspicious individuals (not in trios) and remove them 16 19 17 {{{<pre>p-link --bfile GvNL_good_fam_par --remove GvNL_suspicious.txt --make-bed --out GvNL_good_fam_par_susp}}} 20 {{{ 21 p-link --bfile GvNL_good_fam_par --remove GvNL_suspicious.txt --make-bed --out GvNL_good_fam_par_susp}}} 18 22 19 23 NB: After checking suspicious individuals with Mathieu, turns out it was 1 typo and 1 filtered low call rate (93%). Therefore, only the low callrate individual was screened out after all. … … 21 25 Check and update sex information 22 26 23 {{{p-link --bfile GvNL_good_fam_par_susp --check-sex --make-bed}}} 27 {{{ 28 p-link --bfile GvNL_good_fam_par_susp --check-sex --make-bed}}} 24 29 25 30 Crosscheck with information from Genome Studio provided by Mathieu … … 30 35 Update sex information for children and swapped individuals 31 36 32 {{{<pre>p-link --bfile GvNL_good_fam_par_susp --update-sex GvNL_sex_update.txt --make-bed --out GvNL_raw_final}}} 37 {{{ 38 p-link --bfile GvNL_good_fam_par_susp --update-sex GvNL_sex_update.txt --make-bed --out GvNL_raw_final}}} 33 39 34 40 Identification of individuals with elevated missing data rates or outlying heterozygosity rate (See Anderson, NP2010, p.1569) => How does this compare with the PLINK suggested approach [[http://pngu.mgh.harvard.edu/~purcell/plink/thresh.shtml here]] … … 36 42 Get missing data information: 37 43 38 {{{p-link --bfile GvNL_raw_final --missing --out GvNL_raw_final}}} 44 {{{ 45 p-link --bfile GvNL_raw_final --missing --out GvNL_raw_final}}} 39 46 40 47 Get heterozygocity information: 41 48 42 {{{p-link --bfile GvNL_raw_final --het --out GvNL_raw_final}}} 49 {{{ 50 p-link --bfile GvNL_raw_final --het --out GvNL_raw_final}}} 43 51 44 52 Calculate the observed heterozygosity rate per individual using the formula (N(NM) − O(Hom))/N(NM) and plot the missing SNPs vs heterozygocity rate for eyeball inspection. … … 67 75 * Prune SNPs for LD 68 76 69 {{{p-link --bfile GvNL_raw_final --indep-pairwise 50 5 0.2 --out GvNL_raw_final}}} 77 {{{ 78 p-link --bfile GvNL_raw_final --indep-pairwise 50 5 0.2 --out GvNL_raw_final}}} 70 79 71 80 * Generate pairwise Identity-By-State (IBS) metrics using pruned SNPs only 72 81 73 {{{<pre>p-link --bfile GvNL_raw_final --extract GvNL_raw_final.prune.in --genome --out GvNL_raw_final}}} 82 {{{ 83 p-link --bfile GvNL_raw_final --extract GvNL_raw_final.prune.in --genome --out GvNL_raw_final}}} 74 84 75 85 * Check trio inheritance based on IBS: Check the Pi_HAT value for the individuals (unrelated individuals =~ 0, parent <-> child =~ 0.5, siblings =~0.5, twins =~ 1.0). Done with homemade perl script. … … 79 89 Check trio inheritance based on Mendelian segregation 80 90 81 {{{p-link --file GvNL_raw_final --me 0.05 0.1 --make-bed --out GvNL_inheritance}}} 91 {{{ 92 p-link --file GvNL_raw_final --me 0.05 0.1 --make-bed --out GvNL_inheritance}}} 82 93 83 94 A diff between GvNL_raw_final.fam and GvNL_inheritance.fam confirms IBS version: families G34 and A56 have been removed.