Changes between Version 8 and Version 9 of GoNL_Immunochip_Data_Preparation
- Timestamp:
- Jul 1, 2011 4:13:57 PM (13 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
GoNL_Immunochip_Data_Preparation
v8 v9 22 22 First it has to be clear that here BED refers to the [http://genome.ucsc.edu/FAQ/FAQformat.html#format1 UCSC BED format] and NOT the PLINK binary file format. To be able to sort the alleles with the Human Genome Reference, we need to access it. As it is a big file, [http://code.google.com/p/bedtools/ BEDTools] function ''fastaFromBed'' can extract only the loci of interest (in this case those on the chip) and report them in tab-delimited file. 23 23 24 ''fastaFromBed'' needs a [http://genome.ucsc.edu/FAQ/FAQformat.html#format1 UCSC BED] file as input. This file is tab-delimited and contains 3 columns: Chrom Start_seq End_seq. As we are only interested in specific loci, Start_seq and End_seq will be 1 base appart so that only the locus of interest is reported in the output file. This file can very easily be generated either from the initial VCF file or the PLINK BIM file. 24 ''fastaFromBed'' needs a [http://genome.ucsc.edu/FAQ/FAQformat.html#format1 UCSC BED] file as input. This file is tab-delimited and contains 3 columns: Chrom Start_seq End_seq. As we are only interested in specific loci, Start_seq and End_seq will be 1 base appart so that only the locus of interest is reported in the output file. This file can very easily be generated either from the initial VCF file or the PLINK BIM file: 25 * From VCF: grep -v '^#' in.vcf | awk '{OFS="\t";print $1,$2,$2+1}' > out.bed 25 26 26 27 Once you have the input file, simply run ''fastaFromBed'' on it giving the Human Reference corresponding to the chip data as the other input. For more information on ''fastaFromBed'', see the [http://code.google.com/p/bedtools/ BEDTools] Manual.