Changes between Version 8 and Version 9 of GoNL_Immunochip_Data_Preparation


Ignore:
Timestamp:
Jul 1, 2011 4:13:57 PM (13 years ago)
Author:
laurent
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GoNL_Immunochip_Data_Preparation

    v8 v9  
    2222First it has to be clear that here BED refers to the [http://genome.ucsc.edu/FAQ/FAQformat.html#format1 UCSC BED format] and NOT the PLINK binary file format. To be able to sort the alleles with the Human Genome Reference, we need to access it. As it is a big file, [http://code.google.com/p/bedtools/ BEDTools] function ''fastaFromBed'' can extract only the loci of interest (in this case those on the chip) and report them in tab-delimited file.
    2323
    24 ''fastaFromBed'' needs a [http://genome.ucsc.edu/FAQ/FAQformat.html#format1 UCSC BED] file as input. This file is tab-delimited and contains 3 columns: Chrom Start_seq End_seq. As we are only interested in specific loci, Start_seq and End_seq will be 1 base appart so that only the locus of interest is reported in the output file. This file can very easily be generated either from the initial VCF file or the PLINK BIM file.
     24''fastaFromBed'' needs a [http://genome.ucsc.edu/FAQ/FAQformat.html#format1 UCSC BED] file as input. This file is tab-delimited and contains 3 columns: Chrom Start_seq End_seq. As we are only interested in specific loci, Start_seq and End_seq will be 1 base appart so that only the locus of interest is reported in the output file. This file can very easily be generated either from the initial VCF file or the PLINK BIM file:
     25* From VCF:  grep -v '^#' in.vcf | awk '{OFS="\t";print $1,$2,$2+1}' > out.bed
    2526
    2627Once you have the input file, simply run ''fastaFromBed'' on it giving the Human Reference corresponding to the chip data as the other input. For more information on ''fastaFromBed'', see the [http://code.google.com/p/bedtools/ BEDTools] Manual.