Changes between Version 3 and Version 4 of ChipBasedQcPipelineIdea


Ignore:
Timestamp:
Sep 26, 2010 7:44:00 PM (12 years ago)
Author:
Yurii Aulchenko
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ChipBasedQcPipelineIdea

    v3 v4  
    33Below a detailed outline of steps which should be included in ChipBasedQcPipeline is provided. This document also suggest the CipBasedQcPipelineWorkflow sequence and principal ideas for solutions. This document does not specify exact tools to be used; as most of operations are data manipulations, it will be up to the involved analysts to decide what tool may be more convenient for them and formulate the CipBasedQcPipelineWorkflow. The actual implementation of the workflow should allow automatic reproduction of the results and application of the same workflow to new data. This is not only important from good practice point of view, but also keeping in mind that more data will come to the same pipeline in the future.
    44
    5 The document assumes VcfGtDataFromat (in particular, VCF v.4 format) is used; “+” strand is used for sequencing data. It is assumed that chip data come in ChipGtDataFormat.
     5The document assumes VcfGtDataFormat (in particular, VCF v.4 format) is used; “+” strand is used for sequencing data. It is assumed that chip data come in ChipGtDataFormat.
    66
    77== CHIP-VCF BUILD AND DBSNP MATCHING TABLE ==
     
    4545 * A1VCHIP: first allele the personal genotype, translated according to “+” strand on VCF build <single character, either “A”, “C”, “G” or “T”>
    4646 * A2VCHIP: second allele the personal genotype, translated according to “+” strand on VCF build <single character, either “A”, “C”, “G” or “T”>
    47 * GTCHIP: genotype with alleles in alphabetic order, <two characters, each either “A”, “C”, “G” or “T”>
     47 * GTCHIP: genotype with alleles in alphabetic order, <two characters, each either “A”, “C”, “G” or “T”>
    4848
    4949Questions:
     
    7373 * GTVCF: genotype with alleles in alphabetic order, <two characters, each either “A”, “C”, “G” or “T”>. This can be done by mapping the numbers provided in VCF GT field to REF and ALT and then ordering.
    7474 * GQ, DP: directly from VCF file
    75 * …: factors potentially associated with quality of the sequencing data, summarized in FactorsRelatedToSeqDataQuality.
     75 * …: factors potentially associated with quality of the sequencing data, summarized in FactorsRelatedToSeqDataQuality.
    7676
    7777Merge chip and VCF genotypic tables (“chip_genotypes_yyyy.mm.dd.txt” and “VCF_genotypes_yyyy.mm.dd.txt”) using ID and SNPV as key variables. Keep all chip genotypes, substituting missing (“.”) when no information is available from VCF. Name the table “merged_chip_and_VCF_genotypes_yyy.mm.dd.txt”.