Changes between Initial Version and Version 1 of BIOS_Pipeline/pipeline_todo


Ignore:
Timestamp:
Sep 19, 2016 4:55:41 PM (8 years ago)
Author:
jamverlouw
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • BIOS_Pipeline/pipeline_todo

    v1 v1  
     1
     2= Pipeline todos =
     3
     4This page is reserved to track planned modifications to the pipeline for the full run.
     5
     6== Timeline ==
     7
     8- '''October 1st will be the target date to start implementing these features into the final pipeline. Issues should be filed before this date.'''
     9
     10The full run will start when:
     11
     12- Issues on this page are implemented
     13- Metadatabase issues resolved
     14- All FQ files are merged and available
     15
     16- '''Aim to start running after final plans for the second paper are clear'''
     17
     18== Full run implementation list ==
     19
     20- Two alignments to accommodate downstream analyses (QTL, ASE) to their full potential
     21 1. Unmasked for QTL (and expression quantification)
     22 2. Masked for ASE (Check with Dasha for the masked index)
     23 - Mask with GoNL, 1KG and UMCG ASE study snps.
     24 - Separate map statistics in analysis database
     25- Modify STAR settings to Encode (below)
     26- Variant calling on unmasked bam/mpileup (ASE)
     27
     28== Discussion points ==
     29
     30- STAR 2-pass?
     31
     32=== Suggested STAR Encode settings ===
     33
     34{{{
     35Encode settings (Settings sent to me by Alexander Dobin who did the alignment for some of Encode samples):
     36/home/dzhernakova/tools/STAR_2.3.0e.Linux_x86_64/STAR \
     37--runThreadN 8 \
     38--genomeDir /home/dzhernakova/resources/STARindex_GoNL/ \
     39--genomeLoad NoSharedMemory \
     40--readFilesIn /home/dzhernakova/data/rawData/LL-557-130804_R1.fq.gz ~/data/rawData/LL-557-130804_R2.fq.gz \
     41--readFilesCommand zcat \
     42--outFileNamePrefix ~/data/mappedData/LL-557-130804.encode/LL-557-130804.encode. \
     43--outSAMstrandField intronMotif \
     44--outSAMunmapped Within \
     45--outFilterType BySJout \ //reduces the number of "spurious" junctions
     46--outFilterMultimapNmax 20 \ //max multiple alignments per read: if exceeded, read is considered unmapped
     47--outFilterMismatchNmax 999 \ //max number of mismatches per pair (absolute)
     48--outFilterMismatchNoverLmax 0.04 \ //max mismatches per pair relative to length (0.04*(2*50)=4)
     49--alignIntronMin 20 \ //min intron size (default: 21)
     50--alignIntronMax 1000000 \ //max intron (default: specified by the size of bins)
     51--alignMatesGapMax 1000000 \ //max genomic distance between mates (default: specified by the size of bins)
     52--alignSJoverhangMin 8 \ //min overhang for unannotated junctions (default: 5)
     53--alignSJDBoverhangMin 1 //min overhang for annotated junctions (default: 3)
     54}}}