= Pipeline todos = This page is reserved to track planned modifications to the pipeline for the full run. == Timeline == - '''October 1st will be the target date to start implementing these features into the final pipeline. Issues should be filed before this date.''' The full run will start when: - Issues on this page are implemented - Metadatabase issues resolved - All FQ files are merged and available - '''Aim to start running after final plans for the second paper are clear''' == Full run implementation list == - Two alignments to accommodate downstream analyses (QTL, ASE) to their full potential 1. Unmasked for QTL (and expression quantification) 2. Masked for ASE (Check with Dasha for the masked index) - Mask with GoNL, 1KG and UMCG ASE study snps. - Separate map statistics in analysis database - Modify STAR settings to Encode (below) - Variant calling on unmasked bam/mpileup (ASE) == Discussion points == - STAR 2-pass? === Suggested STAR Encode settings === {{{ Encode settings (Settings sent to me by Alexander Dobin who did the alignment for some of Encode samples): /home/dzhernakova/tools/STAR_2.3.0e.Linux_x86_64/STAR \ --runThreadN 8 \ --genomeDir /home/dzhernakova/resources/STARindex_GoNL/ \ --genomeLoad NoSharedMemory \ --readFilesIn /home/dzhernakova/data/rawData/LL-557-130804_R1.fq.gz ~/data/rawData/LL-557-130804_R2.fq.gz \ --readFilesCommand zcat \ --outFileNamePrefix ~/data/mappedData/LL-557-130804.encode/LL-557-130804.encode. \ --outSAMstrandField intronMotif \ --outSAMunmapped Within \ --outFilterType BySJout \ //reduces the number of "spurious" junctions --outFilterMultimapNmax 20 \ //max multiple alignments per read: if exceeded, read is considered unmapped --outFilterMismatchNmax 999 \ //max number of mismatches per pair (absolute) --outFilterMismatchNoverLmax 0.04 \ //max mismatches per pair relative to length (0.04*(2*50)=4) --alignIntronMin 20 \ //min intron size (default: 21) --alignIntronMax 1000000 \ //max intron (default: specified by the size of bins) --alignMatesGapMax 1000000 \ //max genomic distance between mates (default: specified by the size of bins) --alignSJoverhangMin 8 \ //min overhang for unannotated junctions (default: 5) --alignSJDBoverhangMin 1 //min overhang for annotated junctions (default: 3) }}}