| 1 | = Reference and annotation = |
| 2 | |
| 3 | == File naming policy == |
| 4 | Inlcude md5sum files by running '$md5sum [filename] > [filename].md5sum'. |
| 5 | |
| 6 | == Annotation source: Ensembl v.71 == |
| 7 | |
| 8 | We use Ensembl as our primary source of annotation and fix it at v.71 ([http://apr2013.archive.ensembl.org/biomart/martview link to archived v.71 BioMart]). To get this version of Ensemble in the R package biomaRt:\\ |
| 9 | ensembl = useMart(biomart = "ENSEMBL_MART_ENSEMBL", host = "apr2013.archive.ensembl.org", path = "/biomart/martservice" , dataset = "hsapiens_gene_ensembl") |
| 10 | G=getBM(c("chromosome_name", "start_position","end_position","ensembl_gene_id"), mart=ensembl) |
| 11 | |
| 12 | == This table describes the reference and annotation files. == |
| 13 | |
| 14 | ||=Name=||=Location=||=Contact=||=Notes|| |
| 15 | ||Transcript GTF||!srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/bbmri.nl/RP3/dzhernakova/Homo_sapiens.GRCh37.71.cut.sorted.gtf.gz||!dasha.zhernakova@gmail.com||1|| |
| 16 | ||Meta Exon GTF||!srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/bbmri.nl/RP3/dzhernakova/meta-exons_v71_cut_sorted_05-06-13.gtf.gz||!dasha.zhernakova@gmail.com||2|| |
| 17 | ||Masked genome||!srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/bbmri.nl/RP3/dzhernakova/maskedGenome/||!dasha.zhernakova@gmail.com||3|| |
| 18 | ||STAR index||!srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/bbmri.nl/RP3/dzhernakova/maskedGenome/STARindex/||!dasha.zhernakova@gmail.com||4|| |
| 19 | |
| 20 | == Reference and annotation files description. == |
| 21 | |
| 22 | '''1. Transcript annotation.'''[[BR]] |
| 23 | |
| 24 | To create this transcript annotation the human gtf annotation was downloaded from Ensembl v.71 (containing Gencode v.16: ftp://ftp.ensembl.org/pub/release-71/gtf/homo_sapiens/Homo_sapiens.GRCh37.71.gtf.gz). Only genes on chromosomes for 1-22, X, Y, MT were retained. Then for each chromosome genes were sorted by their start position. |
| 25 | |
| 26 | '''2. Meta-exon annotation.'''[[BR]] |
| 27 | |
| 28 | To create the meta-exon annotation we merged all overlapping exons from Ensembl version 71 (see Transcript annotation section) using mergeBed tool from BEDTools suite. Overlapping exons belonging to different genes or different strands were also merged into one meta-exon. See [wiki:FgReferenceFiles/MetaExonAnnotation_documentation Meta-exon annotation documentation] for a detailed description on how the meta-exon annotation has been created. |
| 29 | |
| 30 | See [wiki:BIOS_ReferenceFiles/MetaExonAnnotation-05-06-13 Meta-exon annotation 05-06-13] for issues with this file. |
| 31 | |
| 32 | '''3. Masked genome.'''[[BR]] |
| 33 | |
| 34 | To mask the genome we took all SNPs called in GoNL project that had a MAF > 1% and replaced them with “N” in genome fasta files using maskFastaFromBed tool from BEDTools suite. |
| 35 | |
| 36 | '''4. STAR genome index.'''[[BR]] |
| 37 | |
| 38 | To make the masked genome index we run STAR in genomeGenerate mode on the masked genome fasta files, setting the --sjdbOverhang parameter to 100 and using the transcript annotation from Ensembl v.71. |
| 39 | |
| 40 | ''' GTF to BED conversion ''' |
| 41 | |
| 42 | Paste magic to get from the GTF to required BED format with columns in correct order for downstream quantification. |
| 43 | |
| 44 | {{{ |
| 45 | paste <(cut -f1,4,5 meta-exons_v71_cut_sorted_05-06-13.gtf) <(cut -f9 meta-exons_v71_cut_sorted_05-06-13.gtf | cut -d';' -f2) | paste - <(cut -f9 meta-exons_v71_cut_sorted_05-06-13.gtf ) > meta-exons_v71_cut_sorted_05-06-13.bed |
| 46 | }}} |