1/44
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
NM_004006.2
Reference sequence
-Where the variant is in relation to the reference sequence used
c.
Letter prefix
124A>G
Position and type of change
What do reference sequences start with?
NG
Genome builds
36.1/hg18, GRCh37/hg19, GRCh38/hg38
What do reference mRNA sequences start with?
NM
What do reference protein sequences start with?
NP
Why is the correct reference sequence important?
-Genes can have multiple isoforms
--Different isoforms of a gene will have different reference sequences
-Alternative splicing; gives us a different transcript that results in a different protein
Genomic DNA (g.)
First nucleotide of the genomic reference sequence
Coding DNA (c.)
First nucleotide of the translation start coding on the coding DNA reference sequence
Noncoding DNA (n.)
First nucleotide of the noncoding DNA reference sequence
Mitochondrial DNA (m.)
First nucleotide of the mito DNA reference sequence
RNA (r.)
First nucleotide of the translation start codon of the RNA reference sequence, or first nucleotide of the noncoding RNA ref seq
Protein (p.)
First amnio acid of the protein sequence
CNV- Deletion (del) format
"prefix""position(s)_deleted""del"
CNV- Duplication (dup) format
Only used for tandem duplication
"prefix""position(s)_duplicated""dup"
CNV- Insertion (ins) format
"prefix""positions_flanking""ins""inserted_sequence"
CNV- Inversion (inv) format
"prefix""positions_inverted""inv"
Contig
Historical term when genome was first sequenced
"Chunks" of sequence that is your 'reference' point relative to the g.___ position
Duplicate reads
Generated during PCR amp in library prep
-Too many can skew variant calling algorithms and introduce PCR-based sequencing errors into the variant calling algorithms
Depth
The number of sequencing reads at a given locus
-How many unique seq reads at a given locus?
Breadth
The amount of target loci with sequences aligned
-How much of the genome are we covering?
Possible issues w/ calling SNVs
-Zygosity
-Mosaicism
-Seq errors that are called as variants
Variant calling algorithms
Predict the likelihood of a variant vs. the likelihood of an artifact
-Quality score of individual reads
-Allele counts at the locus
-Strand bias
-Repeated regions
-Low complexity regions
More high quality reads w/ the same allele, the greater the likelihood that the variant is called
Coverage depth
Important to ensure accuracy of variant calling, especially in sample w/ low allelic fractions
-The more reads you have, the more likely it's a true variant!
What are some circumstances where you may detect a
low fraction of reads with an alternative allele?
1. Low level mosaicism
2. Tumor heterogeneity
Allelic fraction cut-off for mosaicism?
20% or lower
Allelic fraction of G:100%
Genotype is G/G
Homozygous for alternative allele
Allelic fraction of G:60% and A:40%
A/G
Heterozygous for alternative allele
Allelic fraction of G:20% and A:80%
A/G
Heterozygous BUT w/ mosaicism
Deeper coverage = ?
Higher quality variant calling
Calling structural variants
Measure relative read depth to infer changes in copy number
Less than average depth = ?
Deletion
More than average depth = ?
Duplication
Split reads
A single read where the 5' and the 3' end alignment do not align contiguously in the genome
Mapping paired reads- Above average distance between paired end reads = ?
Deletion
Mapping paired reads- Below average distance between paired end reads = ?
Insertion
FASTQ
Raw sequence reads and quality scores directly from the sequencer
BAM
Position information of sequence reads aligned to the reference genome, along w/ quality info
VCF
Variant call format
-Every location the sample differs from the reference
Filters to identify disease-causing variants exclude
1. Intronic or intergenic variants (not within exon)
2. Common variants
3. Synonymous variants
4. Variants not predicted to alter function
5. Variants that don't fir inheritance pattern
Variant filtering
-Sorting through thousands of variants to find those that contribute to phenotype
-Filtering scheme depends on the sequencing method and phenotype/inheritance pattern
How would you filter to identify de novo variants?
Check parents
How would you filter to identify variants causing an AR condition?
-Are parents affected?
-Do they have 1 or 2 copies?
-Does child have 2 bad copies?
How would you filter to identify variants causing an X-linked recessive condition in a male?
See if mom is a carrier of the condition