1/27
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
going from raw reads to analysis-ready data (GATK)
map sequence reads to reference
mark duplicates to mitigate duplication artifacts
base recalibration (BSQR) corrects for machine errors
binary alignment map (bam)
output format after aligning to reference genome
header and records
bam header
describes various metadata for all reads
bam header line
reference sequence dictionary entries
read groups
bam records
contain structured read information (1 line per record)
read name, flags, position, mapq, cigar, mate information, read sequence, phred quality scores, metadata
bam mapping info
summarize position, quality, and structure for each read
bam mate information
points to the read from the other end of the molecule
bam cigar
concise idiosyncratic gapped alignment report
summarizes alignment structure
S substitution
M match
D deletion
I indel
ex:1S3M1D2M1I1M
duplicates
non-independent measurements of a sequence fragment
must be removed to assess support for alleles correctly
library duplicates
caused by pcr
optical duplicates
occur during sequencing
base quality score errors
sequencers make systematic errors in base quality scores
sequencer quality cannot include pcr-based errors
BQSR corrects the quality scores, not the bases
deepVariant
data visualization + deep learning
less total errors than gatk
most accurate method for analyzing human genome
hybrid model of deepvariant for pacbio and illumina reads
variant call format (vcf)
file output after bam file is analyzed for snp/indel detection
types of variants
protein coding
non coding
structural variants
protein coding variants
nonsense
frameshift
missense
noncoding variants
splice site disrupters
promoter disrupters
regulatory element disrupters
types of annotations
population frequencies
evolutionary constrain
biochemical consequences
in vitro or in vivo assays(ENCODE, psychENCODE, GTEx)
phenotypes(mouse knockouts, cellular assays)
annotation
provides context for interpretation
types of inheritance for germline mutations
autosomal dominant
autosomal recessive
mitochondrial
x-linked dominant
x-linked recessive
classes of loss-of-function mutations affecting protein-coding variants
nonsense snp
frame-shift indel
splice site snp
exon deletion
whole gene deletion
causal variant
has to be found among the >150 genuine loss of function variants that are in every human genome
is a disease phenotype is rare, causal variant for a disease should likewise be rare across all ancestral backgrounds
should be under strong purifying selection (mutation intolerance)
insights for variant interpretation from past studies
-careful annotation and curation of LoF variants is essential
-not all exons are expressed equally
-not all LoF disease genes are severely depleted of putative LoF variant
-an adjacent variant can change the predicted effect
-we can now assess the frequency of structural variants
-we should explore beyond coding regions
phased assembly long-read sequencing
-allows identification of large, complex structural variations in human genome
-64 human genomes representing 26 diff human populations were assembled
-found SVs and SV hotspots not discovered by short-read sequencing
integrating rna-seq data with genome sequencing data
-can improve interpretation of genetic variation and enhance diagnostic yield
-detect transcript-level changes, splice-altering variants, weird transcript isoforms
-assess the extent of nonsense-mediated decay in LoF-containing transcripts
-evaluate the extent of biallelic expression at recessive disease genes harboring heterozygous variants
-feasibility depends on tissue availability, and whether it is developmental vs chronic disorder
integration of deep learning and artificial intelligence
can improve genomic analysis and variant interpretation
extreme phenotype study design
hypothesis: individuals w rare variants in the same gene are concentrated in one extreme of the distribution
this approach can identify genes, pathways, and targets in common diseases
mutations with very large effect
-provide causal link betwn genotype and phenotype, allowing studies to determine pathophysiological mechanisms
-can identify genes and pathways that can be manipulated for health benefit
-suggest direction and magnitude of beneficial effects that can be achieved through a target