genomics - methods for variant analysis II

0.0(0)
studied byStudied by 0 people
0.0(0)
call with kaiCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/27

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

28 Terms

1
New cards

going from raw reads to analysis-ready data (GATK)

  1. map sequence reads to reference

  2. mark duplicates to mitigate duplication artifacts

  3. base recalibration (BSQR) corrects for machine errors

2
New cards

binary alignment map (bam)

output format after aligning to reference genome

header and records

3
New cards

bam header

describes various metadata for all reads

bam header line

reference sequence dictionary entries

read groups

4
New cards

bam records

contain structured read information (1 line per record)

read name, flags, position, mapq, cigar, mate information, read sequence, phred quality scores, metadata

5
New cards

bam mapping info

summarize position, quality, and structure for each read

6
New cards

bam mate information

points to the read from the other end of the molecule

7
New cards

bam cigar

concise idiosyncratic gapped alignment report

summarizes alignment structure

S substitution

M match

D deletion

I indel

ex:1S3M1D2M1I1M

8
New cards

duplicates

non-independent measurements of a sequence fragment

must be removed to assess support for alleles correctly

9
New cards

library duplicates

caused by pcr

10
New cards

optical duplicates

occur during sequencing

11
New cards

base quality score errors

sequencers make systematic errors in base quality scores

sequencer quality cannot include pcr-based errors

BQSR corrects the quality scores, not the bases

12
New cards

deepVariant

data visualization + deep learning

less total errors than gatk

13
New cards

most accurate method for analyzing human genome

hybrid model of deepvariant for pacbio and illumina reads

14
New cards

variant call format (vcf)

file output after bam file is analyzed for snp/indel detection

15
New cards

types of variants

protein coding

non coding

structural variants

16
New cards

protein coding variants

nonsense

frameshift

missense

17
New cards

noncoding variants

splice site disrupters

promoter disrupters

regulatory element disrupters

18
New cards

types of annotations

population frequencies

evolutionary constrain

biochemical consequences

in vitro or in vivo assays(ENCODE, psychENCODE, GTEx)

phenotypes(mouse knockouts, cellular assays)

19
New cards

annotation

provides context for interpretation

20
New cards

types of inheritance for germline mutations

autosomal dominant

autosomal recessive

mitochondrial

x-linked dominant

x-linked recessive

21
New cards

classes of loss-of-function mutations affecting protein-coding variants

nonsense snp

frame-shift indel

splice site snp

exon deletion

whole gene deletion

22
New cards

causal variant

has to be found among the >150 genuine loss of function variants that are in every human genome

is a disease phenotype is rare, causal variant for a disease should likewise be rare across all ancestral backgrounds

should be under strong purifying selection (mutation intolerance)

23
New cards

insights for variant interpretation from past studies

-careful annotation and curation of LoF variants is essential

-not all exons are expressed equally

-not all LoF disease genes are severely depleted of putative LoF variant

-an adjacent variant can change the predicted effect

-we can now assess the frequency of structural variants

-we should explore beyond coding regions

24
New cards

phased assembly long-read sequencing

-allows identification of large, complex structural variations in human genome

-64 human genomes representing 26 diff human populations were assembled

-found SVs and SV hotspots not discovered by short-read sequencing

25
New cards

integrating rna-seq data with genome sequencing data

-can improve interpretation of genetic variation and enhance diagnostic yield

-detect transcript-level changes, splice-altering variants, weird transcript isoforms

-assess the extent of nonsense-mediated decay in LoF-containing transcripts

-evaluate the extent of biallelic expression at recessive disease genes harboring heterozygous variants

-feasibility depends on tissue availability, and whether it is developmental vs chronic disorder

26
New cards

integration of deep learning and artificial intelligence

can improve genomic analysis and variant interpretation

27
New cards

extreme phenotype study design

hypothesis: individuals w rare variants in the same gene are concentrated in one extreme of the distribution

this approach can identify genes, pathways, and targets in common diseases

28
New cards

mutations with very large effect

-provide causal link betwn genotype and phenotype, allowing studies to determine pathophysiological mechanisms

-can identify genes and pathways that can be manipulated for health benefit

-suggest direction and magnitude of beneficial effects that can be achieved through a target