Exam II Review

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/99

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 4:25 PM on 3/26/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

100 Terms

1
New cards

What are types of mutation?

  • physical

  • chemical

  • spontaneous

2
New cards

What are somatic mutations?

  • occur in any cell outside of sperm and egg

  • not passed on to children

  • accumulates over a lifetime

  • shorter lifespan means more mutations

3
New cards

What are germline mutations?

  • mutations that occur in the gametes

  • passed to offspring

  • older parents are more likely to pass new germline mutations

4
New cards

What controls mutation?

  • age

  • generation time - species that reproduce quickly, go through more DNA replication in shorter time, more opportunity for mutation)

5
New cards

SNPs

single nucleotide polymorphisms; one letter difference in the DNA

6
New cards

What is an example of a SNP?

sickle cell trait - in the hemoglobin beta gene, an A is changed to a T, causing glutamic acid to turn to valine

7
New cards

Structural Variants (SVs)

a large scale change involving 50 bp or more

8
New cards

What are types of SVs?

  • deletion

  • insertion

  • inversion

  • duplication

  • translocation

9
New cards

What is an example of an SV?

Hemophilia A - an inversion in the F8 gene on the X chromosome; body cannot produce functional blood-clotting proteins

10
New cards

What percent of the human genome are SNPs?

0.078%

11
New cards

What percent of the human genome are insertion-deletions?

0.069%

12
New cards

What percent of the human genome are SVs?

0.19%

13
New cards

What percent of the human genome are inversions?

0.397%

14
New cards

What percent of the human genome are multi-CNV?

0.232%

15
New cards

What is the goal of the International HapMap Project?

Identify genetic variants of common diseases.

16
New cards

When was the International HapMap Project launched?

2002

17
New cards

Was the International HapMap Project publicly or privately funded?

Public and Private (Japan, Canada, China, U.S., U.K.)

18
New cards

Haplotype

a specific set of alleles (DNA variations) that are located physically close to each other on a single chromosome and are inherited together as a single unit from one parent.

19
New cards

an overview of the International HapMap Project

-blood samples were collected from the Yorubas in Nigeria, Japanese, Han Chinese and US residents with ancestry from Northern and Western Europe.

-map haplotypes, not individual mutations

-find haplotypes that are different between healthy and diseased

-various SNP microarrays were used for genotyping

20
New cards

What was the goal of the 1000 genome project?

sequence >1000 genomes or to find variants with frequency >1%

21
New cards

When was the first and last sequence of the 1000 genome project sequence?

first sequence in 2008, last sequence in 2013

22
New cards

an overview of the 1000 genome project

-first project to sequence the genomes of a large amount of people

-3 phases

-sequence >1000 genomes or to find variants with frequency >1%

23
New cards

What data was found from the 1000 genome project?

-2504 individuals from 26 populations (low sequence coverage + exom data) 24 individuals sequence to high coverage

-88 million variants (84.7 million SNPs, 3.6 million short insertions/deletions, and 60,000 SVs)

-first genome map of variants larger than SNPs (revealed roles of SVs in gene expressions + diseases)

-more than 99% of SNP variants with a frequency of >1%

-cell lines and DNA available for further use

24
New cards

What was the goal of the “All of Us” project?

sequencing 1 million people by 2026

25
New cards

an overview of the “all of us” project

-funded by NIH

-first dataset (100,000 people) released in 2022 (enrollment began in 2018)

-413,000 individuals enrolled last year (2023), 250,000 sequence 46% are minority racial or ethnic group

-the database includes some participants’ survey responses, electronic health records and data from wearable devices

-145 new candidate factors discovered for type 2 diabetes.

26
New cards

What did Lewontin and Hubby do?

They introduced the summary statistic heterozygosity (H) and proportion of polymorphic sites (P)

27
New cards

alloenzymes

enzymes used to “track” genetic variation via electrophoresis

28
New cards

heterozygosity

probability of having different alleles

29
New cards

proportion of polymorphic sites

percent of the genes that actually have the variants

30
New cards

what are the formulas for measuring genetic variation? (h, H, P)

h = 1 - Σxi2 ( i = alleles at the loci)

H = 1/n(Σhi) ( i = no. of loci)

P = p/N (p = number of polymorphic loci, N = total loci)

31
New cards

average values found in early studies

-mean H = 0.12 (about 12% of an individual’s genes are heterozygous)

-mean P = 0.3 (about 30% percent of all genes locations in a population have more than one version of an allele)

32
New cards

What is Hardy Weinberg Equilibrium?

If the following conditions are met allele and genotype frequency will not change:

p² + q² = 1

heterozygotes = 2

homozygotes = p², q²

where p is the frequency of allele A and q is the frequency of allele a.

33
New cards

What are the assumptions of Hardy Weinberg Equilibrium?

-no natural selection

-random mating

-infinite population size

-no mutation

-no gene flow

34
New cards

What are Ne and Nc?

Nc → census population → all

Ne → effective population → breeding

35
New cards

What does Ne say about genetic drift?

-small Ne means higher genetic drift (random chance can easily wipe out certain alleles, genetic drift is stronger than selection meaning a bad mutation might spread due to bad luck)

-large Ne means there is more room for different mutations to coexist (good at weeding out deleterious mutations)

36
New cards

What are DNA markers?

a specific segment of DNA that is between genes; shows variation between individuals in a population

-non gene markers are DNA markers

-should have >= 2 alleles

37
New cards

What are the three major types of DNA markers?

-Restriction Fragment Length Polymorphism (RFLP)

-Simple Sequence Length Polymorphism (SSLP)

-Single Nucleotide Polymorphism (SNP)

38
New cards

Restriction Fragment Length Polymorphism (RFLP)

concept: uses restriction enzymes that cut DNA only at specific sequences

variation: if one person has a mutation at that cut site, the enzyme won’t cut it.

result: you run the DNA on a gel, the fragments will be different lengths

39
New cards

Simple Sequence Length Polymorphism (SSLP)

concept: focuses on repetitive sequences

variation: each person would have different amounts of repeats

result: highly variable among people

40
New cards

KNOW THE ADVANTAGES AND LIMITATIONS OF VARIOUS METHODS INCLUDING LEWONTIN AND HUBBYS

41
New cards

In-situ, synthesized array

oligo synthesized using photolithography

42
New cards

How do synthesized arrays work?

light is used to activate specific spots of a glass slide. when the light hits a spot, a specific DNA base is “glued” (forms a covalent bond with the linker molecule on the glass slide) by repeating this they form a short DNA strands (oligos) directly on the slide

43
New cards

What is the capacity of synthesized arrays?

500,000 SNPs can be tested on one chip

44
New cards

Who was synthesized arrays developed by?

Affymetrix

45
New cards

self assembled arrays

-instead of growing on the DNA slide, the DNA is synthesized on tiny polystyrene beads, and deposited in wells etched on a glass surface.

46
New cards

what is the capacity of self assembled arrays?

2.5 million SNPs

47
New cards

who are self assembled arrays licensed and sold by?

Illumina

48
New cards

What are the advantages of microarrays?

-high throughput (faster)

-standardized

-cost effective

49
New cards

what are the disadvantages of microarrays?

-no SVs

-expensive equipment

-discovery bias (only find what you are looking for)

50
New cards

what are some criteria for SNP filtering?

-high missing frequency

-not in hardy weinburg equilibrium

-low minor allele frequency (MAF) (<1% or 5%)

-strand consistency

-exclusion of HapMap SNPs

51
New cards

what are some criteria for sample filtering?

-low call rates (ie people with several missing genotypes)

-high heterozygosity levels

-sex and race mismatch

52
New cards

what is RAD-seq?

-uses restriction enzymes to cut DNA at specific “anchor” points. you only sequence the DNA right next to those cuts.

-reduced representation sequencing strategy

53
New cards

Where is RAD-seq used?

widely used in non-model organism in relation to ecological evolutionary and conservation genomics

54
New cards

what are the advantages of RAD-seq?

-cost effective

-higher sequencing coverage per locus (high quality genotype calls)

-does not require a reference a reference genome

55
New cards

what are the two types of RAD-seq?

-original RAD-seq

-ddRAD

56
New cards

What are the steps of original RAD-seq?

  1. digest (one enzyme)

  2. ligate adapters

  3. multiplex - samples are pooled together

  4. shear - physically broken into smaller chunks

  5. size select

  6. end repair

  7. A-tailing - add an A to the end

  8. ligate y-adaptors

  9. PCR

57
New cards

What are the steps of ddRAD-seq?

  1. digest (two enzymes)

  2. ligate adapters

  3. multiplex

  4. size select

  5. PCR

58
New cards

what are the limitations of RAD Seq

allele dropout - when a mutation (SNP) at a restriction site prevents the enzyme from cutting the DNA, causing that specific version of a gene to be missed during sequencing. This leads to a "null allele," which can trick researchers into thinking an individual is homozygous when they actually have two different versions of that gene.

59
New cards

what does a Phred quality score (Q) represent?

It is a property assigned to each nucleotide base call that represents the probability that the base was called incorrectly

60
New cards

Why do quality scores typically decrease toward the end of a sequencing read?

Because sequencing becomes asynchronous within a cluster (dephasing/signal decay), making the "images" noisier and more error-prone over time

61
New cards

What is the formula for a Phred Score (Q)?

Q = -10log10Pe where Pe is the probability of error

62
New cards

What is a common QV score used?

20

63
New cards

What is the role of a Basecaller in NGS?

It converts platform-specific raw data (like fluorescent light signals or images) into actual nucleotide sequences (A, C, T, G) and their associated Phred scores.

64
New cards

Which tool is the industry standard for checking the overall quality of a raw sequencing run?

FastQC

65
New cards

What is the primary "alignment problem" when mapping NGS reads to a reference?

read aligners need to accommodate variation which looks similar to sequencing errors.

66
New cards

Why are long reads preferred for hypervariable regions of the genome?

to provide enough context to ensure the read is mapped to the correct unique location

67
New cards

What are the two main types of algorithms used for read alignments

-Data compression algorithms (BWT-based)

-Hash-based algorithms

68
New cards

What is the Burrows-Wheeler Transformation (BWT) used for in bioinformatics?

It is a data compression algorithm that makes aligners (like BWA or Bowtie) extremely fast and memory-efficient, especially when dealing with repetitive DNA

69
New cards

Compare BWT aligners (e.g., BWA) vs. Hash-based aligners (e.g., Stampy)

BWT

  • faster

  • memory efficient

  • great for large datasets and repeats

Hash-Based

  • more sensitive and accurate

  • slower

  • more memory intensive

70
New cards

Name two popular BWT-based aligner tools

BWA and Bowtie

71
New cards

Name three Hash-based alignment tools

MaQ, Novolalign and Stampy

72
New cards

Why do Quality Value (QV) scores often need to be recalibrated?

Raw scores may not represent the true base-calling error rate

73
New cards

How are raw quality scores recalibrated?

By mapping reads to invariant sites (areas known not to vary) in a reference genome to see how often the machine "calls" a mutation that isn't actually there.

74
New cards

What is SNP calling?

identifying polymorphic site

75
New cards

What is genotype calling?

assigning genotypes to individuals

76
New cards

What is the "Old Approach" to genotype calling and its main limitation?

It simply counts the number of alleles at a site (using a 20-80% threshold). Its limitation is that it requires very high coverage (>20X) to be accurate

77
New cards

What characterizes the "Modern Approach" to genotype calling?

It uses a probabilistic framework that incorporates uncertainty, allele frequencies, and Linkage Disequilibrium (LD) information

78
New cards

In the context of SNP calling, what does the Genotype Likelihood (P(X|G)) represent?

The probability of observing the sequencing data (X) given a specific true genotype (G), calculated using base quality scores of each read multiplied over all reads

79
New cards

What is a Genotype Prior (P(G)) represent?

The probability of a genotype existing at a site before looking at the sequencing data, often based on population allele frequencies or Linkage Disequilibrium

80
New cards

How do researchers decide which genotype to assign to an individual using the Posterior Probability?

They choose the genotype with the highest posterior probability or use the ratio between the highest and second-highest as a confidence score.

81
New cards

What is the formula used to combine priors and likelihoods in modern SNP calling?

Bayes' Formula

<p>Bayes' Formula</p>
82
New cards

What is a Genotype Prior?

A fancy term for "what is the probability of observing a certain genotype" before even looking at the specific sequencing reads for that individual

83
New cards

How is the prior determined for a Single Sample if no database is available?

You assign equal probability to all possible genotypes (e.g., 1/3 for AA, 1/3 for Aa, 1/3 for aa) to avoid biasing the results

84
New cards

Allele frequency from multiple samples ___ genotype prior calculation.

improves

85
New cards

Which mathematical model is often used to estimate priors when analyzing multiple samples?

Hardy-Weinberg Equilibrium

86
New cards

Let’s assume genotype likelihood of AT & AA are equally large, but allele frequency of A is 1%. What would be your genotype call with & without the allele frequency data?

without - You would likely call the genotype uncertain or a tie (50/50) between AT and AA. Since the machine sees "A" and "T" reads as equally likely, it has no reason to doubt either one

with - You would call the genotype TT

87
New cards

How does Linkage Disequilibrium (LD) help in genotype calling?

It uses known "haplotype blocks" (neighboring SNPs that are usually inherited together) to predict a missing or low-quality genotype based on the clear genotypes surrounding it

88
New cards

What is Imputation in the context of NGS data?

The process of "filling in" missing genotype data by using LD status and reference haplotypes to make highly educated guesses.

89
New cards

When is data filtering technically unnecessary during SNP calling?

Unnecessary if posterior probability of all sites are accurate

90
New cards

What are some criteria used for filtering data?

-deviation from HWE

-low quality score

-systematic score difference between minor and major allele

-abberant LD pattern

-extreme read depth

-strand bias

-Batches of 1000 genomes data were discarded if they showed high discrepancy with HapMap data

91
New cards

Why is "Deviation from HWE" used as a filtering criterion?

Significant deviation from Hardy-Weinberg Equilibrium can indicate genotyping errors, such as an excess of heterozygotes due to mapping issues or paralogous sequences

92
New cards

Why is "Extreme read depth" flagged during data filtering?

Too low: Insufficient data to confidently call a genotype.

Too high: May indicate repetitive regions or duplicated sequences (paralogs) where multiple parts of the genome align to the same spot, causing false SNP calls

93
New cards

What is strand bias in the context of NGS filtering?

When a variant is only seen on the forward or reverse strand (it should ideally appear on both)

94
New cards

What is Systematic score differences in the context of NGS filtering?

When the quality scores for the major allele and minor allele differ significantly, suggesting the "variant" might just be a sequencing error

95
New cards

What do "Aberrant LD patterns" suggest during the filtering process?

Aberrant LD (unusually high or low correlation between neighboring SNPs) can signal assembly errors or incorrect mapping of reads to the reference genome

96
New cards

what is the typical pipeline for variant calling?

knowt flashcard image
97
New cards

Why ancient DNA?

98
New cards

why can’t we sequence a T-rex genome?

99
New cards

What are the challenges of ancient DNA?

-small fragments (<200 bp)

-DNA degrades, nucleotides change

-bacterial DNA contamination

-human DNA

100
New cards

How is the contamination problem is solved with ancient DNA?

Explore top notes

note
Module 8: Price Control
Updated 1257d ago
0.0(0)
note
Storms Review
Updated 1227d ago
0.0(0)
note
Leçon 1 D'Accord 3 Vocabulaire
Updated 1277d ago
0.0(0)
note
Stress
Updated 1249d ago
0.0(0)
note
Module 8: Price Control
Updated 1257d ago
0.0(0)
note
Storms Review
Updated 1227d ago
0.0(0)
note
Leçon 1 D'Accord 3 Vocabulaire
Updated 1277d ago
0.0(0)
note
Stress
Updated 1249d ago
0.0(0)

Explore top flashcards

flashcards
TOP 200 DRUGS FOR PTCB
200
Updated 718d ago
0.0(0)
flashcards
M.1 - Musical
27
Updated 1093d ago
0.0(0)
flashcards
BY 101 Unit 1
66
Updated 938d ago
0.0(0)
flashcards
AP Psych Unit 3-5
268
Updated 466d ago
0.0(0)
flashcards
asian worlds western imperalism
46
Updated 763d ago
0.0(0)
flashcards
Kap 5 Tysk Echt 1
20
Updated 1143d ago
0.0(0)
flashcards
TOP 200 DRUGS FOR PTCB
200
Updated 718d ago
0.0(0)
flashcards
M.1 - Musical
27
Updated 1093d ago
0.0(0)
flashcards
BY 101 Unit 1
66
Updated 938d ago
0.0(0)
flashcards
AP Psych Unit 3-5
268
Updated 466d ago
0.0(0)
flashcards
asian worlds western imperalism
46
Updated 763d ago
0.0(0)
flashcards
Kap 5 Tysk Echt 1
20
Updated 1143d ago
0.0(0)