Exam 2 Study guide Bioinformatics

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/53

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 3:37 AM on 3/18/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

54 Terms

1
New cards

Transitions vs. Transversions

Transitions are a nucleotide single base substitution from purine ←→ purine and vice versa (A-G or T-C)

Transversions are a nucleotide single base substitution from pyrimidine to purine (G-T or A-C)

2
New cards

What are the 5 basic single base substitutions? (SNSNM)

Synonymous - A nucleotide changes but the codon is the same amino acid.

Nonsynonymous - A nucleotide substitution that alters the amino acid sequence of a protein.

Silent - A type of point mutation that doesn’t change the amino acid after a single nucleotide change

Nonsense - Single DNA base change creates a premature stop codon (UAA, UAG, UGA)

Missense - A point mutation where a single nucleotide change in DNA results in a different amino acid.

3
New cards

INDELS

Genetic variation involving the addition or removal of a nucleotide in DNA causesInversion or frameshifts.

4
New cards

Inversion/reversal

Chromosome structural rearrangement where DNA segment breaks in two then reverses and reinserts (Gene material same but order reversed)

5
New cards

Translocation

Chromosome breaks, and portions reattach to a different chromosome.

  • Can easily cause cancer from imbalances

6
New cards

Homologs, Orthologs, and Paralogs

Homologs - Genes/proteins sharing a common ancestor

Orthologs - Different species with shared traits

Paralogs - Distinct traits in same/different species (gene duplication)

7
New cards

PAM (Point Accepted Mutation)

A way to see amino acid similarities is by aligning closely related homologs and counting frequencies of amino acid substitutions

  • Constant rate (Mutations occur at steady rate)

  • Independence (amino acid position mutation independently)

  • Natural selection (mutations that survived)

8
New cards

BLOSUM (Blocks Substitution Matrix)

Another way to see amino acid similarities is by using a database of aligned sequences derived from protein domains that have a specific function or structure.

  • Based on observed alignments

  • Functional domains of proteins contain aligned sequences

    • Highly conserved regions that survived natural selection

9
New cards

Other PAM Matricies

PAM Matricies = series:

  • As the number increases, the evolutionary distance increases

PAM 1 = 1 mutation per 100 amino acids (Less divergent)

PAM 250 = 250 mutations per 100 amino acids (More divergent)

10
New cards

When to use Higher BLOSUM or PAM Matrices

Use PAM 100 or BLOSUM 90 when comparing sequences closely related

  • Punishes mismatches severely.

11
New cards

When to use BLOSUM or PAM Matrices for comparing distances

Use PAM 250 or BLOSUM 45 to lightly penalize mismatches

12
New cards

BLOSUM Matrices number meaning

represents the minimum percentage identity of sequences used.

  • Lower number = distant relatives (BLOSUM45)

  • Higher number = close relatives (BLOSUM80)

13
New cards

PAM & BLOSUM High divergence vs Less divergence

BLOSUM80 & PAM1 = less divergent

BLOSUM45 & PAM100 = more divergent

14
New cards

How to read matricies (Values meaning)

Positive # —> substitution happens often and is evolutionarily acceptable

Negative # —> this substitution is less likely and more disruptive

Higher # —> More favored

Very negative # —> Strongly unfavorable

15
New cards

Meaning of Matricies biologically

If evolution changed this amino acid into that one, would that be a relatively reasonable substitution

16
New cards

Maximum Parsimony Strengths and Weaknesses

Looks for the fewest evolutionary changes for a tree:

  • Strength - doesn’t require an explicit model of sequence evolution (simpler)

  • Weakness - Not realistic and may oversimplify complex patterns

17
New cards

Maximum Likelihood Strengths and Weaknesses

Look for the closest possible tree topology and sees produced data from a specific model of sequence evolution

  • Strength - high accuracy and stronger evolutionary hypothesis

  • Weakness - very complex and slow, and must use a very specific model

18
New cards

Distance-Based Methods Strengths and Weaknesses

Calculates the pairwise matrix between all sequences to build a tree

  • Strength - Extremely fast to analyze thousands of trees and produce a single tree

  • Weakness - Less accurate and more susceptible to errors and false data

19
New cards

Node bootstrap value meaning

Percentage of bootstrap replicate trees that recover the same clade.

Higher value = stronger support for grouping

Lower value = weaker support for groupings

20
New cards

How to choose a good molecular marker for phylogenetic study?

Single copy gene w/ optimum substitutional rates, available primers (for amplify marker), and aligned marker gene sequence.

  • In addition:

    • sufficient length and quality

    • broadly presented

    • orthologous

21
New cards

How to choose a good molecular marker for phylogenetic analysis

  • Be alignable

  • Enough informative sites

  • not too conservative or variable

  • Preferably all orthologous

  • Low risk of duplication

22
New cards

Rooted vs unrooted Tree structure

Rooted - Represents the common ancestor of all taxa and gives a direction of evolution

  • Who diverged from whom over time

Unrooted - Shows which taxa are more closely connected w/o order or direction

  • Relative relationships

23
New cards

What is an outgroup in phylogenetics?

Taxon/species that is outside the main group to help root the tree and direct the ingroups

  • Related but different

  • Determine ancestoral traits divergence

24
New cards

Ingroup in phylogenetic

Main set of species/taxa being studied for their evolutionary relationships

  • Much more closely related to each other

25
New cards

Node - Phylogenetic tree

A branching point that infers divergence from the two groups’ common ancestor. (bootstrap values)

  • Terminal node - observed taxa at the tip

26
New cards

OTU in phylogenetic

Operational taxonomic unit - unit being compared in the analysis (species, strain, individual, sequence)

  • Each thing entered into the tree

  • OTU doesn’t have to be from a formal species

27
New cards

What is the difference between a phylogram and a cladogram?

A cladogram shows the branching order of relationships

A phylogram shows branching order and branch lengths proportional to evolutionary change.

  • Longer branches mean more inferred evolutionary change (not more time)

28
New cards

Cladogram doesn’t show what?

No meaningful branch lengths

  • Focus on topology and branch patterns

  • No biological meaning

29
New cards

What is a method for testing phyogenetic tree accuracy

Jackknife - Removes part of data and rebuilds to see if same clades appear

Bootstrapping - Resampling sites with replacement and sees how many times they appear.

30
New cards

What is a genome?

A genome is a complete set of an organisms genetic material (w/ all genes and noncoding sequences)

All genetic material

31
New cards

What is genomics?

Genomics is study of entire genomes including:

  • Function

  • Structure

  • Sequencing

  • Evolution

  • Interactions

Study of the whole genome

32
New cards

What is genetics?

Genetics is study of individual genes, heredity and passage of traits from generations

Study of genes and inheritance

33
New cards

What is whole-genome shotgun sequencing (WGS)?

break whole genome into random pieces → sequence each piece → assemble overlaps by computer into full genome.

It is used for sequencing complete genomes and genome assemblies

34
New cards

What is hierarchical sequencing?

Hierarchical sequencing = map big fragments first, then sequence them piece by piece

35
New cards

Hierarchical sequencing vs. Whole-genome shotgun sequencing

WGS = random fragments first, assemble later

HS = map/order large fragments first, then sequence

36
New cards

What are Congtigs?

Overlapping DNA pieces joined into one continuous seqence

(Reads —> Contigs —> Scaffolds —> Genome assembly)

37
New cards

What is N50?

Genome assembly quality metric

  • 50% of assembly is contained in contigs/scaffolds of said length or longer

  • Higher N50 = more contiguous & less fragmented

    • No guarentee

38
New cards

1st vs 2nd vs 3rd generation sequencing

1st gen = Sanger, one fragment at a time, very accurate
2nd gen = massively parallel, short reads, high throughput
3rd gen = single-molecule, long reads, better for complex assemblies

39
New cards

What is first gen sequencing?

Sanger Seq —> Detects chain terminating nucleotides during synthesis

Pros

  • Highly accurate

Cons

  • Low throughput

  • One DNA fragment a time

40
New cards

What is second gen sequencing?

NGS Seq —> Millions of sequences in parellel at a time

Pros

  • High throughput

  • Lower cost per base

Cons

  • Produces shorter reads

41
New cards

What is third gen sequencing?

Seqences single DNA mol directly which is beneficial for assembly, variation detection, and resolving repetitive regions

Pros

  • Produces much longer reads

Cons

  • Higher raw read error rates

42
New cards

Sanger sequencing?

1st generation

Chain termination

  • ddNTPs stop elongation

  • DNA fragments of different lengths can be analyzed

43
New cards

Illumina sequencing?

2nd generation

Sequencing by synthesis (SBS):

  • DNA framgnets attached to flow cell —> amplified to clusters —> sequenced as fluorescently labeled nucleotides

44
New cards

Nanopore & PacBio sequencing?

3rd generation

Long read sequene technologies.

  • Nanopore = measures electrical current in DNA (DNA through pore)

  • PacBio = Single molecules in real time (SMRT tech)

NANO = real length, speed, portability

PACBIO = high read accuracy

45
New cards

What is single-end seq?

DNA is sequenced from only one end of each fragment

  • One read per fragment

46
New cards

What is paired-end seq?

DNA is sequenced from both ends of the same fragment

  • Two read per fragment

47
New cards

Paired vs single end seq?

Paired-end seq —> more info and better alignment, gene assembly and structural changes

Single-end seq —> simpler and cheaper

48
New cards

What is a FASTA file?

Text-based seq format to store seq identifier (starting >) followed by DNA/RNA/protein seq

Purpose:

  • Store and share biological sequences for reference

Good for reference sequences, data submissions, assembly tools

49
New cards

What is a FASTQ file?

Text-basd format storing both sequences and per-base quality score within 4 lines read.

Purpose:

  • Store raw sequence reads alone with confidence/quality info

Good for filtering, mapping, assembly, downstream analysis

50
New cards

Multi-FASTA file?

Single FASTA-formatted file w/ multiple sequence entries w/ own header (>)

Purpose:

  • Store many related sequences in one file

Good for multiple sequence comparison, alignment, batch analysis

51
New cards

How do you interpret the lines in a FASTQ file?

Each FASTQ entry has 4 lines:
Line 1: Starts with @ and contains the read identifier/header
Line 2: The nucleotide sequence
Line 3: Starts with + and is a separator (may repeat the identifier)
Line 4: The quality score string, where each character represents the quality of the corresponding base in line 2

52
New cards

What is the purpose of line 4 in a FASTQ file?

Line 4 contains the per-base quality scores for the sequence in line 2. Each symbol corresponds to one base and reflects the confidence that the base was called correctly. Higher quality means lower probability of sequencing error. These scores are commonly represented as Phred quality scores

53
New cards

What is a PHRED score?

A PHRED score is a numerical quality score that indicates the probability that a base was called incorrectly during sequencing. Higher PHRED scores mean higher confidence in the base call.

confidence in each base call

54
New cards

What is the purpose of a PHRED score?

The purpose of a PHRED score is to measure sequencing quality so researchers can judge how reliable each base call is and decide which reads or bases to keep, trim, or filter during analysis.

Explore top notes

note
AP Biology Nervous System Unit 4
Updated 456d ago
0.0(0)
note
Nervous system
Updated 1038d ago
0.0(0)
note
Chapter 1: Functions
Updated 1064d ago
0.0(0)
note
Stone Cold - Robert Swindells
Updated 566d ago
0.0(0)
note
Glycolysis
Updated 1191d ago
0.0(0)
note
AP Biology Nervous System Unit 4
Updated 456d ago
0.0(0)
note
Nervous system
Updated 1038d ago
0.0(0)
note
Chapter 1: Functions
Updated 1064d ago
0.0(0)
note
Stone Cold - Robert Swindells
Updated 566d ago
0.0(0)
note
Glycolysis
Updated 1191d ago
0.0(0)

Explore top flashcards

flashcards
AP Psych - Unit 8 Vocab
73
Updated 1112d ago
0.0(0)
flashcards
bezkregowce
63
Updated 55d ago
0.0(0)
flashcards
Theology - Test 1
42
Updated 910d ago
0.0(0)
flashcards
Spanish Verbs
135
Updated 1146d ago
0.0(0)
flashcards
ESC 240 Midterm 1
97
Updated 153d ago
0.0(0)
flashcards
ap psych vocab unit 4 part 2
30
Updated 854d ago
0.0(0)
flashcards
AP Psych - Unit 8 Vocab
73
Updated 1112d ago
0.0(0)
flashcards
bezkregowce
63
Updated 55d ago
0.0(0)
flashcards
Theology - Test 1
42
Updated 910d ago
0.0(0)
flashcards
Spanish Verbs
135
Updated 1146d ago
0.0(0)
flashcards
ESC 240 Midterm 1
97
Updated 153d ago
0.0(0)
flashcards
ap psych vocab unit 4 part 2
30
Updated 854d ago
0.0(0)