1/164
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Multiple Sequence Alignment
Multiple Sequence Alignment
combines both optimal (global/local) and heuristic alignment; cannot compare DNA to protein due to different scoring matrices
Profile Alignment
profile is created by taking a finished alignment and counting the frequency of every letter and gap and each location; progressively aligns all sequences pairwise, starting with the most similar
ClutalW
cluster alignment weighted; progressive alignment strategy; neighbor joining guide tree; lower accuracy; medium speed; use for small datasets with similar sequences
T-Coffee
tree-based consistency objective function for alignment evaluation; consistency-based alignment strategy; neighbor-joining and consistency weights guide tree; medium accuracy; lower speed; use for small datasets
MUSCLE
multiple sequence comparison by log-expectation; iterative progress and refinement alignment strategy; UPGMA guide tree; higher accuracy; higher speed; use for medium-large datasets
MAFFT
multiple alignment using fast fourier transform; progressive and iterative refinement alignment strategy; UPGMA/NJ guide tree; highest accuracy; highest speed; use for large datasets
Molecular Evolution
Mutation Types
Single Base Substitutions
AKA point mutations; a single base is replaced by another
Transition
same class of nucleotide; purine to purine or pyrimidine to pyrimidine
Transversion
different class of nucleotide; purine to pyrimidine or pyrimidine to purine
Synonymous
encodes for the same amino acid
Silent Mutation
the new nucleotide alters the codon but does not alter the amino acid for which it encodes
Nonsynonymous
encodes for a different amino acid
Missense Mutation
the new nucleotide alters the codon to produce an altered amino acid in the protein product (ex
Nonsense Mutation
the new nucleotide changes a codon that specified an amino acid to a stop codon; translation of the mRNA transcribed from this mutant gene will stop prematurely
Indels
the addition or subtraction of extra base pairs; creates a change in the reading frame
Frameshift
change in the reading frame
Genome Rearrangements
large scale chromosome structure changes; can alter phenotype by 1) destroying gene function, 2) change in expression via influence of different promoters and enhancers, or 3) creating hybrid genes
Deletion and Duplication
occurs on the same chromosome
Inversion (Reversal)
occurs on the same chromosome
Translocation
occurs between different chromosomes; usually between paternal and maternal
Homolog
a gene related to other genes by evolutionary descent from a common ancestral DNA sequence
Identity
((number of identical residues))/((number of residues and gaps in th? alignment)) x 100
Similarity
some amino acid substitutions have similar side chains, leading to a smaller effect in the final protein
((number of similar residues))/((number of residues and gaps in th? alignment) ) x 100
Point Accepted Mutation (PAM)
quantifies the rate at which amino acids change over evolutionary time; assumes constant rate of change for amino acids
Constant Rate
mutations occur at a relatively steady pace over time
Independence
each amino acid position mutates independently of its neighbor
Natural Selection
only count "accepted" mutations that don't break down the protein's function and are passed down
Matrices
PAM matrices are a series, as the number increases the evolutionary distance grows
PAM #
PAM 1
very conserved; observable mutation; small-scale evolution
PAM 250
same amino acid mutation repeatedly; not observable but extrapolated; has error associated with it; large-scale evolution
Block Substitution Matrices (BLOSUM)
based on observed alignments; aligned sequences from functional domains (blocks) of proteins; look at domains (blocks) rather than looking at entire sequence
Blocks
represents highly conserved regions that have survived natural selection
Matrices
BLOSUM matrices represent the minimum percentage identity of the sequences used to build it
Lower #
distant relatives; BLOSUM45 used for very divergent sequences
Higher #
close relatives; BLOSUM80 used for very similar sequences
Similarity Score
not all amino acid matches produce the same similarity score; add all numbers for individual score, the higher the better
Ortholog
a gene present in different species that evolved from a common ancestral gene by speciation; retain the same/similar function in the course of evolution; speciation to give two separate species
Paralog
one gene of a set of genes that underwent a duplication event in a common ancestor; evolve new functions (can be related to the original function); gene duplication and divergence
Phylogenetic Trees
Phylogenetics
method of classification of organisms based upon their evolutionary history
Phylogenetic Tree
shows the evolutionary relationships among various species or other entities that likely have a common ancestor; multiple trees possible showing multiple plausible evolutionary scenarios
Gene-Specific Phylogenies
different genes may show different phylogenetic histories; can avoid this by using multiple genes and many single-gene analyses then concatenating them
Neutral Marker
genes under similar positive selection regimes in different taxa can result in convergent evolution; can make confusing phylogenetic analysis
Connected Graph
graph containing at least one path between any two nodes
Tree
type of connected graph in which there is exactly one path between every two nodes
Rooted Tree
shows evolutionary history of the taxa; single unique node which is the ancestor of all other nodes; directed tree which shows change over time; best done by using an outgroup
Outgroup
a species or molecule that is known to be more distantly related than everything else in the tree
Ingroup
taxa being analyzed to view relationships
Unrooted Tree
shows evolutionary relationships between the taxa; can't make any statement about the direction of evolution, only the closeness of relationships
Nodes
common ancestor; rotating a tree at a node does not change the relationships between the taxa, only the way those relationships are visualized; each node called an operational taxonomic unit
Branches
evolutionary lineages
Tips/Leaves
the most recent taxa in the analysis
Cladogram
branch lengths do not represent time; branching is determined by distinguishing characteristics which identify a particular clade
Phylogram
explicitly represents number of character changes through its branch lengths; indicates the amount of evolutionary time separating taxa
Distance-Based Methods
calculate the genetic distance between pairs of taxa and construct a tree based on these distances
Unweighted Pair Group Method with Arithmetic Mean (UPGMA)
determination of phylogenetic relationships are explicitly non-historical; simply based on similarity/dissimilarity; assumes an ultrametric tree in which the distances from the root to every branch tip are equal
Steps
(1) create tree by first selecting the most closely related sequences and insert a node to represent their common ancestor
(2) then replace the selected sequences by a set containing both and replace the distances from the pair to the others by the average distances
(3) repeat
Neighbor-Joining
clustering creates an additive unrooted tree using pairwise distances; all the taxa do not diverge from a most common ancestor; does not assume that all sequences have the same rate of substitution; fast and often used as a starting point in phylogenetic analyses
Steps
(1) determine the pairwise distances between all the sequences
(2) identify the two sequences closest to each other based on their distances
(3) combine these two sequences into a single node
(4) update the distances between this new node and the other sequences
(5) repeat until all sequences are joined into a single tree
Strengths
Weaknesses
Cladistic Methods
consider the various possible trees and choose the best possible tree; tree selection criteria varies depending on the approach; slower than neighbor joining, but usually more accurate
Maximum Parsimony
finds the tree that requires the fewest number of evolutionary changes to explain the observed data
Strengths
Weaknesses
Maximum Likelihood
finds the tree that has the highest probability of producing the observed data given a specific model of evolution
Strengths
Weaknesses
Newick Format
field standard for representing trees in computer-readable form; allows us to see the connections between nodes
Tree Accuracy and Validation
Methods for Testing Accuracy
Purpose of Bootstrapping
DNA Sequencing
First Generation
Sanger-Sequencing
AKA chain termination method; AKA dideoxynucleoside sequencing; method for determining the nucleotide sequence of DNA; DNA template from PCR required