Dayhoff paper

- accepted point mutation: replacement of one amino acid for another, accepted by natural selection

- accepted point mutations are the result of 2 separate processes

- occurrence of mutation in the portion of the gene template producing one amino acid of a protein

- acceptance of the mutation by the species as the new predominant form

- to be accepted, new AA must function similarly to the old (similar physiochemical properties)

- must consider the frequency of change of each amino acid to each other and the propensity to remain unchanged

- Where does the data come from in the study: 1572 changes in 71 groups of closely related proteins

- mutation data collected from the phylogenetic trees and from few pairs of related sequences

- assumptions: the likelihood of amino acid X replacing Y is same as Y replacing X

- this is reasonable because this likelihood should depend on the product of the frequencies of occurrence of the two amino acids and on their chemical and physical similarity

- this assumption results in no change in amino acid frequencies over evolutionary distance

- How many amino acid exchanges occurred: 1572

- what amino acids had the most exchanges: Asp and Glu bc their codons differed by one nucleotide

- what percent of interchanges involved amino acids whose codons differed by more than one nucleotide: about 20%

- changes at some amino acid positions are rejected by selection whereas some positions are mutable and favor multiple changes

- this is mostly likely because some mutations have occurred but have been rejected by natural selection

- what is an example of this: no exchanges between Gly and Trp

- how possible unique exchanges between amino acids are possible: 190

- some exchanges never occurred, why is that: some amino acids occur infrequently and are not highly mutable and some exchanges require more than one nucleotide of the codon to change

- What is relative mutability: the probability that each amino acid will change in a given small evolutionary interval

- how to calculate relative mutability: number of times each amino acid has changed in an interval / number of times amino acid has occurred in the sequences and has been subject to mutation

- how to calculate relative mutability for multiple trees: (total number of changes of the amino acid on all branches of all protein trees being considers / (sum for all branches of amino acid's local frequency of occurrences) x (total number of mutations per 100 links for that branch))

- relative mutability of each amino acid is proportional to the ratio of changes to occurrences

- which amino acids have the highest relative mutability and which have the lowest: Asn, Ser, Asp, and Glu are the most mutable, Trp and Cys are the least mutable

- why would an amino acid have a lower mutability: having unique, indispensable functions makes it too important to be substituted as well as having a large or distinct shape or chemistry makes its mutability low (even being very small like glycine makes it hard to replace

- why would an amino acid have a high mutability: having a passive function or one of lower importance that can be easily accomplished by other amino acids of similar physiochemical properties makes it easily replaceable

- What is a mutation probability matrix: distance-dependent matrix combining individual kinds of mutation and relative mutability of amino acids

- What does an element $M_{ij}$ represent: the probability that the amino acid in column $j$ will be replaced by amino acid in row $i$ after a given evolutionary interval

- How do you calculate the nondiagonal elements of the mutation probability matrix: $M_{ij} = \frac{\lambda m_j A_{ij}}{\sum_i A_{ij}}$

- what is $A_{ij}$: element of accepted point mutation matrix

- what is $\lambda$: proportionality constant (whatever PAM is)

- what is $m_j$: mutability of the $j^{th}$ amino acid

- How do you calculate the diagonal elements of the mutation probability matrix: $M_{jj} = 1 - \lambda m_j$

- What does the sum of all elements except $M_{AA}$ where $A$ is the element of the accepted point mutation matrix: the probability of observing a change in a site containing the amino acid $A$

- note: this is proportional to the mutability of $A$

- note: the total probability of each column must be 1 (elements may have been multiplied by 10,000)

- what do the rows of the mutation probability matrix represent: the replacement amino acid and the probability that they will be changed into

- what do the columns of the mutation probability matrix represent: the relative probability of each possible event that my happen to the original amino acid (column)

- what do the columns of the mutation probability matrix represent: the original amino acid and the probability that they will change

- what does this equation represent $100 \cdot \sum f_i M_{ij}$ the number of amino acids that will remain unchanged when a protein 100 links long of average composition is exposed to the evolutionary change represented by the matrix

- what can we use the mutation probability matrix for: to simulate any amount of evolutionary change in an unlimited number of proteins

- How do we evaluate statistical methods of detecting relationship and determine accuracy of programs to construct evolutionary trees: we need to have examples of proteins at known evolutionary distances

- How can we simulate PAM 1:

1. to determine the fate of an amino acid after a single interval, we generate a random number 0-1

2. take the number and check it against the mutation probability matrix (the amino acid's respective column) to determine what amino acid it changes into

3. do this for each amino acid in the sequence to get the simulated mutant sequence

4. in this case, the average distance from the original sequence is 1 PAM, even though some will have no mutations or 2+

5. you can simulate for a longer period of evolution by applying this multiple times

- How can we do a simulation for a predetermined number of changes :

1. with a given sequence, the first amino acid that will mutate is selected (probability that amino acid will mutate is proportional to its mutability)

2. amino acid that replaces it is chosen (probability for replacement is proportional to elements in the appropriate column)

3. then, starting with the resultant sequence, a second mutation can be simulated

4. continue this until a predetermined number of changes have been made

- How can we get a matrix that predicts amino acid replacements found after N PAMs: multiply PAM1 matrix by itself N times

- What happens if we repeat multiplying the PAM1 matrix with itself: it will change toward average composition, ie. it will approach the asymptotic amino acid composition (natural occurrence in nature)

- what's the problem with great distances i.e.. when PAM# is very high: little information is left

- what's the problem with PAM0: it's just a unit diagonal and no amino acids change

- What's the observed percent difference in an evolutionary distance of 1 PAM? What about 250: 1%, 80%

- What is the formula to calculate the percentage of amino acids that will change on average in the interval: $100(1-\sum i fi M_{ii})$

- Relatedness odds matrix

- What are the terms of the relatedness odds matrix (formula for $R_{ij} = \frac{M_{ij}}{f_i}$)

- what is $f_i$: the normalized frequency which gives the probability that $i$ will occur in the second sequence by chance

- what does $R_{ij}$ tell us: each term gives the probability of replacement per occurrence of $i$ per occurrence of $j$

- amino acid pairs with score > 1 replace each other more often as alternatives in related sequences than in random sequences of the same composition

- amino acid pairs with scores < 1 replace each other less often

- note: the odds matrix is symmetrical

- How to use an odds matrix: it can be used in detecting very distant relationships between sequences; when one protein is compared to another, position by position, we can multiply the odds for each position to calculate an odds for the whole protein

- why do we use log odds instead of odds: log odds allows us to add the logs of the matrix elements instead of multiplying them

- What groups of amino acids tend to replace one another:

- hydrophobic, aromatic, basic, acid, acid-amide, cysteine, hydrophilic residues

- basic and acid, acid-amide replace one another

- phenylalanine interchanges with hydrophobic

- Why do these groups interchange more often: because of natural selection and constraints of genetic code; they reflect similarity in functions of amino acid residues in their weak interactions with one another in 3D formation of proteins

- What traits determine these interactions: size, shape, local concentration of charge, conformation of van der Waals surface, ability to form salt bonds, hydrophobic bonds, hydrogen bonds