Proteomics

0.0(0)

Studied by 1 person

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/29

There's no tags or description

Looks like no tags are added yet.

Last updated 1:39 AM on 12/5/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

30 Terms

New cards

Proteins

A chain of amino acids (also known as a polypeptide).
20 amino acids.
- Humans can produce 10 of these; the other amino acids are supplied by food (essential amino acids).
- They are classified by their physical and chemical properties.

New cards

Translation: RNA to protein

mRNA carries the genetic code from DNA in sets of three bases called codons.
Each codon corresponds to one amino acid, matched by tRNA during translation.
Ribosomes read the mRNA and link amino acids together to form a protein.
Translation begins with the start codon (AUG).
Translation ends with the stop codon (UAA|UAG|UGA).
Open reading frame (ORF):
- Groupings of codons.
- There are three possible reading frames on mRNA.

<ul><li><p>mRNA carries the genetic code from DNA in sets of three bases called <strong>codons</strong>.</p></li><li><p>Each codon corresponds to one amino acid, matched by tRNA during translation.</p></li><li><p>Ribosomes read the mRNA and link amino acids together to form a protein.</p></li><li><p>Translation begins with the start codon (AUG).</p></li><li><p>Translation ends with the stop codon (UAA|UAG|UGA).</p></li><li><p>Open reading frame (ORF):</p><ul><li><p>Groupings of codons.</p></li><li><p>There are three possible reading frames on mRNA.</p></li></ul></li></ul><p></p>

New cards

Protein structure

Primary structure: sequences.
Secondary structure: regular substructures (alpha-helix/beta sheets).
Tertiary structure: 3D structure of a single protein molecule.
Quaternary structure: larger assembly of several protein molecules or polypeptide chains, usually called subunits in this context.
Functional: transcription factors, receptors, ligands, signaling proteins, kinases, etc.

New cards

Post-translation modifications

Chemical modification of a protein after its translation.
They are the key mechnisms to increase proeomic diversity.

New cards

Genomics vs. proteomics

Genomics: the study of the organism's entire DNA (genome), which provides the relatively static blueprint for life.
Proteomics: the study of the entire set of proteins (proteome), which are the highly dynamic functional molecules that determine the cell's actual state and activity.

New cards

Transcriptomics vs. proteomics

Transcriptomics is the study of mRNA transcripts which indicates the potential cellular activity and is easier to measure.
Proteomics is the study of all proteins which reflects the actual biological function, disease state, and clinical outcomes.

New cards

Gene vs. protein biomarkers

Gene expression (mRNA level):
- Measures which genes are being transcribed into RNA.
- Indicates potential cellular activity.
- Easier, faster, and cheaper to measure (e.g., RNA-seq).
- Good for detecting early molecular changes before symptoms.
- mRNA levels do not always correlate with protein levels or actual function.
Protein expression:
- Measures actual functional molecules carrying out cellular processes.
- Reflects real biological activity and disease state.
- More directly linked to phenotype and clinical outcomes.
- Technically complex and proteins are less stable and harder to quantify.

New cards

Alignment of protein sequences

Protein sequence alignment compares amino acid sequences to identify regions of similarity.
- Evolutionary relationships.
- Predict protein structure.
- Infer function.
  - Based on the sequence similarity.
  - Identify relation proteins with similar known structures and functions in the database.

New cards

Substitution matrices

A substitution matrix contains values proportional to the probability that amino acid “i” mutated into amino acid “j” for all pairs of amino acids.
Substitution matrices are constructed by assembling a large and diverse sample of verified pairwise alignments (or multiple sequence alignments) of amino acids.
Substitution matrices reflect the true probabilities of mutations occurring through a period of evolution.
The two major types of substitution matrices are PAM and BLOSUM.

New cards

PAM matrices

PAMs: point accepted mutations.
- A replacement of one amino acid by another is accepted by natural selection.
Derived from global alignments of very similar sequences (at least 85% identity).
Hypothesis: it is not likely that one substitution is from several consecutive mutations.

New cards

PAM1

Constructed by analyzing closely related protein sequences
that have diverged by about 1% of accepted mutations per
amino acid position.
- Collect homologous proteins with known evolutionary relationships and align them to identify positions where amino acid substitutions occurred.
- Count substitution frequencies between amino acids, normalizing by how often each amino acid appears overall, to get probabilities of one amino acid being replaced by another.
- Convert probabilities to log-odds scores, forming the PAM1 substitution matrix — where each value reflects how likely a substitution is compared to random chance.
PAM1 gives the probability that another specific amino acid will replace an amino acid after a given evolution interval, in which 1 PAM occurs in 100 amino acids.

New cards

PAM# →

Number of PAMs in 100 amino acids.
The number with PAM (PAM40, PAM80) refers to the evolutionary distance.
Large number → large evolutionary distance.
To get PAM80, multiply PAM1 80 times.

New cards

Convert probability matrix to scoring matrix - PAM1

The score of two amino acids i, j. as the log of how likely it is to observe these two amino acids (based on the empirical observation of how often they are aligned in nature) divided by the background probability of finding these amino acids by chance.
A score of +2 indicates that the amino acid substitution occurs 10^0.2 = 1.58 times as frequency as random substitution.
A score of 0 is neutral.
A score of -10 indicates that the amino acid substitution occurs 10^-1 = 1/10 time as frequently as random substitution.

<ul><li><p>The score of two amino acids<em> i</em>,<em> j</em>. as the log of how likely it is to observe these two amino acids (based on the empirical observation of how often they are aligned in nature) divided by the background probability of finding these amino acids by chance.</p></li><li><p>A score of +2 indicates that the amino acid substitution occurs 10<sup>0.2</sup> = 1.58 times as frequency as random substitution.</p></li><li><p>A score of 0 is neutral.</p></li><li><p>A score of -10 indicates that the amino acid substitution occurs 10<sup>-1</sup> = 1/10 time as frequently as random substitution.</p></li></ul><p></p>

New cards

BLOSUM matrices

Block substitution matrix (BLOSUM).
BLOSUM matrices are based on local alignments (blocks).
BLOSUM62:
- Sequences that are at least 62% identical are counted as a single sequence.
  Scoring is mainly influenced by sequence blocks with less than 62% identity.
All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins.
BLOSUM62 is the default matrix in BLAST 2.0.
Though BLOSUM62 is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships.

New cards

Choosing a good matrix

PAM1: Matrix calculated from comparison of sequences with no more than 1% divergence.
BLOSUM80: Matrix calculated from comparison of sequences with at most 80% identity in the blocks.

New cards

BLOSUM matrix example

New cards

Techniques to quantify proteins

Step 1: protein separation.
Step 2: protein quantification.

New cards

Step 1: protein separation

Performed by 2D gel electrophoresis.
- Proteins are first resolved using isoelectric focusing according to their isoelectric point (pI) on an immobilized pH gradient (IPG) strip.
- After placement of the IPG strip with resolved proteins on top of the sodium dodecyl sulfate gel, proteins are separated based on their molecular weight.
- Protein spots of interest, such as those with altered staining intensity between samples, may be excised, enzymatically digested into peptides in gel, and subjected to a mass spectrometric analysis.
2D liquid chromatography:
- Proteins to be analyzed by 2D-LC are separated based on two specific physicochemical properties: isoelectric point (pI) and hydrophobicity.

New cards

Step 2: protein quantification

Targeted: focuses on specific, predefined proteins or peptides (like biomarkers).
Label-free:
- Unbiased, global quantification.
- Detects and measures all detectable proteins in a sample without labels.
- Often used in discovery proteomics → captures novel proteins and isoforms.

New cards

Step 2: protein quantification; antibody-based methods

Targeted only.
Detected specific, predefined proteins using antibodies.
Very high specificity.
Traditional antibody methods are generally low-to-medium throughput → < 10 proteins.
Examples: ELISA, Western blot, Luminex, MSD, and OLINK.
- OLINK is a next-generation method that uses Promixity Extension Assay (PEA) technology where two antibodies recognize each target protein.
  - High throughput: up to ~5,000 proteins.

New cards

Step 2: protein quantification; mass spectrometry (MS)

Targeted and label-free.
Three components:
- Ionizer: breaks proteins into charged fragments.
- Mass analyzer: separates ions based on their mass-to-charge ratio (m/z).
  - Mass = mass(peptide) + mass(proton).
  - Charge = 1+.
- Detector: measures the intensity of separated ions to produce a mass spectrum.
Examples: label-free MS, DIA-MS, TMT, and iTRAQ.

New cards

Step 2: protein quantification; aptamer-based methods

Targeted only.
High throughput: up to 7,000 proteins.
Example: SomaScan (SomaLogic).
- Widely used in recent studies.

New cards

Protein domains

Domains are independently folding, functional units within a protein.
A single protein may contain one or multiple domains.
Each domain often carries a specific function (e.g., DNA-binding, catalytic activity, signaling).
Domains can appear in different proteins, reused like biological “LEGO blocks.”
Examples: SH2 domain (binds phosphorylated tryosines), kinase domain, and Zinc finger domain.

New cards

SH2 protein domain

Function: recognizes and binds to proteins that contain phosphotyrosine.
Role in the cell: helps proteins involved in cell signaling find their correct partners, especially in pathways activated by growth factors or immune responses.

New cards

Types of protein domains

Modular: can be rearranged in evolution to create new proteins.
- One protein can contain multiple domains, each with its own structure and job → swiss army knife.
Conserved: similar sequence and structure across species.
Functional: often responsible for the protein’s main biochemical activity.
Identified by:
- Sequence alignment.
- Structural analysis.
- Databases like Pfam, SMART, and InterPro.

New cards

Protein motifs

Motifs are short, conserved sequence patterns.
Usually smaller than domains and may not fold independently.
Often indicate a specific function or interaction.
Examples: NLA (nuclear localization signal), Walker A motif (ATP binding), and helix-turn-helix motif (DNA binding).

New cards

Domains vs. motifs

Domains: big, functional building blocks.
Motifs: small, recognizable sequence patterns that guide or support function.
- Usually located within domains.

New cards

Protein families

A protein family is a group of proteins that share similar sequences, domains/motifs, functions, or evolutionary origins.
Families help classify proteins into meaningful categories.
Examples: g-protein-coupled receptors (GPCRs) → all share a common architecture of seven alpha-helices that span the cell membrane.

New cards

Protein databases

Sequence and structure database:
- UniProt.
- Swiss-Prot.
Protein expression databases:
- GEO.
- The Human Protein Atlas → tissue- and cell-specificity.

New cards

UniProt

A freely accessible database of protein sequence and functional information.
UniProtKB: universal protein resource knowledgebase.
- Swiss-Prot: manually annotated and reviewed.
  - One record per gene, per species.
- TrEMBL: automatically annotated and is not reviewed.
  - One record for 100% identical full-length sequences in one species.
UniParc: a comprehensive and non-redundant database that contains most of the publicly available protein sequences in the world.
- One record for 100% identical sequences over the entire length, regardless of species
UniRef: clustered sets of sequences from UniProtKB and selected UniParc records.
- One record for 100% identical sequences, regardless of species. Sequence fragments are included.