Proteome is the overall protein content of a cell, and proteomics is the characterization of the proteome, including expression, structure, interactions, and modifications at any stage. Proteins have a primary structure (aa sequence), secondary and tertiary structure (alpha/beta), and quaternary structure (functional interactions).
Mass spectrometry cannot evaluate the entire protein (similar to how DNA is fragmented for next generation sequencing) and so it is broken down into smaller parts. MS reveals the quantitative state of a proteome and enables high throughput proteome analysis. Proteins are broken down with degrading enzymes and then the peptide bond is broken to get smaller peptides. Trypsin is used to break down positive amino acids. The fragments are charged so that they will be able to run in the machine. The time it takes the fragments to run through the machine gives information as to which protein we are talking about. We have databases of all proteins and so can match the running time with known values to find how many peptides of a protein were present in the sample and infer the quantity of protein that there was. Because MS is so sensitive, we can mark different components with different tags (similar to barcodes in scRNAseq). Later versions of MS no longer require running only one sample at a time; many can be run and be separated. This is done using iTRAQ (relative and absolute quantification) or TMT (tandem mass tags) after cell lysis. GTEx collected samples from many tissues and sequence them to create a proteome map of the human body. This data is expressed in a cluster chart; proteins that are closer together express more similar proteins to each other. This is only possible thanks to more sensitive machinery and multiplexing (TMT-MS).
Single cell proteomics can check proteins expressed by a single cell. It does not benefit from amplification strategies. It doesn’t show all of the proteins present in the cell, but can show 2,000-3,000 proteins to establish what a specific cell does (meaning if you have a cell and see the proteins expressed you can infer which tissue it came from). Unlike PCR, there is no method of amplification so we are limited in the amount of proteins we can see.
Post-Translational Modifications (PTM) Proteomics The more advanced we get in the central dogma, the more complex the amount of options possible for a protein species. The most common PTM to a protein is phosphorylation (discovered by MS- 700,000 per cell), but there is also hydroxylation, acetylation, glycosylation, oxidation, methidathion, and ubiquitination. Common PTMs are identified by comparing phosphorylated proteins to unphosphorylated proteins. Rarer modifications like acetylation catch the protein and put them in mass spectrometry. For example, since phosphate charges the protein negatively, we can change the charge so that those proteins are attracted (metal-based affinity purification). For glycosylation there are specific proteins that bind sugars (lectins). For acetylation there are antigens that bind the acetyl (anti-acetyl-lysine antibodies). We can identify how many proteins there are and what modifications are present. Knowledge of PTMs can also inform our understanding of protein activities.
Protein-Protein interaction Proteomics shows protein function by establishing their relationships with other proteins. There is a database that shows all proteins in contact with any searched protein. This is done using co-IP, phage display, Yeast 2 hybrid, and Protein complementation assay.
Co-Immunoprecipitation (co-IP) uses an antibody that binds the target protein (X) very strongly so that it binds all the proteins (Y) it is in contact with as well. Bind the antigens to small beads and wash everything else out. Remove the unbound proteins and then perform WB or MS analysis to sequence the protein. This is the simplest mechanism, but requires a strong antibody. If there is no strong antibody, a tag (around ) can be added containing an antibody that the antigen can bind to. An experiment tried to make a map of all proteins that interact with each protein in the cell. They added a tag to every protein and incubated it with a cell, then sent the precipitation to MS. A disadvantage is that not all protein-protein interactions are strong, and certain interactions may disappear by the time the sample is sent for analysis. Cross-linking is a way of superseding this; they bind proteins until we want the proteins to separate. However, cross-linking may link proteins that are located very close next to each other but don’t actually interact (gives a false positive).
BioID (Proximity-dependent biotin identification) is used in place of coIP. It uses a bacterial enzyme (E coli biotin protein ligase BirA) that marks contacted proteins with biotin. Biotin binds lysine on proteins (there are many lysines on proteins). When we fuse biotin ligase (BirA) and the target protein, all of the other proteins that interact with the target proteins (directly or indirectly per a specific area) will be marked with biotin. Proteins will be released but will remain tagged even after they leave contact. The tagging is irreversible. Biotin:avidin is the strongest connection in nature. We can use avidin to fish the tagged proteins, and there is less concern that the interaction will fade as it is sent to MS (like there was with coIP). BirA barks neighboring proteins with biotin → avidin fishes out proteins that interact with the target protein. Similar to cross-linking, there is a chance of false positives. There is also a chance that putting BirA next to the protein changes the interactions.
Phage display engineers phages to produce target proteins. Phages have many copies of envelope proteins surrounding them. Insert the target protein into the envelope, to get the phage surrounded in an envelope with the target protein. We display the protein (with a tag) on the phage. The point of this mechanism is to add a whole library of proteins to the sample. Every phage gets a single protein on their envelope. Insert a target protein and bind it to a bead. All of the phages with proteins that interact with the target protein will bind to the bead. The protein is determined using MS, or by exploding all of the phages and doing PCR of the tags added to the proteins to sequence the proteins with next generation sequencing and see which proteins were expressed. This method has high throughput.
Yeast 2 hybrid uses a reporter gene activated by transcription factors in yeast. The transcription factor has a DNA-binding domain (for the reporter gene) and an activation domain (recreates the machinery needed for transcription). We separate both of these domains so that transcription of the reporter gene cannot happen. Each domain is fused with target proteins that may or may not interact with each other. If there is interaction, then the DNA-binding domain is brough next to the activating domain and there is transcription of the reporter gene. We want to make this systemic (not only for X and Y), so take yeasts and cause them to express the entire human open reading frame. Every yeast now expresses one protein, and we check how all of the strains interact with the target protein. Only yeasts that interact with one of the ORF proteins and the target protein will have interaction and transcript the reporter gene. We then sequence the yeast genome to see which protein was inserted. The disadvantage here is that we are forcing two proteins to interact in the nucleus. Meaning, the target protein fused to a tf in the nucleus and the second protein bound to the tf must artificially interact to activate the reporter gene. It is possible that in normal circumstances, these proteins would be found in different parts of the cells and not actually interact. Meaning, in human cells this interaction wouldn’t happen; the only reason it happened in the model is because both proteins were artificially inserted into the nucleus (where DNA binding occurs). It is important to clarify the intracellular location of the proteins after the experiment to be sure that these interactions can actually happen in human cells. (This problem also exists in the phage display, since we force proteins to interact whereas they might not have in their original habitats). Two components bind to activate the reporter gene.
Protein-Protein interaction Proteomics (protein complementation assay) uses two different proteins with observable activity together (fluorescence or antibiotics resistance) that can only create this activity if they interact. This can be done in human cells, and is less artificial. It is similar to yeast 2 hybrid in principle (yeast 2 hybrid uses a transcription factor to show connection and here we have a reporter tag on the protein). Use A for the target protein and a different B on each cell representing every protein of the human open reading frame. The cells that glow have interaction with the target protein. The glowing cells can then be sequenced to check which protein is expressed.
The human protein atlas wanted to map all of the proteins. They did many MS and RNAseq in many cells (typically cancer cells). They wanted to see the location of proteins, which is complicated to do in the cell since it requires very specific antibodies for each one (they generated many of these and succeeded). They created a huge database that shows the tissue every protein is expressed in as well as which cell line expresses it.