lec 12 - proteomics & bioinformatic tools for protein studies (glytsou)

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/52

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

53 Terms

1
New cards

what is bioinformatics

  • bioinformatics = scientific subdiscipline that involves using computer technology to collect, store, analyze and disseminate biological data and information such as DNA and AA sequences or annotations about those sequences

  • scientists and clinicians use databases that organize and index such biological information to increase our understanding of health and disease and, in certain cases, as part of medical care

2
New cards

bioinformatics is an interdisciplinary field

  • connected to:

    • biology

    • genetics

    • chemistry

    • medicine

    • pharmacy

    • engineering

    • statistics

    • mathematics

    • CS

3
New cards

applications of bioinformatics

  • identifying new drug targets

  • understanding disease mechanism

  • designing new drugs

  • predicting interactions between a compound and enzyme

  • predicting drug responses/safety

  • streamlining clinical trials

  • shortening timeline and reducing the cost of drug development

  • reducing the risk of side effects

  • fostering the growth of personalized medicine

4
New cards

what is the relevance of bioinformatics to pharmacology?

  • bioinformatics play a role in

    • target discovery → transcriptomics and homology modeling

    • drug development → in silico modeling

<ul><li><p>bioinformatics play a role in</p><ul><li><p>target discovery → transcriptomics and homology modeling</p></li><li><p>drug development → in silico modeling</p></li></ul></li></ul><p></p>
5
New cards

human genetics evidence supports 2/3 of the 2021 FDA approved drugs

knowt flashcard image
6
New cards

example of developing a therapeutic hypothesis based on genetic evidence

kerendia (finerenone) - bayer

  • mineralocorticoid receptor antagonist

    • selective

    • nonsteroidal

  • used to treat chronic kidney disease (CKD) from type 2 diabetes

  • microalbuminuria = increased levels of urinary albumin-to-creatinine ratio (UACR) → early indication of CKD

  • genome-wide association studies (GWAS) and functional genomic analyses → intronic variant in NR3C2 associated with microalbuminuria

    • NR3C2 = gene that encodes the mineralocorticoid receptor (MR)

    • kerendia = mineralcorticoid receptor antagonist → blocks MR

      • reduces the risk of CDK and slows CKD from getting worse

        • since NR3C2 (MR) is genetically linked to kidney damage via albuminuria → blocking MR helps

7
New cards

bioinformatics techniques and tools

  • sequence analysis

  • structural bioinformatics

  • high-throughput techniques

  • biomedical image-based analysis

  • network and systems biology

<ul><li><p>sequence analysis</p></li><li><p>structural bioinformatics</p></li><li><p>high-throughput techniques</p></li><li><p>biomedical image-based analysis</p></li><li><p>network and systems biology</p></li></ul><p></p>
8
New cards

bioinformatics tools

  • databases

    • sequences storage databases

      • archival database: genbank, protein data bank (PBD)

      • curated database; knowledge base

        • interpro → protein families, motifs and domains

        • uniprot → sequence and functional information on proteins

  • computational models

    • mathematical models to describe biological system

  • software tools

    • sequence alignment tools

      • BLAST

      • clustalW

    • function analysis tools

      • GEO

      • pathway tools

    • image analysis software

    • clinical tools (ORACLE)

  • machine learning applications

    • protein structure predictions

    • drug response predictions

9
New cards

uniprot knowledgebase: hub for protein information

  • freely accessible database of protein sequence and functional information

  • contains description of:

    • function of proteins

    • interactions

    • polymorphisms

    • secondary structure

    • quaternary structure

    • etc

<ul><li><p>freely accessible database of <strong>protein sequence</strong> and <strong>functional information</strong></p></li><li><p>contains description of:</p><ul><li><p>function of proteins</p></li><li><p>interactions</p></li><li><p>polymorphisms</p></li><li><p>secondary structure</p></li><li><p>quaternary structure</p></li><li><p>etc</p></li></ul></li></ul><p></p>
10
New cards

uniprot info: general information of a protein of interest

  • accession numnber = unique ID for protein

<ul><li><p>accession numnber = unique ID for protein</p></li></ul><p></p>
11
New cards

uniprot info: subcellular location

knowt flashcard image
12
New cards

uniprot info: diseases caused by variants/mutations in the gene

knowt flashcard image
13
New cards

uniprot info: pharmaceutical (the use of protein as pharmaceutical drug)

knowt flashcard image
14
New cards

uniprot info: post-translational modifications/processing

knowt flashcard image
15
New cards

uniprot info: protein structure

knowt flashcard image
16
New cards

2024 nobel prize in chemistry: protein structure design and prediction

  • david baker → for computational protein design

  • demis hassabis and john M/ jumper → for protein structure prediction

17
New cards

the importance of determining the protein structure

  • medicine

    • drug design

    • predict the function of a mutated protein

  • biochem

    • predict the MOA and develop tools to manipulate it

  • biotech

    • design of new enzymes

18
New cards

protein structure can be determined experimentally

  • by 2024, >227,000 proteins are known in atomic detail

  • techniques used to determine the protein conformation (3D shape or spatial arrangement)

    • x-ray crystallography

      • use diffracted x-rays from a protein crystal to generate an electron density map which indicates the atomic positions of protein

      • ONLY for proteins that can readily crystallize

    • nuclear magnetic resonance (NMR) spectroscopy

      • reveals the structure and dynamics of proteins in solution by identifying protons in close proximity to one another

    • cryo-electron microscopy

      • rapidly developing method that can elucidate the structure of large multimeric complexes and increasingly higher resolutions

19
New cards

protein structure prediction

  • protein structure = primary, secondary, tertiary, quaternary

  • one of the most important goals of computational biology

  • extremely challenging

  • approaches to predict 3D structures

    • Ab initio predictions

      • without prior knowledge

      • calculations that attempt to minimize the free energy of a structure

    • knowledge based methods

      • an unknown primary structure is examined for compatibility with known protein structures/fragments

<ul><li><p>protein structure = primary, secondary, tertiary, quaternary</p></li><li><p>one of the most important goals of computational biology</p></li><li><p>extremely challenging</p></li><li><p>approaches to predict 3D structures</p><ul><li><p>Ab initio predictions</p><ul><li><p>without prior knowledge</p></li><li><p>calculations that attempt to <u>minimize the free energy of a structure</u></p></li></ul></li><li><p>knowledge based methods</p><ul><li><p>an unknown <u>primary structure</u> is examined for compatibility with <u>known protein structures/fragments</u></p></li></ul></li></ul></li></ul><p></p>
20
New cards

prediction and analysis of 3D structures of biomolecules

computational structural prediction methods

  • homology modeling

    • to predict the structure of an unknown protein from existing homologous proteins

  • protein-ligand docking and virtual screening

    • to predict and analyze the binding interactions between small molecules (ligands) and proteins

<p>computational structural prediction methods</p><ul><li><p>homology modeling </p><ul><li><p>to predict the structure of an <strong>unknown protein</strong> from <strong>existing homologous proteins</strong></p></li></ul></li><li><p>protein-ligand docking and virtual screening</p><ul><li><p>to predict and analyze the <strong>binding interactions</strong> between <u>small molecules</u> (ligands) and <u>proteins</u></p></li></ul></li></ul><p></p>
21
New cards

rational drug design

relenza (zanamivir)

  • treatment of illness due to influenza A and B virus in adults and peds pts aged 7 years of age and older who have been symptomatic for no more than 2 days

  • structure based design

    • selecting molecules that were likely to bind to the conserved regions of the enzyme neuraminidase

      • neuraminidase = enzyme produced by the flu virus to release newly formed virus from infected cells

<p>relenza (zanamivir)</p><ul><li><p>treatment of illness due to <strong>influenza A and B</strong> virus in adults and peds pts aged 7 years of age and older who have been symptomatic for <strong>no more than 2 days</strong></p></li><li><p>structure based design</p><ul><li><p>selecting molecules that were likely to bind to the conserved regions of the enzyme <strong>neuraminidase</strong></p><ul><li><p>neuraminidase = enzyme produced by the flu virus to release newly formed virus from infected cells</p></li></ul></li></ul></li></ul><p></p>
22
New cards

alphafold

a significant step forward in protein folding prediction

  • since 1994, protein structure prediction challenge “critical assessment of structure prediction” (CASP)

  • alphafold structures = vastly more accurate than competing methods

  • alphafold = freely available, AI program that can predict the shape of a protein, almost instantly, down to atomic accuracy

<p>a significant step forward in protein <u>folding</u> prediction</p><ul><li><p>since 1994, protein structure prediction challenge “critical assessment of structure prediction” (CASP) </p></li><li><p>alphafold structures = vastly more accurate than competing methods</p></li><li><p>alphafold = freely available, AI program that can predict the shape of a protein, almost instantly, down to atomic accuracy</p></li></ul><p></p>
23
New cards

deepmind software…

that can predict the 3D shape of proteins is already changing biology

alphafold mania = deepmind software

<p>that can predict the 3D shape of proteins is already changing biology</p><p>alphafold mania = deepmind software</p>
24
New cards

alphafold limitations and future directions

  • alphafold predictions are valuable hypotheses and accelerate but do NOT replace experimental structure determination

  • limitations

    • CANNOT predict the consequences of new mutations in proteins since there are NO evolutionarily-related sequences to examine

    • CANNOT deal with proteins that can adopt different structures in different states/environments

    • CANNOT predict protein structures bound to ligands

  • isomorphic labs = deepmind’s drug discovery spin off

    • predict the structure of proteins when they are bound to drugs and other interacting molecules

25
New cards

protein data bank (PDB)

  • RCSB protein data bank

  • structures are published and can be accessed for visualization and analysis

<ul><li><p>RCSB protein data bank</p></li><li><p>structures are published and can be accessed for visualization and analysis</p></li></ul><p></p>
26
New cards

high-throughput techniques

  • multi-omic approach for proteins = proteomics

    • molecular read-out → protein

    • results → abundance of peptides, peptide modifications, and interactions between peptides

    • technology → mass spectrometry, western blotting, and ELISA

27
New cards

proteomics → large-scale study of proteins

  • proteome = the whole set of proteins expressed in a cell at a particular time

  • proteomics = the investigation of the proteome

    • explore the complete catalogue of proteins expressed in a cell type at a given time point

    • investigate how this inventory changes when the conditions are altered

  • unlike the genome, the proteome is NOT a fixed characteristic of the cell

    • a transcribed gene may be differentially translated or NOT translated and different proteins have different degradation rate so transcriptomic data is often NOT a good predictor of protein abundance

<ul><li><p>proteome = the whole set of proteins expressed in a cell at a particular time</p></li><li><p>proteomics = the investigation of the proteome</p><ul><li><p>explore the complete catalogue of proteins expressed in a cell type at a given time point</p></li><li><p>investigate how this inventory changes when the conditions are altered</p></li></ul></li><li><p>unlike the genome, the proteome is NOT a fixed characteristic of the cell</p><ul><li><p>a transcribed gene may be differentially translated or NOT translated and different proteins have different degradation rate so transcriptomic data is often NOT a good predictor of protein abundance</p></li></ul></li></ul><p></p>
28
New cards

protein methods

  • protein purification

    • isolate one specific protein from a complex mixture

  • polyacrylamide gel electrophoresis and western blotting

    • electrophoresis → separate proteins based on size

    • western blotting → after electrophoresis, proteins are transferred to membrane; antibodies are used to detect specific protein

  • enzyme-linked immunosorbent assay (ELISA)

    • quantifies a specific protein

    • capture antibody fixed on plate with an enzyme that produces a color change or signal when target protein is present

<ul><li><p>protein purification</p><ul><li><p>isolate one specific protein from a complex mixture</p></li></ul></li><li><p>polyacrylamide gel electrophoresis and western blotting</p><ul><li><p>electrophoresis → separate proteins based on <strong>size</strong></p></li><li><p>western blotting → after electrophoresis, proteins are transferred to membrane; antibodies are used to detect specific protein</p></li></ul></li><li><p>enzyme-linked immunosorbent assay (ELISA)</p><ul><li><p>quantifies a specific protein</p></li><li><p>capture antibody fixed on plate with an enzyme that produces a color change or signal when target protein is present</p></li></ul></li></ul><p></p>
29
New cards

limitations of protein methods discussed before

  • limited in # of proteins studied per condition/assay

  • limited to the detection of proteins for which an antibody is available

30
New cards

mass spectrometry…

is a powerful technique for ID of peptides and proteins

  • used to investigate:

    • when and where proteins expressed

    • rate of protein production, degradation, and steady-state abundance

    • how proteins are modified (for example, post-translational modifications such as phosphorylation

    • the movement of proteins between subcellular compartments

    • how proteins interact with one another

  • mass spec allows the highly precise and sensitive detection of the mass of an analyte

  • mass spectrometers:

    • convert the analyte into gas-phase ions

      • matrix-assisted laser desorption/ionization (MALDI) = technique used to produce ions using a laser energy-absorbing matrix

      • electrospray ionization (ESI) = techinque used to produce ions using an electrospray in which a high voltage is applied to liquid to create aerosol

    • apply electrostatic potentials to measure the mass-to-charge ratio (m/z)

    • consists of 3 components

      • ion source

      • mass analyzer

      • detector

<p>is a powerful technique for ID of peptides and proteins</p><ul><li><p>used to investigate:</p><ul><li><p>when and where proteins expressed</p></li><li><p>rate of protein production, degradation, and steady-state abundance</p></li><li><p>how proteins are modified (for example, post-translational modifications such as phosphorylation</p></li><li><p>the movement of proteins between subcellular compartments</p></li><li><p>how proteins interact with one another</p></li></ul></li><li><p>mass spec allows the <u>highly precise and sensitive detection of the </u><strong><u>mass</u></strong><u> of an analyte</u></p></li><li><p>mass spectrometers:</p><ul><li><p>convert the analyte into <u>gas-phase ions</u></p><ul><li><p>matrix-assisted laser desorption/ionization (MALDI) = technique used to produce ions using a laser energy-absorbing matrix</p></li><li><p>electrospray ionization (ESI) = techinque used to produce ions using an electrospray in which a <strong>high voltage</strong> is applied to <strong>liquid</strong> to create <strong>aerosol</strong></p></li></ul></li><li><p>apply <u>electrostatic potentials</u> to measure the <strong>mass-to-charge ratio (m/z)</strong></p></li><li><p>consists of 3 components</p><ul><li><p>ion source</p></li><li><p>mass analyzer</p></li><li><p>detector<br></p></li></ul></li></ul></li></ul><p></p>
31
New cards

mass spec can detect…

molecular masses with a high degree of sensitivity and accuracy

<p>molecular masses with a high degree of <u>sensitivity</u> and <u>accuracy</u></p>
32
New cards

basic protocol for proteomics

start with cultured cells → proteins are extracted using acid extraction or column purification (helps isolate protein from other cellular materials) or proteins can be run through SDS-PAGE which also extracts proteins → enzymatic digestion: proteins are digested into peptides using enzymes → digested protein mixture is put for MS analysis → Ms identifies and quantifies peptides based on m/z ratio

<p>start with cultured cells → proteins are extracted using <u>acid extraction</u> or <u>column purification</u> (helps isolate protein from other cellular materials) or proteins can be run through <u>SDS-PAGE</u> which also extracts proteins → enzymatic digestion: proteins are digested into peptides using enzymes → digested protein mixture is put for MS analysis → Ms identifies and quantifies peptides based on <strong>m/z ratio</strong></p><p></p>
33
New cards

peptides can be specifically cleaved into small peptides to facilitate analysis

  • sequencing of long peptides by MS yields complex spectrums that are difficult to interpret

  • to sequence an entire protein

    • protein is chemically or enzymatically cleaves to yield peptides

    • peptides are ionized and their m/z

    • fragment ion spectra are then assigned peptide sequences based on database comparison and protein sequences are predicted

<ul><li><p>sequencing of long peptides by MS yields complex spectrums that are difficult to interpret</p></li><li><p>to sequence an entire protein</p><ul><li><p>protein is chemically or enzymatically cleaves to yield peptides</p></li><li><p>peptides are <strong>ionized</strong> and their m/z</p></li><li><p>fragment ion spectra are then assigned peptide sequences based on database comparison and protein sequences are predicted</p></li></ul></li></ul><p></p>
34
New cards

generic sample prep workflow mass spec based proteomics

biological sample → dissociation and/or lysis → get lysate/fluid that contains proteins, lipids, carbs, metabolites, etc. → partial protein purification: lipids, carbs, metabolites all leave → proteins → denaturation; disulfide reduction; thiol alkylation → denature, chemically inert proteins → enzymatic digestion; desalting/cleanup → peptides → LC-MS/MS

<p>biological sample → dissociation and/or lysis → get lysate/fluid that contains proteins, lipids, carbs, metabolites, etc. → partial protein purification: lipids, carbs, metabolites all leave → proteins → denaturation; disulfide reduction; thiol alkylation → denature, chemically inert proteins → enzymatic digestion; desalting/cleanup → peptides → LC-MS/MS</p>
35
New cards

generic MS-based proteomics experiment

sample fractionation: biological sample fractionated to isolate different proteins → run thru SDS-PAGE which sorts them by size → proteins excised from gel → trypsin digestion → peptide mixture → peptides separated using chromatograph and then ionized by ESI for MS analysis

<p>sample fractionation: biological sample fractionated to isolate different proteins → run thru SDS-PAGE which sorts them by size → proteins excised from gel → trypsin digestion → peptide mixture → peptides separated using chromatograph and then ionized by ESI for MS analysis</p>
36
New cards

co-immunoprecipitation followed by MS to identify protein interactors

  1. cell lysates (total protein) → add antibodies that specifically binds to target protein (aka antigen) and incubate → antigen-antibody complex → add protein A/G beads, which binds to antibodies, and incubate → antigen-antibody-bead complex (with other proteins)→ wash and elute to remove unbound proteins → antigen-antibody-bead complex w/NO other proteins → purify → digestion → peptides → MS

<ol><li><p>cell lysates (total protein) → add antibodies that specifically binds to target protein (aka antigen) and incubate → antigen-antibody complex → add protein A/G beads, which binds to antibodies, and incubate → antigen-antibody-bead complex (with other proteins)→ wash and elute to remove unbound proteins → antigen-antibody-bead complex w/NO other proteins → purify → digestion → peptides → MS</p></li></ol><p></p>
37
New cards

example of developing a therapeutic hypothesis based on protein interactions: saphnelo (anniforlumab)

saphnelo (anniforlumab) - astrazeneca

  • mAb used for treatment of systemic lupus erythematosus (SLE)

    • blocks the signaling of IFNAR1 → prevents downstream JAK-STAT signaling → reduces immune overactivation in lupus

  • binds the type I interferon receptor (IFNAR1)

  • blocks the activity of type I interferon (INF)

  • NO direct genetic evidence linking SLE and IFNAR1

  • missense variants in TYK2 associated with SLE

    • TYK2 = kinase that physically interacts with IFNAR1

<p>saphnelo (anniforlumab) - astrazeneca</p><ul><li><p>mAb used for treatment of <u>systemic lupus erythematosus (SLE)</u></p><ul><li><p>blocks the signaling of IFNAR1 → prevents downstream JAK-STAT signaling → reduces immune overactivation in lupus</p></li></ul></li><li><p><strong>binds</strong> the <u>type I interferon receptor (IFNAR1)</u></p></li><li><p><strong>blocks</strong> the activity of <u>type I interferon (INF)</u></p></li><li><p>NO direct genetic evidence linking <strong>SLE</strong> and <strong>IFNAR1</strong></p></li><li><p>missense variants in TYK2 associated with SLE</p><ul><li><p>TYK2 = kinase that physically interacts with IFNAR1</p></li></ul></li></ul><p></p>
38
New cards

quantitative proteomics

  • the measurement of the abundance of proteins across multiple conditions

    • healthy vs disease

    • untreated vs drug treated

  • 3 approaches to quantify

    • metabolic stable isotope labeling

      • cells are grown in media containing “heavy” AA → proteins made in these cells incoporate the label during synthesis → labeled and unlabeled samples are mixed, digested, analyzed by MS

    • isotope tagging by chem rxn

      • proteins from different conditions digested first → peptides chemically labeled with isotopic tags

    • stable-isotope incorporation via enzyme rxn

      • stable isotopes is introducing during enzymatic digestion → label is added to peptides post-digestion

<ul><li><p>the measurement of the abundance of proteins across multiple conditions</p><ul><li><p>healthy vs disease</p></li><li><p>untreated vs drug treated</p></li></ul></li><li><p>3 approaches to quantify</p><ul><li><p>metabolic stable isotope labeling</p><ul><li><p>cells are grown in media containing “heavy” AA → proteins made in these cells incoporate the label during synthesis → labeled and unlabeled samples are mixed, digested, analyzed by MS</p></li></ul></li><li><p>isotope tagging by chem rxn</p><ul><li><p>proteins from different conditions digested first → peptides chemically labeled with isotopic tags</p></li></ul></li><li><p>stable-isotope incorporation via enzyme rxn</p><ul><li><p>stable isotopes is introducing during enzymatic digestion → label is added to peptides post-digestion</p></li></ul></li></ul></li></ul><p></p>
39
New cards

quantitative proteomics: stable isotope labeling by AA in cell culture (SILAC)

cells are either grown in

  • light isotope-containing media

  • heavy isotope-containing media + treatment

after you harvest + lyse cells → quantitate extracted protein → mix lysates → SDS-PAGE → excise bands → trypsin digestion → LCMS/MS: calculate ratio of heavy:light peptides; indicates relative protein abundance between conditions

40
New cards

applications of proteomics-based tech

  • detection of various diagnostic markers

  • candidates for vaccine production

  • understanding pathogenicity mechanisms

  • alteration of expression patterns in response to different signals/medications

  • interpretation of functional protein pathways in different diseases

<ul><li><p>detection of various diagnostic markers</p></li><li><p>candidates for vaccine production</p></li><li><p>understanding pathogenicity mechanisms</p></li><li><p>alteration of expression patterns in response to different signals/medications</p></li><li><p>interpretation of functional protein pathways in different diseases</p></li></ul><p></p>
41
New cards

proteomics reveals small molecules’ secrets

  • high throughput quantitative proteomics

  • 875 small molecule compounds

  • comprehensive profiling method to characterize the MOA of small molecules

  • each compound altered the expression of 15 proteins

  • revealed potential new targetes for commonly used small molecules

  • elucidated compound MOA and drug repurposing

  • revealed off-target effects which can increase efficiency and safety profiling in drug discovery

<ul><li><p>high throughput <u>quantitative proteomics</u></p></li><li><p>875 small molecule compounds</p></li><li><p>comprehensive profiling method to characterize the <strong>MOA</strong> of small molecules</p></li><li><p>each compound <strong>altered the expression of 15 proteins</strong></p></li><li><p>revealed potential new targetes for commonly used small molecules</p></li><li><p>elucidated compound MOA and drug repurposing</p></li><li><p>revealed off-target effects which can <strong>increase efficiency and safety profiling</strong> in drug discovery</p></li></ul><p></p>
42
New cards

high-throughput techniques: genomics and transcriptomics

  • genomics

    • molecular read out → genes (DNA)

    • results → genetic variants; gene presence or absence; genome structure

    • technology → sequencing, exome sequencing

  • transcriptomics

    • molecular read out → RNA and/or cDNA

    • results → gene expression; gene presence or absence; splice sites’ RNA editing sites

    • technology → RT-PCR (reverse transcription-PCR) and RT-qPCR; gene arrays; RNA-sequencing

<ul><li><p>genomics</p><ul><li><p>molecular read out → genes (DNA)</p></li><li><p>results → genetic variants; gene presence or absence; genome structure</p></li><li><p>technology → sequencing, exome sequencing</p></li></ul></li><li><p>transcriptomics</p><ul><li><p>molecular read out → RNA and/or cDNA</p></li><li><p>results → gene expression; gene presence or absence; splice sites’ RNA editing sites</p></li><li><p>technology → RT-PCR (reverse transcription-PCR) and RT-qPCR; gene arrays; RNA-sequencing</p></li></ul></li></ul><p></p>
43
New cards

genomics - the large-scale study of DNA

  • genome = organism’s complete set of DNA

  • genomics = the study of all of a person’s genes (the genome)

  • sequencing = determine the exact order of bases in a strand of DNA

44
New cards

human genome project

  • the sequence of the human genome has been completed

  • started in 1990 and completed in 2003

  • led at the NIH by the national human genome research institute

  • comprised of approx

    • 3 billion BP of DNA distributed among 24 chromosomes

    • 23,000 genes

<ul><li><p>the sequence of the human genome has been completed</p></li><li><p>started in 1990 and completed in 2003</p></li><li><p>led at the NIH by the national human genome research institute</p></li><li><p>comprised of approx</p><ul><li><p>3 billion BP of DNA distributed among 24 chromosomes</p></li><li><p>23,000 genes</p></li></ul></li></ul><p></p>
45
New cards

methods for sequencing DNA

sanger sequencing or dideoxy sequencing

  • method using DNA polymerase along with special chain-terminating nucleotides called dideoxyribonucleoside triphosphates

    • when incorporated into a growing DNA strand, they block further elongation

  • normal deoxyribonucleoside triphosphate (dNTP) → 3’ OH allows strand extension at 3’ end

  • dideoxyribonucleoside triphosphate (ddNTP) → 3’H prevents strand extension at 3’ end

<p>sanger sequencing or dideoxy sequencing</p><ul><li><p>method using <strong>DNA polymerase</strong> along with special chain-terminating nucleotides called <strong>dideoxyribonucleoside triphosphates</strong></p><ul><li><p>when incorporated into a growing DNA strand, they block further elongation</p></li></ul></li><li><p>normal <strong>deoxy</strong>ribonucleoside triphosphate (dNTP) →<strong> 3’ OH</strong> allows strand extension at 3’ end</p></li><li><p><strong>dideoxy</strong>ribonucleoside triphosphate (ddNTP) → <strong>3’H </strong>prevents strand extension at 3’ end</p></li></ul><p></p>
46
New cards

automated sanger sequencing of DNA process

  • 4 different chain terminating nucleotides have been chemically tagged with a different colored fluroescent label

  • the rxn is loaded onto thin capillary gels which separates the rxn products into series of distinct bands

  • a detector records the color of each band

  • a computer translates the info → nucleotide sequence

  • diagram

    • single-strand DNA fragment to be sequenced → add primer → add small amounts of labeled chain terminating ddNTPs and add excess amounts of unlabeled dNTPs → mixture of DNA products each containing a chain-terminating ddNTP labeled with a specific fluorescent marker → products loaded onto capillary gel → size separated products are read in sequence

<ul><li><p>4 different chain terminating nucleotides have been chemically tagged with a different colored fluroescent label</p></li><li><p>the rxn is loaded onto thin capillary gels which separates the rxn products into series of distinct bands</p></li><li><p>a detector records the color of each band</p></li><li><p>a computer translates the info → nucleotide sequence</p></li><li><p>diagram</p><ul><li><p>single-strand DNA fragment to be sequenced → add primer → add small amounts of labeled chain terminating ddNTPs and add excess amounts of unlabeled dNTPs → mixture of DNA products each containing a chain-terminating ddNTP labeled with a specific fluorescent marker → products loaded onto capillary gel → size separated products are read in sequence</p></li></ul></li></ul><p></p>
47
New cards

automated sanger sequencing of DNA results

  • each colored peak = nucleotide

  • sequence of overlapping segments

  • longer sequence are assembled from shorter pieces

<ul><li><p>each colored peak = nucleotide</p></li><li><p>sequence of overlapping segments</p></li><li><p>longer sequence are assembled from shorter pieces</p></li></ul><p></p>
48
New cards

next generation sequencing (NGS) - illumina

  • allows sequencing of thousands of millions of DNA molecules simultaneously

  • high speed, reduced cost

  • a genome or other large DNA sample is broken into millions of short fragments

  • the sequences are amplified on a solid surface with covalently attached linkers

  • diagram

    • each location on slide or plate contains ~1000 copies of a unique DNA molecule to be sequenced

      1. add DNA polymerase, fluorescent, reversible terminator NTPs → A is added and recorded

      2. fluorescent tag and terminator removed from A → now have free 3’ OH

      3. add DNA polymerase, fluorescent, reversible terminator NTPs again → added T is recorded

<ul><li><p>allows sequencing of thousands of millions of DNA molecules simultaneously</p></li><li><p>high speed, reduced cost</p></li><li><p>a genome or other large DNA sample is broken into millions of short fragments</p></li><li><p>the sequences are amplified on a solid surface with <u>covalently attached linkers</u></p></li><li><p>diagram</p><ul><li><p>each location on slide or plate contains ~1000 copies of a unique DNA molecule to be sequenced</p><ol><li><p>add DNA polymerase, fluorescent, reversible terminator NTPs → A is added and recorded</p></li><li><p>fluorescent tag and terminator removed from A → now have free 3’ OH</p></li><li><p>add DNA polymerase, fluorescent, reversible terminator NTPs again → added T is recorded</p></li></ol></li></ul></li></ul><p></p>
49
New cards

RNA-sequencing (RNA-seq)/transcriptomics

  • RNA-seq = method to detect the presence and quantitation of all the RNA molecules in a cell under specific conditions

    1. isolate RNA from cells or tissue of interest

    2. select for RNA by filtering for sequencing containing polyA tails

    3. synthesize cDNA using reverse transcriptase

      1. cDNA = edited DNA → just the parts of the cell actually used to make proteins

    4. sequence cDNA molecules using an NGS method

    5. use computational algorithms to assemble sequencing data

50
New cards

data analysis and visualization: heatmap/clustering

  • reduce complexity

  • speed up comparisons

  • heatmap = graphical representation of data where values are depicted by color

  • clustering = task of classifying N objects (proteins) into k groups (clusters) in such a way that the objects within a group are similar to each other but the groups are different from each other

    • clustering methods identify similar and distinct expression patterns

<ul><li><p>reduce complexity</p></li><li><p>speed up comparisons</p></li><li><p>heatmap = graphical representation of data where values are depicted by color</p></li><li><p>clustering = task of classifying N objects (proteins) into k groups (clusters) in such a way that the objects within a group are similar to each other but the groups are different from each other</p><ul><li><p>clustering methods identify <u>similar and distinct</u> expression patterns</p></li></ul></li></ul><p></p>
51
New cards

data analysis and visualization: principal component analysis (PCA)

  • method for combining the properties of an object

  • can simplify the data to minimize their effect

  • reduce the data to only 2 or 3 principal components

<ul><li><p>method for combining the properties of an object</p></li><li><p>can simplify the data to minimize their effect</p></li><li><p>reduce the data to only <strong>2 or 3 </strong>principal components</p></li></ul><p></p>
52
New cards

data analysis and visualization: volcano plot

  • type of scatter plot useful for identifying events that differ significantly between 2 groups of experimental subjects

  • it plots significance vs fold change

    • fold change = how much a gene’s expression has changed between 2 groups

<ul><li><p>type of scatter plot useful for <u>identifying events that differ significantly</u> between 2 groups of experimental subjects</p></li><li><p>it plots significance vs fold change</p><ul><li><p>fold change = how much a gene’s expression has changed between 2 groups</p></li></ul></li></ul><p></p>
53
New cards

lecture summary

  • bioinformatics = interdisciplinary scientific field tha tprovides sophisicated tools for managing large datasets and computational approaches that acclerate drug discovery and advance medical care

  • uniprot = central access point for extensive curated protein info

  • structural bioinformatics = helps predict 3D structures of biomolecules

  • proteomics = large scale study of proteins

  • mass spec for proteomics allows the detection of peptides and proteins in sample

  • quantitative proteomics methods = allows accurate measurement of protein abundance across multiple samples

  • methods for sequencing DNA: sanger and next generation sequencing

  • RNA sequencing = used to quantify gene expression in a sample

  • heatmaps, PCA analysis, volcano plots = statistical methods for data analysis and visualization