1/28
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
!! What is bioinformatics
The science of collecting and analyzing complex biology data.
Central dogma of molecular biology: DNA → RNA → protein.
Transcriptomics:
Proteomics:
Metabolomics:
Representation/storage/retrieval/analysis of biological data concerning:
Sequences (DNA, RNA, protein).
Structure (RNA, protein)
Function (protein)
Activity levels (mRNA, protein, metabolite).
Networks of interactions of molecules(metabolic pathways, regulatory pathways, signaling pathways).
Etc.
Bioinformatics vs. computational biology
Bioinformatics:
Development and application of software tools, algorithms, databases for managing and analyzing biological data
Key aspects:
Data storage, retrieval, and annotation (e.g., genome databases).
Sequence alignment, genome assembly, gene prediction.
Emphasis on informatics, data infrastructure, and tool building.
More about engineering.
Example tasks:
Creating a tool for RNA-seq analysis.
Developing a database to store DNA sequences.
Writing algorithms for DNA sequence alignment.
Computational biology
Using mathematical modeling, theoretical approaches, and computational simulations to understand biological processes.
Key aspects:
Model biological processes (e.g., protein folding, population dynamics).
Systems biology, structural biology, evolutionary biology models,
More about discovery, biology-focused and theory-driven,
Example tasks:
Simulating how a protein folds.
Modeling gene regulatory networks.
Studying the evolution of genes using computational models.
Brief review of biology
Modern molecular biology studies a few types of biologically important molecules: DNA, RNA, proteins, and metabolites.
Most bioinformatics research studied DNA, RNA, and proteins.
They are “easier.”
Their primary structures are sequences.
The technologies for analyzing them have been developed.
More work emerges on metabolites.
DNA: The code of life
The structure and the four genetic letters code (A, G, T, C) are the same for all living organisms.
Four different nucleotides distinguished by four bases: adenine (A), cytosine (C), guanine (G) and thymine (T).
Tissue cells have two set of chromosomes (one coming from each parent).
Maternal and paternal copy
Regions of DNA sequence along chromosomes encode instructions for the manufacture of proteins.
DNA is a polymer:
Polymer = a large molecule consisting of nucleotides.
Bioinformatics and computational biology
Highly interdisciplinary:
Computer science → tools, algorithms.
Statistics → numbers (quality control, normalization).
Biology → question.
Emphasis change over time.
Applied:
From freshman to postdocs.
Useful training for biologists.
Field evolving quickly:
Remove microarray.
Add scRNA-seq, scATAC-seq, Hi-C.
Levels:
Level 0:
Modeling for modeling’s sake.
Level 1 (Entry):
Use published tools to analyze data and generate hypothesis for experimentalists.
Level 2 (Bioinfo):
Develop algorithms and databases for data analyses on new technologies, data integration and reuse.
Level 3 (CompBio):
Make biological discoveries from public data integration and modeling.
Level X:
Integrative studies from big consortia.
The double helix
•DNA typically consists of two strands arranged in a double helix structure
In a double stranded DNA, each base has its own binding partner.
Adenine always bonds to Thymine.
Cytosine always bonds to Guanine.
Directions of the DNA strands
Each DNA strand has two ends: 5’ and 3’.
5’ (five prime) and 3’:
5' carbon has a phosphate group attached to it.
Picture = the one circled red.
3' carbon a hydroxyl (-OH) group.
Picture = the one in the blue box.
DNA polymerase (helps DNA replication) works in a 5' -> 3' direction.
DNA polymerase: enzyme, it recognizes the free -OH groups on the 3’ end and works from there to start the DNA synthesis. Synthesized DNA is always from 5’ to 3’.
Chromosomes
DNA is packaged into individual chromosomes
Histones: small proteins (H1, H2A, H2B, H3, H4); helps package and organize DNA into structural units.
Nucleosomes (beads):
Histone + DNA
Chromatin (necklace of beads):
Made up of nucleosomes linked together.
Genome
The complete DNA for a given species.
Human genome consists of 23 pairs of chromosomes.
Mosquitos have 3 pairs.
Camels have 35 pairs.
Adder's tongue ferns have 1440 pairs!
Every cell (except sex cells and mature red blood cells) contains the complete genome of an organism.
A sex cell, also called a gamete, is a reproductive cell that carries half the number of chromosomes of a normal (body) cell and is involved in sexual reproduction.
Mature red blood cells (RBCs) don’t have any DNA because they eject their nucleus during development.
Epigenome
The complete set of chemical modifications to DNA and histone proteins that regulate gene activity.
Without changing the underlying DNA sequence.
It is important because:
Explains why different cells (e.g., skin vs. muscle) behave differently with the same DNA.
Plays a role in development, aging, and diseases like cancer.
Genome vs. epigenome
Genome: the full cookbook (your DNA).
Epigenome = footnotes in the cookbook telling you which recipes to cook, skip, or adjust
DNAs across individuals are not identical
Genetic variations are differences in the DNA sequence among individuals.
They make each person's genome unique.
They can affect everything from physical traits to susceptibility to disease.
Genetic variations are generally permanent within an individual’s genome.
Unless altered by cancer cells or gene editing.
Genetic variations range in size from a single DNA building block (single nucleotide) to a large segment of a chromosome.
Genetic mutation vs. genetic variation
Genetic mutation is a change in the DNA sequence, while genetic variation refers to the differences in DNA among individuals.
Genetic variations across individuals partly arise from accumulating genetic mutations over generations.
Genetic mutations are not always inherited
Gene mutations occur in two ways:
Inherited from a parent (hereditary).
Acquired during a person’s lifetime (somatic).
Hereditary mutations (germline mutations)
Passed from parents to children.
Present in the egg and sperm cells, which are also called germ cells.
These variations are present in virtually every cell of a person's body from birth.
Somatic mutations
Occur in the DNA of individual cells at some time during a person’s life.
Caused by mistakes as DNA copies, sometimes by environmental factors.
In non-reproductive somatic cells (cells other than sperm and egg cells).
Somatic mutations can accumulate over an individual's lifetime within specific cell line, but generally do not pass to the next generation.
Genes
A gene is a segment of DNA sequence that carries the information required for constructing a particular protein.
A gene encodes a protein.
The DNA is organized into many sections called genes.
Each gene contains the instructions a cell needs to make a specific molecule, usually a protein.
Human genome usually comprises ~25,000 protein coding genes.
RNA
Four bases: adenine (A), cytosine(C), Guanine(G), and uracil(U).
Base pairs A-U, C-G.
Single-stranded (notated as s.s. or ss).
Single stranded structure is not stable.
Intramolecular base pairing is common.
RNA folding.
Secondary structure.
RNAfold: prediction of secondary structure.
Transcription
The biological process through which the information in a gene's DNA sequence is copied into RNA.
Transcription is like photocopying a gene from DNA so the cell can use it to make a protein.
Transcription occurs in the 5′ to 3′ direction.
The direction in which the new mRNA strand is synthesized.
The sense strand is the strand of DNA that has the same sequence as the mRNA.
RNA types
Messenger RNA (mRNA):
Information transfer from genes to proteins.
Ribosomal RNA (rRNA):
Ribosome structure.
These complex structures physically move along an mRNA molecule, catalyze the assembly of amino acids into protein chains. They also bind tRNAs and various accessory molecules necessary for protein synthesis.
Transfer RNA (tRNA):
Informational adaptor needed for translation.
RNA consisting of folded molecules which transport amino acids from the cytoplasm of a cell to a ribosome
Regulatory RNAs:
Non-coding RNA which does not lead to any proteins. In the form of RNA, they can regulate the expression of other genes.
MicroRNAs (miRNAs): a small regulatory RNA that helps silence genes through RNA interference.
RNA splicing
After transcription, the sequence we get is called pre-mRNAs.
It must undergo several processing steps before they are ready to be translated.
Introns and exons:
Spliceosomes can recognize sequences at the 5′ and 3′ end of the intron and cut the introns out precisely.
Introns are removed and exons(coding regions) are connected.
Alternative splicing:
When one gene’s RNA can be cut and rearranged in different ways, so the same gene makes different proteins.
The cell can choose to:
Skip certain exons.
Include extra exons.
Use different splice sites within an exon or intron.
Proteins
A chain of amino acids (also known as polypeptide).
There are 20 amino acids,
Human can produce 10 of these; the other amino acids are supplied by food (essential amino acids).
Amino acids are classified by their physical and chemical properties.
These properties play an important role in the function of proteins.
Translation: RNA to protein
mRNA carries the genetic code from DNA in sets of three bases called codons.
Each codon corresponds to one amino acid, matched by tRNA during translation.
Ribosomes read the mRNA and link amino acids together to form a protein.
Translation begins with the start codon (AUG).
Translation ends with the stop codon (UAA|UAG|UGA).
Open reading frame (ORF):
Groupings of codons
Picture: there are 3 possible reading frames on an mRNA.
Option 3 is correct.
Protein structure
Primary structure: sequences (CAAUG, etc.).
Secondary structure: regular substructures (alpha-helix/beta sheets).
Tertiary structure: 3-D structure of a single protein molecule.
Quaternary structure: larger assembly of several protein molecules or polypeptide chains, usually called subunits in this context.
Functional: transcription factors, receptors, ligands, signaling proteins, kinases, etc.
Post-translational modifications
Chemical changes that occur to proteins after translation.
PTMs occur at distinct amino acid side chains or peptide linkages, and they are most often mediated by enzymatic activity.
Protein functions
Structural support.
Storage of amino acids.
Transport of other substances.
Coordination of an organism’s activities.
Movement.
Response to cell to chemical stimuli.
Protection against diseases.
Etc.
Metabolites
A chemical substance produced when the body breaks down food, drugs, chemicals, or its own tissue.
Glucose, lactate, fatty acids
This process is called metabolism, and it produces energy and materials for growth, reproduction, and maintaining health.
Types:
Amino acids, lipids, peptides, nucleic acids, carbohydrates, vitamins, and minerals.
Proteins function as enzymes in metabolism, catalyzing and regulating the chemical reactions
Available data
Nucleotide sequences.
Protein sequences.
Protein structures.
Genome databases.
Gene expression.
Protein expression patterns.
Metabolic pathways.
Interactions and regulatory networks.
Sequence motifs.
Haplotypes and disease associated mutations.
Human genome
Human genome has ~ 3 billion (3x109) base pairs (letters).
A person takes >30 years to read and > 50 years to type these letters.