1/66
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Sequence Analysis
The goal of sequence analysis is to collect, compare, and understand biological data. A common task is to identify unknown sequence.
BLAST (Basic Local Alignment Search Tool)
A program used to compare a given sequence against a database to find similar sequences.
E-value (Expected Value)
Indicates the number of hits one can 'expect' to see by chance when searching a database of a particular size. A low E-value suggests a more significant match.
Low E Value in BLAST results
suggests a more significant match when searching for a similar sequence
What does a low E value and a high score indicate
a likely hit
Example Application of BLAST
An unknown HIV sequence was identified as an HIV-1 N434 retrovirus strain from Venezuela by using BLAST. The result showed a 100% query cover, a 0.0 E-value, and 100% identity.
Primary Structure
The sequence of amino acid residues.
Secondary Structure
Local folding into structures like alpha-helices (A Helix).
Tertiary Structure
The overall three-dimensional shape of a single polypeptide chain.
Quaternary Structure
The arrangement of multiple assembled subunits.
X-ray Crystallography
A method for determining protein structure.
Nuclear Magnetic Resonance (NMR)
A method for determining protein structure.
Cryo-Electron Microscopy (cryoEM)
A method for determining protein structure.
AlphaFold
An AI program that can predict protein structures with high accuracy. It uses an input sequence and searches genetic and structure databases to generate a 3D structure.
Bioinformatic Drug Design
Aims to create therapies by targeting specific proteins.
HIV-1 Protease Inhibition
The drug ritonavir can inhibit the HIV-1 protease, preventing the virus from producing new viral envelopes.
Types of RNA
Major types include messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal RNA (rRNA). Other types like siRNA, miRNA, and lncRNA are also important.
RNA Structure
RNA is typically single-stranded but can have double-stranded regions. It contains the sugar ribose and the base Uracil (U) instead of Thymine (T).
Functions of RNA
Involved in translation, regulation of gene expression, and can act as enzymes (ribozymes) or regulatory elements (riboswitches).
RNA Lifestyle
Includes transcription, transport, and degradation by nucleases
Techniques for RNA analysis
Hybridization, nothern blotting, microarrays, RNA seq.
Hybridization
The principle that Uracil (U) pairs with Adenine (A) and Cytosine (C) pairs with Guanine (G) is used to detect specific sequences.
Northern Blotting
A technique to detect specific RNA sequences in a sample.
microarrays
Used to measure the expression levels of large numbers of genes simultaneously
RNA-Seq
A sequencing technique used to reveal the presence and quantity of RNA in a biological sample at a given moment in time.
Bioinformatic tools for RNA
RFAM, RNAanalyzer, RNAfold, Riboswitch finder
Rfam
A database containing a collection of RNA families, represented by sequence alignments, consensus secondary structures, and covariance models.
RNAanalyzer
A web-based tool for analyzing regulatory RNA elements and secondary structures from an RNA sequence.
RNA fold
A web server that predicts the secondary structure of single-stranded RNA sequences based on minimum free energy.
Riboswitch Finder
A tool to search RNA/DNA sequences for known riboswitches.
Sequencing Methods
Techniques like Sanger and Next-Generation Sequencing (NGS) produce short sequence reads.
Assembly
These short reads must be assembled by finding overlapping parts; this process is difficult for highly repetitive sequences.
Tools for Assembly
BLAST can be used to compare a new sequence to an already determined one to aid in assembly, especially for matches with E-values less than 50.
Genome Annotation
The process of identifying the locations of genes and other biological features on a nucleotide sequence.
Annotation Pipelines
NCBI provides pipelines for prokaryotic (PGAP), eukaryotic (EGAP), and viral (VADR) genomes.
Approaches for genome annotation
analyzing RNA sequence data to identify transcribed regions
Finding promoter sequences using databases like Transfac
Ab initio methods that predict genes based on sequence characterstics.
Human Genome Project (HGP)
A major international research effort to determine the sequence of the human genome and identify the genes that it contains.
ENCODE Project
The Encyclopedia of DNA Elements project aimed to systematically map regions of transcription, transcription factor association, chromatin structure, and histone modification.
How much of the ENCODE project assigned biochemical functions to the genome
80%
Composition of the Human Genome
Only about 2-3% of the human genome consists of protein-coding genes; the majority is composed of introns (26%), repetitive elements like LINEs (20%) and SINEs (13%), and other non-coding DNA.
intergenic
what is in between
LTR (long terminal repeat)
more repetitive DNA for genes and gene sequences
what is the percentage of interspersed elements
33%
Hidden Markov Models (HMMs)
An HMM is a statistical model used to describe observable events that depend on underlying, unobservable 'hidden' states.
HMMs
Hidden Markov Models used in bioinformatics for tasks like gene prediction, sequence alignment, and protein secondary structure prediction.
HMMER
A tool for biosequence analysis using profile Hidden Markov Models.
transmembrane proteins
about 30% of cells in the membrane
how long are transmembrane domains and what are they made of
20 AA long
Made of alpha helix with hydrophobic AA
R groups stick out (Isoleucine or phenylalanine)
transface
transcription factors are looking for a place to bind
what percentage of genes have transcription factors
10%
do they always enhance binding sites
no sometimes they are
metabolomics
The large-scale study of small molecules, or metabolites, within cells, biofluids, tissues, or organisms.
KEGG
Kyoto Encyclopedia of Genes and Genomes, a database resource for understanding high-level functions and utilities of biological systems, containing graphical maps of metabolic pathways.
Flux Balance Analysis (FBA)
A method to calculate the flow of metabolites through a metabolic network and predict growth rates using a stoichiometric matrix and linear programming.
Elementary Mode Analysis (EMA)
A computational method that identifies all minimal, feasible metabolic pathways within a network.
metatool
a tool for metabolic modeling
CellNetAnalyzer
a tool for metabolic modeling
COBRA toolbox
a tool for metabolic modeling
cytoscape
A tool for visualizing molecular interaction networks, including metabolic pathways from databases like KEGG.
systems biology
The study of complex interactions between components of a biological system to understand the system as a whole
boolean models
Logic-based systems that use binary variables and logic (AND OR NOT) to represent biological processes like gene regulation.
They dont need detailed kinetic data
cell designer
A program used to add components and their interconnections in biological modeling.
Stimulates the network dynamics using Boolean commands.
Compare the model to experimental data for validation.
MAPK/ERK Pathway
A chain of proteins that communicates a signal from a cell surface receptor to the DNA in the nucleus, often involving phosphorylation
G-protein coupled receptors (GPCR)
A class of receptors that play a role in signaling pathways, including those involved in heart failure.
Receptor Tyrosine Kinases (RTK)
A class of receptors that are involved in signaling pathways, including those related to heart failure.
Ordinary Differential Equations (ODEs)
Mathematical equations used for quantitative modeling when experimental data on concentration changes over time are available.
Quantitative Modeling
Modeling that uses mathematical equations to represent biological processes, as opposed to simpler, semiquantitative approaches.