1/213
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Bioinformatics
Bioinformatics involves creating and using computer tools to handle and understand biological and health data.
Scope of Data
The scope of data ranges from millions (10^6) to sextillions (10^21).
Bit
1 piece of information (either a 1 or 0 in binary).
Byte
8 bits (allows 2^8 [256] characters).
Medical Research
Analyzing genomic and imaging data to identify disease markers, develop personalized treatments, and improve diagnostic accuracy.
Behavioral Data
Data from social media, surveys, and observational studies to study human behavior, social trends, and psychological patterns.
Central Dogma of Molecular Biology
The central dogma of molecular biology is DNA → RNA → Protein.
Transcription
The synthesis of RNA from a DNA template.
Translation
The synthesis of proteins based on the sequence of the RNA.
DNA Structure
DNA consists of two strands forming a double helix, with building blocks of phosphate group, deoxyribose sugar, and nitrogen bases (A, T, C, G).
RNA Structure
RNA is single-stranded but can form secondary structures, with building blocks of phosphate group, ribose sugar, and nitrogen bases (A, U, C, G).
Protein Structure
Proteins are made up of amino acids (20 different ones) linked by peptide bonds, with structures including primary, secondary, tertiary, and quaternary.
Genetic Code
The genetic code is a set of rules by which information encoded in DNA or RNA sequences is translated into proteins by living cells.
Start Codons
Start codons (AUG for methionine) signal the beginning of protein synthesis.
Stop Codons
Stop codons (UAA, UAG, UGA) signal the end of protein synthesis.
Codon Table
A codon table is used to determine the amino acid sequence from an mRNA sequence.
Open Reading Frame (ORF)
An open reading frame (ORF) is a part of the sequence that potentially ends up being translated into a protein.
Reading Frames
There are three reading frames in a sequence.
Hydrophobic Amino Acids
Hydrophobic (Nonpolar) amino acids like valine, leucine, and phenylalanine tend to avoid water and stabilize protein structure.
Hydrophilic Amino Acids
Hydrophilic (Polar) amino acids like serine, threonine, and asparagine interact well with water and are often found on the surface of proteins.
Charged Amino Acids
Positive and Negative Charged amino acids can form ionic bonds and are involved in binding oppositely charged molecules.
Protein Folding
Hydrophobic amino acids tend to cluster inside, while hydrophilic amino acids are exposed to the aqueous environment.
Active Sites
The chemical nature and charge of amino acids in the active sites of enzymes are critical for substrate binding and catalysis.
Primary Structure
Amino acid sequence.
Secondary Structure
Backbone interactions, no side chains, any protein can form these, sequence doesn't matter.
Tertiary Structure
Backbone + side chain interactions, more complex structure.
Quaternary Structure
Multiple polypeptides combined.
InDel Mutation
Addition or removal of one or more nucleotides from the DNA sequence.
Frameshift Mutation
Can alter the reading frame of a gene, causing a protein to be nonfunctional or function differently.
Point Mutation
Single nucleotide base change.
Silent Mutation
Change doesn't affect the amino acid sequence of a protein.
Missense Mutation
Change results in a different amino acid being incorporated into the protein.
Nonsense Mutation
Change creates a stop codon, leading to premature termination of the protein.
Impact of Silent Mutations
No effect on protein function because the amino acid sequence remains unchanged.
Impact of Missense Mutations
Can alter the protein's function depending on the properties of the new amino acid and its position in the protein.
Impact of Nonsense Mutations
Usually result in a nonfunctional protein because the translation process is prematurely terminated.
Sickle Cell Anemia
A genetic blood disorder characterized by the production of abnormal hemoglobin (HbS), causing red blood cells to assume a sickle shape.
Glutamic Acid to Valine Substitution
The mutation involves a glutamic acid (Glu) to valine (Val) substitution at position 6 in the β-globin gene.
Charge Difference in HbS
This substitution reduces the overall negative charge, making HbS less mobile in electrophoresis.
Gene Structure
A gene is a section of DNA that contains instructions for making a protein, including regulatory regions, a coding region, and a terminator.
Alternative Splicing
Allows a single gene to produce different protein versions depending on the needs of the cell.
GenBank
The primary database, housing raw sequence data.
RefSeq
Refines genetic records into high-quality reference sequences.
UniProtKB
Specializes in protein functionality, building curated annotations.
Nonredundant Nucleotide Database
Optimizes searches by eliminating duplicate sequences.
RefSeq Information
Provides a curated, standardized set of wild-type sequences, including DNA, RNA, and protein records that have been manually reviewed.
UniProtKB Focus
Focuses on proteins, compiling functional annotations and includes translated nucleotide sequences with supporting literature references.
Nonredundant Nucleotide Database Purpose
Streamlines search efficiency by reducing identical sequences found in GenBank, retaining a single representative entry per unique sequence.
Primary Database
Stores raw DNA and RNA sequence data submitted by researchers, serving as the foundational database for other genetic resources.
Substitution matrix
A substitution matrix is a table used to score alignments between sequences by assigning values to substitutions of one amino acid or nucleotide for another. It helps in quantifying the similarity between sequences.
PAM matrices
PAM (Point Accepted Mutation) matrices are based on evolutionary models and assume knowledge of ancestral sequences.
BLOSUM matrices
BLOSUM (BLOcks SUbstitution Matrix) matrices are derived from observed substitutions in conserved regions of proteins without assuming ancestral sequences.
BLOSUM62 matrix
BLOSUM matrices were developed by analyzing blocks of conserved sequences in protein families. The number in BLOSUM62, for example, indicates that the matrix was derived from sequences with at least 62% similarity.
Probability ratio in substitution matrices
The probability ratio compares the likelihood of a particular substitution occurring in an alignment to the likelihood of it occurring by chance. It helps in scoring alignments.
Calculating sequence similarity
To calculate sequence similarity, align the sequences and use the substitution matrix to score each aligned pair of residues. Sum the scores to get the overall similarity score.
Development process of BLOSUM62 matrix
The BLOSUM62 matrix was developed by aligning conserved blocks of protein sequences with at least 62% similarity, counting the observed substitutions, and calculating the log-odds scores for each substitution.
Goal of sequence alignment
The goal is to arrange sequences to identify regions of similarity that may indicate functional, structural, or evolutionary relationships.
Global alignment
Global alignment aligns sequences from end to end.
Local alignment
Local alignment finds the most similar regions within sequences.
Gaps in alignments
Gaps are insertions or deletions introduced to optimize alignment. They affect the alignment score by introducing penalties.
Evolutionary interpretation of a good alignment
A good alignment suggests that the sequences share a common ancestor and have conserved regions due to functional or structural constraints.
Optimal alignment approaches
Optimal approaches, like dynamic programming, guarantee the best alignment but are computationally intensive.
Heuristic alignment approaches
Heuristic approaches, like BLAST, are faster but may not always find the optimal alignment.
BLAST
BLAST stands for Basic Local Alignment Search Tool. It is used to compare a query sequence against a database to find similar sequences.
BLAST search process
BLAST uses a sliding window approach to break the query into words, finds neighborhood words, extends alignments to form High-scoring Segment Pairs (HSPs), and ranks results based on scores.
Main fields on NCBI's BLAST tool page
Key fields include the query sequence, database selection, algorithm parameters, and optional filters for refining the search.
Steps BLAST takes to perform a search
Word Generation: Break the query into short words. Word Matching: Find matching words in the database. Extension: Extend matches to form HSPs. Scoring: Score and rank the HSPs. Output: Display the results with scores and alignments.
Major sections of a BLAST results page
Interpret the major sections of a BLAST results page.
Results Page
The results page includes the query sequence, database matches, alignment scores, E-values (indicating statistical significance), and detailed alignments showing mismatches and gaps.
Cladograms
Cladograms show relationships without indicating time.
Chronograms
Chronograms include time.
Phylograms
Phylograms show evolutionary distances.
Monophyletic Groups
Monophyletic groups include an ancestor and all its descendants.
Polyphyletic Groups
Polyphyletic groups include unrelated organisms.
Paraphyletic Groups
Paraphyletic groups include an ancestor and some, but not all, descendants.
Phylogenetic Trees as Hypotheses
Phylogenetic trees represent hypotheses about evolutionary relationships based on available data and can be tested and refined with new information.
Parts of a Phylogenetic Tree
Key parts include branches (representing evolutionary paths), nodes (common ancestors), and leaves (current species or sequences).
Evolutionary Relationships
The tree shows how species or sequences are related through common ancestors, with closely related species sharing more recent common ancestors.
PCR (Polymerase Chain Reaction)
PCR amplifies specific DNA sequences using cycles of denaturation (separating DNA strands), annealing (binding primers to target sequences), and extension (synthesizing new DNA strands). The result is a large quantity of the target DNA sequence.
Sanger Sequencing
Uses dideoxy nucleotides to terminate DNA synthesis, allowing sequence determination by fragment length.
Illumina Sequencing
Uses sequencing by synthesis with optical detection of incorporated nucleotides.
Ion Semiconductor Sequencing
Detects nucleotide incorporation by measuring changes in pH.
Nanopore Sequencing
Reads DNA sequences by detecting changes in electrical current as DNA passes through a nanopore.
PacBio Sequencing
Uses real-time sequencing by synthesis with long read lengths.
Dideoxy Nucleotides (ddNTPs)
Researchers use dideoxy nucleotides (ddNTPs) to terminate DNA synthesis at specific points.
Gel Electrophoresis
By running the resulting fragments through gel electrophoresis, they can determine the sequence based on fragment length.
Sequencing by Synthesis
Sequencing by synthesis involves synthesizing a complementary DNA strand and detecting the incorporation of nucleotides.
Genome Fragmentation
Technologies like Illumina and Ion Semiconductor fragment genomes before sequencing to create smaller, manageable pieces for analysis.
Short Reads
Illumina (100-300 bp), Ion Semiconductor (200-400 bp).
Long Reads
Nanopore (up to several kb), PacBio (up to 20 kb).
Genome Assembly Problems
Problems include repetitive sequences, gaps, and errors in read alignment, which can complicate the assembly process.
Reads
Short DNA sequences generated during sequencing.
Paired-End Reads
Sequences from both ends of a DNA fragment, providing more information for assembly.
Single-End Reads
Sequences from one end of a DNA fragment.
Contigs
Continuous sequences of DNA assembled from overlapping reads.
Scaffolds
Groups of contigs linked together using paired-end reads or other information.
De Novo Assembly
Assembling a genome without a reference, using only the sequence data.
Reference-Based Assembly
Aligning reads to a known reference genome to guide assembly.
Gene Expression Analysis
Involves measuring the amount of mRNA to infer gene activity. Techniques include Northern blots, qPCR, microarrays, and RNAseq.
Northern Blots
Detect specific RNA sequences using complementary probes.