Introduction to Bioinformatics and Molecular Biology Concepts

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/213

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

214 Terms

1
New cards

Bioinformatics

Bioinformatics involves creating and using computer tools to handle and understand biological and health data.

2
New cards

Scope of Data

The scope of data ranges from millions (10^6) to sextillions (10^21).

3
New cards

Bit

1 piece of information (either a 1 or 0 in binary).

4
New cards

Byte

8 bits (allows 2^8 [256] characters).

5
New cards

Medical Research

Analyzing genomic and imaging data to identify disease markers, develop personalized treatments, and improve diagnostic accuracy.

6
New cards

Behavioral Data

Data from social media, surveys, and observational studies to study human behavior, social trends, and psychological patterns.

7
New cards

Central Dogma of Molecular Biology

The central dogma of molecular biology is DNA → RNA → Protein.

8
New cards

Transcription

The synthesis of RNA from a DNA template.

9
New cards

Translation

The synthesis of proteins based on the sequence of the RNA.

10
New cards

DNA Structure

DNA consists of two strands forming a double helix, with building blocks of phosphate group, deoxyribose sugar, and nitrogen bases (A, T, C, G).

11
New cards

RNA Structure

RNA is single-stranded but can form secondary structures, with building blocks of phosphate group, ribose sugar, and nitrogen bases (A, U, C, G).

12
New cards

Protein Structure

Proteins are made up of amino acids (20 different ones) linked by peptide bonds, with structures including primary, secondary, tertiary, and quaternary.

13
New cards

Genetic Code

The genetic code is a set of rules by which information encoded in DNA or RNA sequences is translated into proteins by living cells.

14
New cards

Start Codons

Start codons (AUG for methionine) signal the beginning of protein synthesis.

15
New cards

Stop Codons

Stop codons (UAA, UAG, UGA) signal the end of protein synthesis.

16
New cards

Codon Table

A codon table is used to determine the amino acid sequence from an mRNA sequence.

17
New cards

Open Reading Frame (ORF)

An open reading frame (ORF) is a part of the sequence that potentially ends up being translated into a protein.

18
New cards

Reading Frames

There are three reading frames in a sequence.

19
New cards

Hydrophobic Amino Acids

Hydrophobic (Nonpolar) amino acids like valine, leucine, and phenylalanine tend to avoid water and stabilize protein structure.

20
New cards

Hydrophilic Amino Acids

Hydrophilic (Polar) amino acids like serine, threonine, and asparagine interact well with water and are often found on the surface of proteins.

21
New cards

Charged Amino Acids

Positive and Negative Charged amino acids can form ionic bonds and are involved in binding oppositely charged molecules.

22
New cards

Protein Folding

Hydrophobic amino acids tend to cluster inside, while hydrophilic amino acids are exposed to the aqueous environment.

23
New cards

Active Sites

The chemical nature and charge of amino acids in the active sites of enzymes are critical for substrate binding and catalysis.

24
New cards

Primary Structure

Amino acid sequence.

25
New cards

Secondary Structure

Backbone interactions, no side chains, any protein can form these, sequence doesn't matter.

26
New cards

Tertiary Structure

Backbone + side chain interactions, more complex structure.

27
New cards

Quaternary Structure

Multiple polypeptides combined.

28
New cards

InDel Mutation

Addition or removal of one or more nucleotides from the DNA sequence.

29
New cards

Frameshift Mutation

Can alter the reading frame of a gene, causing a protein to be nonfunctional or function differently.

30
New cards

Point Mutation

Single nucleotide base change.

31
New cards

Silent Mutation

Change doesn't affect the amino acid sequence of a protein.

32
New cards

Missense Mutation

Change results in a different amino acid being incorporated into the protein.

33
New cards

Nonsense Mutation

Change creates a stop codon, leading to premature termination of the protein.

34
New cards

Impact of Silent Mutations

No effect on protein function because the amino acid sequence remains unchanged.

35
New cards

Impact of Missense Mutations

Can alter the protein's function depending on the properties of the new amino acid and its position in the protein.

36
New cards

Impact of Nonsense Mutations

Usually result in a nonfunctional protein because the translation process is prematurely terminated.

37
New cards

Sickle Cell Anemia

A genetic blood disorder characterized by the production of abnormal hemoglobin (HbS), causing red blood cells to assume a sickle shape.

38
New cards

Glutamic Acid to Valine Substitution

The mutation involves a glutamic acid (Glu) to valine (Val) substitution at position 6 in the β-globin gene.

39
New cards

Charge Difference in HbS

This substitution reduces the overall negative charge, making HbS less mobile in electrophoresis.

40
New cards

Gene Structure

A gene is a section of DNA that contains instructions for making a protein, including regulatory regions, a coding region, and a terminator.

41
New cards

Alternative Splicing

Allows a single gene to produce different protein versions depending on the needs of the cell.

42
New cards

GenBank

The primary database, housing raw sequence data.

43
New cards

RefSeq

Refines genetic records into high-quality reference sequences.

44
New cards

UniProtKB

Specializes in protein functionality, building curated annotations.

45
New cards

Nonredundant Nucleotide Database

Optimizes searches by eliminating duplicate sequences.

46
New cards

RefSeq Information

Provides a curated, standardized set of wild-type sequences, including DNA, RNA, and protein records that have been manually reviewed.

47
New cards

UniProtKB Focus

Focuses on proteins, compiling functional annotations and includes translated nucleotide sequences with supporting literature references.

48
New cards

Nonredundant Nucleotide Database Purpose

Streamlines search efficiency by reducing identical sequences found in GenBank, retaining a single representative entry per unique sequence.

49
New cards

Primary Database

Stores raw DNA and RNA sequence data submitted by researchers, serving as the foundational database for other genetic resources.

50
New cards

Substitution matrix

A substitution matrix is a table used to score alignments between sequences by assigning values to substitutions of one amino acid or nucleotide for another. It helps in quantifying the similarity between sequences.

51
New cards

PAM matrices

PAM (Point Accepted Mutation) matrices are based on evolutionary models and assume knowledge of ancestral sequences.

52
New cards

BLOSUM matrices

BLOSUM (BLOcks SUbstitution Matrix) matrices are derived from observed substitutions in conserved regions of proteins without assuming ancestral sequences.

53
New cards

BLOSUM62 matrix

BLOSUM matrices were developed by analyzing blocks of conserved sequences in protein families. The number in BLOSUM62, for example, indicates that the matrix was derived from sequences with at least 62% similarity.

54
New cards

Probability ratio in substitution matrices

The probability ratio compares the likelihood of a particular substitution occurring in an alignment to the likelihood of it occurring by chance. It helps in scoring alignments.

55
New cards

Calculating sequence similarity

To calculate sequence similarity, align the sequences and use the substitution matrix to score each aligned pair of residues. Sum the scores to get the overall similarity score.

56
New cards

Development process of BLOSUM62 matrix

The BLOSUM62 matrix was developed by aligning conserved blocks of protein sequences with at least 62% similarity, counting the observed substitutions, and calculating the log-odds scores for each substitution.

57
New cards

Goal of sequence alignment

The goal is to arrange sequences to identify regions of similarity that may indicate functional, structural, or evolutionary relationships.

58
New cards

Global alignment

Global alignment aligns sequences from end to end.

59
New cards

Local alignment

Local alignment finds the most similar regions within sequences.

60
New cards

Gaps in alignments

Gaps are insertions or deletions introduced to optimize alignment. They affect the alignment score by introducing penalties.

61
New cards

Evolutionary interpretation of a good alignment

A good alignment suggests that the sequences share a common ancestor and have conserved regions due to functional or structural constraints.

62
New cards

Optimal alignment approaches

Optimal approaches, like dynamic programming, guarantee the best alignment but are computationally intensive.

63
New cards

Heuristic alignment approaches

Heuristic approaches, like BLAST, are faster but may not always find the optimal alignment.

64
New cards

BLAST

BLAST stands for Basic Local Alignment Search Tool. It is used to compare a query sequence against a database to find similar sequences.

65
New cards

BLAST search process

BLAST uses a sliding window approach to break the query into words, finds neighborhood words, extends alignments to form High-scoring Segment Pairs (HSPs), and ranks results based on scores.

66
New cards

Main fields on NCBI's BLAST tool page

Key fields include the query sequence, database selection, algorithm parameters, and optional filters for refining the search.

67
New cards

Steps BLAST takes to perform a search

Word Generation: Break the query into short words. Word Matching: Find matching words in the database. Extension: Extend matches to form HSPs. Scoring: Score and rank the HSPs. Output: Display the results with scores and alignments.

68
New cards

Major sections of a BLAST results page

Interpret the major sections of a BLAST results page.

69
New cards

Results Page

The results page includes the query sequence, database matches, alignment scores, E-values (indicating statistical significance), and detailed alignments showing mismatches and gaps.

70
New cards

Cladograms

Cladograms show relationships without indicating time.

71
New cards

Chronograms

Chronograms include time.

72
New cards

Phylograms

Phylograms show evolutionary distances.

73
New cards

Monophyletic Groups

Monophyletic groups include an ancestor and all its descendants.

74
New cards

Polyphyletic Groups

Polyphyletic groups include unrelated organisms.

75
New cards

Paraphyletic Groups

Paraphyletic groups include an ancestor and some, but not all, descendants.

76
New cards

Phylogenetic Trees as Hypotheses

Phylogenetic trees represent hypotheses about evolutionary relationships based on available data and can be tested and refined with new information.

77
New cards

Parts of a Phylogenetic Tree

Key parts include branches (representing evolutionary paths), nodes (common ancestors), and leaves (current species or sequences).

78
New cards

Evolutionary Relationships

The tree shows how species or sequences are related through common ancestors, with closely related species sharing more recent common ancestors.

79
New cards

PCR (Polymerase Chain Reaction)

PCR amplifies specific DNA sequences using cycles of denaturation (separating DNA strands), annealing (binding primers to target sequences), and extension (synthesizing new DNA strands). The result is a large quantity of the target DNA sequence.

80
New cards

Sanger Sequencing

Uses dideoxy nucleotides to terminate DNA synthesis, allowing sequence determination by fragment length.

81
New cards

Illumina Sequencing

Uses sequencing by synthesis with optical detection of incorporated nucleotides.

82
New cards

Ion Semiconductor Sequencing

Detects nucleotide incorporation by measuring changes in pH.

83
New cards

Nanopore Sequencing

Reads DNA sequences by detecting changes in electrical current as DNA passes through a nanopore.

84
New cards

PacBio Sequencing

Uses real-time sequencing by synthesis with long read lengths.

85
New cards

Dideoxy Nucleotides (ddNTPs)

Researchers use dideoxy nucleotides (ddNTPs) to terminate DNA synthesis at specific points.

86
New cards

Gel Electrophoresis

By running the resulting fragments through gel electrophoresis, they can determine the sequence based on fragment length.

87
New cards

Sequencing by Synthesis

Sequencing by synthesis involves synthesizing a complementary DNA strand and detecting the incorporation of nucleotides.

88
New cards

Genome Fragmentation

Technologies like Illumina and Ion Semiconductor fragment genomes before sequencing to create smaller, manageable pieces for analysis.

89
New cards

Short Reads

Illumina (100-300 bp), Ion Semiconductor (200-400 bp).

90
New cards

Long Reads

Nanopore (up to several kb), PacBio (up to 20 kb).

91
New cards

Genome Assembly Problems

Problems include repetitive sequences, gaps, and errors in read alignment, which can complicate the assembly process.

92
New cards

Reads

Short DNA sequences generated during sequencing.

93
New cards

Paired-End Reads

Sequences from both ends of a DNA fragment, providing more information for assembly.

94
New cards

Single-End Reads

Sequences from one end of a DNA fragment.

95
New cards

Contigs

Continuous sequences of DNA assembled from overlapping reads.

96
New cards

Scaffolds

Groups of contigs linked together using paired-end reads or other information.

97
New cards

De Novo Assembly

Assembling a genome without a reference, using only the sequence data.

98
New cards

Reference-Based Assembly

Aligning reads to a known reference genome to guide assembly.

99
New cards

Gene Expression Analysis

Involves measuring the amount of mRNA to infer gene activity. Techniques include Northern blots, qPCR, microarrays, and RNAseq.

100
New cards

Northern Blots

Detect specific RNA sequences using complementary probes.