Title: Bioinformatics (Data Handling)
Presenter: Dr. Louise Robinson (Email: L.Robinson@derby.ac.uk)
Key Statement: "We finally have it all mapped; we just don't know where it leads us."
Related Organization: Genome CURS Synovite Inc. (Website: www.tonille.com)
Bioinformatics is defined as the use of software to analyze and interpret biological data, particularly sequence data, to gain insights into its biological significance.
Graph representation shows a significant increase in molecular data over 23 years with base pairs in billions, particularly from the National Library of Medicine and NCBI, highlighting user growth and various databases including:
GenBank
NIH Public Access
Genome Reference Consortium
Genetic Testing Registry
1000 Genomes
Users per weekday reached millions by 2012.
Growth in the Sequence Read Archive database is depicted in terabases (10^12) from 2010 to 2020, indicating the increased volume of sequencing data.
Emphasizes the relationship between DNA, RNA, and proteins:
DNA undergoes replication, transcription to mRNA, which is then translated to protein.
Terminology: "OMICS" (Ome – Whole; -ics – Study)
Key areas studied include:
Genome (DNA)
Transcriptome (RNA)
Proteome (Protein)
Various databases are continuously updated for molecular biology studies.
Structure of DNA involves:
Nitrogenous bases: Adenine (A), Thymine (T), Guanine (G), Cytosine (C)
Sugar-phosphate backbone
Base pairing rules: A-T and G-C
Important features: major and minor grooves, reading direction (5' to 3').
While the structure of DNA is well understood, the implications of mutations (e.g., single base changes) on DNA folding and accessibility remain less clear.
Advances like CRISPR stem from our sequencing knowledge.
Example DNA sequences demonstrate the complementary nature of DNA strands, emphasizing that only one strand is typically deposited in databases; understanding requires deriving the complementary strand.
In transcription, the sense strand (5'-3') and the antisense strand (3'-5') demonstrate that mRNA is produced from the antisense strand resulting in positive sense RNA.
In RNA, Uracil (U) replaces Thymine (T) when base-pairing with Adenine.
Splicing process removes introns, contributing to the differences in length and sequence between DNA and mRNA, as protein coding can be split across various genome regions.
Reverse transcription converts unstable RNA into complementary DNA (cDNA).
Using reverse transcriptase in lab processes allows for examination and analysis of RNA through its DNA representation.
Overview of reverse transcription process in retroviruses, showing the lifecycle from viral RNA to DNA integration in the host nucleus.
A description of real-time PCR as a method for amplifying DNA from clinical samples, including RNA extraction and reverse transcription leading to test results for viral presence.
Details on using reverse transcriptase to transcribe mRNA into cDNA, which is then sequenced and stored in databases; significance lies in lacking introns.
Description of the translation process from mRNA to protein, facilitated by ribosomes reading mRNA codons (from 5' to 3') and adding corresponding amino acids consecutively.
An example of coding a DNA sequence into an amino acid chain, illustrating the triplet codon format and resultant amino acids, including a stop signal.
Exposition on RNA splicing; explainer of how unnecessary coding regions (introns) are removed to yield a coherent mRNA sequence that reflects the needed protein.
Discusses the ribosomal structure which facilitates translation; an explanation of codon recognition processes in the formation of proteins from amino acids.
Detailed codon table illustrating the relationships between nucleotide triplets and respective amino acids, including stop codons and their designations.
Explains the significance of where the reading frame begins (start codon) and its implications for correct protein synthesis.
Description of protein sequences using single-letter amino acid notation, focusing on the orientation (N to C terminus).
Overview of the gene structure showing regulatory sequences, promoters, exons, introns, and untranslated regions.
Lists the diverse applications of bioinformatics in:
Proteomics
Gene expression analysis
DNA sequencing
Immunoinformatics
Genome analysis
Evolutionary biology.
Summary of major nucleotide databases which house extensive DNA and protein sequences, including:
GenBank (NCBI)
EMBL (EBI)
DDBJ (Japan)
Daily data exchange ensures consistency across platforms.
Lists protein databases:
Swiss-Prot
TrEMBL
PIR
UniProt
Describes analyses and categorizations available through the ExPASy system.
Introduction to processes involved post-sequencing for analysis and interpretation, utilizing molecular markers for genetic analysis.
BLAST (Basic Local Alignment Search Tool) enables searches for significant sequence matches in nucleic acid and protein databases based on similarity with queried sequences.
Illustrates challenges faced during DNA sequence assembly, underscoring the importance of computational analysis and sequencing techniques to piece together overlapping sequences effectively.
Details on complexity of algorithmic assembly processes that analyze genetic fragments to find relationships and reconstruct full sequences.
Description of NCBI BLAST and its capabilities to find homologies between biological sequences, offering access to multiple BLAST tools for diverse applications.
G-Query facilitates the retrieval of various information regarding sequences in nucleotide databases, showcasing its analytical capabilities.
Lists diverse databases and search categories available within the NCBI framework, connecting to various biological data resources related to literature, genes, and proteins.
An outline of practical tasks guiding students on:
Performing nucleotide BLAST
Interpreting GenBank entries
Mapping gene structures through genomic cDNA alignments.
A detailed breakdown of an annotated gene sequence illustrating UTRs, exons, introns, and other relevant features impacting gene structure understanding.
Advises students on preparative steps before practical sessions emphasizing organization, documentation, and sequence analysis instructions.
Discussion on how bioinformatics has numerous applications in biological and forensic fields, aligning educational levels with relevant job expectations.
Citation references from various forensic science international publications focused on the application of real-time PCR and transcript profiling in forensic contexts.