BIOINFORMATICS (Module1).ppt

Page 1

BIOINFORMATICS

  • Module 1 Introduction

Page 2

"Artificial Life" Breakthrough

  • Craig Venter announces potential creation of the first artificial life form

  • Aimed to combat illness and global warming

  • Synthetic chromosome built from laboratory-made chemicals

  • Upcoming announcement expected soon

Page 3

Nobel Prize in Physiology or Medicine 2005

  • Awarded for the discovery of Helicobacter pylori’s role in gastritis and peptic ulcer disease

  • Recipients: Barry J. Marshall and J. Robin Warren

Page 4

Nobel Prize in Physiology or Medicine 2006

  • Awarded for the discovery of RNA interference

  • Recipients: Andrew Z. Fire and Craig C. Mello

Page 5

Nobel Prize in Physiology or Medicine 2007

  • Awarded for discoveries related to gene modifications in mice via embryonic stem cells

  • Recipients: Mario R. Capecchi, Oliver Smithies, and Sir Martin J. Evans

Page 6

Molecular Biology Advances

  • Over 500 years worth of research challenges in biology

  • DNA structure discovery in 1953 paved the way for molecular biology advancements

  • Increasing biological data requires interdisciplinary approaches involving math and computer science

  • Emergence of Computational Molecular Biology and Bioinformatics as fields

Page 7

Commercial Market Overview

  • Current bioinformatics market valued at $300 million/year

  • Predicted to grow to $2 billion/year in 5-6 years

  • Key bioinformatics companies include:

    • Genomatrix Software, Genaissance Pharmaceuticals, DeCode Genetics, etc.

Page 8

Computational Molecular Biology & Bioinformatics

  • Combination of computer science and mathematical techniques to solve molecular biology issues

Page 9

Bioinformatics Units

  1. Basic Concepts

  2. Suffix Trees and Applications

  3. Sequence Alignment: Pairwise Alignment, Multiple Alignments

  4. Sequencing

  5. Motif Prediction

Page 10

Unit 1: Basic Concepts of Molecular Biology

  • Focus on Cellular Architecture, Nucleic Acids (RNA & DNA), DNA replication, repair, and recombination

  • Understanding transcriptions, genetic codes, and protein structures

  • Statistical methods including estimation, hypothesis testing, and Markov models

Page 11

Unit 2: Suffix Trees

  • Definition, examples, and algorithms (Ukkonen’s linear-time)

  • Applications include exact string matching, longest common sub-strings

  • Understanding pairwise sequence alignment (edit distances, dynamic programming)

Page 12

Unit 3: Sequence Alignment

  • Local pairwise sequence alignment

  • Need and methodology for multiple sequence alignments

  • Searches for similar sequences in databases (using FASTA, BLAST)

Page 13

Unit 4: Sequencing

  • Techniques including fragment assembly and sequencing by hybridization

Page 14

Unit 5: Motif Prediction

  • Motif prediction processes and methods for protein structure prediction

Page 15

Recommended Books

  • "Algorithms on Strings, Trees and Sequences" by Dan Gusfield

  • "Introduction to Computational Molecular Biology" by J. Setubal & Meidanis

  • "Statistical Methods in Bioinformatics" by W.J. Ewens & G.R. Grant

Page 16-18

Continuation of Recommended Literature

  • Works by R. Durbin et al., N.C. Jones & P.A. Pevzner, D.E. Krane et al., and more

Page 19

Class 2 Introduction

Page 20

Unit 1 Focus

  • Key elements: DNA, RNA, Protein, Genetic Code

Page 21

Craig Venter's Breakthrough

  • Overview of Venter’s creation of Mycoplasma laboratorium and its implications for global warming mitigation

Page 22

Basics of Genetics

  • One cell contains a copy of the genome (blueprint for individual traits)

  • Discussion of chromosomes as chapters in a genomic book containing genes

Page 23

Venter's Language Understanding

  • Insights into how Venter comprehended genetic coding language

Page 24

Venter's Genetic Advancements

  • Development of a synthetic chromosome with 381 genes to produce new life forms

Page 25

Advancements in Genome Creation

  • Historical context of genomic research culminating in Venter's achievements

Page 26

Bioinformatics Need Emergence

  • Clarifying the urgency in bioinformatics due to increased biological data complexity

Page 27

Computational Molecular Biology Explanation

  • Emphasis on CMB combining computer science and biology for problem solving

Page 28

Living vs. Nonliving

  • The distinctions based on movement, reproduction, and environmental interaction

Page 29

Characteristics of Living Organisms

  • The role of chemical reactions in sustaining life and the interaction with surroundings

Page 30

Origins of Life

  • Life began approximately 3.5 billion years ago, evolving into diverse forms compatible with earth's molecular chemistry

Page 31

Key Biological Molecules

  • Proteins define physical traits while nucleic acids convey genetic information

Page 32

Functions of Proteins

  • Various roles proteins play, such as enzymes, transport molecules, and cellular structure builders

Page 33

Amino Acids Overview

  • Explanation of hydrophobic and hydrophilic amino acid properties in protein construction

Page 34

Polypeptide Chain Structure

  • Description of polypeptide orientation from N-terminal to C-terminal

Page 35

Protein Structure Types

  • Different levels of protein structure: primary, secondary, tertiary, and quaternary

Page 36

Basic Genomic Code

  • Exploration of mRNA codon mapping and relationship to amino acids

Page 37

DNA Fundamentals

  • Definition of DNA structure focusing on nucleotides and base pairing

Page 38

DNA Molecular Structure

  • Description of DNA as a double-stranded helix with sugar-phosphate backbones

Page 39

Nucleotide Components

  • Components of nucleotides: sugars, phosphates, and nitrogenous bases

Page 40

Purines vs Pyrimidines

  • Explanation of base types in nucleotides (A, G as purines; C, T as pyrimidines)

Page 41

Complementary Base Pairing

  • Overview of Watson-Crick base pairing rules in DNA structure

Page 42

RNA Overview

  • Discussion on the structure and function of RNA compared to DNA

Page 43

Key Differences: RNA and DNA

  • Comparative analysis of DNA and RNA based on structure and roles in protein synthesis

Page 44

Class 3 Introduction

Page 45

Central Dogma of Molecular Biology

  • The flow of genetic information: DNA -> RNA -> Protein

Page 46

Transcription and Translation Processes

  • Overview of gene transcription to mRNA and subsequent translation to protein

Page 47

Intron-Exon Dynamics

  • Description of splicing introns from mRNA prior to protein synthesis

Page 48

Summary of Central Dogma

  • Visualization of transcription and translation processes from DNA to protein

Page 49

Concept of Junk DNA

  • Understanding of genetic regions without clear function termed as "junk DNA"

Page 50

Open Reading Frame (ORF) Definition

  • Description of ORF in DNA sequence and its significance in translation

Page 51

Genome Definition

  • Complete set of chromosomes characterizing species, with examples from humans and mice

Page 52

Genome as a Computer Program Analogy

  • Genome equated to a computer program governing organism functionality

Page 53

Class 4 Introduction

Page 54

Eye Development Gene Studies

  • Case study on the eyeless gene in fruit flies and its human counterpart

Page 55

Gene Function Comparison

  • Exploring functional similarities between eyeless and aniridia genes across species

Page 56

Historical Context of Sequence Analysis

  • Evolution of sequence analysis from manual methods to computer-assisted techniques

Page 57

Bioinformatics Tools Evolution

  • Advancements in software tools significantly impacting molecular biology practices

Page 58

Genome Study Techniques

  • Overview of sequencing and its challenges in studying human genetic materials

Page 59

Cutting and Manipulating DNA

  • Usage of restriction enzymes as tools for DNA manipulation

Page 60

DNA Cloning Processes

  • Methods of copying DNA using host organisms for amplification

Page 61

DNA Analysis Techniques

  • Gel electrophoresis as a primary method for DNA fragment analysis

Page 62

Overview of the Human Genome Project

Page 63

HGP Components

  • Multi-disciplinary research involving chemistry, biology, engineering, physics, ethics, informatics

Page 64

Objectives of the Human Genome Project

  • Aims to identify human genes, sequence the human genome, and address ethical concerns

Page 65

DOE Involvement in HGP

  • Historical context linking radiation studies to genome research

Page 66

Reference Genome Composition

  • First reference genome made from multiple individual samples across ethnicities

Page 67

Benefits of HGP Research

  • Advancements in medicine, agriculture, forensic science, and evolutionary biology

Page 68

Ethical Implications in HGP

  • Addressing concerns involving genetic data privacy, testing, and social issues

Page 69

Further HGP Information

Page 70

Collaborative Nature of HGP

  • Importance of databases and computational analysis for genome research

Page 71

Genetic Disease Treatment Advances

  • Pioneering results emerging from HGP data application for disease treatment

Page 72

Class 5 Introduction

Page 73

Understanding Databases

  • Definition and significance of databases in biological research

Page 74

History of Biological Databases

  • Timeline of significant developments in biological database systems

Page 75

Functions of Biological Databases

  • Roles of databases in data accessibility and computational research needs

Page 76

Database Types Overview

  • Different classes of biological databases based on data types and entry methods

Page 77

Data Quality Control Mechanisms

  • Importance of data curation and validation in biological databases

Page 78

Database Technical Design

  • Various database architectures employed in managing biological data

Page 79

Accession Codes and Identifiers

  • Explanation of how database entries are uniquely defined and identified

Page 80

Identifier Characteristics

  • Discussion on the nature of identifiers in database entries

Page 81

Accession Code Stability

  • Importance of stable accession codes for consistent entry tracking

Page 82

Primary Nucleotide Sequence Databases

  • Key examples (EMBL, GenBank, DDBJ) and their characteristics

Page 83

Detailed Description of Databases

  • Overview of EMBL, GenBank, DDBJ operational roles in sequencing data management

Page 84

Secondary Nucleotide Sequence Databases

  • Explanation of databases that build upon primary data for enhanced features

Page 85

Protein Sequence Databases Overview

  • Distinction of curated databases focusing on protein sequences

Page 86

SWISS-PROT vs PIR

  • Comparison of two notable protein databases, with emphasis on annotation quality

Page 87

PIR Database Insights

  • Overview of the Protein Information Resource’s capabilities and history

Page 88

Other Relevant Databases

  • Examples of databases catering to specific biological or genetic information needs

Page 89

Popular Biological Databases

  • Overview of well-regarded databases for ease of access and information consolidation

Page 90

Bioinformatics Database Resources

  • List of popular bioinformatics database websites for research and analysis

Page 91

Growth of GenBank

  • Visualization of the expansion of the GenBank database over time

Page 92

NCBI Overview

  • History, mission, and role in public databases and computational biology

Page 93

NCBI Database Overview

  • List of various NCBI database offerings for nucleotides and proteins

Page 94

Nucleotide Database Components

  • Comprehensive overview of available Sequence databases at NCBI

Page 95

NCBI Database Types

  • Differentiation between primary and derivative databases in the NCBI framework

Page 96

Entrez Database Search Engine

  • Summary of capabilities provided by NCBI's Entrez search engine

Page 97

Literature and Text Resource

  • Access to biomedical literature and related databases at NCBI

Page 98

Overview of Nucleotide Databases

  • Summary of primary nucleotide database statistics

Page 99

EMBL/GenBank/DDBJ Collaborative Nature

  • Description of how these databases synchronize sequences and data

Page 100

Protein Databases Overview

  • Insight into the features of major protein databases

Page 101

Secondary Protein Database Insights

  • Details on SWISS-PROT and PIR's notable features, advantages, and uses

Page 102

UniProt Description

  • Overview of UniProt as an extensive protein information repository

Page 103

NCBI Derivative Sequence Data

  • Example genetic sequences illustrating NCBI data curation methods

Page 104

High-throughput DNA Sequencing Visualization

  • Images depicting sequences and technological advancements in sequencing

Page 105

Data Growth in Bioinformatics

  • Trends in biotechnology and implications for computational bioinformatics

Page 106

Managing Information Overload

  • The role of bioinformatics in processing large amounts of biological data

Page 107

Bioinformatics Needs and Algorithms

  • Historical context of bioinformatics development and its algorithmic requirements

Page 108

Internet and Bioinformatics

  • Importance of internet access to databases for biological research

Page 109

Bioinformatics Workflow Visualization

  • Overview of bioinformatics data processing workflow and tools

Page 110

Market Overview for Bioinformatics

  • Current market valuation and projection for growth in bioinformatics

Page 111

Scope of Bioinformatics Resources

  • Understanding the resources created for biologists accessing data

Page 112

Critical Database Interactions

  • Discussion of the interaction between major databases in bioinformatics

Page 113

Specialized Bioinformatics Databases

  • Examples of specialized databases with links to various resources

Page 114

High-Level Protein Databases

  • Overview of specific databases focused on protein sequence information

Page 115

Database Homology Searching Techniques

  • Introduction to algorithms and scoring methodologies for sequence analysis

Page 116

Scoring Systems in Sequence Alignments

  • Overview of scoring raw scores and matrices in alignments

Page 117

Creation of Scoring Matrices

  • Methodology for developing scoring matrices to assess sequence similarity

Page 118

Influence of Scoring Matrices

  • Importance of scoring matrix choice on analysis outcomes

Page 119

Sequence Alignment Methodologies

  • Differentiation between global and local sequence alignment strategies

Page 120

Algorithm Use in Database Search

  • Comparative analysis of common algorithms used for similarity searches

Page 121

Overview of Genomic Sequencing by 2002

  • Summary of progress in genomic sequencing across numerous organisms

Page 122

Comparison Dilemma: DNA vs Protein

  • Discussion on accuracy in nucleotide vs protein sequence comparisons

Page 123

Implications of Sequence Comparison Approaches

  • Importance of using appropriate comparison methods based on sequence type

Page 124

BLAST and FASTA Variants

  • Summary of different variants of search tools for sequence comparison

Page 125

Practical Example of Sequence Analysis

  • Visualization of NCBI tools for protein analysis and alignment

Page 126

Explanation of E-Value in Sequence Searches

  • Discussion of E-value implications in assessing search significance

Page 127

Database Searching Recommendations

  • Guidelines for effective searches in biological databases

Page 128

Popular Bioinformatics Analysis Sites

  • List of widely used alignment and translation tools in bioinformatics.