knowt logo

4P06 final

Platform

Sequencing  Principle

Highest  Capacityper  run;format

Accuracy

Direct  Cost  ($/GB)

Machine  cost

Highlights/error  type

Illumina

SBS, fluorescent

NovaSeqP25B:

>99%,fading

$5 or

Versatile;

High capacity &

(2nd Gen)

dNTPs, reversibleterminators, 4Ns at  a time

150 bp x 2 x 3Bx8= 7.2 Tbp;  SE/PE/ MP;PCR

at the ends

>$50?(72hrs)

>750k

accuracy, PE and MPreads, shortreads;  substitution

IonTorrent/  proton(2nd Gen)

SBS, regulardNTPs,  pH changes, 1 N at  atime

S5/S5 L540:200bp x  80M=16Gbp;SR;  PCR

>99%? High  error ratefor  long homo-  polymers

~$1000  (~4hrs)

80K

Cheap reagents,quick  sequencing,lower  capacity/high on  indels

PacBio

SBS, fluorescent

Sequel II: 20kb x

87% insingle

$2000

700K

Long reads, higherror

(3rd Gen)

dNTPs, all together,

1.5M =30Gbp,SR;

round read,

(2- 30

rate, lower capacity,

continuously, noterminators

Single molecule,real time/SMART

99.9%(HiFi)

hrs)

DNA methylation

Nanopore

Sequencing by the

PromethION 48:

~85%?

$2-16

MinION:

Extremely longreads,

(3rd Gen)

current profile of

1Mb x 50 x200K

(1 min to

$1K;

excellent mobility,

the passing Ns on

=~10TB; SR;

72 hrs)

PromethIO

DNA methylation, low

the DNA through

single molecule,

N 48

capacity/high error

the pore

real time

CapEX:!600K

on substitution &indels

Comparison of NGS platforms  quality vs read length (bp)

DNA sequencing overview:

Cost per human genome ($3B to $300 in 30 yrs)

Sequencing cheaper than data analysis

Cost

v Cost of sequencing library/sample: very little room for decreasing

v Cost of sequencing ($/GB): decrease with increasing throughput; but  limited by the maximal capacity of indexing

v $300 as the wall?

Ø  Read length

v Read length limit for the platforms: 1MB for ONT

v DNA integrity limit: <50kb? Up to 1Mb?

Ø  Accuracy

v Low accuracy for the long read platforms, especially for  ONT

Ø  The relationship between read length vs.  coverage.

v Only 2 coverage is required if …..

v The shorter the reads, the higher the coverage required.

v Long read is not always advantageous

= Illumina most effective WHEN Long read is not always advantageous

NOTE: RAW DATA WILL PRODUCE MORE DISCREPACY

Ø  Opportunities:

§  Cheaper: down to $1 k, now $300 per human genome

§  Faster: less than 1 week or even 1 day per genome

Ø  Challenges:

§  Data management:

§  Storage: raw data at hundreds of GB, total to TB level per genome

§  Data analysis:

§  difficult to assemble;

§  How to identify variations? SAM/BAM/CRAM format

APPLICATIONS OF NGS:

Ø  Genome sequencing of new species:

Ø  more suitable for smaller genomes with less repetitive sequences

Ø  Re-sequencing of model organisms:

Ø  identifying genome variations based on references

Ø  RNA sequencing for gene expression

Ø  replacing microarray? YES

Ø  Metagenomics:

Ø  Study the dynamics of environmental biological communities

PAIR END, MATE PAIR SEQUENCING

Diagram

YOU CAN RECONNECT THE  PAIRED ENDS

Pair end, mate pair sequencing (applications) Ø ENCHANCING ASSEMBLY QUALITY FROM SHORT READS v The pair reads extend the length of single reads v gDNA fragments can be longer than the sum of the two reads v Mate pairs help joining contigs Ø DETECT STRUCTURAL VARIANTS with PE reads v Deletions v Insertions v Translocation v Inversions

  • two shorter regions can be covered by

  • if you have a test genome in the test genome

THE USE OF HI-C SEQ: Method: The method is known as chromosome conformation capture, which has different versions such as 1/2/3/4/5-C. Application: It is used for unbiased identification of DNA regions in physical proximity in the nucleus, including loops and topologically associating domains (TADs). The method is also applied in assigning sequences into chromosomes.

  • SCAFFOLDS IN THE GENOME TEND TO BE CONNECTED

Genome Sequencing & assembly

Motivation

Sequencing strategies: (Historical & Current)

Historical: BAC cloning, chromosomal walking

Current: whole genome shotgun sequencing

Steps:

  1. Make DNA libraries

  2. Sequencing (SEQUENCING THEN ASSEMBLE TOGETHER )

  3. Assembly

  4. Annotation (HOW THIS GENOME IN TERMS OF STRUCTURE DIFFERS FROM ANOTHER GENOME?)

COST OF HUMAN GENOME SEQUENCING:

  • WAS  3 billion for completed human genome projects in 1990s

  • Current reachable cost (goal)  1 thousand per genome

HOW TO MAKE A GENOMIC LIBRARY:

  1. USE A VECTOR ( SUCH AS A PLASMID FOR EXAMPLE)

  2. BREAK UP THE DNA !!!

  3. THEN GET A PRIMER AND DUPLICATE

HOW TO SEQUENCE ?? THE ULTIMATE STRAEETGY:

An idealized representation of the hierarchical shotgun sequencing strategy is shown to the right:The genomic DNA fragments  represented in the BAC  library are organized into a physical mapindividual BAC clones are selected and sequenced by  the random shotgun strategy. Finally, the clone sequences  are assembled to reconstruct the sequence of the genome.  (Lander, et al, 2001, Nature)

SHOTGUN SEQUENCING:

heirchal shotgun sequencing

Diagram

Shotgun Sequencing - Another definition:

Shotgun sequencing is a sequencing method that involves randomly breaking down the DNA or RNA molecule into smaller fragments. The fragmented DNA or RNA is then sequenced, generating short reads of the nucleotide sequence. These reads are then assembled using computational algorithms to reconstruct the original sequence of the DNA or RNA molecule.

The process of shotgun sequencing involves several steps. First, the DNA or RNA sample is extracted and randomly sheared into small fragments. Then, adapters are added to the ends of the fragments, which enables them to bind to a sequencing platform. Next, the fragments are amplified and sequenced using high-throughput sequencing platforms, such as Illumina or PacBio. Finally, computational algorithms are used to assemble the short reads into longer contiguous sequences, known as contigs, and then into larger genomic scaffolds.

traditional methods

Below are they levels of clone and sequence coverage:

Diagram, timeline

Note: the gaps in between are called scaffolds

Whole Genome Shotgun:

Whole Genome Shotgun Sequencing Method:DNA was cut into small pieces and sequenced completely.These fragments were organized into contigs ( a contiguous stretch of DNA or RNA sequence that has been assembled from overlapping sequencing reads)

Diagram

Terms to note:

Ø  Sequence contigs: Contigs produced by merging overlapping sequence  reads;

THUS Sequence contigs = continuous sequence

Ø  Sequence scaffolds: Scaffolds produced by joining contigs on the basis  of linking information with gaps (”NNNNNNN”) at estimated sizes.

Ø  Text Box:  Sequencing coverage: The number of sequences covering any given  point of a genome; =total sequence length/genome size.

What would be a good coverage?

Ø  N50: A measure of contig/scaffold length in a genome assembly. Specifically, it is the maximum length L such that 50% of all nucleotides lie in contigs (or scaffolds) of size at least l.

n50 is a middle point/ a standard measure of the genome ( the better the n50 the better the quality of the genome)

Ø  L50: the number of sequencing to reach 50% of the genome

Large Sequencing Projects:

Genomes:

v 1000 Genome projects; 2008-2015; The Genome 10K Project (G10KP)

v Personal genome projects (PGP): 2005--, PGP-Canada (2012)

v Non-human: Earth BioGenome Project (EBP): 2017

Exomes:

Exome Aggregation Consortium (ExAC): exome sequencing for >60,000  individiuals; published in 2016

1000 Personal Genome Project

The 1000 Genome Project aimed to sequence at least 1000 individuals from various populations worldwide and catalogue human genetic variation down to variants occurring at 1% frequency or less. The project completed in 2015 with 2504 individuals from 26 populations and identified 88 million genetic variants, with the most variation found in African ancestry. The project used a combination of whole-genome sequencing, deep exome sequencing, and high-density SNPs microarray to obtain data. The results include thousands of variants associated with complex traits and rare diseases, along with overlapping regulatory regions.

Summarized:

 Began in 2007 and completed in 2015

  HapMap (Haplotype map of the human genome) was the previous owner // organization

 Millions of SNPs were discovered and GWAS (genome wide association studies) used the dataset for  research in disease association

 2007 GOALS: sequence min. 1000 volunteers from populations worldwide

 RESULT:  greatest variation sites same from African ancestry

 RESULT:  88 million variants:

Personal Genome Project

The Personal Genome Project (PGP) aims to publicly share the complete genomes and medical records of thousands of participants. It provides researchers with genomic, environmental, and human trait data to study the relationships between genotype, environment, and phenotype.

The PGP raises ethical, legal, and social issues regarding privacy, informed consent, and data accessibility.

 Initiated by George Church in 2005

The Personal Genome Project Canada launched in 2007 and sequenced DNA from whole blood using the Illumina HiSeq X system.

LECTURE 8: GENE PREDICTION/GENOMEANNOTATION – March 16, 2023

REMEMBER: GENOME SEQUENCING & ASSEMBLY PROCESS  motivation  find out the proper sequencing Strategy ( Historical or Current) Historical: BAC cloning, chromosomal walking Current: whole genome shotgun sequencing (WGS)  Follow the steps of Whole Genome Sequencing  Make DNA libraries  Sequencing  Assembly  Annotation (predict the genome? what does it mean? where are the transposable elements?)

Bioinformatics: Steps in genome assembly

  1. Preprocessing ( clean up reads  remove low quality parts)

  2. Contiging (reads to contigs)

  3. Polishing (error correction)

  4. Scaffolding ( longer pieces) THEN, GENOME ANNOTIATION

REMAINING CHALLENGES IN GENOME SEQUNCING:  Obtaining accurate continuous sequences for individual chromosomes v Errors in joining contigs (e.g. highly repeated regions) v Lack of sequences for certain regions (e.g. centromeres) Obtaining assembled sequences representing the diploid nature of genomes v Difficulties in obtaining long DNA molecules v Lack of diploid (flattened consensus) phasing with long reads v Short haplotype structure • Solutions: hybrid between long read and short read NGS, chromosomal imaging… • Would the use of Hi-Seq help here? Note : The use of HiSeq, which is a short-read sequencing technology, may not be sufficient on its own to address these issues. Instead, a combination of different sequencing technologies and approaches is often necessary to obtain accurate and complete diploid genome sequences.

Ploidy, Haplotype, Phasing

(top image) At the end of assembly, you can generate 2 sets of sequences ( each haplotype - one unphased and one phased: complete picture of genetic variation)

(bottom left image)

 A) the first image may be insufficient to carry on to the next generation.

 B) the second pair may bring diversity to the couple and all favorable portions should ( theoretically) be involved

A TERM TO KNOW:

Haplotype  A haplotype is group of variants in a section of a chromosome that tend to stay together in transmission across generations ( Piang Liang)

 they are important for assessing functional impact of  variants (genetic variations, WGA and pop studies)

 date back to the Human HapMap era ( 2002 – 2010)

(TOP) Future Challenges:

  1. Precise, predictive model of transcription initiation and termination: ability to predict where and when transcription will occur in a genome

  2. Precise, predictive model of RNA splicing/ alternative splicing: ability to predict the splicing pattern of any primary transcript in any tissue

  3. Accurate ab initio protein structure prediction

CAN YOU IDENITFY EXONS VS INTRONS IN A GENOME SEQ?? Introns in BLACK Exons in PINK

WHAT IS A GENE?  A locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions (promotor), transcribed regions and/or other functional sequence regions.

HOW HAVE SCIENTISTS DEFINED IT THRU THE YEARS??  A gene is the most basic unit of inheritance  Gregor Mendel; Traits are determined by discrete unit that are passed from one generation to the next (1860)  Wilhelm Johanssen coined the word “gene” for the unit associated with an inherited trait (1909)  Thomas Morgan “genes as beads on a string” (1910)  George Beadle “one gene, one enzyme” (1941)  Avery and MacLeod and McCarty “genes are made of DNA” (1944)  Watson and Crick “Info flows from DNA to RNA: (1953)  Richard Roberts and Philip Sharp - Gene Splicing (1977)  Discover of microRNA & RNA interference (1993)

WHAT ARE THE COMPONENTS OF A GENE? (i.e. introns, exons, promoters, enhancers, silencers) Introns  Exons  Promoters  Enhancers  Silencers  Operons 

VM

4P06 final

Platform

Sequencing  Principle

Highest  Capacityper  run;format

Accuracy

Direct  Cost  ($/GB)

Machine  cost

Highlights/error  type

Illumina

SBS, fluorescent

NovaSeqP25B:

>99%,fading

$5 or

Versatile;

High capacity &

(2nd Gen)

dNTPs, reversibleterminators, 4Ns at  a time

150 bp x 2 x 3Bx8= 7.2 Tbp;  SE/PE/ MP;PCR

at the ends

>$50?(72hrs)

>750k

accuracy, PE and MPreads, shortreads;  substitution

IonTorrent/  proton(2nd Gen)

SBS, regulardNTPs,  pH changes, 1 N at  atime

S5/S5 L540:200bp x  80M=16Gbp;SR;  PCR

>99%? High  error ratefor  long homo-  polymers

~$1000  (~4hrs)

80K

Cheap reagents,quick  sequencing,lower  capacity/high on  indels

PacBio

SBS, fluorescent

Sequel II: 20kb x

87% insingle

$2000

700K

Long reads, higherror

(3rd Gen)

dNTPs, all together,

1.5M =30Gbp,SR;

round read,

(2- 30

rate, lower capacity,

continuously, noterminators

Single molecule,real time/SMART

99.9%(HiFi)

hrs)

DNA methylation

Nanopore

Sequencing by the

PromethION 48:

~85%?

$2-16

MinION:

Extremely longreads,

(3rd Gen)

current profile of

1Mb x 50 x200K

(1 min to

$1K;

excellent mobility,

the passing Ns on

=~10TB; SR;

72 hrs)

PromethIO

DNA methylation, low

the DNA through

single molecule,

N 48

capacity/high error

the pore

real time

CapEX:!600K

on substitution &indels

Comparison of NGS platforms  quality vs read length (bp)

DNA sequencing overview:

Cost per human genome ($3B to $300 in 30 yrs)

Sequencing cheaper than data analysis

Cost

v Cost of sequencing library/sample: very little room for decreasing

v Cost of sequencing ($/GB): decrease with increasing throughput; but  limited by the maximal capacity of indexing

v $300 as the wall?

Ø  Read length

v Read length limit for the platforms: 1MB for ONT

v DNA integrity limit: <50kb? Up to 1Mb?

Ø  Accuracy

v Low accuracy for the long read platforms, especially for  ONT

Ø  The relationship between read length vs.  coverage.

v Only 2 coverage is required if …..

v The shorter the reads, the higher the coverage required.

v Long read is not always advantageous

= Illumina most effective WHEN Long read is not always advantageous

NOTE: RAW DATA WILL PRODUCE MORE DISCREPACY

Ø  Opportunities:

§  Cheaper: down to $1 k, now $300 per human genome

§  Faster: less than 1 week or even 1 day per genome

Ø  Challenges:

§  Data management:

§  Storage: raw data at hundreds of GB, total to TB level per genome

§  Data analysis:

§  difficult to assemble;

§  How to identify variations? SAM/BAM/CRAM format

APPLICATIONS OF NGS:

Ø  Genome sequencing of new species:

Ø  more suitable for smaller genomes with less repetitive sequences

Ø  Re-sequencing of model organisms:

Ø  identifying genome variations based on references

Ø  RNA sequencing for gene expression

Ø  replacing microarray? YES

Ø  Metagenomics:

Ø  Study the dynamics of environmental biological communities

PAIR END, MATE PAIR SEQUENCING

Diagram

YOU CAN RECONNECT THE  PAIRED ENDS

Pair end, mate pair sequencing (applications) Ø ENCHANCING ASSEMBLY QUALITY FROM SHORT READS v The pair reads extend the length of single reads v gDNA fragments can be longer than the sum of the two reads v Mate pairs help joining contigs Ø DETECT STRUCTURAL VARIANTS with PE reads v Deletions v Insertions v Translocation v Inversions

  • two shorter regions can be covered by

  • if you have a test genome in the test genome

THE USE OF HI-C SEQ: Method: The method is known as chromosome conformation capture, which has different versions such as 1/2/3/4/5-C. Application: It is used for unbiased identification of DNA regions in physical proximity in the nucleus, including loops and topologically associating domains (TADs). The method is also applied in assigning sequences into chromosomes.

  • SCAFFOLDS IN THE GENOME TEND TO BE CONNECTED

Genome Sequencing & assembly

Motivation

Sequencing strategies: (Historical & Current)

Historical: BAC cloning, chromosomal walking

Current: whole genome shotgun sequencing

Steps:

  1. Make DNA libraries

  2. Sequencing (SEQUENCING THEN ASSEMBLE TOGETHER )

  3. Assembly

  4. Annotation (HOW THIS GENOME IN TERMS OF STRUCTURE DIFFERS FROM ANOTHER GENOME?)

COST OF HUMAN GENOME SEQUENCING:

  • WAS  3 billion for completed human genome projects in 1990s

  • Current reachable cost (goal)  1 thousand per genome

HOW TO MAKE A GENOMIC LIBRARY:

  1. USE A VECTOR ( SUCH AS A PLASMID FOR EXAMPLE)

  2. BREAK UP THE DNA !!!

  3. THEN GET A PRIMER AND DUPLICATE

HOW TO SEQUENCE ?? THE ULTIMATE STRAEETGY:

An idealized representation of the hierarchical shotgun sequencing strategy is shown to the right:The genomic DNA fragments  represented in the BAC  library are organized into a physical mapindividual BAC clones are selected and sequenced by  the random shotgun strategy. Finally, the clone sequences  are assembled to reconstruct the sequence of the genome.  (Lander, et al, 2001, Nature)

SHOTGUN SEQUENCING:

heirchal shotgun sequencing

Diagram

Shotgun Sequencing - Another definition:

Shotgun sequencing is a sequencing method that involves randomly breaking down the DNA or RNA molecule into smaller fragments. The fragmented DNA or RNA is then sequenced, generating short reads of the nucleotide sequence. These reads are then assembled using computational algorithms to reconstruct the original sequence of the DNA or RNA molecule.

The process of shotgun sequencing involves several steps. First, the DNA or RNA sample is extracted and randomly sheared into small fragments. Then, adapters are added to the ends of the fragments, which enables them to bind to a sequencing platform. Next, the fragments are amplified and sequenced using high-throughput sequencing platforms, such as Illumina or PacBio. Finally, computational algorithms are used to assemble the short reads into longer contiguous sequences, known as contigs, and then into larger genomic scaffolds.

traditional methods

Below are they levels of clone and sequence coverage:

Diagram, timeline

Note: the gaps in between are called scaffolds

Whole Genome Shotgun:

Whole Genome Shotgun Sequencing Method:DNA was cut into small pieces and sequenced completely.These fragments were organized into contigs ( a contiguous stretch of DNA or RNA sequence that has been assembled from overlapping sequencing reads)

Diagram

Terms to note:

Ø  Sequence contigs: Contigs produced by merging overlapping sequence  reads;

THUS Sequence contigs = continuous sequence

Ø  Sequence scaffolds: Scaffolds produced by joining contigs on the basis  of linking information with gaps (”NNNNNNN”) at estimated sizes.

Ø  Text Box:  Sequencing coverage: The number of sequences covering any given  point of a genome; =total sequence length/genome size.

What would be a good coverage?

Ø  N50: A measure of contig/scaffold length in a genome assembly. Specifically, it is the maximum length L such that 50% of all nucleotides lie in contigs (or scaffolds) of size at least l.

n50 is a middle point/ a standard measure of the genome ( the better the n50 the better the quality of the genome)

Ø  L50: the number of sequencing to reach 50% of the genome

Large Sequencing Projects:

Genomes:

v 1000 Genome projects; 2008-2015; The Genome 10K Project (G10KP)

v Personal genome projects (PGP): 2005--, PGP-Canada (2012)

v Non-human: Earth BioGenome Project (EBP): 2017

Exomes:

Exome Aggregation Consortium (ExAC): exome sequencing for >60,000  individiuals; published in 2016

1000 Personal Genome Project

The 1000 Genome Project aimed to sequence at least 1000 individuals from various populations worldwide and catalogue human genetic variation down to variants occurring at 1% frequency or less. The project completed in 2015 with 2504 individuals from 26 populations and identified 88 million genetic variants, with the most variation found in African ancestry. The project used a combination of whole-genome sequencing, deep exome sequencing, and high-density SNPs microarray to obtain data. The results include thousands of variants associated with complex traits and rare diseases, along with overlapping regulatory regions.

Summarized:

 Began in 2007 and completed in 2015

  HapMap (Haplotype map of the human genome) was the previous owner // organization

 Millions of SNPs were discovered and GWAS (genome wide association studies) used the dataset for  research in disease association

 2007 GOALS: sequence min. 1000 volunteers from populations worldwide

 RESULT:  greatest variation sites same from African ancestry

 RESULT:  88 million variants:

Personal Genome Project

The Personal Genome Project (PGP) aims to publicly share the complete genomes and medical records of thousands of participants. It provides researchers with genomic, environmental, and human trait data to study the relationships between genotype, environment, and phenotype.

The PGP raises ethical, legal, and social issues regarding privacy, informed consent, and data accessibility.

 Initiated by George Church in 2005

The Personal Genome Project Canada launched in 2007 and sequenced DNA from whole blood using the Illumina HiSeq X system.

LECTURE 8: GENE PREDICTION/GENOMEANNOTATION – March 16, 2023

REMEMBER: GENOME SEQUENCING & ASSEMBLY PROCESS  motivation  find out the proper sequencing Strategy ( Historical or Current) Historical: BAC cloning, chromosomal walking Current: whole genome shotgun sequencing (WGS)  Follow the steps of Whole Genome Sequencing  Make DNA libraries  Sequencing  Assembly  Annotation (predict the genome? what does it mean? where are the transposable elements?)

Bioinformatics: Steps in genome assembly

  1. Preprocessing ( clean up reads  remove low quality parts)

  2. Contiging (reads to contigs)

  3. Polishing (error correction)

  4. Scaffolding ( longer pieces) THEN, GENOME ANNOTIATION

REMAINING CHALLENGES IN GENOME SEQUNCING:  Obtaining accurate continuous sequences for individual chromosomes v Errors in joining contigs (e.g. highly repeated regions) v Lack of sequences for certain regions (e.g. centromeres) Obtaining assembled sequences representing the diploid nature of genomes v Difficulties in obtaining long DNA molecules v Lack of diploid (flattened consensus) phasing with long reads v Short haplotype structure • Solutions: hybrid between long read and short read NGS, chromosomal imaging… • Would the use of Hi-Seq help here? Note : The use of HiSeq, which is a short-read sequencing technology, may not be sufficient on its own to address these issues. Instead, a combination of different sequencing technologies and approaches is often necessary to obtain accurate and complete diploid genome sequences.

Ploidy, Haplotype, Phasing

(top image) At the end of assembly, you can generate 2 sets of sequences ( each haplotype - one unphased and one phased: complete picture of genetic variation)

(bottom left image)

 A) the first image may be insufficient to carry on to the next generation.

 B) the second pair may bring diversity to the couple and all favorable portions should ( theoretically) be involved

A TERM TO KNOW:

Haplotype  A haplotype is group of variants in a section of a chromosome that tend to stay together in transmission across generations ( Piang Liang)

 they are important for assessing functional impact of  variants (genetic variations, WGA and pop studies)

 date back to the Human HapMap era ( 2002 – 2010)

(TOP) Future Challenges:

  1. Precise, predictive model of transcription initiation and termination: ability to predict where and when transcription will occur in a genome

  2. Precise, predictive model of RNA splicing/ alternative splicing: ability to predict the splicing pattern of any primary transcript in any tissue

  3. Accurate ab initio protein structure prediction

CAN YOU IDENITFY EXONS VS INTRONS IN A GENOME SEQ?? Introns in BLACK Exons in PINK

WHAT IS A GENE?  A locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions (promotor), transcribed regions and/or other functional sequence regions.

HOW HAVE SCIENTISTS DEFINED IT THRU THE YEARS??  A gene is the most basic unit of inheritance  Gregor Mendel; Traits are determined by discrete unit that are passed from one generation to the next (1860)  Wilhelm Johanssen coined the word “gene” for the unit associated with an inherited trait (1909)  Thomas Morgan “genes as beads on a string” (1910)  George Beadle “one gene, one enzyme” (1941)  Avery and MacLeod and McCarty “genes are made of DNA” (1944)  Watson and Crick “Info flows from DNA to RNA: (1953)  Richard Roberts and Philip Sharp - Gene Splicing (1977)  Discover of microRNA & RNA interference (1993)

WHAT ARE THE COMPONENTS OF A GENE? (i.e. introns, exons, promoters, enhancers, silencers) Introns  Exons  Promoters  Enhancers  Silencers  Operons 