SAQ Prep

RNA-Seq

Explain the procedure of RNA-seq in detail. Step by step

Step 1: RNA Extraction

Isolate RNA from the cells using:
Enzyme lysozyme to break cell walls.
Proteases to eliminate proteins.
DNases to degrade DNA.
Ensure the sample contains mostly mRNA, free of contaminants.

Step 2: Reverse Transcription to cDNA

Use oligonucleotides that hybridize randomly to RNA sequences.
Employ reverse transcriptase to synthesize the first strand of complementary DNA (cDNA).
If the starting RNA sample is low, synthesize a second cDNA strand (note that this may lead to loss of strand information).

Step 3: Tagmentation or Adapter Addition:

Two adapters are Ligated to the ends of the cDNA fragments being prepared for sequencing.

Step 4: Flow Cell Attachment:

After tagmentation, the cDNA fragments are attached to the surface of a flow cell. Each fragment adheres to a distinct location on the flow cell surface, which allows for parallel sequencing.

Step 5: Bridge Amplification:

Once attached to the flow cell, bridge amplification occurs, resulting in the formation of clusters. Each cluster consists of identical DNA fragments amplified from a single cDNA molecule.

Step 6: Clonal Amplification:

This step ensures that each cluster represents a specific sequence with high enough abundance for accurate sequencing.

Step 7: Sequencing Process:

Sequencing begins with the incorporation of fluorescently labeled bases that identify which nucleotide has been added at each cluster, allowing for the determination of the original RNA sequence.

What are the strengths and limitations of using RNA-seq?

Strengths and Limitations:
- Strengths:
- Detects known and novel transcripts, identifies alternative splicing events and non-coding RNAs.
- High dynamic range for accurate measurement of low abundance transcripts.
- Limitations:
- High sequencing cost and substantial computational resource requirement.
- Results influenced by RNA quality and sample preparation techniques.

Explain the RNA-seq computational workflow?

For RNA-seq computer analysis, it’s basically you get the FASTQ file after running rna-seq via illumina and then u get a reference genome FASTA file. You enter these both into galaxy and generate a BAM file which is a comprised comparison of the alignments of the files and then use ARTEMIS to view the BAM file in detail. And then use the BAM file to produce a gen bank file. You compare the BAM to the GenBank reference to find mutations, deletions, or gene expression levels

RNA-seq → FASTQ

Then using FASTA

Enter FASTQ + FASTA into Galaxy Bowtie 2 →BAM file

Enter BAM into ARTEMIS → Genbank file

Q1. Compare 16S rRNA sequencing and metagenomics as methods for studying microbial communities.

✅ Answer (bullet-point sentences)

16S rRNA sequencing:

Targets a single conserved gene (16S rRNA) found in bacterial genomes.
Primarily used to identify and classify bacteria.
Can also study eukaryotes using 18S rRNA, but cannot detect viruses.
Is simple, cost-effective, and easy to analyse.
Can detect low-abundance organisms due to targeted sequencing.

Limitations of 16S:

Limited to organisms with ribosomes.
PCR amplification introduces bias.
Has low species/strain resolution due to similar sequences.
Cannot distinguish strains within a species.
Provides no functional information.

Metagenomics:

Sequences all genetic material in a sample (DNA ± RNA).
Does not require culturing or PCR.
Detects bacteria, viruses, fungi, and other microbes.
Provides high-resolution identification (species and strain level).
Reveals functional genes and metabolic pathways.

Conclusion:

16S is simple and cheap but limited.
Metagenomics is comprehensive and powerful but expensive and complex.

❓ Q2. Explain the advantages and limitations of 16S rRNA sequencing.

✅ Answer

Advantages:

Simple method targeting a single gene.
Cost-effective compared to metagenomics.
Easy to analyse due to reduced data complexity.
Effective at detecting low-abundance bacteria.

Limitations:

Only detects organisms with ribosomes.
Cannot detect viruses.
PCR bias affects accuracy.
Low resolution at species level.
Cannot distinguish strains.
Does not provide functional/genetic activity information.

🧫 Q3. Compare metagenomics and 16S rRNA sequencing in studying microbial communities.

✅ Model Answer

16S rRNA sequencing:

Targets a single gene (16S rRNA) in bacteria.
Used mainly for bacterial identification.
Simple, cost-effective, and easy to analyse.
Detects low-abundance organisms.

Limitations of 16S:

Limited to organisms with ribosomes.
Cannot detect viruses.
PCR bias affects results.
Low resolution at species/strain level.
No functional information provided.

Metagenomics:

Sequences all genetic material in a sample.
Detects bacteria, viruses, fungi, and other microbes.
Does not require PCR or culturing.
Provides high-resolution identification (species and strains).

Advantages of metagenomics:

Reveals functional genes and metabolic pathways.
Allows discovery of new organisms.
Provides both taxonomic and functional data.

Conclusion:

16S is simple and cheap but limited.
Metagenomics is comprehensive but expensive and complex.

🧪 Q4. Discuss the applications and challenges of metagenomics in clinical microbiology.

✅ Model Answer

Applications:

Rapid identification of pathogens without culturing.
Detects bacteria, viruses, and fungi in a single test.
Identifies antimicrobial resistance genes.
Provides strain-level information for outbreak tracking.
Useful when traditional diagnostics fail.

Clinical examples:

Identification of arenavirus in transplant patients.
Discovery of SARS-CoV-2.
Diagnosis of Leptospira brain infection.
Detection of AAV2 in hepatitis cases.

Advantages:

Faster than culture-based methods.
Does not require prior knowledge of pathogen.
Can detect unculturable organisms.

Challenges:

Expensive (£100–£1000 per sample).
Requires complex bioinformatic analysis.
Large data output.
High levels of host DNA contamination (up to 99–99.9%).
Requires host depletion or enrichment techniques.

🧬 1. Describe the principle of 16S rRNA sequencing and explain why the 16S gene is useful

Model Answer (Bullet Sentences)

16S rRNA sequencing is a method used to identify and classify bacteria based on the 16S rRNA gene.
The 16S rRNA gene is present in all bacteria and forms part of the 30S ribosomal subunit.
It contains both highly conserved regions and variable regions.
Conserved regions allow universal PCR primers to bind and amplify the gene.
Variable regions accumulate mutations and provide species-specific sequence differences.
This allows comparison of sequences to determine evolutionary relationships.
The gene acts as a molecular clock, where sequence similarity reflects relatedness.
It enabled the discovery of the three domains of life: Bacteria, Archaea, and Eukarya.
Therefore, 16S sequencing is widely used to study bacterial diversity and phylogeny.

🧪 2. Describe and explain the main steps in generating 16S sequencing data

Model Answer (Bullet Sentences)

A sample is collected (e.g. stool, soil, or skin swab).
DNA is extracted from all microorganisms present in the sample.
PCR is performed using primers targeting conserved regions of the 16S gene.
This amplifies the variable regions between the primers.
Barcodes (index sequences) are added to label different samples.
Amplified DNA is sequenced using high-throughput sequencing (e.g. Illumina).
Short sequence reads are generated from many organisms simultaneously.
Sequences are grouped into operational taxonomic units (OTUs) based on similarity.
Sequences are compared to databases to identify bacterial taxa.
The output provides relative abundance and diversity of bacteria in the sample.

⚖ 3. Compare 16S rRNA sequencing and metagenomics

Model Answer (Bullet Sentences)

16S sequencing targets a single gene, whereas metagenomics sequences all genetic material.
16S is limited to bacteria (and some eukaryotes with 18S), while metagenomics detects bacteria, viruses, fungi, and archaea.
16S requires PCR amplification, which introduces bias; metagenomics does not require PCR.
16S has low taxonomic resolution and cannot distinguish strains; metagenomics provides species and strain-level resolution.
16S provides only taxonomic information; metagenomics provides both taxonomic and functional information.
16S is cheaper and easier to analyse; metagenomics is expensive and computationally complex.
16S is useful for community profiling; metagenomics is useful for detailed analysis and discovery.

🧬 4. Explain the advantages and limitations of metagenomics

Model Answer (Bullet Sentences)

Advantages:

Metagenomics sequences all DNA in a sample, providing a comprehensive view of the microbiome.
It can detect all organism types, including bacteria, viruses, fungi, and archaea.
It provides high-resolution identification at species and strain level.
It allows discovery of novel organisms that cannot be cultured.
It provides functional information by identifying genes and metabolic pathways.
RNA metagenomics (metatranscriptomics) shows which genes are actively expressed.

Limitations:

It is expensive compared to 16S sequencing.
It generates large datasets requiring complex bioinformatic analysis.
Samples often contain high amounts of host DNA (up to 99–99.9%).
This requires host depletion or microbial enrichment.
Interpretation of functional data can be challenging.

🏥 5. Discuss the applications of metagenomics in clinical microbiology

Model Answer (Bullet Sentences)

Metagenomics allows direct sequencing of clinical samples without culturing organisms.
It enables rapid identification of pathogens in infections.
It can detect organisms that are difficult or impossible to culture.
It identifies antibiotic resistance genes, guiding treatment decisions.
It allows strain-level typing for outbreak investigation.
It enables discovery of novel pathogens, such as SARS-CoV-2.
It has been used to diagnose unexplained infections,
Brain Infection Case Metagenomics detected Leptospira infection.
Child unexplained liver failure. Metagenomics detected AAV2 virus

It can analyse polymicrobial infections where multiple organisms are present.
However, cost and data complexity currently limit routine clinical use.

🧬 1. Describe the structure and DNA-binding properties of transcription factors

Model Answer (Bullet Sentences)

Transcription factors are proteins that regulate gene expression by binding to DNA.
Many contain a helix-turn-helix (HTH) motif for DNA binding.
The recognition helix fits into the major groove of DNA.
Amino acid side chains interact with specific base sequences.
Binding is sequence-specific, allowing recognition of target genes.
Many transcription factors function as dimers.
Dimerisation allows binding to palindromic DNA sequences.
Palindromic sequences have symmetry, matching the two subunits of the protein.
Some transcription factors contain sensor domains to detect environmental signals.
These signals regulate whether the transcription factor activates or represses genes.

🧪 2. Describe Chromatin Immunoprecipitation (ChIP) and how it is used to identify transcription factor binding sites

Model Answer (Bullet Sentences)

ChIP is a method used to identify DNA regions bound by transcription factors in vivo.
Cells are grown under conditions where the transcription factor is active.
Formaldehyde is added to cross-link proteins to DNA.
DNA- protein complex is fragmented into smaller pieces.
An antibody specific to the transcription factor is used for immunoprecipitation.
The antibody pulls down DNA fragments bound to the protein.
Cross-links are reversed to release DNA fragments.
DNA is sequenced (ChIP-seq) to identify binding sites across the genome.
The result shows regions where transcription factors bind and regulate genes.
Computer analysis: use enterseq on Galaxy extract 400 bp around each peak. Enter DNA seq into MEME to identify palindromic sequences use ARTEMIS for further analysing.

🧬 3. Evaluate the strengths and limitations of using ChIP-seq

Strengths:

ChIP allows identification of transcription factor binding sites in vivo.
It captures protein–DNA interactions under physiological conditions.
ChIP-seq enables genome-wide analysis of binding sites.
It does not require prior knowledge of DNA target sequences.
It can identify regulatory regions such as promoters and enhancers.
It can be used to study histone modifications as well as transcription factors.
High-throughput sequencing provides high-resolution binding maps.

Limitations:

Requires high-quality, specific antibodies for the protein of interest.
Cross-linking efficiency can vary, affecting results.
Resolution is limited by DNA fragment size after shearing.
Background noise and non-specific binding can occur.
Data analysis is computationally complex.
It does not directly show whether binding leads to gene expression changes.
It provides a snapshot in time, not dynamic binding over time.

📊 5. Explain the applications and limitations of RNA-seq

Model Answer (Bullet Sentences)

Applications:

RNA-seq measures the transcriptome (all RNA molecules in a cell).
It quantifies gene expression levels across the genome.
It can identify novel transcripts and alternative splicing events.
It allows comparison of gene expression between conditions (e.g. disease vs healthy).
It is useful for studying global transcriptional responses.

Limitations:

RNA-seq does not directly identify transcription factor binding sites.
It cannot distinguish direct vs indirect regulatory effects.
Strand information may be lost during second-strand synthesis.
It requires a reference genome for accurate mapping.
Data analysis is complex and computationally intensive.

🧬 Q1. Describe the workflow of RNA sequencing (RNA-seq) from sample preparation to data analysis.

✅ Model Answer (bullet sentences)

1. RNA extraction

RNA is isolated from cells.
Lysozyme breaks cell walls, proteases remove proteins, DNases degrade DNA.
Goal is a pure RNA sample.

2. Reverse transcription to cDNA

RNA is converted into complementary DNA (cDNA).
Uses reverse transcriptase and random oligonucleotide primers.
cDNA retains strand information (may be lost if second strand synthesis occurs).

3. Library preparation

Short DNA adapters are added to cDNA fragments.
Adapters allow binding to sequencing flow cell.
Barcodes may be added for multiplexing samples.

4. Clonal amplification

cDNA fragments are amplified using bridge amplification.
Creates clusters of identical DNA sequences.
Ensures detectable signal during sequencing.

5. Sequencing

Nucleotides are incorporated into DNA strands.
Each base emits a signal (light) that is recorded.
Produces sequence reads.

6. Data generation and analysis

Output stored as FASTQ files (reads + quality scores).
Reads aligned to reference genome (FASTA) using tools (e.g. Bowtie2).
Generates BAM files (mapped reads).
Data analysed to determine gene expression levels.

🧬 3. Compare ChIP with traditional methods for studying protein-DNA interactions

Model Answer (Bullet Sentences)

Gel shift assay (EMSA) detects protein-DNA binding by reduced mobility in a gel.
DNA bound to protein moves slower than free DNA.
It is simple but low throughput and requires known DNA targets.
DNA footprinting identifies exact binding sites using DNase digestion.
Bound regions are protected from cleavage, revealing precise binding locations.
It provides high resolution but requires purified protein and known DNA.
ChIP differs by studying binding in living cells (in vivo).
ChIP does not require prior knowledge of DNA targets.
ChIP combined with sequencing (ChIP-seq) allows genome-wide analysis.
Therefore, ChIP is more powerful and high-throughput than traditional methods.