1/39
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Transcriptome
- all the RNA molecules transcribed from a genome.
- It also varies according to the function and structure of a cell
Transcriptomics
it is the study of the complete transcriptome encoded by a specific cell (or population of cells) at a specific time point and/or under specific functions
- most transcriptomic studies are focused on differential gene expression and are more interested in mRNA.
- in that case you use the polyA enrichment method.
Why do you use a polyA enrichment method when looking at mRNA in transcriptomic?
It allows you to selectively isolate messenger RNA from the total RNA pool.
- Only mature mRNA in eukaryotes has a polyA tail
Types of RNA
1. Protein synthesis:
- mRNA, tRNA, rRNA
2. Co/post transcriptional modification/DNA replication
- small nuclear RNA (snRNA) and small nucleolar RNA (snoRNA)
3. Regulatory
- miRNA, piRNA, siRNA and lncRNA
4.Parasitic
- viral RNA
Transcription
- In eukaryotes, the coding region is often split into segments (exons) by one or more non-coding introns/
- the entire gene sequence is trasncribed into pre-mRNA.
- The introns are excised and the exons are spliced together to make the mature mRNA molecule.
- Exon splicing is achieved by snRNAs and proteins in the spliceosomes.
Transcriptome methods
1. Microarray
2. RNAseq (cDNAseq)
Microarray
a grid of DNA segments of known sequence is used to test and map DNA fragments, antibodies, or proteins.
- The probes are against known transcripts, and the level change can be detected through fluorescence detection
Microarray advantages
- high precision within a dunamic range.
- higher throughput
- lower cost
- No alternative splicing information
Microarray disadvantages
- Balanced chromosomal rearrangements are not detected
- Imbalances in regions not included on array platform are not detected
- low dynamic range (the dynamic range is the 3 orders of magnitude between a lowly expressed gene and a highly expressed gene)
RNA seq
The sequencing technique used to determine directly the nucleotide sequence of a collection of RNAs.
- It is a more transcriptome-wide analysis
RNA-seq advantages
- Don't need to know the sequence before the assay
- Learn about what isoforms are expressed in a cell
- higher reproducibility
- Higher dynamic range
- higher precision
RNA-seq workflow
1. In vivo
Pre-mRNA, intron splicing occurs and then you get a mature mRNA
2. In vitro
Fragmentation of RNA, reverse transcription occurs, ds-cDNA fragments, high-throughput sequencing occurs.
3.In silico
Library prep, Mapping and alignment, Gene expression estimates
RNAseq
- Usually you use illumina ( a short read technology)
Illumina TruSeqRNA protocol and how it works?
- commonly used kit used in library preparation of RNAseq.
- it is focused on mRNA.
How it works:
- It uses a polyT bead that would complement the polyA in messenger RNA fragments.
- It will select for mRNA.
- Addition of random primer to make cDNA.
- The RNA strand will be removed and another cDNA strand will be made (this is the cDNA from the messenger RNA transcript)
-End-repair phosphorylation, poly A tailing and adapter ligation will occur
-Then PCR amplification and sequencing
What do the universal p5 and p7 adapters do?
- they bind to the flow cell on your Illumina and allow the binding of your library to the Illumina flow cell
- without it, your fragments would wash away during sequencing
What is the goal of an index in an adapter?
- allows for multiplexing of samples
Why do paired-end sequencing?
- if you want more information in highly repetitive regions or exon junctions, because you will be able to get more reads in the region
-Improves accuracy for the detection of differential expression for low-expressed genes.
Why do single-end sequencing
it is cheaper than PE
- faster turn-around
- smaller data footprint
RNA quality check
- prior to creating a library, you must look at the quality of your RNA.
- you can do so by using a gel ( able to look at the contamination and noise in the gel)
- you can also run a fluorometric assay called Qubit, to look at your RNA quality
- also use a bioanalyser, and this gives you an RNA integrity number, and you want a number above 8
Whats an important thing to look for when creating an RNA library?
- Presence of your ribosomal rna.
- Humans have 18s/28s , a good library will show two bands corresponding with the two RNAs.
Qubit
- will give you an RNA integrity and quality score.
- it does so by utilisng two unique dyes, one that binds to large , intact highly structured RNA (mRNA, tRNA, rRNA) and another selectively binds to small/or degraded RNA.
- Together they enable you to quickly assess the quality and integrity of the RNA sample.
- a score is given from 1-10, where a small number indicates the sample is compromised and the a large number indicates that the sample consists of mainly large RNA
Bioanalyser (electropherogram)
- A bioanalyser gives you an electropherogram, a graph showing the fluorescence intensity versus fragment size.
- the peaks will show the 18s and 28s respectively and other peaks for degraded RNA etc.
Bioanalyser (electropherogram) for plants
- They have more ribosome RNA, which affects the score, thus you cannot have much confidence when you do this for plants
- no single metric is sufficient; rely on judgment and experience
Library QC
- you need to have a uniform fragment size of around 260 bp.
- if you see multiple peaks, you do not have a uniform fragment size=not ideal for sequencing
Why do you have to mRNA enrichment to look at gene expression?
- mRNA enriched library gives roughly 40% more sequence against exons (i.e. message).
Sequence QC
1. Filter for quality.
2. Trim adapters ( in longer sequences , there is a lot of adapter contamination)
Phred score/FastQ
- a measure of the quality of base call in a DNA sequencing, the probability of whether a sequencing is incorrect.
- most seq. experiment you want 30 at least
How much percentage of your reads won't be mapped to the reference genome?
- roughly 20%, but this is standard, could be contamination, sequencing error etc.
How much percentage of PCR duplicates do you detect on average in reads and what do we use to detect it?
- roughly 15%, and we correct this error by using an unique molecular identifier (UMI).
-some adapters have a UMI, this umi will allow you to remove any PCR duplicates present
How do I determine quantity and compare gene expression (after QC and mapping)?
1. Normalise for sequencing depth i.e. number of reads per sample (between samples, libraries) and gene length.
2. This is critical for both within and between sample comparisons. Even though Gene X is the same length between samples, the libraries of the samples will vary, so expression must be normalised.
What is RPKM?
- Reads per kilobase of exon model per million reads
- it is a within sample normalisation method that removes the transcript length and library size effects.
- it basically tells you how active a gene is while adjusting for gene size and how deeply you sequenced the sample.
How do you calculate:
1. count total reads and divide by 1 million
2. normalise across samples
3. divide rpm by gene length
What is FPKM?
fragments per kilobase of transcript per million mapped reads
- similar to RPKM but used for paired end reads
What is TPM?
- Transcripts per million.
- Here you are doing the gene length correction first for your sample.
- This means the denominator is the same between samples.
1. divide read counts by length of each gene in kb
2. Count all RPK values in a sample and divide by 1 million
3. Normalise across samples.
What FDR rate do we go for?
- around 0.05 or less
What type of analysis can you do with your data?
- heatmaps
- Principal component analyssi (PCA_
- Functional annotation of DEG
CLIP-seq
Stands for Crosslinking and Immunoprecipitation sequencing
Used to identify binding sites of RNA-binding proteins (RBPs) on RNAs
Combines UV crosslinking, immunoprecipitation (IP), and high-throughput sequencing
Reveals where on the RNA a protein binds — useful for studying gene regulation, splicing, stability, etc.
CLIP- seq workflow
1. UV Crosslinking
Irradiate living cells with UV light to covalently link RBPs to bound RNA.
2. Cell Lysis
Break open cells to extract RNA-protein complexes.
3. Immunoprecipitation (IP)
Use an antibody to pull down the specific RBP (and its bound RNAs).
4. RNase Digestion
Partially digest RNA to trim unbound regions, leaving the protected fragment.
5. Gel Purification
Run complexes on a gel, cut out the desired band (RBP-RNA complex).
6. Proteinase Treatment
Remove protein, leaving a short RNA fragment that was bound.
7. cDNA Library Preparation
Convert RNA fragment to cDNA, add sequencing adapters.
8. High-throughput Sequencing
Sequence the cDNA to identify RNA fragments.
9. Data Analysis
Map reads to the genome to find binding sites and target RNAs.
Single cell sequencing
- interested in a rare cell type
- You have to have a single cell suspension.
- then form your libraries
What is the difference between bulk RNA seq and single cell RNA seq?
🔬 1. Sample Resolution
Bulk RNA-seq: Measures average gene expression across a large population of cells. It blends all the RNA from the sample together.
Single-cell RNA-seq: Measures gene expression at the level of individual cells, capturing cell-to-cell variability.
📊 2. Biological Insight
Bulk RNA-seq: Useful for understanding overall gene expression patterns in tissues or large cell populations.
Single-cell RNA-seq: Allows identification of rare cell types, cell states, and heterogeneous responses that bulk methods would average out.
Single cell sequencing workflow
- To make a library, we use 10x chromium.
- Then you would do your adapter ligation and would put it on an Illumina sequencing
- Three lanes:
1. Cells and RT master mix
2. Gel bead ( could be a specific colour bead for each sample
3. Oil + recovery well
- The beads enter, and the labelled cell enzyme enters from different entry points,s which then mix with the water and oil components.
- The water and oil emulsion encapsulates 1 bead in it.
- about 65% of these beads will have a single cell attached to it.
- After that you break your emulsion, amplify cDNA, construct library.
- 25k cells per lane which is approx 200k cells in a day