1/33
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Central Dogma
DNA → RNA → Protein
______ is all genes that are transcribed
Transcriptome
____ are transcribed but spliced out
Introns
_______ was the old way of measuring gene expression
Western/Northern blotting
qPCR
Transcribe DNA-RNA
Reverse transcribe to cDNA
Obtain primers
Conduct qPCR
RNA sequencing
Unbiased, no preconceived hypothesis
Can quantify 20,000 genes in a sample
Single cell RNA sequencing
Provides xy coordinates and amount of each transcript present
Can gain information on the _____ and ______ state of a gene
physiological;pathological
Steps of RNA to cDNA (5 steps)
Extract RNA from a sample and ensure it is high quality
Fragment into smaller pieces for sequencing
Reverse transcribe to cDNA
Add adapters to help bind to sequencing flow cell in the machine
Amplify cDNA with PCR
Steps of cDNA to raw files
cDNA library is loaded onto Illumina flow cell
DNA fragments undergo PCR
Sequencing by synthesis, adding fluorescently labeled nucleotides one at a time
Convert raw fluorescence data into FASTQ files
What are FASTQ files?
Raw sequence data with quality scores
Quality score = confidence that the sequence is correct
Identifier = length of the sequence
What occurs after files are converted to FASTQ files?
Alignment of sequences with a reference genome
Normalization of raw counts
Why must data be normalized?
Data may be collected under different conditions
Ranges and means may vary
Ensures samples are comparable
What is counts per million?
Normalizes raw counts, scaling for different library sizes
Differential gene expression
Identifies genes with statistically significant expression between conditions
Determines which genes are up or down regulated
What is Principal Component Analysis
Converts high dimensional data to low dimensional data while maintaining variance
Identifies dominant trends in gene expression data
Identifies major source of variance
PCA uses a ____ model to find relationships in data
linear
PCA workflow (3 steps)
Table of how each sample fits on each PC
Weights of the original variable on the PC
Graph PCs and observe clustering and variance
Housekeeping genes
Genes that are needed for cell maintenance; highly expressed in every cell
Why is log transformation important
Makes the data more normal, and therefore statistically comparable
Heteroskedasticity
Variance depends on the mean
Lowly expressed genes have ___ variance, and highly expressed genes have ____ variance
High, low
The ____ method estimates _______ relationships in count data
voom; mean-variance
Batch effect
Variation caused by samples that were analyzed at the same time
How can we account for the batch effect?
Each batch should have all phenotypes
Can also account for batch effect in R code
DGElist 3 elements
Expression, phenotype, feature
What is the issue with limma, and why is it used with voom?
Limma assumes normal data; can be used after voom to analyze data with linear models
What is limma?
Linear models
Models gene expression as a function of experimental conditions
Estimates log fold change
Computes statistical significance
Design matrix
Mathematical representation of the experimental conditions
Tells statistical models which samples belong to which group
Contrast Matrix
Specifies the groups to compare in differential expression analysis
What is a p-value
The probability that the null hypothesis is true/results are due to random chance
Why is it important to adjust the p-value
Reduces false positives among significant results only
Type I error = _______, Type II error = _____
false positives; false negatives
Reducing _______ increases ________
type I error, type II error