Bioinformatics Lec1

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/33

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

34 Terms

1
New cards

Central Dogma

DNA → RNA → Protein

2
New cards

______ is all genes that are transcribed

Transcriptome

3
New cards

____ are transcribed but spliced out

Introns

4
New cards

_______ was the old way of measuring gene expression

Western/Northern blotting

5
New cards

qPCR

  1. Transcribe DNA-RNA

  2. Reverse transcribe to cDNA

  3. Obtain primers

  4. Conduct qPCR

6
New cards

RNA sequencing

  • Unbiased, no preconceived hypothesis

  • Can quantify 20,000 genes in a sample

7
New cards

Single cell RNA sequencing

  • Provides xy coordinates and amount of each transcript present

8
New cards

Can gain information on the _____ and ______ state of a gene

physiological;pathological

9
New cards

Steps of RNA to cDNA (5 steps)

  1. Extract RNA from a sample and ensure it is high quality

  2. Fragment into smaller pieces for sequencing

  3. Reverse transcribe to cDNA

  4. Add adapters to help bind to sequencing flow cell in the machine

  5. Amplify cDNA with PCR

10
New cards

Steps of cDNA to raw files

  1. cDNA library is loaded onto Illumina flow cell

  2. DNA fragments undergo PCR

  3. Sequencing by synthesis, adding fluorescently labeled nucleotides one at a time

  4. Convert raw fluorescence data into FASTQ files

11
New cards

What are FASTQ files?

  • Raw sequence data with quality scores

Quality score = confidence that the sequence is correct

Identifier = length of the sequence

12
New cards

What occurs after files are converted to FASTQ files?

  • Alignment of sequences with a reference genome

  • Normalization of raw counts

13
New cards

Why must data be normalized?

  • Data may be collected under different conditions

  • Ranges and means may vary

  • Ensures samples are comparable

14
New cards

What is counts per million?

Normalizes raw counts, scaling for different library sizes

15
New cards

Differential gene expression

  • Identifies genes with statistically significant expression between conditions

  • Determines which genes are up or down regulated

16
New cards

What is Principal Component Analysis

  • Converts high dimensional data to low dimensional data while maintaining variance

  • Identifies dominant trends in gene expression data

  • Identifies major source of variance

17
New cards

PCA uses a ____ model to find relationships in data

linear

18
New cards

PCA workflow (3 steps)

  1. Table of how each sample fits on each PC

  2. Weights of the original variable on the PC

  3. Graph PCs and observe clustering and variance

19
New cards

Housekeeping genes

Genes that are needed for cell maintenance; highly expressed in every cell

20
New cards

Why is log transformation important

  • Makes the data more normal, and therefore statistically comparable

21
New cards

Heteroskedasticity

Variance depends on the mean

22
New cards

Lowly expressed genes have ___ variance, and highly expressed genes have ____ variance

High, low

23
New cards

The ____ method estimates _______ relationships in count data

voom; mean-variance

24
New cards

Batch effect

Variation caused by samples that were analyzed at the same time

25
New cards

How can we account for the batch effect?

  • Each batch should have all phenotypes

  • Can also account for batch effect in R code

26
New cards

DGElist 3 elements

Expression, phenotype, feature

27
New cards

What is the issue with limma, and why is it used with voom?

Limma assumes normal data; can be used after voom to analyze data with linear models

28
New cards

What is limma?

  • Linear models

  • Models gene expression as a function of experimental conditions

  • Estimates log fold change

  • Computes statistical significance

29
New cards

Design matrix

  • Mathematical representation of the experimental conditions

  • Tells statistical models which samples belong to which group

30
New cards

Contrast Matrix

  • Specifies the groups to compare in differential expression analysis

31
New cards

What is a p-value

The probability that the null hypothesis is true/results are due to random chance

32
New cards

Why is it important to adjust the p-value

Reduces false positives among significant results only

33
New cards

Type I error = _______, Type II error = _____

false positives; false negatives

34
New cards

Reducing _______ increases ________

type I error, type II error