genomics - single cell genomics

0.0(0)
studied byStudied by 1 person
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/52

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 5:42 AM on 3/5/25
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

53 Terms

1
New cards

advantages of sc sequencing over bulk sequencing

-obtain genomic information for every cell

-understand cell heterogeneity within the tissue

-distinguish cell population changes vs gene expression changes

2
New cards

scRNA seq applications

-resolve cellular heterogeneity

-identify rare cell populations

-trace lineage and developmental relationships between heterogeneous, yet related, cellular states

-mechanism of heterogenous drug response

3
New cards

sc technology challenges

-amount of rna present in a single cell is lower than the amount needed for successful signal detection

-application of sequencing based expression profiling to single cells required either increased sensitivity or amplification of input RNA

4
New cards

single cell rna seq (Tang, 2009)

-modified from sc microarray protocol, inc rt incubation and pcr extension

-full length first strand cdnas

-used SOLiD sequencing system from applied biosystems

-allowed the detection of thousands of genes and hundred of new splice junctions more than a standard microarray experiment

5
New cards

Limitations of Tang 2009 method

-pronounced 3’ bias with the majority of reads mapping to the 3’ terminal portion of the transcripts

-severe limitation for the study of transcriptional start sites (TSS) as well as in the analysis of the different splice variants

-inefficiencies in the enzymatic reactions resulted in decreased sensitivity w consequent loss of lowly expressed transcripts

-throughput: only 6 cells were sequenced

6
New cards

technologies developed to increase throughput

multiplexing → integrated fluidic circuits → liquid handling robotics → nanodroplets → picowells → in situ barcoding

7
New cards

GEMs

Gel beads in Emulsion

8
New cards

10x genomics

-use microfluidic partitioning to capture single cells and prepare barcoded, NGS cDNA libraries

-cells are loaded at a limiting dilution in order to maximize the number of GEMs containing a single cell (doublet formation)

<p>-use microfluidic partitioning to capture single cells and prepare barcoded, NGS cDNA libraries</p><p>-cells are loaded at a limiting dilution in order to maximize the number of GEMs containing a single cell (doublet formation)</p>
9
New cards

UMIs

Unique molecular identifiers

-random short nucleotide sequences with a very low likelihood of a duplicate UMI within a single bead

-each molecule has a unique UMI

distinguish PCR duplicates vs true biological diversity

10
New cards

Smart-seq2 technology

-low-throughput (96 or 384 well plate)

-more expensive

-full length cDNA sequencing

-physical separation of cells into well plates using a laser

11
New cards

single cell limitations

-single cell datasets are sparse and suffer dropout

-low sensitivity: lowly expressed transcripts more likely to escape detection

-very easy to produce poor qual datasets from poor cell handling

-easy to create artifacts/cell multiplets

-loss of spatial, temporal and lineage information

12
New cards

dropout

a gene is observed at a low or moderate expression level in one cell but is not detected in another cell of the same cell type

13
New cards

Cell Ranger

-software for 10x genomics

-input raw data in fastq format and reference transcriptome

14
New cards

source of technical noise

-bias of transcript coverage

-low capture efficiency

-sequencing depth differences

-dropout events

15
New cards

low quality cells

-cells that are broken or dead

-doublets and multiple cells

16
New cards

technical noise and low quality cells lead to what

-forming distinct, misleading clusters

-inflating variance estimates in dimensionality reduction

17
New cards

library size

total rna counts per cell

-low counts indicate rna loss or prep inefficiencies

-vary by protocol and dataset (commonly >1000 for 10x genomics)

18
New cards

expressed genes

number of genes with non-zero expression

-low values suggest incomplete transcript capture

-vary by protocol and dataset (commonly >500 for 10x genomics)

19
New cards

mitochondrial percentage

proportion of reads from mitochondrial genes

-high values point to cytoplasmic rna loss during cell damage

-commonly <10% for human and <5% for mouse mitochondrial reads

20
New cards

spike-in percentage

proportion of reads mapping to spike-ins(control probes)

-elevated levels indicate endogenous rna loss

21
New cards

primary goal of single-cell normalization

remove the influence of technical effects in the underlying molecular counts, while preserving true biological variation

22
New cards

dimensionality reduction

the transformation of data from a high-dimensional space into a low dimensional space so that the low dimensional representation retains as much meaningful properties of the original data as possible

-to learn the underlying manifold of the data in order to place similar cells together in low-dimensional space

-reduce noise, computational complexity, and enable visualization

23
New cards

PCA

a linear dimensional reduction algorithm

-creates a new set of uncorrelated variables (principal components), via an orthogonal transformation of the original dataset

24
New cards

t-SNE

T-distributed stochastic neighbor embedding

-graph based, non-linear dimensionality reduction technique

25
New cards

UMAP

uniform manifold approximation and projection

-graph based, non-linear dimensionality reduction technique

-the fastest run times, the highest reproducibility and the most meaningful organization of cell clusters than other dimensionality reduction approaches

26
New cards

cell clustering

-to group cells based on similarity in gene expression profiles

-to identify biologically meaningful groups in high-dimensional scRNA-seq data

27
New cards

supervised clustering methods

use a set of known markers in clustering

28
New cards

unsupervised clustering methods

for de novo identification of cell populations

29
New cards

cell clustering methods

k-means

hierarchical clustering

density-based clustering

graph-based clustering

30
New cards

cell annotation

the process of assigning biological meaning to clusters of cells, usually based on their gene expression profiles

goal is to map clusters to known cell types, states, or lineages based on expression patterns and marker genes

31
New cards

why is cell annotation important?

-provides biological context for identified clusters

-helps interpret cellular diversity in tissues or developmental stages

-facilitates comparative analysis across datasets, conditions, or diseases

32
New cards

differential expression analysis

-to distinguish different cell types or subpopulations

-helps to understand developmental processes

-reveal biological insights into cell-type-specific responses to treatment or disease conditions

-essential for identifying biomarkers or therapeutic targets

33
New cards

snRNA-seq input

nuclei

34
New cards

snRNA-seq tissue

fresh, lightly fixed, or frozen tissues, hard-to-dissociate tissues (brain, heart)

35
New cards

snRNA-seq cells

difficult to isolate cells

36
New cards

snRNA-seq dissociation protocol

quick and mild

37
New cards

snRNA-seq measurement

nuclear transcripts

38
New cards

snRNA-seq cons

cannot capture RNA in the cytoplasm (gene isoforms, RNA in mitochondria and chloroplast)

39
New cards

scRNA-seq input

whole cell

40
New cards

scRNA-seq tissue

fresh tissue

41
New cards

scRNA-seq cells

easy to isolate cells

42
New cards

scRNA-seq dissociation protocol

extended incubations and processing

43
New cards

scRNA-seq measurement

both cytoplasmic and nuclear transcripts

44
New cards

scRNA-seq cons

technical artifacts from heating, protease digestion

45
New cards

trajectory/pseudotime analysis

-the cells in many biological systems exhibit a continuous spectrum of states and involve transitions between different cellular states

-such dynamic processes within a portion of cells can be computationally modeled by reconstructing the cell trajectory/pseudotime based on scRNA-seq data

46
New cards

pseudotime

an ordering of cells along the trajectory of a continuous process in a system, which allows the identification of the cell types at the beginning, intermediate, and end states of the trajectory

47
New cards

trajectory topology

-reveal gene expression dynamics across cell states

-identification of the factors triggering state transitions

<p>-reveal gene expression dynamics across cell states</p><p>-identification of the factors triggering state transitions</p><p></p>
48
New cards

gene co-expression network inference

-gene co-expression network (GCN): undirected graph where each node corresponds to a gene, and a pair of nodes is connected with an edge if there is a significant co-expression relationship between them

-can be constructed by looking for pairs of genes which show a similar expression pattern across samples, since the transcript levels of 2 co-expressed genes rise and fall together across samples

-co expressed genes could be controlled by the same transcriptional regulatory program, functionally related, or members of the same pathway or protein complex

<p>-gene co-expression network (GCN): undirected graph where each node corresponds to a gene, and a pair of nodes is connected with an edge if there is a significant co-expression relationship between them</p><p>-can be constructed by looking for pairs of genes which show a similar expression pattern across samples, since the transcript levels of 2 co-expressed genes rise and fall together across samples</p><p>-co expressed genes could be controlled by the same transcriptional regulatory program, functionally related, or members of the same pathway or protein complex</p>
49
New cards

gene regulatory network inference

assumption that the genes highly correlated in expression could be co-regulated

50
New cards

ligand-receptor network analysis

-to identify the protein messages passed between cells and their associated pathways

-to understand the directionality, magnitude and biological relevance of cell-cell communication

51
New cards

LIANA+

re-implements and adapts eight ligand-receptor methods to infer interactions from single-cell data, along w a flexible consensus that can integrate any combination of these methods

52
New cards

single cell atac-seq

-uses a hyperactive tn5 transposase to insert sequencing adapters into accessible chromatin regions

-measuring chromatin accessibility = potential regulatory sequences

53
New cards

single cell proteomics

-proteomic differences poorly coorelate with corresponding transcriptomic differences between biological states

-post translational modifications

-mass spec based single cell proteomics

-not that accessible right now