genomics - single cell genomics

studied byStudied by 1 person
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 52

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

53 Terms

1

advantages of sc sequencing over bulk sequencing

-obtain genomic information for every cell

-understand cell heterogeneity within the tissue

-distinguish cell population changes vs gene expression changes

New cards
2

scRNA seq applications

-resolve cellular heterogeneity

-identify rare cell populations

-trace lineage and developmental relationships between heterogeneous, yet related, cellular states

-mechanism of heterogenous drug response

New cards
3

sc technology challenges

-amount of rna present in a single cell is lower than the amount needed for successful signal detection

-application of sequencing based expression profiling to single cells required either increased sensitivity or amplification of input RNA

New cards
4

single cell rna seq (Tang, 2009)

-modified from sc microarray protocol, inc rt incubation and pcr extension

-full length first strand cdnas

-used SOLiD sequencing system from applied biosystems

-allowed the detection of thousands of genes and hundred of new splice junctions more than a standard microarray experiment

New cards
5

Limitations of Tang 2009 method

-pronounced 3’ bias with the majority of reads mapping to the 3’ terminal portion of the transcripts

-severe limitation for the study of transcriptional start sites (TSS) as well as in the analysis of the different splice variants

-inefficiencies in the enzymatic reactions resulted in decreased sensitivity w consequent loss of lowly expressed transcripts

-throughput: only 6 cells were sequenced

New cards
6

technologies developed to increase throughput

multiplexing → integrated fluidic circuits → liquid handling robotics → nanodroplets → picowells → in situ barcoding

New cards
7

GEMs

Gel beads in Emulsion

New cards
8

10x genomics

-use microfluidic partitioning to capture single cells and prepare barcoded, NGS cDNA libraries

-cells are loaded at a limiting dilution in order to maximize the number of GEMs containing a single cell (doublet formation)

<p>-use microfluidic partitioning to capture single cells and prepare barcoded, NGS cDNA libraries</p><p>-cells are loaded at a limiting dilution in order to maximize the number of GEMs containing a single cell (doublet formation)</p>
New cards
9

UMIs

Unique molecular identifiers

-random short nucleotide sequences with a very low likelihood of a duplicate UMI within a single bead

-each molecule has a unique UMI

distinguish PCR duplicates vs true biological diversity

New cards
10

Smart-seq2 technology

-low-throughput (96 or 384 well plate)

-more expensive

-full length cDNA sequencing

-physical separation of cells into well plates using a laser

New cards
11

single cell limitations

-single cell datasets are sparse and suffer dropout

-low sensitivity: lowly expressed transcripts more likely to escape detection

-very easy to produce poor qual datasets from poor cell handling

-easy to create artifacts/cell multiplets

-loss of spatial, temporal and lineage information

New cards
12

dropout

a gene is observed at a low or moderate expression level in one cell but is not detected in another cell of the same cell type

New cards
13

Cell Ranger

-software for 10x genomics

-input raw data in fastq format and reference transcriptome

New cards
14

source of technical noise

-bias of transcript coverage

-low capture efficiency

-sequencing depth differences

-dropout events

New cards
15

low quality cells

-cells that are broken or dead

-doublets and multiple cells

New cards
16

technical noise and low quality cells lead to what

-forming distinct, misleading clusters

-inflating variance estimates in dimensionality reduction

New cards
17

library size

total rna counts per cell

-low counts indicate rna loss or prep inefficiencies

-vary by protocol and dataset (commonly >1000 for 10x genomics)

New cards
18

expressed genes

number of genes with non-zero expression

-low values suggest incomplete transcript capture

-vary by protocol and dataset (commonly >500 for 10x genomics)

New cards
19

mitochondrial percentage

proportion of reads from mitochondrial genes

-high values point to cytoplasmic rna loss during cell damage

-commonly <10% for human and <5% for mouse mitochondrial reads

New cards
20

spike-in percentage

proportion of reads mapping to spike-ins(control probes)

-elevated levels indicate endogenous rna loss

New cards
21

primary goal of single-cell normalization

remove the influence of technical effects in the underlying molecular counts, while preserving true biological variation

New cards
22

dimensionality reduction

the transformation of data from a high-dimensional space into a low dimensional space so that the low dimensional representation retains as much meaningful properties of the original data as possible

-to learn the underlying manifold of the data in order to place similar cells together in low-dimensional space

-reduce noise, computational complexity, and enable visualization

New cards
23

PCA

a linear dimensional reduction algorithm

-creates a new set of uncorrelated variables (principal components), via an orthogonal transformation of the original dataset

New cards
24

t-SNE

T-distributed stochastic neighbor embedding

-graph based, non-linear dimensionality reduction technique

New cards
25

UMAP

uniform manifold approximation and projection

-graph based, non-linear dimensionality reduction technique

-the fastest run times, the highest reproducibility and the most meaningful organization of cell clusters than other dimensionality reduction approaches

New cards
26

cell clustering

-to group cells based on similarity in gene expression profiles

-to identify biologically meaningful groups in high-dimensional scRNA-seq data

New cards
27

supervised clustering methods

use a set of known markers in clustering

New cards
28

unsupervised clustering methods

for de novo identification of cell populations

New cards
29

cell clustering methods

k-means

hierarchical clustering

density-based clustering

graph-based clustering

New cards
30

cell annotation

the process of assigning biological meaning to clusters of cells, usually based on their gene expression profiles

goal is to map clusters to known cell types, states, or lineages based on expression patterns and marker genes

New cards
31

why is cell annotation important?

-provides biological context for identified clusters

-helps interpret cellular diversity in tissues or developmental stages

-facilitates comparative analysis across datasets, conditions, or diseases

New cards
32

differential expression analysis

-to distinguish different cell types or subpopulations

-helps to understand developmental processes

-reveal biological insights into cell-type-specific responses to treatment or disease conditions

-essential for identifying biomarkers or therapeutic targets

New cards
33

snRNA-seq input

nuclei

New cards
34

snRNA-seq tissue

fresh, lightly fixed, or frozen tissues, hard-to-dissociate tissues (brain, heart)

New cards
35

snRNA-seq cells

difficult to isolate cells

New cards
36

snRNA-seq dissociation protocol

quick and mild

New cards
37

snRNA-seq measurement

nuclear transcripts

New cards
38

snRNA-seq cons

cannot capture RNA in the cytoplasm (gene isoforms, RNA in mitochondria and chloroplast)

New cards
39

scRNA-seq input

whole cell

New cards
40

scRNA-seq tissue

fresh tissue

New cards
41

scRNA-seq cells

easy to isolate cells

New cards
42

scRNA-seq dissociation protocol

extended incubations and processing

New cards
43

scRNA-seq measurement

both cytoplasmic and nuclear transcripts

New cards
44

scRNA-seq cons

technical artifacts from heating, protease digestion

New cards
45

trajectory/pseudotime analysis

-the cells in many biological systems exhibit a continuous spectrum of states and involve transitions between different cellular states

-such dynamic processes within a portion of cells can be computationally modeled by reconstructing the cell trajectory/pseudotime based on scRNA-seq data

New cards
46

pseudotime

an ordering of cells along the trajectory of a continuous process in a system, which allows the identification of the cell types at the beginning, intermediate, and end states of the trajectory

New cards
47

trajectory topology

-reveal gene expression dynamics across cell states

-identification of the factors triggering state transitions

<p>-reveal gene expression dynamics across cell states</p><p>-identification of the factors triggering state transitions</p><p></p>
New cards
48

gene co-expression network inference

-gene co-expression network (GCN): undirected graph where each node corresponds to a gene, and a pair of nodes is connected with an edge if there is a significant co-expression relationship between them

-can be constructed by looking for pairs of genes which show a similar expression pattern across samples, since the transcript levels of 2 co-expressed genes rise and fall together across samples

-co expressed genes could be controlled by the same transcriptional regulatory program, functionally related, or members of the same pathway or protein complex

<p>-gene co-expression network (GCN): undirected graph where each node corresponds to a gene, and a pair of nodes is connected with an edge if there is a significant co-expression relationship between them</p><p>-can be constructed by looking for pairs of genes which show a similar expression pattern across samples, since the transcript levels of 2 co-expressed genes rise and fall together across samples</p><p>-co expressed genes could be controlled by the same transcriptional regulatory program, functionally related, or members of the same pathway or protein complex</p>
New cards
49

gene regulatory network inference

assumption that the genes highly correlated in expression could be co-regulated

New cards
50

ligand-receptor network analysis

-to identify the protein messages passed between cells and their associated pathways

-to understand the directionality, magnitude and biological relevance of cell-cell communication

New cards
51

LIANA+

re-implements and adapts eight ligand-receptor methods to infer interactions from single-cell data, along w a flexible consensus that can integrate any combination of these methods

New cards
52

single cell atac-seq

-uses a hyperactive tn5 transposase to insert sequencing adapters into accessible chromatin regions

-measuring chromatin accessibility = potential regulatory sequences

New cards
53

single cell proteomics

-proteomic differences poorly coorelate with corresponding transcriptomic differences between biological states

-post translational modifications

-mass spec based single cell proteomics

-not that accessible right now

New cards
robot