MCB 182 Midterm 3

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/114

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

115 Terms

New cards

Difference between genetics and genomics

volume and complexity of data

New cards

C0t analysis

where you heat up dsDNA to separate the two strands, then let it cool and let the ssDNA find complementary sequences and become dsDNA again.

-repeated sequences will find complementary sequences much faster than single copy (rare/unique) sequences

New cards

Cot graph

The Y axis is % ssDNA, and X axis is Log10 Co (time??)

New cards

If a C0t graph has more repeats

the more steep the curve

New cards

genetic map

the order and distance between genetic markers

New cards

genetic map units

recombination frequency

New cards

Why use recombination frequency?

There can be recombination that occurs more often in some parts of a chromosome than others, so the perceived distance between equally, physically spaced markers can be different

New cards

genetic map marker

visible phenotype resulting from mutation

New cards

physical genome map

ordered collection of clones from a genomic library

New cards

most commonly used cloning vector and how long is it

BAC, ~200 kb

New cards

physical genome marker

a sequence tagged site (STS)

New cards

What is a STS?

STS is a defining/unique part of a genome; any fragment/sequence that hybridizes in only one location in the genome.

New cards

Contig

A contiguous set of clones (overlapping sequences that make continous sense when kept together)

New cards

Finger-printed contigs (FPC)

Fingerprint = unique pattern of restriction fragments. Clones that overlap in sequences can have fingerprints in common, which means they can overlap each other. can be full of gaps that they need long-range PCR or genomic libraries for fixing

New cards

X value

Denotes the completeness of a map; how much of the sequence has been duplicated in various clones.

New cards

Formula for coverage?

Coverage = 1 - e ^ (-X)

New cards

How does coverage work?

If a physical map has 3X coverage, then that should be about 95% of the genome. That’s weird because it depends on the typa genome?

New cards

Given 4x BAC coverage, draw a representative contig and compute how much of the genome will NOT be sampled. You may use 2 in place of e.

ASK

New cards

reference genome

the complete sequence of an organism’s chromosomes.

New cards

Why is a reference genome an oversimplification of a real genome?

Genomes are variant within the species (each lil guy is special). The organism might be diploid while assembly is haploid which makes things weird

New cards

Maxam-Gilbert sequencing tech

200-600 nucl.;

0.01 MB/h (speed);

1e^-4 error;

useful for footprinting

New cards

Sanger sequencing

500-1000 nucleotides;

0.1-0.2 MB/h; 1e^-4 error;

useful for verification

New cards

pyrosequencing

200-500 nucleotides;

20-30 MB/h;

1e^-3 error

New cards

SOLiD

25-35 nucleotides;

5-15 MB/h;

1e^-2 error

New cards

EARLY illlumina

25-50nucl.

20 speed;

1e^-2 error

New cards

LATE illumina

100-150 nuc

50k MBh

3e^-3 error

useful for RNA-seq, ChiP-seq, etc.

New cards

PacBio

30kb nuc;

1300 speed;

10-20% error;

useful for genome assembly

New cards

oxford nanopore

15kb

700 speed

10-20% error

useful for genome assembly

New cards

What are the 3 eras of sequencing and how do they solve the genome assembly problem?

Sanger era →

Short-read era →

Long-read era →

New cards

sanger era of sequencing

-relied on cut and compare tactics

-pick a minimal tiling path across clones (golden path)

-you did shot gun sequencing (randomly chosen sub-clones); the sequences that you found that aligned with each other were thrown into a contig

-altogether you make a consensus sequence (OLC approach, overlap, layout, consensus)

-not easy to connect contigs (lots of experimentation with PCR)

-once the sequences were complete, you could keep combining them and go onto to creat chromsomes

New cards

scaffolds

a natural/logical connection between contigs

New cards

FPC-OLC approach produced which genomes?

S. cerevisae, C. elegans. D. melanogaster, homo sapians, A. thaliana, M. musculus

New cards

Issues of sanger era

1) need to create a physical map

2) subcloning the BACs to sequencing them

New cards

Short read era /NGS era

Instead of cloning DNA, they used PCR amplifiaction, and sequencing couldbe done with a WHOLE GENOME SHOT GUN (WGS) without BAC clones/subclones; involved SOLiD, pyrosequencing, and Illumina

-described by N50 value

New cards

Problems of NGS/short read stuff (pyro, illumina, SOLiD)?

They were…short (at most 150 bp). Genomes are full of repeats (ex. Illumina did NOT like GC rich sequences and didn’t sequence it.

-tons of contigs (not great for whole chromosome sequencing)

New cards

How to calculate N50 value?

-sort all contigs by size

(1,2,3,4,5,6,7,8,9,10=total is 55)

-Add up fragments till you exceed half the total assembly size

half of 55 IS 27.5.
10+9+8+7=34 OR 1+2+3+4+5+6+7=28 *THESE ARE UR ONLY OPTIONS, U MUST GO IN ORDER OF SIZE
Those values (34 or 28) is way higher than 27.5
Note that 7 occurs in them both (it’s sort of the median value if you will)
So 7 is your N50.

New cards

long-read era

Current era; PacBio, Oxford nanopore.

single molecule sequencing technologies that can do long reads with high error rates, but overall better for genome assembly.

Inclusive of repetitive structures.

difficult to assemble since you’re assembling whole chromosomes rather than just one BAC.

New cards

E.col genome size

4.6M

New cards

D. radiodurans genome size

3.1 M

New cards

S.CEREVISEIA genome size

12.1M

New cards

c elegans genome size

100M

New cards

d MELANOGASTER GENOME size

140M

New cards

A thaliana genome size

160 M

New cards

o SATIVA GENOME SIZE

430 m

New cards

Z mays genome size

2.5G

New cards

P. taeda genome size

22G

New cards

Humans GENOME SIZE

3.1M

New cards

M musculus genome size

2.5g

New cards

D rerio genome size

1.7G

New cards

T rubripes genome size

390 M

New cards

We tend to reserve the word repeats for…

…something that doesn’t have an obvious function. For example, different members/branches of a gene family would not neccessarily be called repeats.

New cards

tandem repeats

occur right next to each other.

includes microsatellites or minisatellites

can grow/shrink based on replication slippage

New cards

microsatellites / SSR / STR

really really short, like only a few nucleotides

most common in humans is AC repeat

New cards

minisatellites

when the repat is longer(10-60 bp)

New cards

long tandem repeats can also be called duplications/inversions depending on..

…whether they are in a row or in a antiparallel set up.

New cards

Interspersed repeats

distributed across the entire genome; though more concentrated in some regions than others

can copy themselves to new locations > can interrupt genes

can help combat gene conversion and homologous combination (identical repeats copying over multiple times/keep sequences looking similar, it combats by maintaining variation in sequences)

New cards

RNA transposons

RNA retrotransposons. 42% of genome is this. Occurs by going through an RNA intermediate and go backwards from RNA to DNA

LTR and non LTR

New cards

LTR transposons

long terminal repeats (LTR) at the end, (100nucleotides)

but the bulk of it encodes enzymes that help them replicate and integrate into genome (5kb or more)

New cards

NON LTR retrotransposons are also called LINEs

LINE stands for long interspersed nuclear elements

21% of genome is composed of LINES

recall the LINE1 and SINE elements

New cards

LINE

this is a non-LTR retrotransposon that is up to 21% of ur genome.

6kb long and encodes endonuclease protein and revers transcriptase stuff. Only a few of the expressed proteins from this are active becuaes the rest of them are warped up bc the rev transcriptase activity gives up. Still, these can be inserted into the genome (AT rich sequences)

New cards

SINE

short interspersed nuclear element/Alu elemts.

100-700 nucl.

genome as 1 mil copies of the 300 nt Alu (10%)

originate from small RNA like 72L RNA (signal peptide recognition) or from tRNA variants

transcribed by RNAP III

reverse transcribed by LINE RT (DONT NEED THEIR OWN ENZYMES to copy themselves to new locations)

New cards

DNA transposons

do NOT go through an RNA intermediate. Both proks and euks have’em

structure is a transposase (enzyme) gene flanked by inverted tandem repeats (TIR)

recall helitrons and polintons

New cards

helitrons

eukaryotic DNA transposon that makes copies of itself using rolling circle replication (present in most euks but no common)

New cards

polintons

newest member of DNA transposons

15-20 kb in length

widely dist in euks

New cards

Psuedogenes

“broken” protein coding genes

broken = truncation, frameshift, nonsense mutations

consider the 4 classes

New cards

Processed Pseudogenes

most common; retro-pseudogene.

mRNA is rev-transcribed to DNA and put into genome

have poly A tails and all introns removed

derived from high expression genes

common to be truncated bc the RT falls off before it reaches 5’ end

New cards

Non processed DNA pseudogenes

result of incomplete duplication; look really similar to original sequence; sometimes the copy can pick up mutations and become useless

New cards

unitary pseudogenes

nonfunctional without duplication

New cards

GULO

example of a unitary pseudogene. Broken verion of L-gulon-lactone oxidase in the human genome, which is why we gotta eat vitamin C and othe mammals don’t

New cards

pseudo-pseudogenes

a gene with nonsense mutations that can actually function normally or like the protein isnt translated but the nucleotide sequence acts like a decoy so the actual gene is unaffected. a secret agent gene if you will

New cards

where does variation come from?

replication errors are about 1/100KB

external mutation sources(UV, radiation, etc.)

every cell division there’s 30000 errors per haploid genome

no two cells are alike

New cards

SNPs

single nucleotide polymorphisms are mots common form

mutations that segregate in a population that may or may not lead to a phenotype

can be markers in GWAS(genome wide association study)

classified as noncoding, coding, indel

New cards

non coding SNP

occurs in intergenic regions (between genes)

in intragenic regions(within a gene)

which can be a spice site, within a intron, or a UTR

New cards

coding synonymous snp

this SNP doesn’t change the amino acid

New cards

non synonymous coding SNP

this coding snp can be a missense (changes the amino acid) or nonsense (changes amino acid to a stop)

New cards

indel frameshift SNP

SNP is inserted into a protein coding region and causes a frameshift

New cards

indel noncoding snp

an SNP inserted into something and rendering it noncoding CHECK LECTURE

New cards

structural variants

structural variants (SVs) are genetic differences that make larger changes to chromosomes than single nucleotide variations (SNVs). hard to assess than SNPs so the quality in population isn’t known

includes insertions, deletions, inversions, translocations, copy number variations (CNVs)

New cards

CNV

a structural variation where a region of a chr. has some difference in the number of repeats.

EX. fragile x syndrome and dup15q syndrome

New cards

size of human genome

around 3 billion bp

New cards

longest chromosomes are what and how long

1(249 Mbp),

2, (242 Mbp)

3 (198 Mbp)

New cards

smallest chromosomes are what and how long

21 (47 Mbp)

22 (51 mbp)

Y (57 mbp

New cards

Tell me ur gene trivia

we got 20k protein coding genes

1% of the genome has to do with protein coding DNA

we got no clue about how many RNA genes we have

New cards

How much of the genome is repetitive?

More than 50 percent is repetitive

13% SINE (11 percent Alu)

20% LINE) (17 percent LINE1)

8 percent LTR transposons

3 percent DNA elements

3 percent SSR

3 percent duplications

New cards

people differ from each other by about

1 SNP per 1000 bp

New cards

free living bacteria

are bacteria that can kinda live anywhere since they aint dependent on other bacteria; only make up about 1 percent or less of all bacteria (no idea really)

New cards

metagenomics / environmental sequencing

sequencing literally everything in a particular environment to see how it all relates to each other

New cards

the simplest form of metagenomic analysis is sequencing 16S rDNA. what’s up with 16S that makes it cool?

it has highly conserved sequences, so somebody could PCR them with very universal/highly applicable primers and then look at the variable regions between them. not a very high res technique tho

New cards

a rarefaction curve

tells you how many different species are present in a metagenomic analysis

New cards

What is operational taxonomic units refer to (OTU)?

An indication of how complex your environment is, but kind of a standard for differentiating different species.

Higher OTU = higher complexity (less duplicates)

Low OTU = less complex (more duplicates)

New cards

X and Y axis for rarefaction curve?

x axis - number of sequences

y axis - unique OTU

New cards

Do people still target the 16s?

No. Most people are doing shotgun sequecning (randomly sequencing whole genomes). They compare the reads to known proteins and figure out what their functions might be. Sometimes you wanna know who’s in your sample, sometimes you wanna know what’s even happening in there

New cards

When studying RNA, why do we use cDNA? What is cDNA?

rna is not stable. so we use reverse transcriptase to convert RNA sequence into a DNA sequence

New cards

reverse transcriptase - all that you know

New cards

100

New cards