1/17
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Diff between genomics and genetics
Genomics
larger scale and complexity of data/datasets
closely related to computer/technology and bioinformatics
CoT Analysis/Reassociation kinetics
Estimate the complexity and repetitive content of DNA → see how qucikly they anneal
extract DNA → hear DNA and fragment to single stranmd → wait and let cool to measure reassociation → plot ssDNA vs log C0T
x-axis - log10cot, y-axis % single-strand DNA
more “bumps” and smaler dropoffs = less repetiitive DNA, bigger dropoff w? less bumps = more repetitive DNA
Genetic Map
based on recombination frequency
measure relative positions of genes + affected by recombination hotspots
made by loofing at crossing over
“How often trians switch tracks between stations”
Physical Mapping (STS, contig, BAC, fingerprinting, FPC, YAC, two ways to close gaps (library and PCR))
ordered collection of clones from genomic library → find order and physical distance between DNA base pairs by DNA markers
Involves:
BAC - Bacterial Artificial Chromosome, clonging vector used with physical maps
STS - Sequence tagged site, marker, only hybridize one locaiton in genome
contig - set of overlapping DNA clones
Fingerprint - unique pattern of restriction fragments used to orient and order clone
FPC - fingerprinted contigs, contigs made with fingerprinting
YAC - Yeast artifical chromsome, used to potentially fill in gaps + some seq poisonous in BAC (and vice-versa)
Filling in gaps for physical map
1) long-range PCR
find clones in genomic libraries hybriziing to ends of contigs
Physical Map - X-value and coverage
denotes how much sequence has been duplicated
ex: 3X coverage means each base covered about 3 times
Coverage = 1-e-x
e-x = how much not covered
Reference genome and why it isn’t representative
complete sequence of an organisms chromosome
artificial because
involves many different people’s genomes
haploid
Sanger era (golden path, finishing, scaffold, shotgun seq) + issues of sanger
Human genome mainly used Sanger sequencing
picked the “golden path” with given physical map
Bac clone sequence with shotgun seq - random chosen sub-clones of BAC)
seq aligned were grouped into contigs → single consensus seq
finishing - filling in the gaps of contigs, often with PCR
scaffolds - connecting contigs logically but not by using contiguous sequence
Issues
1) needed to create physical map
2) subcloning BACs to sequence them
Short-read era (adv+dis of shortread, N50
used PCR instead of cloning DNA
SOLiD, Illumina, pyrosequencing
Advatange → cheap, high coverage
Disadvatange → short read (100-150 bp) anmd genome has may repeats, make assembly difficult
N50 → measure quality and contiguity of genome assembly
add up total contig sizes
add up contigs until hits half of total size
last number to reach 50% threshold is the N50
Long-read era
PacBio and Oxford
huge read with high error rates → long reads better for genome assembly
Genoem facts
largest genome → amoeba, 670Gb
Pufferfish have more genes than humans but 1/10th gneome size
S. cerevisae only few huynder iuntrons in 6.3k gnees, anothe rfungus C, neofromans has 5.3 introns per gene
C. elegnas have trnas and cis pslicing
most common repeat in pine is 5% genoem, most common repeat in corn in 75% of genome
intron of animal genomes are huge, but not plants or fungi
some highly conservedseq in mammmal don;t code fvor protein
Tandem Repats - micro/mini satelite, replication slippage
occur next to each other
microsatelite - coupkle of nt long
ministatelite - 10-60 bp long
long tandem repeats - duplications or inversions
satelite grow and shirnk via replication slippage - DNA pol slips causing indel
Intersperesed repeats - gene conversion
distributed all over genome
can copy to new locations in the genome and disrupt normal genes
“junk dna”
combat gene conversion
interspersed repeats find way into introns and make genes look dissimilar, preventing homologous conversion and gene conversion
RNA transposons (LTR, non-LTR (LINE and SINE))
also known as retrotransposons, are common in eukaryotic genomes, 42% of human genome
get name from going through RNA intermediate, from RNA →: DNA
RNA transposon types
1) LTR (long terminal repeats)
have long terminal repeats and encode enzymes allowing self replication and integration in genome
couplkt hundred nucleotides and enzymes are 5 kb, look similar to retrovirus
2) Non-LTR
also called LINEs (long interspersed nuclear elements), 21% of genome
LINE1 - 6kb lonmg, RT activity
many LINE truncated because RT falls off before ocmpleting transcript -:> explains nonfunctionality
another type → SINE, 100-700 nt long
human genome, SINE = Alu elements → 300 nt and 1 million present anout 10% genoem
SINE originate from small RNA, transcribed by RNA p[ol III
DNA transposons - Helitrons and politrons
dont use RNA intermediate
two types
Helitrons - eukaryotic DNA transposon, make copies via rolling circle)
Politrons - 15-20 kb lonf an related to viruses,
Psuedogene - retrosuedogenes.proicessed, non-processed, unitary, psuedo-psuedogene
look like broken protein coding genes, 4 majopr classes
1) Process psuedogenes (retropsuedogene)
mRNBA rev transcribed to DNA and inserted into geneome
contian a poly-A tail and have no intron
often form highly expressed genes
can create a higher BLASTX score than regular DNA
2)Non-processed
DNA psuedo genes
reuslt of incomplete duplication
retian OG sequence and look like real genes
3) Unitary
gene nonfuinctional without dupliocation
ex: GULOm, why we need Vitamin C
4) Psuedo-Psuedogenes
genes seem broken with nonsense mutation but may actually funmction
stop codons read through
Variation - SNP, SNV, SV, CNV
humans vary 1bp/1000bp (human/chimp = 1/100)
replication errors 1/100,000bp (30,000 errors per haploid gneome)
SNPs (single nucleotide polymorphisms)\
present in1% or more of population
often used as markers for GWAS
SNV → also simlar but cna be rare
calssified about where they occur in genome (ex: non0-coding, coding, indel)
SV (structural variant)
gneetic differences that make larger chagnes to genoeme
ex: indel, inversion, translocationm
CNV (copy miumber variant) →: region of chromosome where ther is differnece of repeasts
some diseases incluse Fragile X, Dup15q
SV harder to asses than SNPs
Human gneoem Trivia
Size - 3 billion bp
LKongest chromosome: 1 - 249Mbp, 2- 242bp, 3- 198 bp
Shortest Chromosome: 21: 47 Mbp, 22 - 51 Mbp, Y - 57 Mbp
Genes
20KL protein-coding genes
1% of genome corresponmds to protien-coding DNA
unknown number of RNA genes
Repeats - >50% repetitive geneome
13% SINE (11% Alu)\
20% LINE ( 17% LINE1)
8% LTR transposons
3% DNA elements
3% SSR
3% Duplications
variation - 1bp/1000bp
Metagenomics - 16S rDNA, rarefaction curve, OTU
Environmental sequencing
look at 16S rDNA, part of small ribosomal subunit
rarefaction curve → unique OTU (operational taxnomical units/unique seq) y-axis vs. # of sequences x-axis
tells us hwo manyh differnet species present