1/42
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Functional componentsof DNA primary structure
coding for protein/DNA, gene regulation
bacterial genome components
Open reading frame (ORF) - ATG - stop codon, Promoters - RNA pol binding site, Operators and regulators - protein binding sites to regulate transcription/translation
Prokaryote genome features
simple gene structure, small (0.5-10 million bp), no introns (easy to identify genes), high coding density (>90% codes for something), gene overlap (nested), some short genes (hard to identify)
Prokaryotes ORF/gene finding approaches
simple rule based, content based, similarity based
Simple rule based gene finding (prokaryote)
look for start codon, find stop codon in same reading frame, if >50codon/150bp its a gene, if <50codon/150bp from stop codon increment by 1 and start again
ORF finding program flaws
overlook small genes, over predict long genes
Content based gene finding (prokaryote)
RNA polymerase promoter site (-10 pribnow box/TATA box and -35 site), Shine Dalgarno sequence/ Ribosome binding site (RBS), Stem loop (rho independent) terminators, G/C content (higher in genes)
Prokaryotic promoters
2 short sequence for RNA pol binding (-10 and -35)
-10 sequence in promoter
Pribnow or TATA box, 6nt usually TATAAT, more conserved the sequence the higher the activity
Shine Dalgarno Motif
Ribosome binding site, 13bp upstream of AUG start codon, more conserved the sequence the higher the activity
Stem loop terminators
mechanism to treminator transcription via release/dissociation of RNA pol
Similarity based gene finding (prokaryote)
take known gene from related genome and compare via BLAST
Disadvantages of similarity based gene finding (prokaryote)
Orthologs/paralogs sometime lose function (pseudogenes), Not all gene known in comparison genome (rare to be complete novel usually similar domains), best species for comparison isn’t always obvious
Eukaryote genome features
complex gene structure, large genomes, Exons and Introns (hard to find similarity), low coding density (>30% are actual genes), alternate splicing, pseudogene
Eukaryote gene finding appraoches
Content based, Feature based, similarity based, pattern based
Content based method of gene finding (Eukaryotes)
CpG islands, GC content, Hexamer repeats, composition statistics, codon frequencies (codon bias in species)
Feature based methods of gene finding (Eukaryotes)
donor sites, acceptor sites, promoter sites, start/stop codons, polyA signals, feature lengths
Similarity based methods of gene finding (Eukaryotes)
sequence homology, EST searches, need reverse transcriptase for mRNA splicing?
Pattern based method of gene finding (Eukaryotes)
AI recognizes patterns better, HMNs, Artificial Neural Networks
BLAST
find similar sequences, measures organsimal relatedness, search DNA against databases
sequence evolution
Point mutation over time, single nucleotide polymorphisms (might not effect), Insertion/Deletion, Inversion
Homolgs
related genes, have common ancestor, orthologs and paralogs
Orthologs
homologs from evolution (speciation)
Paralogs
homologs within species, from duplication
NCBI BLAST (basic local alignment search tool)
fragment query into short “words” (short sequences), searches database for exact matches, performs local alignments, extend aligment until whole query
BLAST overview
our sequene (query) compared to library (database), looks for short matches (3-4bp called words) then extends, ranks hits based how well it aligns and how liekly it matches by chance (E)
How BLAST works
Heurisitc algorithms, local common words between query and sequence in database,any sequence similar enough/above threshold retrieved
What BLAST compares
amino acid, DNA, RNA
blastn
nucleotide-nucleotide, DNA query, returns most similar DNA sequence, from DNA database
blastp
protein-protein, protein query, returns most similar protein sequence, from protein database
blastx
Nucleotide 6 frame translation - protein, conceptual translation of all 6 reading frame query, protein - protein of all 6, returns most similar protein sequence, protein database
tblastx
Nucleotide 6 frame translation - Nucleotide 6 frame translation, conceptual translation of all 6 reading frame query, returns most similar DNA sequence, DNA database, very slow (translate whole database), find distant relationship between nt sequence (proteins more conserved?)
BLAST uses
Comparison, identifying species, Locating protein domains, Identifying Phylogenetic relationship, identifying putative(true) ORF
using BLAST for comparison
identify similar genes from related organism, helpful in genome annotation
using BLAST for Identifying species
working with environmental isolates, sequence data from unidentified organsim, use to potential ID unknown
using BLAST for Locating protein domains
locate known domains within your query sequence (conserved function)
using BLAST for Phylogenetic relationship
create phylogenetic tree, more similar = more related, generate data set related sequence for external phylogeny programs
Protein and DNA database
some governemently and some privately funded, most open to public, some for nt and other for proteins (integrated together)
Inputting query
FASTA is universal standard in bioinformatics, text based format, single letter codes for AA/nt, allow for sequence name to precede (> on first line)
Query coverage
how much of your query is in the match
E value
how likely the match is by chance alone
Percent identical
how much is identical when aligned, how many gaps
Average nucleotide identity
fragment 2 genomes and reciprocal BLASTn, calculate for all reciprocal hit, identify novelty, same species if >95%, <90% mean different species