Information in Primary structure

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/42

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

43 Terms

1
New cards

Functional componentsof DNA primary structure

coding for protein/DNA, gene regulation

2
New cards

bacterial genome components

Open reading frame (ORF) - ATG - stop codon, Promoters - RNA pol binding site, Operators and regulators - protein binding sites to regulate transcription/translation

3
New cards

Prokaryote genome features

simple gene structure, small (0.5-10 million bp), no introns (easy to identify genes), high coding density (>90% codes for something), gene overlap (nested), some short genes (hard to identify)

4
New cards

Prokaryotes ORF/gene finding approaches

simple rule based, content based, similarity based

5
New cards

Simple rule based gene finding (prokaryote)

look for start codon, find stop codon in same reading frame, if >50codon/150bp its a gene, if <50codon/150bp from stop codon increment by 1 and start again

6
New cards

ORF finding program flaws

overlook small genes, over predict long genes

7
New cards

Content based gene finding (prokaryote)

RNA polymerase promoter site (-10 pribnow box/TATA box and -35 site), Shine Dalgarno sequence/ Ribosome binding site (RBS), Stem loop (rho independent) terminators, G/C content (higher in genes)

8
New cards

Prokaryotic promoters

2 short sequence for RNA pol binding (-10 and -35)

9
New cards

-10 sequence in promoter

Pribnow or TATA box, 6nt usually TATAAT, more conserved the sequence the higher the activity

10
New cards

Shine Dalgarno Motif

Ribosome binding site, 13bp upstream of AUG start codon, more conserved the sequence the higher the activity

11
New cards

Stem loop terminators

mechanism to treminator transcription via release/dissociation of RNA pol

12
New cards

Similarity based gene finding (prokaryote)

take known gene from related genome and compare via BLAST

13
New cards

Disadvantages of similarity based gene finding (prokaryote)

Orthologs/paralogs sometime lose function (pseudogenes), Not all gene known in comparison genome (rare to be complete novel usually similar domains), best species for comparison isn’t always obvious

14
New cards

Eukaryote genome features

complex gene structure, large genomes, Exons and Introns (hard to find similarity), low coding density (>30% are actual genes), alternate splicing, pseudogene

15
New cards

Eukaryote gene finding appraoches

Content based, Feature based, similarity based, pattern based

16
New cards

Content based method of gene finding (Eukaryotes)

CpG islands, GC content, Hexamer repeats, composition statistics, codon frequencies (codon bias in species)

17
New cards

Feature based methods of gene finding (Eukaryotes)

donor sites, acceptor sites, promoter sites, start/stop codons, polyA signals, feature lengths

18
New cards

Similarity based methods of gene finding (Eukaryotes)

sequence homology, EST searches, need reverse transcriptase for mRNA splicing?

19
New cards

Pattern based method of gene finding (Eukaryotes)

AI recognizes patterns better, HMNs, Artificial Neural Networks

20
New cards

BLAST

find similar sequences, measures organsimal relatedness, search DNA against databases

21
New cards

sequence evolution

Point mutation over time, single nucleotide polymorphisms (might not effect), Insertion/Deletion, Inversion

22
New cards

Homolgs

related genes, have common ancestor, orthologs and paralogs

23
New cards

Orthologs

homologs from evolution (speciation)

24
New cards

Paralogs

homologs within species, from duplication

25
New cards

NCBI BLAST (basic local alignment search tool)

fragment query into short “words” (short sequences), searches database for exact matches, performs local alignments, extend aligment until whole query

26
New cards

BLAST overview

our sequene (query) compared to library (database), looks for short matches (3-4bp called words) then extends, ranks hits based how well it aligns and how liekly it matches by chance (E)

27
New cards

How BLAST works

Heurisitc algorithms, local common words between query and sequence in database,any sequence similar enough/above threshold retrieved

28
New cards

What BLAST compares

amino acid, DNA, RNA

29
New cards

blastn

nucleotide-nucleotide, DNA query, returns most similar DNA sequence, from DNA database

30
New cards

blastp

protein-protein, protein query, returns most similar protein sequence, from protein database

31
New cards

blastx

Nucleotide 6 frame translation - protein, conceptual translation of all 6 reading frame query, protein - protein of all 6, returns most similar protein sequence, protein database

32
New cards

tblastx

Nucleotide 6 frame translation - Nucleotide 6 frame translation, conceptual translation of all 6 reading frame query, returns most similar DNA sequence, DNA database, very slow (translate whole database), find distant relationship between nt sequence (proteins more conserved?)

33
New cards

BLAST uses

Comparison, identifying species, Locating protein domains, Identifying Phylogenetic relationship, identifying putative(true) ORF

34
New cards

using BLAST for comparison

identify similar genes from related organism, helpful in genome annotation

35
New cards

using BLAST for Identifying species

working with environmental isolates, sequence data from unidentified organsim, use to potential ID unknown

36
New cards

using BLAST for Locating protein domains

locate known domains within your query sequence (conserved function)

37
New cards

using BLAST for Phylogenetic relationship

create phylogenetic tree, more similar = more related, generate data set related sequence for external phylogeny programs

38
New cards

Protein and DNA database

some governemently and some privately funded, most open to public, some for nt and other for proteins (integrated together)

39
New cards

Inputting query

FASTA is universal standard in bioinformatics, text based format, single letter codes for AA/nt, allow for sequence name to precede (> on first line)

40
New cards

Query coverage

how much of your query is in the match

41
New cards

E value

how likely the match is by chance alone

42
New cards

Percent identical

how much is identical when aligned, how many gaps

43
New cards

Average nucleotide identity

fragment 2 genomes and reciprocal BLASTn, calculate for all reciprocal hit, identify novelty, same species if >95%, <90% mean different species