BIOL 266 Final Exam Review

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/264

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

265 Terms

1
New cards

structural bioinformatics

field that predicts and models function from 3D structure

2
New cards

genomics

field that predicts phenotype from genotype

3
New cards

molecular novelty

field that novel genes and gene functions from newly sequenced genomes and metagenomes

4
New cards

evolutionary adaptation

field that analyzes how changes in genotype lead to changes in phenotype and evolutionary adaptation

5
New cards

computational biology

the application of computational tools to solve biological problems (many disciples, ie/ genomics, biophysics, ecology, molecular bio, etc.)

6
New cards

bioinformatics

more emphases on analysis of high-throughput data, notably genome-sequence data. A subset of computational biology.

7
New cards

pattern discovery

subtask of comp bio -learn patterns from biological data

8
New cards

prediction

subtask of comp bio -use patterns to predict biological function

9
New cards

integration

subtask of comp bio -develop models that connect levels of info

10
New cards

simulation

subtask of comp bio - model behavior of biological systems on a computer

11
New cards

engineering

subtask of comp bio - design novel biological systems for specific purposes

12
New cards

therapy

subtask of comp bio - design molecular therapeutics to combat disease

13
New cards

wet lab

data generation testing

14
New cards

dry lab

interpretation, prediction, model-building, hypothesis-generation

15
New cards

algorithm

a set of rules or instructions specifying how to solve a problem

16
New cards

tool

implementation of an algorithm (software)

17
New cards

sequence analysis

determining the optimal alignment between sequences, searching databases, for homologs, organization and interpretation of data

18
New cards

phylogenetic analysis

organization of sequences according to their evolutionary relationships

19
New cards

genome analysis

finding and analyzing genes within the context of an entire genome

20
New cards

transcriptomics and proteomics

examines levels of gene or protein expression

21
New cards

network and systems biology

analyzing a biological system as a network of interacting components

22
New cards

synthetic biology

the use of computers to design new biological systems

23
New cards

dna polymers

specific sequences of nucleotides

24
New cards

nitrogenous base

nucleotides differ by which _________ they contain

25
New cards

genome

organism's DNA-based genetic instructions; composed of genes

26
New cards

genes

dna instructions for making proteins

27
New cards

central dogma

DNA --> RNA --> protein

28
New cards

transcription

assisted by RNA polymerase

29
New cards

translation

assisted by ribosomes

30
New cards

mRNA, tRNA, rRNA

three main types of RBA; messenger, transfer, ribosomal

31
New cards

RNA

-primary structure similar to DNA

-can be single or double stranded

-can exhibit different conformations (unlike DNA)

-contains U instead of T

32
New cards

gene expression

process of using DNA info to make mRNA and proteins

33
New cards

promoter sequence

what RNA polymerases look for to recognize beginning of genes

34
New cards

enhancers

. Modulate gene expression and can be far from the gene. Needed on top of promoters for eukaryotes.

35
New cards

prokaryotes

use +ve and -ve regulation for transcription

36
New cards

genetic code

RNA to amino acids

37
New cards

ORF

long stretches of DNA that are uninterrupted by stop codons and therefore encode protein

38
New cards

genes

ORFs + additional regulatory info

39
New cards

start codon

met, AUG, translation of DNA to RNA

40
New cards

stop codons

UAA, UAG, UGA, expected once every 20th codon

41
New cards

hydrophobic amino acids

amino acids with long alkyl side chains. More likely to be found in the interior of proteins. A I L M P V F W

42
New cards

hydrophilic (polar) amino acids

C N Q S T Y G

43
New cards

Charged amino acids

(-) D E, (+) K R H

44
New cards

sequencing

determining the exact nucelotide sequence of DNA

45
New cards

sequencing methods

Maxam- Gilbert: chemical degradation

Dideoxy (Sanger): chain termination

next gen (high throughput): many types

46
New cards

next gen methods

illumina (solexa)

-up to 1 Tb/rn (>5 human genomes)

-125 base pairs (paired end)

454 pyrosequencing

-up to 600 Mb/run

400-500 base pairs (single direction)

47
New cards

evolution

changes in inherited characteristics of biological populations over successive generations

48
New cards

genotype

organism's genetic information

49
New cards

phenotype

observable features of an organism, encoded by genotype

50
New cards

mutations

-point, duplication, insertion, deletion

-drive differences between species (variation)

51
New cards

homology

similarity due to common ancestry

52
New cards

homologous

evolutionarily related

53
New cards

NCBI GenBank

part of the national centre for biotechnology information, part of US national institute of health

54
New cards

protein sequencing timeline

1955: first complete sequence (insulin, Ryle et al.)

1965: ~20

1980:~1500

today: ~200 million

55
New cards

nucleotide sequencing timeline

1953: structure, watson and crick

60s-70s: small RNAs, cloning, then PCRs

1982: creation of genbank, simpler, democratization of data

56
New cards

sequence revolution

1980s/90s

-development of more efficient computer hardware and software

-birth of bioinformatics (term was coined in the 70s as the study of info processes in biotic systems)

57
New cards

DNA databases

NCBI GenBank: WGS, CoreNucleotide, dbGSS

58
New cards

RNA databases

NCBI: GEO, dbEST, UniGene

59
New cards

protein databases

NCBI and others: NCBI protein, UniProt, Protein Data Bank

60
New cards

flow of info

curation --> annotation --> release

61
New cards

core data

-key info in the db entry and minilan info req'd to identify it

-included data derived from experimental results (ex. sequence, structural data)

62
New cards

annotations

-all additional info, 2ndary info, may change over time

-ex. known or predicted functional info

63
New cards

purines

guanine and adenine

64
New cards

pyrimidines

thymine and cytosine

65
New cards

flatfile db

-data (ex. sequences) are stored as a text file or a collection of text files

-flat, as in sheet of paper

-easy to input, distribute, search, and retrieve data

66
New cards

relational databases

-data stored within a number of tables linked together by a shared field, the key (which must be unique to each record)

-handles huge mounts of data, reducing data in memory, faster search and retrieval

67
New cards

fasta file type

-.fa, .faa, .fna, .fasta

-header followed by raw data

68
New cards

ncbi genbank file type

-header, features, dequence

-each sequence filed with an accession number

-any revisions made, version number changes

69
New cards

accession number

~4-10 numbers/letters to identify specific DNA and protein sequence records

70
New cards

feature key

keyword indicating functional group

71
New cards

location

instructions for finding the feature

72
New cards

qualifiers

auxiliary info about a feature

73
New cards

protein dbs

NCBI protein, UniProtKB, Protein Information Resource (PIR), SWISS-PROT, TrEMBL

74
New cards

entries

databases are composed of

75
New cards

common queries

gene/protein name/function, db identifiers, species names, raw sequences

76
New cards

logic

these are operators that indicate relationships among searches. Ex. AND, OR, NOT, NOR, NAND, XOR, XNOR

77
New cards

homology searches

best way to find genes/proteins related to yours of interest

78
New cards

data quality and info content

redundancy, efficiency, automatic and manual quality control

79
New cards

computer error

incorrect annotations, missed relationships (insufficient info extraction)

80
New cards

human error

multiple contributions, vector sequence left in, PCR chimeras, taxonomic misidentification, trivial data entries

81
New cards

quality control

manual approach to deal with errors, ex. 20% of fungi sequences were misidentified

82
New cards

sequence alignment

identification of character matches preserving character order

83
New cards

true alignment

reflects evolutionary relationship between 2+ sequences that share a common ancestor (homology)

84
New cards

global alignment

attempt to align entire sequence, ex. NW

85
New cards

local alignment

stretches of sequences with highest density of matches are aligned, ex. SW

86
New cards

function, structure, evolutionary information

aligning sequences useful to discover

87
New cards

similarity, patterns, relationships

alignments reveal

88
New cards

score and compute

to understand alignments we need to:

89
New cards

a good alignment has

many matches, few mismatches, few gaps

90
New cards

dynamic programming

Used by both NW and SW, solves the problem by breaking it down into subproblems

91
New cards

db search

needed to find closest homolog of a given sequence

-important for predicting sequence function, genome annotation, phylogenetics, determining taxonomic identity of a sequence

92
New cards

SSEARCH

-extension of pairwise alignment

-instead 1v1, 1vmany

-problem: speed

-use for for comparing local dbs

93
New cards

BLAST

-basic local alignment search tool

-faster than SW

-word-based (k-tuples), ungapped, locally optimal

0larger word length permits inexact matches between words

-heuristic procedure

-minimum word length: 3 for proteins, 16 for nt

94
New cards

E value

expectation; the number of matches with scores equivalent to or better than S that are expected to occur in a db search by chance.

-borderline significant <0.01

-highly significant <1e-10

95
New cards

BLAST programs

blastp: protein query v protein db

blastn: nt query v nt db

blastx: translated nt v protein db

tblastn: protein v translated nt db

tblastx: translated nt v translated nt db

PSI-BLAST: detection of emote protein homology using profiles

96
New cards

BLAST process

1. break query into words

2.search for matches

3. extend matches in both directions until score beloe threshold

4.merge HSPs into a longer alignment (further extend and allow gaps)

5. report statistical significance

97
New cards

BLAST artifacts

-longer the sequence higher the score (this is natural)

-query sequence w/ repeats artificially inflates score

-low complexity regions

-conservative with short query sequences

98
New cards

multiple sequence alignment

>2 sequences, basis of phylogenetic reconstruction, indicates patterns of conservation and variation (for finding functional residues, motifs), greater accuracy of overall alignment

99
New cards

order

can affect end result of MSA

100
New cards

MSA challenges

-finding best alignment that takes mutations/gaps for ALL sequences into account

-scoring entire alignment

placement and scoring of gaps

-cannot easily extend dynamic programming algorithms like SW or NW