Bioinformatics Midterm

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/158

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

159 Terms

1
New cards

Natural selection

Adaptations where selective pressure only allows species with better fitness to survive

2
New cards

Genotype

Genetic variants in the genome

3
New cards

Phenotype

Phenotype is the trait that results from the organism’s life history, environment, epigenetics, and genotype

4
New cards

Life history

Sum total of past and present environments

5
New cards

Epigenetic factors

Non-heritable changes to DNA

6
New cards

Allele

Alternate form of a gene

7
New cards

Population

Group of same species that lives in the same area

8
New cards

Mutations

Heritable change to the genome

9
New cards

Recombination

Process of chromosomes receiving a unique set of genetic information during meiosis. Close chromosomes are more likely to be inherited as a set and far apart chromosomes are more likely to be inherited independently

10
New cards

Gene duplication

Extra copies of the gene are created

11
New cards

Gene loss

Copy of a gene is removed

12
New cards

Gene flow

Transfer of genetic material between populations

13
New cards

Microevolution

The evolution that occurs within a population over a relatively short time scale, often involving small genetic changes that can lead to variations in traits

14
New cards

Macroevolution

The evolution that occurs over long periods of time and typically results in the formation of new species

15
New cards

Central Dogma

The process of information going from the genome to the transcriptome and finally to the proteome

16
New cards

Exon

Part of DNA that is included in the mature mRNA

17
New cards

Intron

Part of DNA that is removed from the mature mRNA

18
New cards

Codons

A set of three nucleic acids that codes for a particular animo acid

19
New cards

Side chain

The part of the amino acid that determines the folding, bond formation, and effects interactions

20
New cards

Folding pattern

The way a sequence of amino acids arranges itself within 3D space many factors contribute to this process

21
New cards

Primary structure

The sequence of amino acids in a peptide

22
New cards

Secondary structure

Formation of peptide chains into higher structures such as alpha helixes and beta sheets

23
New cards

Tertiary structure

Recurring patterns of interactions between helices and sheets

24
New cards

Supersecondary structure

A compact arrangement of secondary structure motifs that is smaller than a domain

25
New cards

Domains

compact units within
The folding pattern of a single chain that appears
independently stable, each has specific function(s)

26
New cards

Modular proteins

Contains multiple copies of the same domain

27
New cards

Databases

Organized sets of data that are typically sorted by data type (ex a DNA sequence database)

28
New cards

Homologous

Feature is due to a related structure in a common ancestor

29
New cards

Common ancestor

The most recent ancestors that is related to two species that diverged from that common ancestor

30
New cards

Similarity

The extent to which nucleotide or protein sequences are related. Based on identity and conservation

31
New cards

Homolog

A sequence that is similar due to shared common ancestry

32
New cards

Ortholog

Homologous sequences in different species that arose from a common ancestral gene during speciation. May or may not be responsible for a similar function

33
New cards

Paralog

Homologous sequences within a single species that arose by gene duplication

34
New cards

Convergent evolution

Independent origin of trait or phenotype

35
New cards

Pairwise alignment

The process of lining up two sequences in order to achieve maximal levels of identity for the purpose of assessing the degree of similarity

36
New cards

Phylogeny

Inference of evolutionary relationships reconstructed as a tree diagram

37
New cards

LINES

Repetitive element consisting of long interspersed nuclear elements

38
New cards

SINES

Repetitive element consisting of short interspersed nuclear elements

39
New cards

Genome assembly

Sequence of an entire genome ordered as the continuous DNA molecules that make up each chromosome

40
New cards

Scaffold

A portion of the genome that consists of gaps and contigs

41
New cards

Contig

A continuous sequence

42
New cards

NGS

Next generation sequencing is a field of bioinformatics that aims to expand the methods for rapidly sequencing DNA and RNA

43
New cards

Illumina

Newer method of sequencing that shreds the entire genome into 400-600 base pair fragments. This method can read 600 million of these fragments at once

44
New cards

Nanopore

A small portable device that is capable of rapidly sequencing entire genomes

45
New cards

Karyotype

The complete set of chromosomes for an organism

46
New cards

G-banding

Giemsa stain that stains the A-T rich regions a darker color than the rest of the chromosome. Each chromosome has a unique G-banding pattern that is used to number chromosome regions

47
New cards

Linkage map

Gives a visual representation for the distance that genes are apart on a chromosome. This determines the likelihood of them being inherited together

48
New cards

Recombination frequency

Used to group and order genes, 1% recombination is equal to 1 centiMorgan

49
New cards

FISH

Fluorescent in situ hybridization which is used to verify that genome assembly is correct by fluorescently labeling DNA to be viewed under a microscope

50
New cards

BAC map

A method to create scaffolds by using chromosome sections that are inserted into plasmids and replicated in bacteria. The bacterial artificial chromosomes are ordered into a map using genetic markers

51
New cards

Marker

A known location on a chromosome that is useful for identification

52
New cards

Locus

A specific location on a chromosome

53
New cards

Human genome hg38

The current version of the human genome which was created in December 2013

54
New cards

UCSC Genome Browser

An interactive collection of a variety of genome sequences of many types of species

55
New cards

Gene model

The most basic annotation defining the start and stop of a sequence that is transcribed to RNA and preforms a function

56
New cards

Open reading frame

Sequence of DNA that can be translated into a protein and contains a start and stop codon

57
New cards

Promotor

A DNA site upstream from the protein coding region that indicates where transcription proteins should begin 

58
New cards

Splice junction

The process of removing introns from RNA

59
New cards

5’UTR

Is upstream of the initiation codon

60
New cards

3’UTR

Is downstream of the initiation codon

61
New cards

TF binding sites

The location on DNA where the transcription factor binds to begin transcription

62
New cards

cDNA

Complementary DNA that is used to measure the amount of expression of different genes

63
New cards

RNAseq

A technique that determines what RNA is present in a cell and how much of it. Gives a snapshot of the transcriptome of the cell

64
New cards

GWAS

Genome-wide association studies looks at many individuals with genetic variants to determine if that variant is associated with a specific trait

65
New cards

SNP

Single nucleotide polymorphisms are variations at a single position in the DNA sequence

66
New cards

Tag SNP

A region in the genome that has high linkage disequilibrium and represents a groups of SNPs called a haplotype

67
New cards

Haplotype

A group of alleles that are inherited together from a single parent

68
New cards

Haplotype block

Region of an organism’s genome where there is little evidence of genetic recombination

69
New cards

International HapMap Project

Goal was to develop a haplotype map for the human genome

70
New cards

1,000 Genomes Project

The replacement for the International HapMap Project

71
New cards

Aliphatic

Compounds that consist entirely of carbon and hydrogen in a straight line

72
New cards

Aromatic

Containing at least one aromatic ring with alternating double bonds

73
New cards

Polar/nonpolar

Specifies whether or not the compound contains a dipole

74
New cards

Homology

Trait or sequence inherited by a common ancestor

75
New cards

Identity

The extent to which two sequences are invariant

76
New cards

Conservation

Changes at a specific position of an amino acid or sequence that preserve the physio-chemical properties of the original residue

77
New cards

Score

Determines how well the sequences align

78
New cards

Identities

Gives the number of residues within the sequence that are an exact match

79
New cards

Positives

Gives the number of residues within the sequence that are similar

80
New cards

Log-odds score

Score aligned positions based on the likelihood of a mutation occurring

81
New cards

Substitution Matrix

Contains values proportional to the probability that amino acid i mutates into amino acid j for all pairs of amino acids. Constructed by assembling a large and diverse sample of pairwise alignments

82
New cards

PAM

Point accepted mutation is based on global alignments of closely related proteins

83
New cards

BLOSUM

Blocks substitution matrix is based on observed alignments that are not extrapolated

84
New cards

BLAST

Allows rapid sequence comparison of a query sequence against a database

85
New cards

E-value

The expect value is the number of different alignments expected to occur by chance that have a score of S or better given the database (lower = a better match)

86
New cards

NR database

The non-redundant nucleotide database contains all the nucleotide sequences available on NCBI

87
New cards

RefSeq database

A publicly available database of available nucleotide sequences 

88
New cards

DELTA-BLAST

Domain enhanced lookup time accelerated works best for more distantly related proteins

89
New cards

Conserved domain database

A database containing annotated protein sequences

90
New cards

Homologous sites

Refer to features that share a common evolutionary origin

91
New cards

Ancestral sites

Refer to the common ancestors from which homologous sites evolved

92
New cards

Structural sites

Features that have a similar function but do not have a common ancestor

93
New cards

Disulfide bridges

Occur between two cysteine residues. The bond forms between the two sulfur atoms

94
New cards

Transmembrane regions

The nonpolar region of the peptide chain that interacts with the hydrophobic membrane core

95
New cards

Progressive alignment

First aligns the most similar sequences then additional sequences are progressively aligned to the existing alignment

96
New cards

Needleman & Wunsch dynamic programming

Program gives a score to every possible alignment in order to find the best alignment

97
New cards

Guide tree

Tree calculated from the distance matrix

98
New cards

Gaps

Missing sequences

99
New cards

CLUSTAL

Uses progressive alignment, ideal for smaller alignments

100
New cards

MUSCLE

Uses progressive alignment, ideal for larger alignments, more accurate than CLUSTAL