1/102
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai | Chat |
|---|
No analytics yet
Send a link to your students to track their progress
Margaret O. Dayhoff
edited the collection of amino acid sequences compiled in the Atlas of protein sequence and structure by comparison of amino acid sequences by developing computer software for detecting distantly related sequences
EMBL
established data library in 1980
NCBI
established in USA and became the primary information databank and provider of information
Bioinformatics
combination of biology and informatics
In silico
analysis by computer
Human Genome Project
spurred the rapid rise of bioinformatics as a formal discipline
NCBI
creates automated systems for storing and analyzing knowledge about molecular biology, biochemistry and genetics
Identity
extent to which two sequences are the same
Alignment
lining up two or more sequences to search for maximal regions of identity or similarity
Local alignment
alignment of some portion of two sequences
Multiple sequence
alignment of three or more sequences arranged with gaps so common residues align together
Optimal alignment
alignment of two sequences with the best degree of identity
Conservation
sequence changes that maintain the properties of the original sequence
Similarity
relatedness of sequences, percent identity or conservation
Algorithm
fixed set of commands in a computer program
Domain
discreet portion of a protein or DNA sequence
Motif
highly conserved short region in protein domains
Gap
space introduced in alignment to compensate for insertions or deletions
Homology
similarity attributed to descent from a common ancestor
Orthology
homology in different species due to a common ancestral gene
Paralogy
homology within the same species resulting from gene duplication
Query
sequence presented for comparison with all other sequences in a selected database
Annotation
description of functional structures such as introns or exons in DNA
Interface
point of meeting between a computer and an external entity
GenBank
genetic sequence database sponsored by the National Institutes of Health
PubMed
search service sponsored by the National Library of Medicine providing access to literature citations in Medline and related databases
SwissProt
protein database sponsored by the Medical Research Council United Kingdom
International Union of Pure and Applied Chemistry and International Union of Biochemistry and Molecular Biology
organizations that made the IUB universal nomenclature for mixed bases
R
purine or A and G
Y
pyrimidine or C and T
M
A and C
K
G and T
S
C and G
W
A and T
H
A, C, T or not G
B
C, G, T or not A
V
A, C, G or not T
D
A, G, T or not C
N
A, C, G, T or any
X or ?
unknown A or C or G or T
O or -
deletion
BLAST
Basic Local Alignment Search Tool
BLAST
used for homology searches
BLAST
searches GenBank maintained by NCBI
BLAST
searches for regions of local similarity between protein and nucleotide sequences
E-value
number of matches to the query sequence
Very low E-values 10^-12
associated with perfect match
Mis-primes
caused by multiple potential binding sites
Off-target products
caused by multiple potential binding sites
GenBank
international nucleotide sequence database and repository of NCBI
ENA
international nucleotide sequence database and repository of EMBL-EBI
DDBJ
nucleotide sequence database in Japan
UniProt
protein database with sequence and functional annotation
Ensembl
vertebrate and eukaryotic genomes database
Ensembl genomes
genome-scale data for bacteria, protists, fungi, plants and invertebrate metazoa
InterPro
functional analysis database for protein sequences
Pfam
manually curated collection of protein domain families
FASTA
most widely used format in bioinformatics
.fasta or .fa
file extensions for FASTA
FASTA
format beginning with greater than symbol >
GenBank file
starts with LOCUS and sequence itself
ORIGIN
beginning of sequence in GenBank format
//
ending of GenBank sequence
EMBL file
used by European Molecular Biology Laboratory
ID
identifier marking beginning of EMBL file
SQ
start of sequence in EMBL format
.aln
typical file extension for CLUSTAL
CLUSTAL
multiple sequence alignment format used for phylogenic algorithms
Dashes -
indicate deletions in CLUSTAL
.nex or .nxs
typical file extensions for NEXUS
NEXUS
begins with wording “nexus” followed by blocks containing commands
.phy or .ph
typical file extensions for PHYLIP
Sequence alignment
arranging DNA, RNA or protein sequences to identify regions of similarity
Reference sequence
known sequence
Query sequence
unknown sequence
Global alignment
uses Needleman-Wunsch algorithm
Global alignment
assumes sequences are similar over entire length
Local alignment
based on Smith-Waterman
Local alignment
finds local regions with highest similarity
Pairwise sequence alignment
used by BLAST
Dot matrix
old method of producing pairwise alignments
Dynamic programming algorithm
advanced method of producing pairwise alignments
Word or K-tuple method
advanced method used in FASTA and BLAST
Dot plots
another term for dot matrix
Richard Bellman
introduced dynamic programming method in 1940
Word or K-tuple method
identifies short non-overlapping subsequences of query sequence
BLASTp
compares amino acid query sequence against protein database
BLASTn
compares nucleotide query sequence against nucleotide database
BLASTx
searches six frame translation products of nucleotide sequence against protein database
tBLASTn
searches protein sequence against translated nucleotide sequence database
tBLASTx
compares six frame translations of nucleotide query against six frame translations of database
Mega BLAST
optimized for aligning long DNA sequences
PSI BLAST
position specific iterated BLAST
PHI BLAST
pattern hit initiated BLAST
Nucleotide BLAST
option clicked first when performing BLAST procedure
CCAGAGTCCAGCTGCTGCTCATACTACTGATACTGCTGGG
example sequence used for BLAST practice
BLASTn
program selected under program selection category during procedure
E-value less than 1.0
indicates significant alignments
Percent identity
percentage similarity between query and subject sequence
Accession number
unique identifying number assigned to a sequence before database entry