1/56
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai | Chat |
|---|
No analytics yet
Send a link to your students to track their progress
Margaret O. Dayhoff (1965 1978)
Edited the collection of amino acid sequences compiled in the Atlas of protein sequence and structure by comparison of amino acid sequences by developing computer software for detecting distantly related sequences
NCBI
Established in USA
The primary information databank and provider of information
Bioinformatics
Blends computer science and biostatistics w/ biomedical sciences such as: epidemiology, genetics, genomics, and proteomics
a combination of biology and informatics
In silico
Bioinformatics is analysis in?
HUMAN GENOME PROJECT
spurred the rapid rise of bioinformatics as a formal discipline
Goal: facilitate the management, analysis and interpretation of data from biological experiments and observational studies.
Alignment
Lining up two or more sequences to search for the maximal regions of identity (or similarity) in order to assess the extent of biological relatedness or homology
Identity
The extent to which two sequences are the same
Local alignment
Alignment of some portion of two sequences
Multiple sequence alignment
Alignment of three or more sequences arranged with gaps so that common residues are aligned together
Optimal alignment
The alignment of two sequences with the best degree of identity
Conservation
Specific sequence changes (usually protein sequence) that maintain the properties of the original sequence
Similarity
The relatedness of sequences, the percent identity or conservation
Algorithm
A fixed set of commands in a computer program (stores the bioinformatics)
Domain
A discreet portion of a protein or DNA sequence
Motif
A highly conserved short region in protein domains
Gap
A space introduced in alignment to compensate for insertions or deletions in one of the sequences being compared
Homology
Similarity attributed to descent from a common ancestor
Orthology
Homology in different species due to a common ancestral gene
Paralogy
Homology within the same species resulting from gene duplication
Query
The sequence presented for comparison with all other sequences in a selected database (e.g., NCBI)
Annotation
Description of functional structures, such as introns or exons in DNA or secondary structure or functional regions to protein sequences
Interface
The point of meeting between a computer and an external entity, such as an operator, a peripheral device, or a communications medium
PubMed
Search service sponsored by the National Library of Medicine that provides access to literature citations in Medline and related databases
SwissProt
Protein database sponsored by the Medical Research Council (United Kingdom)
NATIONAL CENTER FOR BIOTECHNOLOGY INFORMATION (NCBI)
Create automated systems for storing and analyzing knowledge about molecular biology, biochemistry and genetics
Research into advanced methods of computer-based information processing for analyzing the structure and function of biologically important molecules and compounds
Facilitates the use of databases and software by biotechnology researchers and medical care personnel
Coordinate efforts to gather biotech information worldwide
Heterozygous mutations
Important in bioinformatics because more than one base or mixed bases at the same position in the sequence
Consensus sequences
A family of sequences with proportional representation of the polymorphic bases
IUB Universal Nomenclature for Mixed Bases
Made by the International Union of Pure and Applied Chemistry and International Union of Biochemistry and Molecular Biology
Their base designations in the IUB code are used to communicate consensus sequences and for computer input of polymorphic sequence data
BLAST - Basic Local Alignment Search Tool
for homology searches
Searches GenBank (large database maintained by NCBI)
Nucleic acid or amino acid sequences
searches for regions of local similarity between protein and nucleotide sequences
E-values
number of matches to the query
Decreases exponentially with the quality of the match
Very low E-values (10-12) associated with perfect match
FASTA
File extensions: ‘.fasta’ or ‘.fa’ o
Most widely used formats in bioinformatics o
Can be a single sequence format or multiple sequence forma o
The format f when you search it in the blast or other websites, it begins with ‘greater than’ (‘>’) symbol
Organism origin
Sequence accession number
Sequence description, features and comments
Separated by dash (-), underscore (_) and pipe symbols (|)
FASTA contains:
GenBank File
Starts with LOCUS and the sequence itself o
Begins with ORIGIN and ending with double slant (//)
EMBL File
Used by European Molecular Biology Laboratory (EMBL)
Begins with an identifier (marked with “ID”)
Start of sequence: marked by ‘SQ’ and the end by ‘//’ same with the GenBank file
CLUSTAL
Typical file extension: ‘.aln’ o
Includes multiple sequences in one file. It is used as an input format for phylogenic algorithms. Format starts with the word “CLUSTAL”
Each sequence is identified by a name followed by a space and a string of characters. The dashes (-) indicates deletions
NEXUS
Typical file extension: ‘.nex’ or ‘.nxs’ o
Begins with the wording “nexus” followed by blocks containing commands. Each block begins with “begin block name” and ends with “end”
PHYLIP
Typical file extension: ‘.phy’ or ‘.ph’ o
It begins with a number of sequence in the file followed by the length in base pairs of the alignment. Followed on the next line by the alignment block. Alignment block for each specie
Known
reference sequence
Unknown
query sequence
Global Alignment
Two sequences to be aligned are assumed to be generally similar over their entire length
Uses the Needleman-Wunsch algorithm
Carried out from beginning to end of both sequences
Compare two genes with same functions (humans vs mouse)
Compare two proteins with similar functions
Applications of global alignment:
Local Alignment
Based on Smith-Waterman
Does not assume that the two sequences in question have similarity over the entire length
Finds local regions with highest level of similarity between two sequences and aligns these regions without regard for alignment of the rest of the sequence regions. It aligns locally and not the entire sequence
Input: two sequences may or may not be related
Goal: see whether a substring in one sequence aligns well with a substring in the other
Searching for local similarities in large sequences (e.g. newly sequenced genomes)
Looking for conserved domains or motifs in two proteins
Applications of local alignment:
Pairwise sequence alignment
Used to find the best-matching piecewise (local or global) alignments of two query sequences.
Dot Matrix
The Dynamic Programming (DP) Algorithm
Word or K-Tuple Method
3 Primary Method of Producing Pairwise Alignments:
Dot matrix (Dot plots)
Similar nucleotides of 2 DNA sequences are represented as dots
Dynamic Programming (DP) method
Introduced by Richard Bellman (1940)
Useful in aligning nucleotide sequences of DNA and amino acid sequences of proteins coded by that DNA
Word method/K-Tuple method
Useful in large-scale database searches to find whether there is significant match available with the query sequence
Used in FASTA and BLAST family
Identify a series of short, non-overlapping subsequences (words) of the query sequence
Blastp
Blastn
Blastx
tBlastx
tBlastp
Standard BLAST are of 5 types:
BLASTp
This program compares an amino acid query sequence against a protein sequence database
BLASTn
It compares a nucleotide query sequence against a nucleotide sequence database
BLASTx
It searches the six frame translation products of a nucleotide sequence against a protein database
tBLASTn
It searches a protein sequence against translated nucleotide sequence in the database
tBLASTx
It compares the six frame translations of a nucleotide query sequence against six frame translations of database
Mega BLAST
It is a program optimized for aligning long sequences. It can only work with DNA sequences
PSI BLAST
It stands for position specific iterated BLAST. It is useful for protein similarity search
PHI BLAST
Pattern hit initiated BLAST, it can be used to search for a specific pattern or motif