Bioinformatics

0.0(0)

Studied by 2 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/56

There's no tags or description

Looks like no tags are added yet.

Last updated 4:01 PM on 5/18/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai	Chat

No analytics yet

Send a link to your students to track their progress

57 Terms

New cards

Margaret O. Dayhoff (1965 1978)

Edited the collection of amino acid sequences compiled in the Atlas of protein sequence and structure by comparison of amino acid sequences by developing computer software for detecting distantly related sequences

New cards

NCBI

Established in USA
The primary information databank and provider of information

New cards

Bioinformatics

Blends computer science and biostatistics w/ biomedical sciences such as: epidemiology, genetics, genomics, and proteomics
a combination of biology and informatics

New cards

In silico

Bioinformatics is analysis in?

New cards

HUMAN GENOME PROJECT

spurred the rapid rise of bioinformatics as a formal discipline
Goal: facilitate the management, analysis and interpretation of data from biological experiments and observational studies.

New cards

Alignment

Lining up two or more sequences to search for the maximal regions of identity (or similarity) in order to assess the extent of biological relatedness or homology

New cards

Identity

The extent to which two sequences are the same

New cards

Local alignment

Alignment of some portion of two sequences

New cards

Multiple sequence alignment

Alignment of three or more sequences arranged with gaps so that common residues are aligned together

New cards

Optimal alignment

The alignment of two sequences with the best degree of identity

New cards

Conservation

Specific sequence changes (usually protein sequence) that maintain the properties of the original sequence

New cards

Similarity

The relatedness of sequences, the percent identity or conservation

New cards

Algorithm

A fixed set of commands in a computer program (stores the bioinformatics)

New cards

Domain

A discreet portion of a protein or DNA sequence

New cards

Motif

A highly conserved short region in protein domains

New cards

Gap

A space introduced in alignment to compensate for insertions or deletions in one of the sequences being compared

New cards

Homology

Similarity attributed to descent from a common ancestor

New cards

Orthology

Homology in different species due to a common ancestral gene

New cards

Paralogy

Homology within the same species resulting from gene duplication

New cards

Query

The sequence presented for comparison with all other sequences in a selected database (e.g., NCBI)

New cards

Annotation

Description of functional structures, such as introns or exons in DNA or secondary structure or functional regions to protein sequences

New cards

Interface

The point of meeting between a computer and an external entity, such as an operator, a peripheral device, or a communications medium

New cards

PubMed

Search service sponsored by the National Library of Medicine that provides access to literature citations in Medline and related databases

New cards

SwissProt

Protein database sponsored by the Medical Research Council (United Kingdom)

New cards

NATIONAL CENTER FOR BIOTECHNOLOGY INFORMATION (NCBI)

Create automated systems for storing and analyzing knowledge about molecular biology, biochemistry and genetics
Research into advanced methods of computer-based information processing for analyzing the structure and function of biologically important molecules and compounds
Facilitates the use of databases and software by biotechnology researchers and medical care personnel
Coordinate efforts to gather biotech information worldwide

New cards

Heterozygous mutations

Important in bioinformatics because more than one base or mixed bases at the same position in the sequence

New cards

Consensus sequences

A family of sequences with proportional representation of the polymorphic bases

New cards

IUB Universal Nomenclature for Mixed Bases

Made by the International Union of Pure and Applied Chemistry and International Union of Biochemistry and Molecular Biology
Their base designations in the IUB code are used to communicate consensus sequences and for computer input of polymorphic sequence data

New cards

BLAST - Basic Local Alignment Search Tool

for homology searches
Searches GenBank (large database maintained by NCBI)
Nucleic acid or amino acid sequences
searches for regions of local similarity between protein and nucleotide sequences

New cards

E-values

number of matches to the query
Decreases exponentially with the quality of the match
Very low E-values (10-12) associated with perfect match

New cards

FASTA

File extensions: ‘.fasta’ or ‘.fa’ o
Most widely used formats in bioinformatics o
Can be a single sequence format or multiple sequence forma o
The format f when you search it in the blast or other websites, it begins with ‘greater than’ (‘>’) symbol

New cards

Organism origin
Sequence accession number
Sequence description, features and comments
Separated by dash (-), underscore (_) and pipe symbols (|)

FASTA contains:

New cards

GenBank File

Starts with LOCUS and the sequence itself o
Begins with ORIGIN and ending with double slant (//)

New cards

EMBL File

Used by European Molecular Biology Laboratory (EMBL)
Begins with an identifier (marked with “ID”)
Start of sequence: marked by ‘SQ’ and the end by ‘//’ same with the GenBank file

New cards

CLUSTAL

Typical file extension: ‘.aln’ o
Includes multiple sequences in one file. It is used as an input format for phylogenic algorithms. Format starts with the word “CLUSTAL”
Each sequence is identified by a name followed by a space and a string of characters. The dashes (-) indicates deletions

New cards

NEXUS

Typical file extension: ‘.nex’ or ‘.nxs’ o
Begins with the wording “nexus” followed by blocks containing commands. Each block begins with “begin block name” and ends with “end”

New cards

PHYLIP

Typical file extension: ‘.phy’ or ‘.ph’ o
It begins with a number of sequence in the file followed by the length in base pairs of the alignment. Followed on the next line by the alignment block. Alignment block for each specie

New cards

Known

reference sequence

New cards

Unknown

query sequence

New cards

Global Alignment

Two sequences to be aligned are assumed to be generally similar over their entire length
Uses the Needleman-Wunsch algorithm
Carried out from beginning to end of both sequences

New cards

Compare two genes with same functions (humans vs mouse)
Compare two proteins with similar functions

Applications of global alignment:

New cards

Local Alignment

Based on Smith-Waterman
Does not assume that the two sequences in question have similarity over the entire length
Finds local regions with highest level of similarity between two sequences and aligns these regions without regard for alignment of the rest of the sequence regions. It aligns locally and not the entire sequence
Input: two sequences may or may not be related
Goal: see whether a substring in one sequence aligns well with a substring in the other

New cards

Searching for local similarities in large sequences (e.g. newly sequenced genomes)
Looking for conserved domains or motifs in two proteins

Applications of local alignment:

New cards

Pairwise sequence alignment

Used to find the best-matching piecewise (local or global) alignments of two query sequences.

New cards

Dot Matrix
The Dynamic Programming (DP) Algorithm
Word or K-Tuple Method

3 Primary Method of Producing Pairwise Alignments:

New cards

Dot matrix (Dot plots)

Similar nucleotides of 2 DNA sequences are represented as dots

New cards

Dynamic Programming (DP) method

Introduced by Richard Bellman (1940)
Useful in aligning nucleotide sequences of DNA and amino acid sequences of proteins coded by that DNA

New cards

Word method/K-Tuple method

Useful in large-scale database searches to find whether there is significant match available with the query sequence
Used in FASTA and BLAST family
Identify a series of short, non-overlapping subsequences (words) of the query sequence

New cards

Blastp
Blastn
Blastx
tBlastx
tBlastp

Standard BLAST are of 5 types:

New cards

BLASTp

This program compares an amino acid query sequence against a protein sequence database

New cards

BLASTn

It compares a nucleotide query sequence against a nucleotide sequence database

New cards

BLASTx

It searches the six frame translation products of a nucleotide sequence against a protein database

New cards

tBLASTn

It searches a protein sequence against translated nucleotide sequence in the database

New cards

tBLASTx

It compares the six frame translations of a nucleotide query sequence against six frame translations of database

New cards

Mega BLAST

It is a program optimized for aligning long sequences. It can only work with DNA sequences

New cards

PSI BLAST

It stands for position specific iterated BLAST. It is useful for protein similarity search

New cards

PHI BLAST

Pattern hit initiated BLAST, it can be used to search for a specific pattern or motif