Bioinformatics

0.0(0)
Studied by 2 people
call kaiCall Kai
Locked
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/56

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 4:01 PM on 5/18/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai
Chat

No analytics yet

Send a link to your students to track their progress

57 Terms

1
New cards

Margaret O. Dayhoff (1965 1978)

Edited the collection of amino acid sequences compiled in the Atlas of protein sequence and structure by comparison of amino acid sequences by developing computer software for detecting distantly related sequences

2
New cards

NCBI

  • Established in USA

  • The primary information databank and provider of information

3
New cards

Bioinformatics

  • Blends computer science and biostatistics w/ biomedical sciences such as: epidemiology, genetics, genomics, and proteomics

  • a combination of biology and informatics

4
New cards

In silico

Bioinformatics is analysis in?

5
New cards

HUMAN GENOME PROJECT

  • spurred the rapid rise of bioinformatics as a formal discipline

  • Goal: facilitate the management, analysis and interpretation of data from biological experiments and observational studies.

6
New cards

Alignment

Lining up two or more sequences to search for the maximal regions of identity (or similarity) in order to assess the extent of biological relatedness or homology

7
New cards

Identity

The extent to which two sequences are the same

8
New cards

Local alignment

Alignment of some portion of two sequences

9
New cards

Multiple sequence alignment

Alignment of three or more sequences arranged with gaps so that common residues are aligned together

10
New cards

Optimal alignment

The alignment of two sequences with the best degree of identity

11
New cards

Conservation

Specific sequence changes (usually protein sequence) that maintain the properties of the original sequence

12
New cards

Similarity

The relatedness of sequences, the percent identity or conservation

13
New cards

Algorithm

A fixed set of commands in a computer program (stores the bioinformatics)

14
New cards

Domain

A discreet portion of a protein or DNA sequence

15
New cards

Motif

A highly conserved short region in protein domains

16
New cards

Gap

A space introduced in alignment to compensate for insertions or deletions in one of the sequences being compared

17
New cards

Homology

Similarity attributed to descent from a common ancestor

18
New cards

Orthology

Homology in different species due to a common ancestral gene

19
New cards

Paralogy

Homology within the same species resulting from gene duplication

20
New cards

Query

The sequence presented for comparison with all other sequences in a selected database (e.g., NCBI)

21
New cards

Annotation

Description of functional structures, such as introns or exons in DNA or secondary structure or functional regions to protein sequences

22
New cards

Interface

The point of meeting between a computer and an external entity, such as an operator, a peripheral device, or a communications medium

23
New cards

PubMed

Search service sponsored by the National Library of Medicine that provides access to literature citations in Medline and related databases

24
New cards

SwissProt

Protein database sponsored by the Medical Research Council (United Kingdom)

25
New cards

NATIONAL CENTER FOR BIOTECHNOLOGY INFORMATION (NCBI)

  • Create automated systems for storing and analyzing knowledge about molecular biology, biochemistry and genetics

  • Research into advanced methods of computer-based information processing for analyzing the structure and function of biologically important molecules and compounds

  • Facilitates the use of databases and software by biotechnology researchers and medical care personnel

  • Coordinate efforts to gather biotech information worldwide

26
New cards

Heterozygous mutations

Important in bioinformatics because more than one base or mixed bases at the same position in the sequence

27
New cards

Consensus sequences

A family of sequences with proportional representation of the polymorphic bases

28
New cards

IUB Universal Nomenclature for Mixed Bases

  • Made by the International Union of Pure and Applied Chemistry and International Union of Biochemistry and Molecular Biology

  • Their base designations in the IUB code are used to communicate consensus sequences and for computer input of polymorphic sequence data

29
New cards

BLAST - Basic Local Alignment Search Tool

  • for homology searches

  • Searches GenBank (large database maintained by NCBI)

  • Nucleic acid or amino acid sequences

  • searches for regions of local similarity between protein and nucleotide sequences

30
New cards

E-values

  • number of matches to the query

  • Decreases exponentially with the quality of the match

  • Very low E-values (10-12) associated with perfect match

31
New cards

FASTA

  • File extensions: ‘.fasta’ or ‘.fa’ o

  • Most widely used formats in bioinformatics o

  • Can be a single sequence format or multiple sequence forma o

  • The format f when you search it in the blast or other websites, it begins with ‘greater than’ (‘>’) symbol

32
New cards
  • Organism origin

  • Sequence accession number

  • Sequence description, features and comments

  • Separated by dash (-), underscore (_) and pipe symbols (|)

FASTA contains:

33
New cards

GenBank File

  • Starts with LOCUS and the sequence itself o

  • Begins with ORIGIN and ending with double slant (//)

34
New cards

EMBL File

  • Used by European Molecular Biology Laboratory (EMBL)

  • Begins with an identifier (marked with “ID”)

  • Start of sequence: marked by ‘SQ’ and the end by ‘//’ same with the GenBank file

35
New cards

CLUSTAL

  • Typical file extension: ‘.aln’ o

  • Includes multiple sequences in one file. It is used as an input format for phylogenic algorithms. Format starts with the word “CLUSTAL”

  • Each sequence is identified by a name followed by a space and a string of characters. The dashes (-) indicates deletions

36
New cards

NEXUS

  • Typical file extension: ‘.nex’ or ‘.nxs’ o

  • Begins with the wording “nexus” followed by blocks containing commands. Each block begins with “begin block name” and ends with “end”

37
New cards

PHYLIP

  • Typical file extension: ‘.phy’ or ‘.ph’ o

  • It begins with a number of sequence in the file followed by the length in base pairs of the alignment. Followed on the next line by the alignment block. Alignment block for each specie

38
New cards

Known

reference sequence

39
New cards

Unknown

query sequence

40
New cards

Global Alignment

  • Two sequences to be aligned are assumed to be generally similar over their entire length

  • Uses the Needleman-Wunsch algorithm

  • Carried out from beginning to end of both sequences

41
New cards
  • Compare two genes with same functions (humans vs mouse)

  • Compare two proteins with similar functions

Applications of global alignment:

42
New cards

Local Alignment

  • Based on Smith-Waterman

  • Does not assume that the two sequences in question have similarity over the entire length

  • Finds local regions with highest level of similarity between two sequences and aligns these regions without regard for alignment of the rest of the sequence regions. It aligns locally and not the entire sequence

  • Input: two sequences may or may not be related

  • Goal: see whether a substring in one sequence aligns well with a substring in the other

43
New cards
  • Searching for local similarities in large sequences (e.g. newly sequenced genomes)

  • Looking for conserved domains or motifs in two proteins

Applications of local alignment:

44
New cards

Pairwise sequence alignment

Used to find the best-matching piecewise (local or global) alignments of two query sequences.

45
New cards
  1. Dot Matrix

  2. The Dynamic Programming (DP) Algorithm

  3. Word or K-Tuple Method

3 Primary Method of Producing Pairwise Alignments:

46
New cards

Dot matrix (Dot plots)

Similar nucleotides of 2 DNA sequences are represented as dots

47
New cards

Dynamic Programming (DP) method

  • Introduced by Richard Bellman (1940)

  • Useful in aligning nucleotide sequences of DNA and amino acid sequences of proteins coded by that DNA

48
New cards

Word method/K-Tuple method

  • Useful in large-scale database searches to find whether there is significant match available with the query sequence

  • Used in FASTA and BLAST family

  • Identify a series of short, non-overlapping subsequences (words) of the query sequence

49
New cards
  1. Blastp

  2. Blastn

  3. Blastx

  4. tBlastx

  5. tBlastp

Standard BLAST are of 5 types:

50
New cards

BLASTp

This program compares an amino acid query sequence against a protein sequence database

51
New cards

BLASTn

It compares a nucleotide query sequence against a nucleotide sequence database

52
New cards

BLASTx

It searches the six frame translation products of a nucleotide sequence against a protein database

53
New cards

tBLASTn

It searches a protein sequence against translated nucleotide sequence in the database

54
New cards

tBLASTx

It compares the six frame translations of a nucleotide query sequence against six frame translations of database

55
New cards

Mega BLAST

It is a program optimized for aligning long sequences. It can only work with DNA sequences

56
New cards

PSI BLAST

It stands for position specific iterated BLAST. It is useful for protein similarity search

57
New cards

PHI BLAST

Pattern hit initiated BLAST, it can be used to search for a specific pattern or motif