Bioinformatics Databases

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/26

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

27 Terms

1
New cards

Bioinformatics

The acquisition, archiving and interpretation (analysis) of molecular biology information.

2
New cards

Biological Databases

A large, persistent collection of systemically organised data, managed by a software that can retrieve and update records.

3
New cards

Types of biological data

  • Genomic, transcriptomic, and protein sequences.

  • Genomic annotation, e.g. genes, transcription factors binding sites, gene function, pathways

  • Phenotypes and more..

4
New cards

Key concepts of biological databases

  • Unique identifier aka accession number

  • Fixed format/vocab

5
New cards

In molecular biology what are the two sequence formats frequently used?

GENBANK and FASTA

6
New cards

FASTA

de facto standard for any raw sequence. FASTA is a machine interpretable format.

  • a ‘>’ symbol for every new entry, with a unique identifier (name) and the sequence on the following line

7
New cards

Biological System

  • Cell lines

  • Animal model

  • Human

8
New cards

Organisation

  • Organelles

  • Single cells

  • Tissues

9
New cards

Scope and Coverage

  1. biased or partial (candidate gene)

  2. comprehensive (omics data)

10
New cards

Genesis

  1. Computational predictions

  2. experimental data

11
New cards

Curation

  1. Raw/archival data (SRA)

  2. Curated data (RefSeq)

12
New cards

Types of Curation

  1. computationally curated (UniProt)

  2. community curated (GO)

  3. Expert reviewed (RefSeq)

13
New cards

Primary Sources

Data derived from experiments (GenBank, DDBJ, PDB)

14
New cards

Secondary sources

Curated/ derived from analysis of primary data/ literature (Pfam, InterPro, PROSITE, OMIM)

15
New cards

Composite or hybrid sources

Characteristics of both primary and secondary data (Uniprot, Entrez)

16
New cards

Pubmed

Most popular free database, good support for structured queries.

17
New cards

Web of Science

Oldest paid access database and provides a reputable collection of journals with impact factors and number of citations.

18
New cards

Medline

Updated frequently, focus on life and medical sciences, access freely from Pubmed.

19
New cards

Scopus

Paid access database owned by Elsevier, an academic journals publisher.

20
New cards

GenBank

  • Archival in nature

  • Subjective

  • Multiple copies

  • Human readable format

  • part of NCBI nucleotide

  • Flat file format

21
New cards

Entrez Gene (NCBI gene)

  • Information about genes, unique identifiers and associated info

  • While GenBank is a primary database containing sequences Entrez Gene is another database containing many kinds of info.

22
New cards

UniProt

  • Protein focused database consisting of the combined databases SwissProt, TrEMBL

  • SwissProt is manually annotated and reviewed

  • TrEMBL are automatically annotated and not reviewed

23
New cards

Gene Ontology

Species-independent vocab that describes gene function in a systemic way using fixed terms- GO terms.

24
New cards

How are genes GO terms assigned?

Based on the literature about the function of the gene.

25
New cards

KEGG Pathway Database

Repository of species- independent metabolic pathways.

26
New cards

KEGG Orthology Database

Repository of species-independent molecular functions extracted from KEGG Pathways.

27
New cards

How are genes assigned molecular functions?

On the basis of their involvement on a particular KEGG pathway.