1/26
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Bioinformatics
The acquisition, archiving and interpretation (analysis) of molecular biology information.
Biological Databases
A large, persistent collection of systemically organised data, managed by a software that can retrieve and update records.
Types of biological data
Genomic, transcriptomic, and protein sequences.
Genomic annotation, e.g. genes, transcription factors binding sites, gene function, pathways
Phenotypes and more..
Key concepts of biological databases
Unique identifier aka accession number
Fixed format/vocab
In molecular biology what are the two sequence formats frequently used?
GENBANK and FASTA
FASTA
de facto standard for any raw sequence. FASTA is a machine interpretable format.
a ‘>’ symbol for every new entry, with a unique identifier (name) and the sequence on the following line
Biological System
Cell lines
Animal model
Human
Organisation
Organelles
Single cells
Tissues
Scope and Coverage
biased or partial (candidate gene)
comprehensive (omics data)
Genesis
Computational predictions
experimental data
Curation
Raw/archival data (SRA)
Curated data (RefSeq)
Types of Curation
computationally curated (UniProt)
community curated (GO)
Expert reviewed (RefSeq)
Primary Sources
Data derived from experiments (GenBank, DDBJ, PDB)
Secondary sources
Curated/ derived from analysis of primary data/ literature (Pfam, InterPro, PROSITE, OMIM)
Composite or hybrid sources
Characteristics of both primary and secondary data (Uniprot, Entrez)
Pubmed
Most popular free database, good support for structured queries.
Web of Science
Oldest paid access database and provides a reputable collection of journals with impact factors and number of citations.
Medline
Updated frequently, focus on life and medical sciences, access freely from Pubmed.
Scopus
Paid access database owned by Elsevier, an academic journals publisher.
GenBank
Archival in nature
Subjective
Multiple copies
Human readable format
part of NCBI nucleotide
Flat file format
Entrez Gene (NCBI gene)
Information about genes, unique identifiers and associated info
While GenBank is a primary database containing sequences Entrez Gene is another database containing many kinds of info.
UniProt
Protein focused database consisting of the combined databases SwissProt, TrEMBL
SwissProt is manually annotated and reviewed
TrEMBL are automatically annotated and not reviewed
Gene Ontology
Species-independent vocab that describes gene function in a systemic way using fixed terms- GO terms.
How are genes GO terms assigned?
Based on the literature about the function of the gene.
KEGG Pathway Database
Repository of species- independent metabolic pathways.
KEGG Orthology Database
Repository of species-independent molecular functions extracted from KEGG Pathways.
How are genes assigned molecular functions?
On the basis of their involvement on a particular KEGG pathway.