bioinformatics-basic tools

Studied by 0 people

0.0(0)

LearnA personalized and smart learning plan

Practice TestTake a test on your terms and definitions

Spaced RepetitionScientifically backed study method

Matching GameHow quick can you match all your cards?

FlashcardsStudy terms and definitions

1 / 28

Earn XP

Description and Tags

Bionformatics

29 Terms

Bioinformatics

is the science of storing, retrieving and analysing large amounts of biological information

New cards

Where can I find nucleotide sequence datasets?

International Sequence Database Collaboration

New cards

Where can I find protein sequences datasets?

UniProt Consortium

New cards

Where can I find macromolecular structure datasets?

Worldwide Protein Data Bank

New cards

Where can I find molecular interaction datasets?

The International Molecular Exchange Consortium

New cards

Where can I find Protein identifications

The ProteomeXchange Consortium

New cards

Where can I find genomic and clinical datasets?

Global alliance for Genomics and Health

New cards

What are primary datasets?

are populated with experimentally derived data

New cards

What are secondary datasets?

comprise data derived from the results of analysing primary data

New cards

What are some examples of secondary datasets?

InterPro (protein families, motifs and domains)

UniProt Knowledgebase (sequence and functional information on proteins)

Ensembl (variation, function, regulation and more layered onto whole genome sequences)

New cards

What are some examples of primary datasets?

ENA, GenBank and DDBJ (nucleotide sequence) ArrayExpress and GEO (functional genomics data)

Protein Data Bank (PDB; coordinates of three-dimensional macromolecular structures)

New cards

What is metadata and what is an example of it?

ssentially data about the data. If you’re involved in sequencing samples from the environment, perhaps to understand biodiversity in different conditions, or to investigate associations between crop yield and differences in soil flora, it would be useful to know when and where your samples were collected for instance.

New cards

What is an example of a data library that has metadata in bioinformatics?

BioSamples database

New cards

Minimum information standards

Their purpose is to ensure the data generated by these methods can be easily verified, analysed and interpreted by the wider scientific community. Ultimately, they facilitate the transfer of data from journal articles (unstructured data) into databases (structured data) in a form that enables data to be mined across multiple data sets.

New cards

Where can I make sure that my experimental data follows data protocols?

FAIRsharing.org

New cards

What’s the most simple part of a controlled vocabulary?

The simplest type of controlled vocabularies are non-hierarchical lists of terms, such as a list of countries. Annotating data with these lists makes it easier to filter or search for related records in a database. For example, if you use Europe PMC’s advanced literature search, and filter the results by language, you are choosing from items in a list determined by a controlled vocabulary

New cards

thesaurus (in IT)

is defined as a controlled and structured vocabulary in which concepts are represented by terms.

New cards

Where can I learn more about PubMed databases?

The NLM provides a series of webinars and tutorials about MeSH; one that may be of particular interest is Searching Drugs or Chemicals in PubMed

New cards

What is an ontology in IT?

is a representation of the shared background knowledge for a community (7). An ontology describes the categories of objects described in a body of data, the relationships between those objects, and the relationships between those categories

New cards

What are some tips for managing and collecting accurate data?

Start early – begin collecting data and metadata at the beginning of your experiment
Consider creating a data management plan, using tools such as DMPonline and the Data stewardship wizard
Identify the correct database (see ‘Where do I submit my data?‘ on the next page)
Speak to the curators who work with that database – check what you need to submit!
Learn about the metadata requirements and data standards used in your field. You can look these up on FAIRsharing.org.
Use an ontology to annotate the data, for example the Experimental Factor ontology.

New cards

How do I submit data to EMBL-EBI?

Through the EMBL-EBI submission portal

New cards

What does InterPro do?

Find protein families

New cards

What website can, based on a provided sequence, build a protein model?

Swiss Model

New cards

What does a QMEAN z-score mean in Swiss Model?

e represents an estimate of how comparable the model is to experimentally derived structures of similar size. QMEAN z-scores around zero indicate good agreement between the model structure and experimental structures of similar size. Models of low quality typically have scores of -4.0 or lower. The “thumbs-up” and “thumbs-down” symbols next to the score are used to indicate whether or not the model is of good quality (9). Another approach is to factor in observations of the quality of the alignment and template search method – this is represented in the GMQE (Global Model Quality Estimation) score. The GMQE score reflects the expected accuracy of that alignment and is expressed as a number between 0 and 1 where higher numbers indicate higher reliability (9). For more information see the SWISS-MODEL documentation pages.

New cards

What are some tools to help intergrate data?

, UniProt ID Mapping and Ensembl Biomart allow you to convert a set of identifiers from one format to another. There are also mappings of different controlled vocabularies, but care needs to be taken that you don’t lose data. For example, a term in one ontology might be mapped to a term that is less granular, so you might lose specificity. At EMBL-EBI we use application ontologies, the archetypal example of which is the Experimental Factor Ontology, to solve this problem.

New cards

What does EMBL-EBI’s Embassy Cloud do?

EMBL-EBI’s Embassy Cloud provides EMBL-EBI’s collaborators with direct access to their datasets hosted at EMBL-EBI, and to the institute’s powerful computing resources. This shared, high-performance workspace allows project partners in many locations to analyse their data alongside public offerings, using their own approaches.

New cards

What is a good guideline for data standirdardization?

Toni Kazic’s guide for data provenanc

New cards

Where can I see drug targets and disease data?

Open Targets

New cards

What are the four steps of a bioinformatics experiment?

– Search
– Compare
– Model
– Integrate

New cards