Bioinformatics: Accessing biological data

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/23

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

24 Terms

1
New cards

Which journal mantains an authoritative, yearly updated list of molecular biology databases?

The Nucleic Acids Research (NAR) journal.

2
New cards

What is the key characterictic of a primary database?

It contains raw, experimental results and the initial experimental interpretation → GenBank for nucleotide sequences

3
New cards

What is the key characteristic of a secondary database?

It contains data derived from primary resources, such as collections of protein families or conserved domains → PFAM, SCOP

4
New cards
  • Nucleotide Sequence: UniGene

  • Protein Sequence: RefSeq, CCDS

Primary or secundary?

Primary

5
New cards
  • Collection of conserved protein sequence motifs: PFAM, CDD

  • Protein families: GPCRDB, CAZY

  • Conserved structural domains: SCOP, CATH, Superfamily

Primary or secundary?

Secundary

6
New cards

What are the three major international nucleotide sequence databases coordinated by the INSDC?

  • GenBank (NCBI,US)

  • EMBL-EBI European Nucleotide Archive (Europe)

  • DDBJ (DNA Databank of Japan)

7
New cards

What are the different types of Nucleotides data, stored in different databases?

  • Raw genomic sequences (chromosomal DNA)

  • cDNAs

  • Expressed Sequence Tagas (ESTs) libraries

  • Sequence-Tagged Sites (STSs)

  • Genome Survey Sequences (GSSs)

  • High Throughput Genomic Sequence (HTGS)

  • Whole Genome Shotgun projects

8
New cards

What are the Reference Sequences collections (RefSeq)?

It provides the best representative sequence of each transcript or protein produced by a gene. There may be hundreds of GenBank entries corresponding to a gene, but only one RefSeq gene entry.

9
New cards

The RefSeq database aims to provide a comprehensive set of sequences. Which of the following is not a charateristic of RefSeq?

a. Non-redundant

b. Well annotated

c. Contains raw, unprocessed data

d. Provides representative sequences

c. Contains raw, unprocessed data. (RefSeq is curated and processed, unlike primary databases like GenBank).

10
New cards

What type of record has this starting accession code?

NM_

Trancript products. Mature mRNA

11
New cards

What type of record has this starting accession code?

NP_

Protein products

12
New cards

What type of record has this starting accession code?

NR_

Non-coding transcripts (structural RNAs, pseudogenes…)

13
New cards

What type of record has this starting accession code?

NC_

Complete genomic molecules (genomes, chromosomes, organelles, plasmids)

14
New cards

What type of record has this starting accession code?

NW_ NT_

Incomplete genomic assemblies. Contigs

15
New cards

What type of record has this starting accession code?

NZ_###

Collection of whole genome shotgun sequence data for a projecct (###). Unfinished.

16
New cards

What type of record has this starting accession code?

XM_

Automated model mRNA provided by genome annotation

17
New cards

What type of record has this starting accession code?

XP_, YP_, ZP_,

Protein

18
New cards

What type of record has this starting accession code?

XR_

Non-coding transcripts

19
New cards

Which NCBI resource provides integrated access to genes and genomes, connecting sequence, mapping, expression, and homology data from worldwide databases?

The NCBI Gene database

20
New cards

What is the primary purpose of the UniGene project?

To provide an organized, gene-oriented view of the transcriptome by automatically clustering Expressed Sequence Tags (ESTs) into non-redundant sets.

21
New cards

In the UniGene database, what does a cluster containing tens of thousands of ESTs most likely represent?

A highly expressed gene.

22
New cards

The Consensus Coding Sequence (CDDS) project tries to identify what?

A core set of human and mouse protein-coding regions that are consistently annotated and of high quality.

23
New cards

Which four groups collaborate in the CCDS?

  • EBI

  • NCBI

  • Wellcome Trust Sanger Institute

  • University of California Santa Cruz (UCSC)

24
New cards

What does the NCBI Genome explorer do?

  • Organizes information on genomes including sequences, maps, chromosomes, assemblies and annotations

  • Summarizes the available sequencing projects of any given organism

  • Allows a simple visualization of the genome content