Genomics & Bioinformatics: Computers and Sequencing

0.0(0)
Studied by 0 people
call kaiCall Kai
Locked
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/80

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 9:42 PM on 4/27/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai
Chat

No analytics yet

Send a link to your students to track their progress

81 Terms

1
New cards

What is RAM?

Random access memory which is connected to central processor. The storage. Random = can be accessed in any order

2
New cards

PERL

Practical Extraction and Report Language (type of program). Developed by Larry Wall, 1987. Processes text (runs on Windows, Mac). Develop own code to write program

3
New cards

What are the 3 major databases?

EMBL - housed at EBI

GenBank - housed at NCBI

DDBJ - housed in Japan

4
New cards

What was the first fully sequenced DNA genome?

That of Bacteriophage Phix174 (5000 bp) and Bacteriophage MS2

5
New cards

Who's genome was the first human genome to be fully sequenced?

James T. Wattson

6
New cards

HGS

more organized, longer. Requires pre-existing map (library) to organize large clone contigs. Still uses Shotgun to clone and then sequences separately.

7
New cards

Sanger sequencing

Old method (not used anymore). Chemically label fragments by dyeing primer

8
New cards

Examples of NGS

Pyrosequencing, Pair-end, Bridge amplification, Ion Torrent

9
New cards

Examples of platforms that do NGS

454 Pyro-, Illumina/Solexa, Helicos, Pacbio

10
New cards

What does cloning require?

Origin, marker, and polilinker. Restriction enzyme cuts at marker to make fragments of specific sequences.

11
New cards

Sequential vs Binary search

Sequential=checks each genomic sequence one by one

Binary=splits datasets in half to search them. Much faster for bigger sequences

12
New cards

Paralog vs Ortholog homology

Paralog = genetic diversity after duplication event

Ortholog = same gene in different species derived from common ancestor

13
New cards

Why are paralogs often more diverse than other homologous sequences?

Because they come from gene duplications which allow them to vary and diverge more from the original copy

14
New cards

Out-paralogs

duplication and then separate speciation events

15
New cards

Xeno-paralogs

homologous genes that arose from horizontal gene transfer

16
New cards

Is there a difference between the E value and p value from stats?

Only that the E value is rounded up. They represent the same thing (significance)

17
New cards

Accession numbers and corresponding item.

NG_ = gene

NM_ = nucleotide (DNA or RNA)

NP_ = protein

NC_ = complete genome or chromosome

NT_ = genomic contig

18
New cards

Different types of sequencing on NCBI?

BLAST and OMIM

19
New cards

Types of BLAST searches

n = DNA -> DNA

p = Protein -> Protein

x = DNA -> protein

t-n = Protein -> DNA

t-x = translated DNA -> translated DNA

20
New cards

Different genome sequencing approaches

1) Sanger's method (hardly used anymore)

2) Maxam-Gilbert

3) Next Gen

4) Pair-end

5) Bridge amplification

6)Pyro

7) Ion torrent

21
New cards

Why do RNA sequencing?

Better at mapping variants

22
New cards

BLOSUM

Blocks Substitution Matrix. Local alignment. Derived from the observed alignment. Higher # = more conserved. Threshold: L% identity

23
New cards

How many base pairs does the human genome have?

2.8-3.4 billion

24
New cards

Are coding genes located in GC or AT rich areas?

GC rich

25
New cards

How much of the human genome is repeated sequences?

50% (all noncoding)

26
New cards

Unsupervised computing

reads and uses the raw data, without labels. No human intervention

27
New cards

What is the most conserved protein?

Tryptophan (W). Only protein that has 1 codon too.

28
New cards

STUDY TEXTBOOK QUESTIONS

29
New cards

ENIAC

Electronic Numerical Integrator and Computer

- the first computer

30
New cards

Integrated Circuit

transistor, resistor, and capacitors on the same pice of semiconductor. Low connectivity between components

31
New cards

Personal/microcomputer

small, low power, only one person can be on at once

32
New cards

Minicomputer

medium, can be used by 10-60 people at once

33
New cards

Mainframe computer

Large, 100+ people can be on at once

34
New cards

Super computer

largest, fastest, most expensive, most capable. Really only used for space

35
New cards

What categories of devices does each computer have?

Storage, input, output, communication, Processing (most important)

36
New cards

Central Processing Unit (CPU)

The processor that carries out instructions of computer program. Performs the basic math, logic, control, input/output operations. Made by INTEL and AMD

37
New cards

2 Types of programs

Application=word processors, game, spreadsheet, graphics, web browsers

System=keeps software and hardware working together. (Operating system, networking, data backup, website server)

38
New cards

How did bioinformatics parallel with computer innovations?

Computers allowed us to sequence proteins, DNA, and RNA

39
New cards

Who was the first to sequence a protein and DNA?

Frederick Sanger. Sequenced insulin

40
New cards

What was the first fully sequenced RNA genome?

That of Bacteriophage M2 (3000 bp). Sequenced by Walter Friers by using restriction enzymes to fragment RNA.

41
New cards

What's the difference between genomics, transcriptomics, proteomics, and metabolomics?

gen = sequences

transcripto = microarray

proteo = protein sequence/structures

Metabolo = metabolites & interacting systems

42
New cards

Epigenome

base pairs of a single gene. More specific that full genome

43
New cards

2 approaches of whole genome sequencing

1) Whole genome shotgun (WGS)

2) Hierarchical genome (HGS)

44
New cards

WGS

simpler, random sequencing, construct library. Clones the random fragments. Includes NGS

45
New cards

NGS (general idea)

Based on fragmenting, cloning, and then assembling. Form of WGS that use more modern techniques (but uses generally same process)

46
New cards

Motif v domain

motif = sequence of a protein

domain = larger units of protein having to do with structure/function

47
New cards

Protein homologs: definition and characteristics

proteins that are extremely similar, derived from common ancestor. Characteristics: >30% identity sequence alignment, common ancestor.

48
New cards

bits

smallest unit. Either by 1 or 0. represents speed

49
New cards

Bites

bigger unit (8 bits). represent storage

50
New cards

In-paralogs

Duplication but no speciation

51
New cards

How can you tell whether a protein has a xenolog or the paralog/ortholog?

Xenologs will have a different function and the GC/AT content would be different between species. Orthologs/Paralogs would function the same

52
New cards

How can you tell it's a local vs global alignment?

analysis will start at 1 for global, vs local will start somewhere else.

53
New cards

What does the E value mean?

shows likely hood that the alignment was by chance. Lower it is the better match it is.

54
New cards

What is PSI-BLAST used for?

To get a deeper search using a customized scoring matrix

55
New cards

Is a high score a good match?

Yes. The higher the better

56
New cards

What are the 2 human genome browsers?

Ensembl and UCSC (more common)

57
New cards

OMIM

Online Mendelian Inheritance in Man. Used for medical genetics, focusing more on single gene traits. Catalogs human genes and genetic disorders

58
New cards

What type of alignment does BLAST do?

Basic Local Alignment Search Tool

59
New cards

What do we use to align protein structures (protein version of BLAST)?

VAST

60
New cards

Maxam-Gilbert sequencing

Denature template and put in primer to create fragments that differentiate in size by one base

61
New cards

Next gen sequencing

clone small fragment and amplify to create a sequencing library in real time. Computer detects bases

62
New cards

Pair-end sequencing

type of next gen sequencing. Similar to shotgun, but sequences both ends of fragments for more accurate assembly.

63
New cards

Bridge amplification sequencing

type of next gen sequencing that anchors DNA fragments which will bend and amplify. Results in clusters of clones

64
New cards

Pyro sequencing

part of next gen sequencing. Detects the pyrophosphate release after each nucleotide binds. More expensive and tells the exact nucleotide that bonded

65
New cards

Ion torrent sequencing

type of next gen sequencing measures H+ levels to sense DNA synthesis

66
New cards

What platforms do next gen sequencing?

454 pyrosequencing, Illumina/Solexa (most used), Helicos, Pacbio, Ion Torrent

67
New cards

RNA sequencing process

1) fragment mRNA into exons

2) reverse transcribe to make cDNA

3) map cDNA sequence

3) use cDNA and DNA polymerase to make second strand

68
New cards

What's different between the DNA created from RNA sequencing and regular DNA?

the RNA sequencing DNA has no introns.

69
New cards

How do you sequence a protein?

use mass spectrometer. Protease cuts protein into peptide fragments (easier to sequence). Each peptide is ionized for analysis

70
New cards

Dayhoff's model (PAM)

Point Accepted Mutations matrix. Global. Derived from protein database. Lower # = less conserved. Threshold: >85% identity

71
New cards

What's the twilight zone for the matricies?

<20% homology. Can't detect specific changes/probability

72
New cards

Microsatellite vs mini satellite

repeated sequences that are 1-10 (micro) and 10-60 (mini) in length

73
New cards

GC content of Humans, E. coli, and Rhodobacter

40%, 50%, 68%

74
New cards

Supervised computing

uses labeled inputs and outputs to train models and predict outcomes. Humans put the labels

75
New cards

Gene

a union of genomic sequences encoding a coherent set of potentially overlapping functional products

76
New cards

What does a cluster size mean in UniGene?

Refers to amount of transcription/expression the gene has. (e.g. 1 = only expressed once)

77
New cards

BLASTP

protein against protein database

78
New cards

BLASTN

Nucleotide against the nucleotide database

79
New cards

BLASTx

Translated nucleotide against protein database

80
New cards

tBLASTn

Protein against a translated nucleotide database

81
New cards

tBLASTx

translated nucleotide against translated nucleotide