1/80
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai | Chat |
|---|
No analytics yet
Send a link to your students to track their progress
What is RAM?
Random access memory which is connected to central processor. The storage. Random = can be accessed in any order
PERL
Practical Extraction and Report Language (type of program). Developed by Larry Wall, 1987. Processes text (runs on Windows, Mac). Develop own code to write program
What are the 3 major databases?
EMBL - housed at EBI
GenBank - housed at NCBI
DDBJ - housed in Japan
What was the first fully sequenced DNA genome?
That of Bacteriophage Phix174 (5000 bp) and Bacteriophage MS2
Who's genome was the first human genome to be fully sequenced?
James T. Wattson
HGS
more organized, longer. Requires pre-existing map (library) to organize large clone contigs. Still uses Shotgun to clone and then sequences separately.
Sanger sequencing
Old method (not used anymore). Chemically label fragments by dyeing primer
Examples of NGS
Pyrosequencing, Pair-end, Bridge amplification, Ion Torrent
Examples of platforms that do NGS
454 Pyro-, Illumina/Solexa, Helicos, Pacbio
What does cloning require?
Origin, marker, and polilinker. Restriction enzyme cuts at marker to make fragments of specific sequences.
Sequential vs Binary search
Sequential=checks each genomic sequence one by one
Binary=splits datasets in half to search them. Much faster for bigger sequences
Paralog vs Ortholog homology
Paralog = genetic diversity after duplication event
Ortholog = same gene in different species derived from common ancestor
Why are paralogs often more diverse than other homologous sequences?
Because they come from gene duplications which allow them to vary and diverge more from the original copy
Out-paralogs
duplication and then separate speciation events
Xeno-paralogs
homologous genes that arose from horizontal gene transfer
Is there a difference between the E value and p value from stats?
Only that the E value is rounded up. They represent the same thing (significance)
Accession numbers and corresponding item.
NG_ = gene
NM_ = nucleotide (DNA or RNA)
NP_ = protein
NC_ = complete genome or chromosome
NT_ = genomic contig
Different types of sequencing on NCBI?
BLAST and OMIM
Types of BLAST searches
n = DNA -> DNA
p = Protein -> Protein
x = DNA -> protein
t-n = Protein -> DNA
t-x = translated DNA -> translated DNA
Different genome sequencing approaches
1) Sanger's method (hardly used anymore)
2) Maxam-Gilbert
3) Next Gen
4) Pair-end
5) Bridge amplification
6)Pyro
7) Ion torrent
Why do RNA sequencing?
Better at mapping variants
BLOSUM
Blocks Substitution Matrix. Local alignment. Derived from the observed alignment. Higher # = more conserved. Threshold: L% identity
How many base pairs does the human genome have?
2.8-3.4 billion
Are coding genes located in GC or AT rich areas?
GC rich
How much of the human genome is repeated sequences?
50% (all noncoding)
Unsupervised computing
reads and uses the raw data, without labels. No human intervention
What is the most conserved protein?
Tryptophan (W). Only protein that has 1 codon too.
STUDY TEXTBOOK QUESTIONS
ENIAC
Electronic Numerical Integrator and Computer
- the first computer
Integrated Circuit
transistor, resistor, and capacitors on the same pice of semiconductor. Low connectivity between components
Personal/microcomputer
small, low power, only one person can be on at once
Minicomputer
medium, can be used by 10-60 people at once
Mainframe computer
Large, 100+ people can be on at once
Super computer
largest, fastest, most expensive, most capable. Really only used for space
What categories of devices does each computer have?
Storage, input, output, communication, Processing (most important)
Central Processing Unit (CPU)
The processor that carries out instructions of computer program. Performs the basic math, logic, control, input/output operations. Made by INTEL and AMD
2 Types of programs
Application=word processors, game, spreadsheet, graphics, web browsers
System=keeps software and hardware working together. (Operating system, networking, data backup, website server)
How did bioinformatics parallel with computer innovations?
Computers allowed us to sequence proteins, DNA, and RNA
Who was the first to sequence a protein and DNA?
Frederick Sanger. Sequenced insulin
What was the first fully sequenced RNA genome?
That of Bacteriophage M2 (3000 bp). Sequenced by Walter Friers by using restriction enzymes to fragment RNA.
What's the difference between genomics, transcriptomics, proteomics, and metabolomics?
gen = sequences
transcripto = microarray
proteo = protein sequence/structures
Metabolo = metabolites & interacting systems
Epigenome
base pairs of a single gene. More specific that full genome
2 approaches of whole genome sequencing
1) Whole genome shotgun (WGS)
2) Hierarchical genome (HGS)
WGS
simpler, random sequencing, construct library. Clones the random fragments. Includes NGS
NGS (general idea)
Based on fragmenting, cloning, and then assembling. Form of WGS that use more modern techniques (but uses generally same process)
Motif v domain
motif = sequence of a protein
domain = larger units of protein having to do with structure/function
Protein homologs: definition and characteristics
proteins that are extremely similar, derived from common ancestor. Characteristics: >30% identity sequence alignment, common ancestor.
bits
smallest unit. Either by 1 or 0. represents speed
Bites
bigger unit (8 bits). represent storage
In-paralogs
Duplication but no speciation
How can you tell whether a protein has a xenolog or the paralog/ortholog?
Xenologs will have a different function and the GC/AT content would be different between species. Orthologs/Paralogs would function the same
How can you tell it's a local vs global alignment?
analysis will start at 1 for global, vs local will start somewhere else.
What does the E value mean?
shows likely hood that the alignment was by chance. Lower it is the better match it is.
What is PSI-BLAST used for?
To get a deeper search using a customized scoring matrix
Is a high score a good match?
Yes. The higher the better
What are the 2 human genome browsers?
Ensembl and UCSC (more common)
OMIM
Online Mendelian Inheritance in Man. Used for medical genetics, focusing more on single gene traits. Catalogs human genes and genetic disorders
What type of alignment does BLAST do?
Basic Local Alignment Search Tool
What do we use to align protein structures (protein version of BLAST)?
VAST
Maxam-Gilbert sequencing
Denature template and put in primer to create fragments that differentiate in size by one base
Next gen sequencing
clone small fragment and amplify to create a sequencing library in real time. Computer detects bases
Pair-end sequencing
type of next gen sequencing. Similar to shotgun, but sequences both ends of fragments for more accurate assembly.
Bridge amplification sequencing
type of next gen sequencing that anchors DNA fragments which will bend and amplify. Results in clusters of clones
Pyro sequencing
part of next gen sequencing. Detects the pyrophosphate release after each nucleotide binds. More expensive and tells the exact nucleotide that bonded
Ion torrent sequencing
type of next gen sequencing measures H+ levels to sense DNA synthesis
What platforms do next gen sequencing?
454 pyrosequencing, Illumina/Solexa (most used), Helicos, Pacbio, Ion Torrent
RNA sequencing process
1) fragment mRNA into exons
2) reverse transcribe to make cDNA
3) map cDNA sequence
3) use cDNA and DNA polymerase to make second strand
What's different between the DNA created from RNA sequencing and regular DNA?
the RNA sequencing DNA has no introns.
How do you sequence a protein?
use mass spectrometer. Protease cuts protein into peptide fragments (easier to sequence). Each peptide is ionized for analysis
Dayhoff's model (PAM)
Point Accepted Mutations matrix. Global. Derived from protein database. Lower # = less conserved. Threshold: >85% identity
What's the twilight zone for the matricies?
<20% homology. Can't detect specific changes/probability
Microsatellite vs mini satellite
repeated sequences that are 1-10 (micro) and 10-60 (mini) in length
GC content of Humans, E. coli, and Rhodobacter
40%, 50%, 68%
Supervised computing
uses labeled inputs and outputs to train models and predict outcomes. Humans put the labels
Gene
a union of genomic sequences encoding a coherent set of potentially overlapping functional products
What does a cluster size mean in UniGene?
Refers to amount of transcription/expression the gene has. (e.g. 1 = only expressed once)
BLASTP
protein against protein database
BLASTN
Nucleotide against the nucleotide database
BLASTx
Translated nucleotide against protein database
tBLASTn
Protein against a translated nucleotide database
tBLASTx
translated nucleotide against translated nucleotide