bioinformatics quiz 1

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/17

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

18 Terms

New cards

origins of bioinformatics

earliest foundations (1950-1970) focused primarily on protein sequence analysis

New cards

comprotein

first known bioinformatics software (early 1960s)

developed by Margaret Dayhoff

designed to assemble whole protein sequences (de novo) from small Edman peptide fragments

New cards

paradigm shift (1970-1980)

when bioinformatics began shifting its focus from protein analysis to DNA analysis after sanger sequencing was invented

New cards

needleman-wunsch (1970)

developed the first dynamic programming algorithm for performing pairwise protein sequence alignments

New cards

homology: orthology

homology resulting from a speciation event

defined by walter m. fitch (1970)

New cards

dayhoff/pam matrix

developed the first probabilistic model of amino acid substitutions (point accepted mutations) in 1978, using probability to measure evolutionary change

New cards

de novo sequencing

the determination of a full-genome sequence without using a known template or reference sequence

New cards

massively parallel

multiple processors working simultaneously

New cards

multiplexing

combining multiple inputs/samples into a single sequence run

New cards

overfitting

when a model built on training data shows high accuracy but significantly decreased accuracy when applied to separate validation data, indicating the model is too specific to the initial dataset features

New cards

sanger (dideoxy)

long reads (~600-1000 bp)

low throughput, typically single samples

quality loss at the beginning and end

based on chain-terminating dideoxynucleotides (ddNTPs)

New cards

illumina (MiSeq)

short reads (100-300 bp)

high/massively parallel throughput

high accuracy

bridge amplification/sequencing by synthesis where fragments attached to a flow cell are amplified into clusters

New cards

oxford nanopore (minION)

ultra long reads

high throughput, portable

moderate error rate

DNA passes through a nanopore, changes in electrical current are decoded into the DNA sequence (basecalling)

New cards

fastq file

file format that incorporates both the nucleotide sequence and associated quality scores

New cards

phred score (q)

measure of sequence quality determination

Q20 = probability of less than 1% error per base, meaning 99% accuracy

Q30 = 99.9% accuracy

New cards

coverage

the average number of reads that align to, or “cover,” known reference bases

50x genome coverage is recommendeds

New cards

single-end reads

sequence in one direction of the fragment

New cards

paired-end reads

report sequences from both directions of a DNA fragment