bioinformatics quiz 1

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/17

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

18 Terms

1
New cards

origins of bioinformatics

earliest foundations (1950-1970) focused primarily on protein sequence analysis

2
New cards

comprotein

first known bioinformatics software (early 1960s)

developed by Margaret Dayhoff

designed to assemble whole protein sequences (de novo) from small Edman peptide fragments

3
New cards

paradigm shift (1970-1980)

when bioinformatics began shifting its focus from protein analysis to DNA analysis after sanger sequencing was invented

4
New cards

needleman-wunsch (1970)

developed the first dynamic programming algorithm for performing pairwise protein sequence alignments

5
New cards

homology: orthology

homology resulting from a speciation event

defined by walter m. fitch (1970)

6
New cards

dayhoff/pam matrix

developed the first probabilistic model of amino acid substitutions (point accepted mutations) in 1978, using probability to measure evolutionary change

7
New cards

de novo sequencing

the determination of a full-genome sequence without using a known template or reference sequence

8
New cards

massively parallel

multiple processors working simultaneously

9
New cards

multiplexing

combining multiple inputs/samples into a single sequence run

10
New cards

overfitting

when a model built on training data shows high accuracy but significantly decreased accuracy when applied to separate validation data, indicating the model is too specific to the initial dataset features

11
New cards

sanger (dideoxy)

long reads (~600-1000 bp)

low throughput, typically single samples

quality loss at the beginning and end

based on chain-terminating dideoxynucleotides (ddNTPs)

12
New cards

illumina (MiSeq)

short reads (100-300 bp)

high/massively parallel throughput

high accuracy

bridge amplification/sequencing by synthesis where fragments attached to a flow cell are amplified into clusters

13
New cards

oxford nanopore (minION)

ultra long reads

high throughput, portable

moderate error rate

DNA passes through a nanopore, changes in electrical current are decoded into the DNA sequence (basecalling)

14
New cards

fastq file

file format that incorporates both the nucleotide sequence and associated quality scores

15
New cards

phred score (q)

measure of sequence quality determination

Q20 = probability of less than 1% error per base, meaning 99% accuracy

Q30 = 99.9% accuracy

16
New cards

coverage

the average number of reads that align to, or “cover,” known reference bases

50x genome coverage is recommendeds

17
New cards

single-end reads

sequence in one direction of the fragment

18
New cards

paired-end reads

report sequences from both directions of a DNA fragment