Bioinformatics

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/13

flashcard set

Earn XP

Description and Tags

yay

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

14 Terms

1
New cards
What is meant by contamination in sequence databases?
A contaminated sequence is one that contains one or more segments of foreign origin and does not faithfully represent the genetic information from the source organism or organelle.
2
New cards
What are the consequences of sequence contamination in databases?
Time and resources are wasted on invalid analyses, misinterpretations of biological significance occur, sequence assemblies and clusters may be erroneous, database submission is delayed, and public databases risk being polluted with inaccurate data.
3
New cards
What are the sources of contamination and how would you remove them?
Sources include vectors, adapters, linkers, primers, transposons, and impurities. Removal involves using VecScreen and BLAST against UniVec and contaminant databases, reviewing cloning history, trimming poor-quality regions, and validating cleaned sequences before submission.
4
New cards
What is BLAST commonly used for?
BLAST is a tool used to compare a query sequence against a database to find regions of local similarity, helping infer functional, evolutionary, and structural relationships between sequences.
5
New cards
What statistical parameters are used by BLAST to indicate significance?
E-value (expect value), bit score, raw score, percent identity, alignment length, and coverage are used, with E-value and bit score being the most critical for assessing match significance.
6
New cards
What are the two major methods used to investigate sequence similarity and their major difference?
Global alignment aligns sequences end-to-end, ideal for sequences of similar length, while local alignment finds the best matching sub-regions within sequences and is suitable for divergent sequences. The major difference is global covers entire sequences, local focuses on high-similarity regions.
7
New cards
How are alignment scores computed?
Scores are calculated by assigning values for matches, mismatches, and gaps based on a scoring matrix. The total score is the sum of these values across the alignment.
8
New cards
Why would you expect a better alignment using protein sequences rather than DNA?
Protein sequences evolve more slowly, have 20 amino acids reducing chance matches, better capture functional and evolutionary conservation, and avoid issues from silent nucleotide mutations.
9
New cards
What does it mean when two amino acids are 'similar' in protein alignments and how are these represented?
Similar amino acids share physicochemical properties like size or charge and are scored positively in substitution matrices like BLOSUM or PAM, shown in alignments through conservative substitution scores.
10
New cards
Why would you prefer one substitution matrix over another in protein alignments?
Different matrices are tailored for sequences with varying divergence. BLOSUM62 suits moderately similar sequences, while PAM250 is better for distantly related ones. Choice affects sensitivity and biological accuracy.
11
New cards
Explain what a dotplot is and how it is constructed.
A dotplot is a graphical comparison of two sequences where one is plotted on the X-axis and the other on the Y-axis. A dot is placed wherever residues match. Windows and thresholds can refine noise and display conserved regions.
12
New cards
What type of information can be gained from a dotplot?
Dotplots reveal exact matches, repeats, insertions, deletions, inversions, and conserved regions. Diagonal lines show similarity, gaps indicate indels, and parallel or inverted lines suggest repeats or palindromes.
13
New cards
What is a sequence alignment?
A sequence alignment arranges two or more sequences to identify regions of similarity, revealing potential functional, structural, or evolutionary relationships.
14
New cards
Why are gaps often needed in a sequence alignment?
Gaps account for insertions or deletions in evolutionary history, maintain biologically meaningful alignments, maximize alignment scores, model evolutionary events, and improve functional and structural interpretations.