1/44
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
What is bioinformatics?
The use of computational tools to store, analyse, and interpret biological data.
Why is biological data considered discrete while reality is continuous?
Biological systems are continuous in reality, but are represented as discrete units (DNA bases, RNA reads, protein sequences) so computers can process them.
Why is biological data computationally challenging?
High variability, limited building blocks (DNA: A/T/C/G; proteins: 20 amino acids), and 3D biology represented as 2D data adds unpredictability.
List the main branches of bioinformatics.
Applied omics (genomics, proteomics), data analytics and visualisation, machine learning, computational biology, and data integration.
What is the central dogma of molecular biology?
The flow of genetic information from DNA to RNA to protein.
What are the exceptions to the central dogma?
RNA can be reverse-transcribed into DNA, and RNA can be replicated directly in RNA viruses.
What type of data does genomics analyse and for what purpose?
DNA data, used to identify genes and mutations.
What type of data does transcriptomics analyse and for what purpose?
RNA data, used to measure gene expression.
What type of data does proteomics analyse and for what purpose?
Protein data, used to study protein function and interactions.
What does the phenome describe?
Observable traits and biological outcomes.
Give examples of biological data used in bioinformatics.
DNA sequences, RNA expression levels, protein sequences and structures, pathways, and phenotypic data.
List real-world applications of bioinformatics mentioned in the lecture.
Disease gene discovery, cancer genomics, and drug target identification.
In which direction does DNA synthesis occur and why does this matter for sequencing?
DNA synthesis occurs 5′ to 3′, which sequencing methods rely on to build and read DNA strands.
What is the core principle of Sanger (dideoxy) sequencing?
Incorporation of ddNTPs terminates DNA synthesis because they lack an OH group, creating fragments of different lengths.
What components are required for Sanger sequencing?
DNA template, primer, DNA polymerase, dNTPs, and ddNTPs.
What happens when a ddNTP is incorporated during Sanger sequencing?
DNA chain elongation stops, producing a terminated fragment.
How is the DNA sequence read in Sanger sequencing?
Fragments are separated by size and read from smallest to largest (bottom to top).
What technological improvements modernised Sanger sequencing?
Fluorescent dyes, single-reaction sequencing, automated detection, and capillary electrophoresis.
What are the advantages of Sanger sequencing?
Long read length (~1000 bp) and high accuracy.
What are the limitations of Sanger sequencing?
It is slow, expensive, and low throughput.
What was the goal of the Human Genome Project?
To create a reference human genome.
What are the two competing sequencing strategies of the Human Genome Project?
Top-down (hierarchical, public) and whole-genome shotgun (Celera, private).
What is the top-down (hierarchical) sequencing strategy?
A physical genome map is built first using BACs, then each BAC is shotgun sequenced.
What is whole-genome shotgun sequencing?
The entire genome is randomly fragmented, sequenced at once, and assembled computationally.
What are BACs and why are they used?
Bacterial Artificial Chromosomes with large inserts (100–300 kb) used for stable cloning and genome mapping.
How are BACs mapped using restriction enzymes?
Restriction enzymes cut DNA into fragments, producing patterns used to create a restriction map.
What is a Sequence-Tagged Site (STS)?
A short, unique DNA sequence with a known sequence that can be amplified by PCR to map genome locations.
What is the typical size and spacing of STS markers?
200–500 bp in length, approximately every 100 kb.
What is the Golden Path in genome sequencing?
A minimal overlapping set of BACs that efficiently covers the entire genome.
What is a sequencing read?
A short fragment of DNA sequence generated by sequencing.
What is a contig?
A continuous DNA sequence formed by overlapping reads.
What is a scaffold?
An ordered collection of contigs separated by gaps.
What is sequencing coverage?
The average number of times each base in the genome is sequenced.
What is a gap in genome assembly?
A region of the genome with no sequencing reads.
What are the main steps in genome assembly?
Sequence reads, identify overlaps, build contigs, link contigs using paired-end data, and form scaffolds.
How does sequencing coverage affect genome assembly?
Higher coverage increases confidence, while low coverage leads to gaps.
What is paired-end sequencing?
Sequencing both ends of a DNA fragment with a known distance between them.
Why is paired-end sequencing useful for assembly?
It helps link contigs and span gaps to form scaffolds.
How does top-down sequencing compare to shotgun sequencing in physical mapping?
Top-down uses a physical map; shotgun does not.
How does top-down sequencing compare to shotgun sequencing in assembly difficulty?
Top-down is easier to assemble; shotgun is more difficult.
How does top-down sequencing compare to shotgun sequencing in speed?
Top-down is slower; shotgun is faster.
How do repetitive DNA regions affect shotgun sequencing?
They make assembly more difficult because repeats are hard to place correctly.
What is the basic idea behind De Bruijn graph assembly?
Reads are broken into k-mers which are used to construct paths for genome assembly.
What is the trade-off when choosing k-mer size?
Shorter k-mers increase overlap, but if too short they create ambiguity.