1/27
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Query
A sequence read presented to an aligner; typically, a subsequence of an entire read (subread), which is basecalls from a single pass of the insert DNA molecule
Telomere
Region of repetitive DNA sequences and proteins at the end of a chromosome that acts like a protective cap, preventing the chromosome from fraying, fraying with others, or being mistakenly recognized as damaged DNA
Binary alignment and map = BAM file
Binary format, not human-readable.
More efficient: smaller size, faster computation, reduced storage costs.
Almost all tools expect BAM input.
Must be sorted (by read ID or genomic coordinates) and indexed (BAI file) before use.
Sequence alignment and map = SAM files
Contains alignment info of mapped (against the reference genome) and unmapped sequences.
Text-based, human-readable.
BAM Index = BAI
Companion file, much smaller.
Acts like a “table of contents” for BAM.
Must be regenerated after sorting.
Required by most analysis software.
Contig
= a set of overlapping DNA segments that together represent a consensus region of DNA
How are you going to investigate the methylation status?
IGV: Integrative Genomics Viewer
What is the average telomere length in humans?
Highly variable
Primarily depends on age and type of cell tissue
Blood cells:
At birth (newborns): 5-15 kb
Young adults (24): 12 kb
Older adults (72): 7.2 kb
Shorten with (chronological) age
shorter telomeres are associated with advanced age and an increased risk op age-related diseases
Cell-type variation
Rate of attrition
around 23 bp per year in cross-sectional studies
about 38 bp per year in longitudinal studies
G-rich overhang
= single-stranded (G-rich), 3' end of the telomere (TTAGGG repeats) left after DNA replication (typically 20-500 nucleotides) → necessary substrate for telomerase and the binding site for the protective protein POT1 (protection of telomeres 1) (shelterin) → binding helps prevent the overhang from activating the DNA damage checkpoint
T-loop (telomere loop)
= large, protective lariat structure formed when the G-rich overhang folds back and invades the double-stranded region, effectively sealing the end of the chromosome to prevent it from being mistaken for a DNA break (inaccessible to nucleases) (= end protection)
TRF2 (Telomere Repeat Factor 2)(Shelterin complex), promotes and stabilizes the formation of this loop.
D-Loop (Displacement-Loop)
A three-stranded DNA junction formed where the G-rich overhang invades the double helix, displacing the original C-rich strand and stabilizing the crucial T-loop structure.
Subtelomere
Segments of DNA in transition zone between the highly repetitive, protective telomere and the chromosome’s unique, gene-rich sequences (euchromatin)
Directly adjacent to telomere repeats
Mosaic patchwork of DNA sequences: multiple blocks of highly homologous (similar) DNA repeats and large, evolutionarily recent segmental duplications
Highly variable in size and sequence between different chromosome ends and even between the two copies (alleles) of the same chromosome in an individual => most structurally unstable and dynamic regions of the entire human genome
Genome stability (buffer zone: TPE), gene regulation, and evolutionary adaptation (hotspot for recombination)
Can you explain the principle of mapping? Why do you need subtelomeres?
Mapping = determining the relative locations of DNA sequences along a chromosome → Where are shorter telomeres, for example?
Subtelomeres = contain unique, chromosome-specific blocks of DNA that are distinct for each arm
Long-range haplotypes = high level of variation between the two alleles of subtelomeres
Why are you still determining the sequence of the telomere if you know it is the same repeat and you can just count the number of repeats?
Determining telomeric variants, which are deviations from the canonical human telomere sequence → leads to problems (instability, potential cell death)
Disease risk determination: maybe genetically predetermined short or long telomeres (telomeropathies, cancers…)
Biomarkers: biological age, predisposition to age-related diseases
Link telomere length to methylation adjacent subtelomere?
Determine the length of the telomere (a highly variable trait) and link it to the specific genetic variations (the haplotype) found in the adjacent subtelomere of the maternal or paternal chromosome. This is crucial for understanding the cis-acting elements (DNA sequences on the same chromosome) that regulate telomere length.
Terminal Restriction Fragment Analysis
DNA isolation
DNA digestion (enzymes that cut outside of telomeric repeats)
Southern blot
Hybridization of telomeric probe
Telomere length analysis
Telomere maintenance mechanisms (TMM)
Cancers: enabling replicative immortality
Mainly: Activating reverse transcriptase telomerase (adding repeats) (stemcells also use this)
Cellular recombination machinery: alternative lengthening of telomeres (ALT)
Why is/was it challenging to sequence/map/assemble entire human telomeres by Sanger sequencing/NGS?
repetitive
0.015% of total human genome
length: short - several 10 kb
92 chromosome ends in diploid human cell
Where do the donor-derived fibroblasts that you have used come from? What is the influence of Alzheimer’s?
Collected at University of California, San Diego (UCSD)
Part of Salk AHA-Allen aging cohort.
Alzheimer’s Disease Research Center participants at UCSD
Multiple studies and meta-analyses confirm shorter telomere length (LTL) in AD patients, particularly in leukocytes (white blood cells). → But that is not the study we are interested in at the moment, but we have to take it into account.
Why also interested in chromosome-specific telomere length?
suggestions of chromosome-arm specific factors that influence telomere length
some telomere arms are sign. longer: telomere 18q is the longest in 9 individuals
some telomere arms are sign. shorter
Why do we want to visualize chromosome- and allele-specifically?
For decades, telomere research relied on measuring Average Telomere Length (ATL) across all 92 chromosome ends in a person's cells.
Limitation: This is like measuring the average shoe size in a city; the average is uninformative if the city's problems are caused by one person wearing critically small shoes. Similarly, cell senescence (aging) is not triggered by the average length, but by the shortest telomere (the "first critical telomere") becoming dysfunctional.
Need for Specificity: To understand aging and disease, we need to know: Which chromosome end is the shortest, and why does it shorten faster than the others?
The Solution: Allele-specific mapping connects the physical telomere length to the unique genetic fingerprint of the immediately adjacent subtelomere. The highly variable, measured length of the terminal repeats on a single chromosome end (e.g., 5 kb on the paternal copy of chromosome 12p).
Subtelomere Haplotype (The Regulator): The unique combination of structural variants (segmental duplications, non-coding transcripts like TERRA) in the adjacent subtelomere of the same physical chromosome (the allele). | The determining factor that carries the regulatory signals. | | Cis-acting Element | Any DNA sequence (e.g., a specific gene variant or non-coding RNA promoter) located on the same DNA molecule (in cis) that regulates the telomere length of that molecule. | The molecular switch that is physically linked to, and governs, the length of its own telomere. | ### 3. Cis-acting Elements Regulate Telomere Length The most profound insight from this mapping is the identification of cis-acting elements in the subtelomere that dictate how long a telomere is maintained, independently of the overall cellular environment. * TERRA Regulation: The subtelomere contains promoters (start sites) for TERRA (Telomeric Repeat-containing RNA), a long non-coding RNA essential for recruiting telomerase. A specific subtelomere haplotype (e.g., one with a particular variation in the promoter region) might cause higher or lower TERRA production. * Effect: A lower-producing TERRA haplotype would result in less telomerase being recruited to that specific chromosome end, causing that telomere to shorten more quickly than its partner allele. * Chromatin State: Subtelomeric variations influence the spreading of heterochromatin (compacted, silent DNA) from the subtelomere into the telomere. This chromatin state regulates telomerase access. * Effect: A haplotype that promotes a more open chromatin state might increase telomerase access, leading to a longer telomere on that specific allele. ### 4. Importance for Diagnostics and Research This detailed, allele-specific mapping is critical because it moves telomere biology from a general measure of aging to a precise genetic tool: 1. Pinpointing Risk: It allows researchers to pinpoint the specific chromosome end (e.g., the maternal copy of 17p) that is genetically predisposed to being the shortest—the one most likely to trigger disease (like bone marrow failure). 2. Therapeutic Targets: By identifying the exact cis-acting sequence in the subtelomere responsible for a telomere's unique length, scientists can develop targeted therapies (e.g., gene therapy to restore TERRA production) for specific chromosome ends, rather than broadly targeting the entire cell's telomerase. In short, Map Length to Haplotype allows us to read the unique genetic instructions embedded in each subtelomere that determine the specific length and stability of its adjacent telomere cap. Would you like to know more about the relationship between telomere shortening and a specific disease, such as Idiopathic Pulmonary Fibrosis?

Telomere Position Effect (TPE)
phenomenon where the structure of the telomere influences the transcriptional activity (gene expression) of genes located in the adjacent subtelomeric region of the chromosome. This influence typically results in the silencing or reduced expression of those nearby genes.
Telomere Position Effect-Over Long Distances (TPE-OLD)
extension of TPE, describing the observation that telomeres can influence the expression of genes located much further away along the chromosome, potentially impacting genes that are megabases away from the telomere itself.
Basecalling
Computational process: converting raw, analog or digital signal data produced by a sequencing instrument (like changes in current, light intensity, or voltage) into the corresponding sequence of nucleotide bases (A, C, G, T) that make up a DNA or RNA strand.
Bonito basecalling model
Open-source, deep-learning framework
Oxford Nanopore Technologies (ONT)
Convert the raw electrical signals from their sequencers into A, C, G, T nucleotide sequences.
Convolutional Neural Network (CNN) layers to process the noisy current, followed by a Recurrent Neural Network (RNN) core to interpret the time-series context, and finally employs a Connectionist Temporal Classification (CTC) decoder to output the base sequence, acting primarily as a platform for research and development of new high-accuracy basecalling algorithms.
Why? analog electrical current fluctuation → noisy and continuous + electrical current influenced by 4 to 6 bases simultaneously sitting within the pore (k-mer)
Dorado
current, high-performance, and officially supported production basecaller software developed by Oxford Nanopore Technologies (ONT) for processing the raw electrical signals from their sequencers.
Integrated bioinformatics features such as: Modified Base Calling: Detecting epigenetic markers like 5mC and 6mA directly from the signal. - Duplex Basecalling: Combining signals from both strands of a DNA molecule for the highest possible accuracy. - Read Trimming and Alignment: Post-processing steps like removing adapters and aligning reads using tools like Minimap2.
What is the difference between reference mapping and de novo assembly?