Föreläsning 6: Gene annotation, gene expression and function

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/51

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

52 Terms

1
New cards

What is gene annotation

The process of identifying and describing regions of biological interest within a genome – BOTH functionally and structurally

2
New cards

What are three main steps in gene annotation

  1. Identifying noncoding regions

  2. Identifying coding regions (=gene prediction)

  3. Attaching biological information of these elements

3
New cards

What are the main approaches for identifying genes within the genome

Intrinsic methods: based on DNA sequence alone

  • Open reading frame (ORF)

  • Gene codon bias

  • Splicing sites

Extrinsic methods: comparing to known data

  • Gene homology and related genomes

  • Comparison to RNA expression

4
New cards

Describe the Open reading frame (ORF)

  • ORF is a segment of DNA starting with a start codon (usually ATG) and ending with a stop codon (TAA, TAG, TGA)

    • 6 reading frames because DNA is double stranded and each strand can be read in 3 different ways depending on which base position of the codon you start at.

    • search in all 6 reading frames 5’→3’

  • Identifying ORFs help finding parts of DNA coding for a protein → reveals the functional part of DNA

  • To find genes, we look for long stretches (=segments) without stop codons:

    • Random DNA has stop codons every ~64 bases. A longer stretch than 64 bases likely means a gene.

  • More difficult for eukaryotes: because genes are often split between introns and exons (can be short and scattered → more difficult)

5
New cards

Describe how exons and introns are identified using gene codon bias

  • Codon bias is the preferred codon for an aa in an organism (different between organisms)

  • Codon bias only in the exons → Helps identifying exons

  • ORF is effective for prokaryotes but not for eukaryotes → codon bias helps identifying protein coding segments in DNA

6
New cards

Describe how exons and introns are identified using the splicing sites

  • Exon-intron boundaries are not anywhere – they are marked by specific
    sequences.

  • By comparing many exon-intron boundaries consensus sequences for the boundaries have been identified

  • Consensus sequences = typisk sekvens som ofta finns på samma plats i många gener. Useful markers to find the edges of genes, to locate where exons are in the genome.

  • Example, consensus sequence for vertebrae

    • Py = T or C

    • N = any nucleotide

7
New cards

What are the specific elements that help to identify genes

  • Start/stop codon

  • Poly-A signals and terminators (important for ending transcription)

  • Promotor:

    • CpG islands

    • Binding sites for regulatory proteins

8
New cards

What are CpG islands

  • About 1 kbp region high in CG (C in CG can be methylated)

  • Found upstream of many genes

  • In humans, 70% of proximal promoters contain a CpG island

  • Not all gene promoters contain a CpG island but if a promoter has it, most often a gene starts after

9
New cards

How do you identify binding sites through ChIP-seq

  • ChIP-seq = Chromatin Immunoprecipitation sequencing

  • This method helps discover regulatory proteins that bind to DNA and control gene expression especially when there's no clear sequence pattern.

Steps:

  1. Chromatin = Crosslink proteins to DNA (so they stay stick together)

  2. Break DNA into fragments (sonication)

  3. Use antibodies to pull out only the DNA fragments bound to the protein of interest.

  4. Break the crosslinks, discard proteins.

  5. Sequence the DNA → You now know where that protein was bound!

10
New cards

How do you locate genes for noncoding RNA

  • Not all genes encode proteins. Some genes produce RNA molecules that function directly as RNAs — they are not translated into proteins, but still play essential roles in the cell.

  • Noncoding RNA includes:

    • tRNA

    • rRNA

    • Other short and long RNA molecules participating in e.g:

      • Alternate splicing

      • Posttranscriptional gene regulation

      • Chromatin remodelling

      • Protein interactions

  • Noncoding RNA not typically conserved by sequence but by structure

  • There are programs to test length of stem, size of loop, stability etc.

11
New cards

Describe homology search (extrinsic method)

  • Homology = shared ancestry. If two sequences are homologous, they likely came from the same gene in an ancestor and might have similar functions

  • BLAST: compare sequence to known sequences in a database. It tells you:

    • is this gene similar to anything known

    • does i exist in other organisms

12
New cards

Why are protein comparisons better than DNA in homology search

The genetic code is redundant: multiple DNA codons can code for the same amino acid. So, two DNA sequences can look different but still produce the same protein. That’s why comparing protein sequences often gives more meaningful results for function and annotation.

13
New cards

Describe why using related genomes can help with gene annotation (extrinsic method)

If you're not sure whether a DNA sequence is a real gene:

  • Check if the same or similar gene exists in a closely related species.

  • If yes, it adds confidence that this gene is real and functional, not just a random ORF (open reading frame).

Why it's important:

  • Prevents false positives in gene annotation: sometimes short DNA sequences look like genes but aren't real (called "spurious ORFs").

  • Seeing conservation across species (especially functionally important genes) helps validate annotations.

14
New cards

How can transcriptome comparison help with gene annotation (extrinsic method)

  • Jämför allt RNA (alla transkript) från olika celler för att se vilka gener som är aktiva i olika celltyper.

  • If a region of the genome is being transcribed into RNA, it's likely a gene (or regulatory RNA).

  • Mapping the RNA back onto the genome helps confirm gene locations.

How it's used:

  • In genome annotation pipelines, these data help validate predictions made from ORF finding and homology.

You get a functional confirmation that the gene is actually expressed.

15
New cards

Why should annotation exclude pseudogenes

  • Pseudogenes = broken versions of real genes

  • They may look like genes but can’t make functional proteins.

16
New cards

How can annotation tools detect and filter out pseudogenes (not labelling them as real, functional genes)

Common problems in pseudogenes:

  1. Missing promoter: no way to start transcription.

  2. Missing start codon: can’t begin translation.

  3. Frameshifts: small insertion/deletion that throws off the reading frame.

  4. Early stop codons: translation ends too early = nonfunctional protein.

  5. Missing introns

  6. Partial deletion

17
New cards

How do we know if we have done a good annotation

  • Use BUSCO (= Benchmarking Universal Single-Copy Orthologs)

  • BUSCO på assembly → kollar om DNA-sekvensen innehåller alla viktiga gener = mäter genomets kvalitet.

  • BUSCO på annotation → kollar om de annoterade generna är rätt och kompletta = mäter annoteringens kvalitet.

  • For example: all mammals should have the same core set of metabolic genes

18
New cards

What is functional annotation

  • = determine what the protein the gene codes for does (once after annotation)

  • Homology can give important clues to protein function

  • Often includes:

    • Domain/motif searches

    • Orthology searches

    • Homology searches

19
New cards

What is the domain/motif search in functional annotation

  • Search in specific parts of a protein → indicates what the protein can do

  • DNA-binding motif: litet återkommande funktionellt igenkänningsmönster ex. TATA-box

  • Catalytic domains: större självständig funktionell enhet ex. DNA-binding domain

  • Databases like Pfam, InterPro, and SignalP help identify these features.

20
New cards

What is the orthology search in functional annotation

These look for equivalent genes (orthologs) in other organisms. Since orthologs often maintain the same function through evolution, this can offer strong functional clues.

21
New cards

What is the homology search in functional annotation

 Tools like BLAST compare sequences against large databases to find similar proteins. Functional predictions can then be made based on known roles of matching proteins.

22
New cards

What can the information from a gene sequence help determining about protein functions

  • Functions based on homology:

  • Secondary structure determination: Determines whether regions of a protein are likely to form alpha helices, beta sheets, or coils/loops. These structures influence how the protein folds and interacts.

  • Transmembrane domain prediction: Identifies hydrophobic regions that may embed into cell membranes. These regions are typical in membrane-bound receptors or transporters. Helps determine if the protein is cytosolic or membrane bound.

  • Signal peptides: Short sequences at the start of proteins that direct them to specific locations in the cell (e.g. mitochondria, endoplasmic reticulum).

  • 3D structure modeling: Gives a complete shape of the protein, which is critical for understanding how it interacts with other molecules.

23
New cards

What is forward genetics

This approach starts with an observable trait (phenotype), such as a visible defect or disease. Researchers then try to find which gene is responsible. This method is useful when the phenotype is known, but the genetic basis is not.

What gene causes the phenotype?

24
New cards

What is reverse genetics

In this approach, scientists start with a specific gene of interest and then alter it = ändrar det (e.g. by knocking it out, mutating it, or overexpressing it) to see what effect that has on the organism. The goal is to deduce = härleda the gene’s function by observing what happens when it is changed.

Reverse genetics often studied using:

  • mutagenesis of the protein

  • gene knock-out

  • down regulation

  • overexpression

in vivo

What is the gene’s function?

25
New cards

How do you detect a transcript through Northern blot

  • A method to detect RNA in a sample

  • Same technique as Southern blot but for RNA

How it works:

  1. RNA is separated by gel electrophoresis.

  2. Transferred from the gel to a membrane.

  3. A labeled probe on the membrane binds to a specific RNA sequence.

  4. The probe shows where (and if) the RNA is present.

  • Determine where/when a gene is expressed.

  • Measure transcript length.

  • Detect splice variants.

26
New cards

What is Northen blot used for

  • Determine where/when a gene is expressed.

  • Measure transcript length.

  • Detect splice variants.

27
New cards

When do you analyze a transcript

  • Don’t have a complete gene sequence (or don’t trust it)

  • Need to know up/downstream UTR (untranslated) regions

  • May need to verify the sequence of your favorite gene before trying to
    investigate it.

28
New cards

Why do you need to detect start/end of a transcript

Helps identify regulatory regions and transcription boundaries

29
New cards

What methods are used to detect start/end of a transcript

  • S1 nuclease mapping

  • Primer extension

  • RACE-PCR

30
New cards

What is S1 nuclease mapping

What it does:

  • Detects the 5' and 3' ends of RNA.

  • Also identifies exons and potential splice variants (= olika kombinationer av exoner i en gen).

How it works:

  1. RNA hybridizes with DNA → forms a DNA-mRNA heteroduplex (hybrid).

  2. Introns create loops that don’t hybridize = basparar med.

  3. S1 nuclease cuts unpaired single-stranded DNA (loops).

  4. RNA is degraded by alkali.

  5. The ssDNA-fragments that where RNA-protected remain → can be analyzed by sequencing or electrophoresis.

31
New cards

What is RACE-PCR

What it does:

  • Amplifies the start or end of an RNA sequence.

  • Helps find exact 5' or 3' ends of mRNAs.

Steps:

  1. Reverse transcriptase + primer → reversed transcription of RNA to cDNA.

  2. RNA is denatured.

  3. Poly-A tail is added to 3´end with terminal transferase.

  4. Second primer binds to A-tail.

  5. Second strand synthesis with Taq-polymerase

  6. PCR amplifies the cDNA sequence.

  7. The amplified cDNA fragments are sequenced to determine start or end of the RNA.

32
New cards

What does identification of regulation of gene expression show

Finding where regulatory proteins (like transcription factors) bind to DNA.

33
New cards

What methods are used to identify regulation of gene expression

  • (Chip-Seq)

  • Gel retardation (Electrophoretic mobility shift assay, EMSA)

  • Footprinting with DNase I

  • Modification interference assay

34
New cards

What is the principle of gel retardation (EMSA)

  • DNA bound to a protein moves slower in a gel than DNA alone.

  • Shifted band = DNA-protein complex. 

Shifted =  bandet hamnar högre upp i gelen

  • Used to confirm if a protein binds a specific DNA fragment.

  • To see binding strength or presence från proteinet till DNA.

35
New cards

What is DNase footprinting

Principle:

  • Identifies exact binding sites of DNA-binding proteins.

  • Detects protected regions where proteins block DNase I from cutting DNA.

Steps:

  1. DNA is labeled with an End–label and is mixed with a regulatory protein that binds.

  2. DNase I is added → it cuts DNA except where the protein is bound.

  3. Fragments are separated by gel → the “gap” or “footprint” on the gel shows where the protein was bound.

  • End-label: marks one end of DNA so fragment sizes can be seen clearly.

  • Protein binds DNA → protects a region from DNase cutting.

  • DNase I cuts only unprotected (protein-free) DNA.

  • Footprint = gap in band pattern on gel → shows where protein was bound.

  • Only DNA fragments run in the gel, not the proteins.

36
New cards

What is modification interference assay

Determines which DNA bases are essential for protein binding

Steps:

  1. DNA is chemically modified (e.g., G bases are methylated).

  2. Protein is added. If the modification prevents binding, that base is important.

  3. Compare modified fragments that did or did not bind the protein via gel electrophoresis.

37
New cards

How do you find out what the found sequence does (identifying the regulation of gene expression)

Deletion analysis

38
New cards

What is deletion analysis

Tests what happens when control elements that we have found (enhancers/silencers of gene expression) are deleted.

The result should be:

  • Delete enhancer → gene expression reduces

  • Delete silencer → gene expression increases

39
New cards

What is a reporter gene

A gene used to mimic the expression pattern of the original gene

40
New cards

How can reporter genes facilitate deletion analysis

En reporter används för att enkelt spåra genuttryck, eftersom den ger en tydlig och mätbar signal, till skillnad från många vanliga gener.

41
New cards

How do you analyze proteins and their functions

Change your protein of interest - study the effect:

  • Introduce a point mutation - In vitro or site-directed mutagenesis

  • Change larger parts of you protein – swaps and truncations

42
New cards

Name two ways to introduce mutations in proteins

  • Artificial gene synthesis

  • Mutations by PCR

43
New cards

How does artificial gene synthesis work

  • Create overlapping short DNA fragments = oligonucleotides (Designar dom själv).

  • Assemble them into a full gene with DNA polymerase and ligase.

44
New cards

How do you mutate by PCR

  • Use primers that contain mutations (forward and reverse).

  • Run PCR to amplify DNA with the mutation.

  • Used to precisely alter DNA and test effects.

45
New cards

How do you analyze a protein through swaps and truncations

Strategies:

  • Promoter swap: test how a different promoter affects expression.

  • Motif swap: exchange a domain or motif between genes.

  • Truncations: cut out parts of the gene/protein to study the function of specific regions (if they are important for the function of the gene/protein).

46
New cards

Name ways to change gene expression in vivo (in organism)

  • Gene editing using well-developed Gene Editing Tools:

    • TALENs, Zinc Finger Nucleases, CRISPR/Cas9
      → Allow precise changes in DNA.

  • Gene knockout/downregulation: turn off/ decrease expression of a gene 

    • Homologous recombination (deletion cassettes)

    • CRISPR interference

    • RNA interference, or recombination

  • Gene overexpression: make more of a gene’s product, from the same or different species.

47
New cards

Deletion Cassette is a way to knockout a gene by homologous recombination. How does it work

Example with mice:

  1. Insert modified DNA with selectable markers.

  2. Replace gene via homologous recombination. (Den ursprungliga genen ersätts via homolog rekombination med det modifierade DNA:t.)

  3. Select modified cells → insert into embryos → grow chimeric mice. Välj celler där bytet lyckats → sätt in i embryon → få chimeriska möss = har både normala och knockout-celler 

  4. Crossbreed to get homozygous knockout mice = möss där båda kopiorna av genen är utslagna.

48
New cards

RNA interference is way to knockdown a gene.

How does it work

  • A post-transcriptional method to silence genes using double-stranded RNA (dsRNA).

  • The dsRNA matches a gene’s mRNA and triggers its degradation (uppfattas som ett hot, ex. virus-RNA → det matchande mRNAt bryts ned i försvar), reducing protein production.

  • The genes are temporarily silenced not permanently removed (like gene-knockout)

49
New cards

What is the mechanism of RNA interference

  1. Dicer enzyme chops long dsRNA into small interfering RNAs (siRNA).

  2. siRNA joins a complex called RISC (RNA-induced silencing complex).

  3. RISC uses one siRNA strand to find and bind mRNA with a matching sequence.

  4. Bound mRNA is cleaved and degraded, stopping translation.

    → This is a natural defense against viruses, which often have dsRNA.

50
New cards

How do you introduce dsRNA for RNAi

  1. Antisense RNA: Make RNA that is reverse-complementary to your target → binds to target RNA and creates dsRNA.

  2. Hairpin RNA: RNA is engineered to fold back on itself, creating internal dsRNA.

51
New cards

CRISPR interference is a way to knockdown a gene. How does it work

  • Interference = the gene is still present, but its expression (protein production) is blocked or significantly reduced.

  • It uses dCas9, a modified form of Cas9 that cannot cut DNA due to inactivating mutations.

  • A guide RNA (sgRNA) directs dCas9 to a specific DNA sequence.

52
New cards

What is the mechanism of CRISPR interference

Blocking gene expression:  

1. Initiation block:

  • dCas9 binds near the promoter region.

  • This prevents RNA polymerase from binding, so transcription cannot start.

2. Elongation block:

  • dCas9 binds within the gene body (= den kodande delen av genen).

  • RNA polymerase may bind, but is blocked during transcription, so transcription stops partway.

Key points:

  • CRISPRi blocks transcription, while RNA interference (RNAi) blocks translation.

  • The gene itself is not cut or removed, only silenced.