Lecture 6: Genome annotation, gene expression and function

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/40

There's no tags or description

Looks like no tags are added yet.

Last updated 1:03 PM on 5/14/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

41 Terms

New cards

Genome annotation

The process of identifying and describing regions of biological interest within a genome - both functionally and structurally. 3 steps. 1) Identifying noncoding regions. 2) Identifying coding regions (=gene prediction) 3) attaching the biological information of these elements

New cards

Transposable elements (TE)

One type of repeat sequences in the genome. They are a DNA sequence that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Up to 90% of some genomes, key contributors in eukaryotic genomes. First annotation task to combat. Specific programs identifies them.

New cards

Methods for identifying genes

Intrinsic methods: ORF, splicing sites and gene codon bias. Extrinsic methods: gene homology and related genomes, comparison to RNA expression.

New cards

Open reading frame (ORF) scanning

To find the start codon (often ATG) and stop (TAA, TAG, TGA). Need to search in all 6 possible reading frames. Itis ab initio gene prediction (from the beginning). Random stop codon every 64th bp, genes are longer.

New cards

Finding ORF

Effective in prokaryotic genomes, more difficult in eukaryotes due to exon/introns. Exons are often shorter than 64 bp. Use codon bias since it is only present in exons. Can also use intron/exon boundaries.

New cards

Identification of intron/exon boundaries

Relies on consensus sequences. Not so distinct to make the search trivial. Used to find the ORF count.

New cards

Homology search

Search for similar sequences in other genomes, using the database BLAST for example. We can look at nucleotide identity or amino acid identity (more similar on a protein level). The closer the relative is the more similar it will be and the easier it will be to do comparisons.

New cards

Related genomes

Can compare using homology. Can help identify "sites" such as start/stop codons, promoter region, Poly A signal and terminators, codes for signal peptides, conserved regions and motifs. Can show if they are part of exons or not.

New cards

Binding site identification ChIP-seq

Chromatin immunoprecipitation sequencing. Identifies pieces of DNA that binds a specific protein. Proteins are crossed-linked to DNA, isolated and cut into small pieces. It is precipitated with a protein-specific antibody, then reverse cross-linking to digest protein.

New cards

Comparison to RNA expression

Using transcriptomics one could compare the genomic data to RNA/cDNA. Good to see what is actually genes and what is pseudogenes.

New cards

Pseudogenes

Nonfunctional segments of DNA that resemble functional genes. Needs to be excluded from the genes. Common defects: missing promoter, missing start codon, frameshift, premature stop codon, missing introns, partial deletion.

New cards

BUSCO

Quality control after assembling. Stands for: benchmarking universal single-copy orthologs. Tool for assessment of genome assemblies.

New cards

Functional annotation

Assigning a gene a function. Works better with "single gene phenotypes".

New cards

Information that can be gathered from a gene sequence

Bioinformatics and homology by using different tools. 3D structure via crystallization. Secondary structure easier to predict. Transmembrane regions also easy to predict by looking at hydrophobicity and amino acids.

New cards

Outline for genome annotation

1) Transposable elements 2) ORF identification 3) Homology search 4) Related genomes 5) Binding site identification

New cards

Outline for gene expression and function identification

1) Detection of a transcript 2) Methods for transcript analysis 3) Identifying the regulation of gene expression 4) Analyzing proteins and their function.

New cards

Detection of a transcript methods

Via northern blot or qPCR.

New cards

Northern blot

A RNA detection analysis. Electrophoresis with RNA, transfer from gel to membrane to visualize it. Good for: comparing expression of genes in different tissues. Length of RNA, alternative splicing and introns can be seen.

New cards

qPCR

Quantitative PCR. Can be used to compare with expression of household genes. Can not say anything about length or presence of introns with this method. Alternative to Northern blot for RNA detection.

New cards

Methods for transcript analysis

S1 nuclease mapping, primer extension and RACE-PCR. Used when one do not trust the sequence or need to know up/downstream regions. To analyze introns/exons, start and end of the sequence.

New cards

S1 nuclease mapping

RNA is mixed with DNA of the same gene. The DNA have introns, not RNA. Based on the heteroduplex form (RNA and DNA match) they will match, looping the intron regions. Treat with S1 nuclease, it cuts off ssDNA. DNA is degraded by alkali. dsDNA to be analyzed.

New cards

RACE-PCR

Rapid amplification of cDNA ends. Needs one known primer that matches part of the RNA that binds. cDNA is synthesized using reverse transcriptase, then denatured. As are added to 3' end with terminal transferase. Second primer anneals to the poly-A-tail on cDNA. Then PCR is carried out with Taq polymerase as usual.

New cards

Methods for identifying protein binding sites

1) EMSA (electrophoretic mobility shift assay) type of gel retardation 2) footprinting with DNase 1 3) modification interference assay.

New cards

Gel retardation (Electrophoretic mobility shift assay, EMSA)

Method for identifying protein binding sites. Mix the DNA fragment with the regulatory protein we are interested in. Separation on gel based on size. The size shift if the protein is bound. Cannot tell where the protein binds.

New cards

Footprinting with DNase 1

Method for identifying protein binding sites, it identifies blocked DNA. All DNAs are end-labeled. DNase 1 cuts DNA into pieces, but not in the position at which the protein is bound. Followed by a gel electrophoresis. The sizes that are protected creates a footprint by comparing the cleavage sizes. The blocked site is slightly larger than the protein binding site.

New cards

Modification interference assay

Method for identifying the exact protein binding site. Uses end-labeled restriction fragments and adds dimethyl sulfate which modifies a specific base on the fragment. Nuclear extract is added, the protein will not bind if the binding site is blocked by the modification. Using gel electrophoresis to purify that part, and then the protein piperidine cuts at the modification to determine the size by gel electrophoresis.

New cards

Piperidine

Protein used to find the exact protein binding site in modification interference assay. Piperidine cuts where the modification was made in the restriction fragment that the protein did not bind to. By gel electrophoresis we can determine the exact place where the protein binding site is.

New cards

Methods for analyzing proteins and their functions

1) Deletion analysis 2) swaps and truncations 3) gene mutations.

New cards

Deletion analysis

The gene under study is replaced with a reporter gene. See what happens.

New cards

Reporter genes

Used to mimic the expression pattern of the original genes. Their expression is easy to detect. Used to see expression in different tissues for example. Often used by swapping the gene under study with the reporter gene.

New cards

Swaps and truncations

Used to study protein function and what regulates what. One can swap promoter, motif swap, truncations swap etc and analyze what happens? One can also delete part of the sequence, for example the enhancer or the promoter.

New cards

Gene mutations

Used to study protein function. By adding/deleting/modifying nucleotides in a gene, via artificial gene synthesis and mutations by PCR for or base pair editing or use longer primers to introduce changes in the gene.

New cards

Gene editing

Study of protein expression in vivo. By TAL effector nucleases (TALENs), Zinc finger nucleases (ZFN, Zinc fingers), Clustered Regularly interspaced Short palindromic repeats (CRISPR/Cas9)

New cards

Gene knockout/downregulation

Can be used to study protein expression. Through homologous recombination and deletion cassettes, RNA interference or CRISPR interference. Way to change gene expression in vivo.

New cards

Ways to study and change gene expression in vivo

1) Gene editing: TALENs, Zinc fingers, CRISPR/Cas9, 2) gene knockdown/downregulation; Homologous recombination (deletion cassettes), RNA interference, CRISPR interference or 3) gene overexpression of a gene from the same or another species.

New cards

Knockout by homologous recombination

Using a deletion cassette. Need target gene in DNA, a homologous part of DNA with a positive (inside) and negative (outside the homologous region) selection marker. Due to homology the parts swap places. A method used to get information about that specific protein. It will cause no expression and no translation, the gene is deleted.

New cards

RNA interference

Post-transcriptional gene silencing. Is a sequence specific suppression of gene expression by dsRNA. The dsRNA targets mRNA with the same sequence for breakdown. Not 100% effective, some mRNA will pass through. Good for studying genes that are lethal when knocked out. Can target several gene copies. 2 ways: antisense or RNAai

New cards

Antisense

Type of RNA interference. Uses reverse complementary RNA. It makes double stranded RNA when reacting to target mRNA.

New cards

RNAi

One type of RNA interference. Introducing dsRNA via hairpin loops.

New cards

RNAi mechanism

dsRNA binds to the protein dicer that cleaves it to smaller fragments. One RNA strand is loaded into RISC complex that links the complex to the mRNA strand via base pairing. mRNA is then cleaved and destroyed. No protein can be synthesized.

New cards

CRISPR interference

Many variations exist. The gene will still be there but very little will be expressed. Way to study for proteins.