1/40
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Genome annotation
The process of identifying and describing regions of biological interest within a genome - both functionally and structurally. 3 steps. 1) Identifying noncoding regions. 2) Identifying coding regions (=gene prediction) 3) attaching the biological information of these elements
Transposable elements (TE)
One type of repeat sequences in the genome. They are a DNA sequence that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Up to 90% of some genomes, key contributors in eukaryotic genomes. First annotation task to combat. Specific programs identifies them.
Methods for identifying genes
Intrinsic methods: ORF, splicing sites and gene codon bias. Extrinsic methods: gene homology and related genomes, comparison to RNA expression.
Open reading frame (ORF) scanning
To find the start codon (often ATG) and stop (TAA, TAG, TGA). Need to search in all 6 possible reading frames. Itis ab initio gene prediction (from the beginning). Random stop codon every 64th bp, genes are longer.
Finding ORF
Effective in prokaryotic genomes, more difficult in eukaryotes due to exon/introns. Exons are often shorter than 64 bp. Use codon bias since it is only present in exons. Can also use intron/exon boundaries.
Identification of intron/exon boundaries
Relies on consensus sequences. Not so distinct to make the search trivial. Used to find the ORF count.
Homology search
Search for similar sequences in other genomes, using the database BLAST for example. We can look at nucleotide identity or amino acid identity (more similar on a protein level). The closer the relative is the more similar it will be and the easier it will be to do comparisons.
Related genomes
Can compare using homology. Can help identify "sites" such as start/stop codons, promoter region, Poly A signal and terminators, codes for signal peptides, conserved regions and motifs. Can show if they are part of exons or not.
Binding site identification ChIP-seq
Chromatin immunoprecipitation sequencing. Identifies pieces of DNA that binds a specific protein. Proteins are crossed-linked to DNA, isolated and cut into small pieces. It is precipitated with a protein-specific antibody, then reverse cross-linking to digest protein.
Comparison to RNA expression
Using transcriptomics one could compare the genomic data to RNA/cDNA. Good to see what is actually genes and what is pseudogenes.
Pseudogenes
Nonfunctional segments of DNA that resemble functional genes. Needs to be excluded from the genes. Common defects: missing promoter, missing start codon, frameshift, premature stop codon, missing introns, partial deletion.
BUSCO
Quality control after assembling. Stands for: benchmarking universal single-copy orthologs. Tool for assessment of genome assemblies.
Functional annotation
Assigning a gene a function. Works better with "single gene phenotypes".
Information that can be gathered from a gene sequence
Bioinformatics and homology by using different tools. 3D structure via crystallization. Secondary structure easier to predict. Transmembrane regions also easy to predict by looking at hydrophobicity and amino acids.
Outline for genome annotation
1) Transposable elements 2) ORF identification 3) Homology search 4) Related genomes 5) Binding site identification
Outline for gene expression and function identification
1) Detection of a transcript 2) Methods for transcript analysis 3) Identifying the regulation of gene expression 4) Analyzing proteins and their function.
Detection of a transcript methods
Via northern blot or qPCR.
Northern blot
A RNA detection analysis. Electrophoresis with RNA, transfer from gel to membrane to visualize it. Good for: comparing expression of genes in different tissues. Length of RNA, alternative splicing and introns can be seen.
qPCR
Quantitative PCR. Can be used to compare with expression of household genes. Can not say anything about length or presence of introns with this method. Alternative to Northern blot for RNA detection.
Methods for transcript analysis
S1 nuclease mapping, primer extension and RACE-PCR. Used when one do not trust the sequence or need to know up/downstream regions. To analyze introns/exons, start and end of the sequence.
S1 nuclease mapping
RNA is mixed with DNA of the same gene. The DNA have introns, not RNA. Based on the heteroduplex form (RNA and DNA match) they will match, looping the intron regions. Treat with S1 nuclease, it cuts off ssDNA. DNA is degraded by alkali. dsDNA to be analyzed.
RACE-PCR
Rapid amplification of cDNA ends. Needs one known primer that matches part of the RNA that binds. cDNA is synthesized using reverse transcriptase, then denatured. As are added to 3' end with terminal transferase. Second primer anneals to the poly-A-tail on cDNA. Then PCR is carried out with Taq polymerase as usual.
Methods for identifying protein binding sites
1) EMSA (electrophoretic mobility shift assay) type of gel retardation 2) footprinting with DNase 1 3) modification interference assay.
Gel retardation (Electrophoretic mobility shift assay, EMSA)
Method for identifying protein binding sites. Mix the DNA fragment with the regulatory protein we are interested in. Separation on gel based on size. The size shift if the protein is bound. Cannot tell where the protein binds.
Footprinting with DNase 1
Method for identifying protein binding sites, it identifies blocked DNA. All DNAs are end-labeled. DNase 1 cuts DNA into pieces, but not in the position at which the protein is bound. Followed by a gel electrophoresis. The sizes that are protected creates a footprint by comparing the cleavage sizes. The blocked site is slightly larger than the protein binding site.
Modification interference assay
Method for identifying the exact protein binding site. Uses end-labeled restriction fragments and adds dimethyl sulfate which modifies a specific base on the fragment. Nuclear extract is added, the protein will not bind if the binding site is blocked by the modification. Using gel electrophoresis to purify that part, and then the protein piperidine cuts at the modification to determine the size by gel electrophoresis.
Piperidine
Protein used to find the exact protein binding site in modification interference assay. Piperidine cuts where the modification was made in the restriction fragment that the protein did not bind to. By gel electrophoresis we can determine the exact place where the protein binding site is.
Methods for analyzing proteins and their functions
1) Deletion analysis 2) swaps and truncations 3) gene mutations.
Deletion analysis
The gene under study is replaced with a reporter gene. See what happens.
Reporter genes
Used to mimic the expression pattern of the original genes. Their expression is easy to detect. Used to see expression in different tissues for example. Often used by swapping the gene under study with the reporter gene.
Swaps and truncations
Used to study protein function and what regulates what. One can swap promoter, motif swap, truncations swap etc and analyze what happens? One can also delete part of the sequence, for example the enhancer or the promoter.
Gene mutations
Used to study protein function. By adding/deleting/modifying nucleotides in a gene, via artificial gene synthesis and mutations by PCR for or base pair editing or use longer primers to introduce changes in the gene.
Gene editing
Study of protein expression in vivo. By TAL effector nucleases (TALENs), Zinc finger nucleases (ZFN, Zinc fingers), Clustered Regularly interspaced Short palindromic repeats (CRISPR/Cas9)
Gene knockout/downregulation
Can be used to study protein expression. Through homologous recombination and deletion cassettes, RNA interference or CRISPR interference. Way to change gene expression in vivo.
Ways to study and change gene expression in vivo
1) Gene editing: TALENs, Zinc fingers, CRISPR/Cas9, 2) gene knockdown/downregulation; Homologous recombination (deletion cassettes), RNA interference, CRISPR interference or 3) gene overexpression of a gene from the same or another species.
Knockout by homologous recombination
Using a deletion cassette. Need target gene in DNA, a homologous part of DNA with a positive (inside) and negative (outside the homologous region) selection marker. Due to homology the parts swap places. A method used to get information about that specific protein. It will cause no expression and no translation, the gene is deleted.
RNA interference
Post-transcriptional gene silencing. Is a sequence specific suppression of gene expression by dsRNA. The dsRNA targets mRNA with the same sequence for breakdown. Not 100% effective, some mRNA will pass through. Good for studying genes that are lethal when knocked out. Can target several gene copies. 2 ways: antisense or RNAai
Antisense
Type of RNA interference. Uses reverse complementary RNA. It makes double stranded RNA when reacting to target mRNA.
RNAi
One type of RNA interference. Introducing dsRNA via hairpin loops.
RNAi mechanism
dsRNA binds to the protein dicer that cleaves it to smaller fragments. One RNA strand is loaded into RISC complex that links the complex to the mRNA strand via base pairing. mRNA is then cleaved and destroyed. No protein can be synthesized.
CRISPR interference
Many variations exist. The gene will still be there but very little will be expressed. Way to study for proteins.