Bioinformatics: Illumina Sequencing and RNA-seq

Objective of Research: The primary objective of this research is to develop a small molecule drug targeting neuroregeneration following neuronal injury, particularly focusing on enhancing the inherent repair mechanisms of nerve cells.

Hypothesis: The hypothesis driving this research is that certain genes, which become activated in response to nerve damage, may play a crucial role in facilitating recovery and promoting neuron growth. By prolonging the activation of these genes, it is anticipated that significant improvements in long-term neuroregeneration can be achieved, potentially leading to better recovery outcomes for individuals suffering from nerve injuries.

Method for Gene Identification:

Utilize RNA sequencing (RNA-seq) to quantify and analyze gene expression changes that occur as a result of neuronal injury.
Focus on the identification and characterization of genes that exhibit upregulation following injury events, which may serve as therapeutic targets for drug development.

Back to Basics: DNA Structure

DNA Structure: DNA is structured as a double helix consisting of two anti-parallel strands that are intertwined. This unique formation is essential for its replication and functionality in biological processes.
Direction of transcription: DNA is transcribed by RNA polymerase in the 5’ to 3’ direction, which is vital for the accuracy of gene expression and the formation of messenger RNA (mRNA). Both DNA strands possess the ability to be transcribed, yet the reading direction for RNA synthesis consistently adheres to the 5’ to 3’ orientation.

Transcripstome Analysis

Pre-Injury Transcriptome:
- Sample A comprises a suite of genes, including GeneA, GeneB, GeneC, GeneD, GeneE, and GeneF, which serve as a baseline for understanding normal neuronal function and expression patterns.
Post-Injury Transcriptome:
- Sample B indicates significant alterations in the gene expression profile immediately following the neuronal injury, highlighting the dynamic biological response of neurons to trauma.

RNA Sequencing Process

RNA Enrichment Methods:
- Implement selection of polyA+ RNA by depleting ribosomal RNA (rRNA) to focus on mRNA fractions and reduce background noise during sequencing.
Library Preparation Steps:
- RNA Extraction: Employ a commercial kit, such as the Lexogen Corall, to efficiently lyse samples and isolate RNA.
- Perform sequencing using an Illumina machine, like the NovaSeq6000, which offers high-throughput capabilities essential for comprehensive analysis.
Data Analysis:
- Utilize a structured RNA-seq pipeline for the analysis and interpretation of sequencing outputs, which allows for robust quantification of gene expression levels and comparison between samples.

Understanding RNA Sequencing with Illumina Technology

Sequencing Process:
- Extract total RNA molecules from the target cells and conduct reverse transcription to generate double-stranded DNA (dsDNA) suitable for sequencing processes.
Key Steps in Illumina Sequencing:
- Engage in library preparation, followed by cluster formation on a flow cell, the sequencing of fragments, and subsequent identification of sequences from generated data.
Details on key components:
- Each DNA fragment receives an adaptor to enhance sequencing efficiency, utilizes the incorporation of nucleotides (ATP, TTP, CTP, GTP), and employs laser technology to accurately identify bases during the sequencing process.

Questions Address during RNA-seq Library Preparation

Critical questions to investigate include:
- What are the expression levels of each gene under different experimental conditions (e.g., comparing Sample A versus Sample B)?
- Can we accurately identify the precise nucleotide sequences of transcribed RNA molecules?
- Which strand of DNA contributed to the RNA molecule, as this is crucial for mapping and data analysis?
- How can we ensure proper distinction between samples when loading RNA-seq libraries on a flow cell to accurately track the origins of reads and avoid mix-ups?

Paired-End Library Structure (Dual Indexes)

Basic Structure:
- Each sequencing cycle generates Indexed Read 1 and Read 2, critical for reconstructing the original sequences from both ends of DNA fragments.
- Primers include P5, P7, along with specific index primers for reads to enhance detection.
- Employ dual indexing strategies to effectively track and differentiate reads from each sample, ensuring data integrity during analysis.

Read Length Considerations in Sequencing

Typical Read Lengths:
- Short read runs generally yield sequences of around 100 base pairs (bp), depending on the application and desired resolution.
- In paired-end sequencing, the total read length is split between Read 1 and Read 2, allowing for insightful sampling from longer gene fragments, which enhances the comprehensiveness of the data obtained.

Specifics of Lexogen Corall Kit (Stranded Protocol)

Steps Involved:
- Key procedures include reverse transcription, linker oligo addition, and subsequent ligation of sequences.
- Networking of read pairs with their corresponding barcodes and flow cell adaptors is performed using polymerase chain reaction (PCR) techniques to ensure a successful sequencing layout.
Determining Read Lengths:
- Factors such as flow cell kit compatibility and sequencing settings significantly influence achievable read lengths; for instance, 100 cycles typically yield shorter reads, while 300 cycles can produce longer reads.
- Targeting a yield of 20-30 million reads per sample is ideal, contingent on optimal loading conditions on the flow cell, to ensure effective coverage and reliability of results.