Bioinformatics Techniques for Protein Identification and Analysis

Overview of Bioinformatics in Protein Identification

  • Aim: Identify an unknown protein (LAB1) involved in cancer survival.
  • Techniques: Sequence-specific DNA affinity chromatography and Edman degradation peptide sequencing.
  • Outcome: Obtained amino acid sequence of potential DNA binding protein (Query1).

Query Sequence Details

  • Amino Acid Sequence (Query1):

TQIPLSQPIQIAQDLQQLQQLQQQNLNLQQFVLVHPTTNLQPAQFIISQTPQ…

  • Sequence length: 50+ amino acids.

Part 1: Performing a BLAST Search

  • BLAST (Basic Local Alignment Search Tool): A bioinformatics tool for comparing an input sequence against a database to identify homologous sequences.
  • Steps for BLASTP search:
    1. Open web browser: BLAST
    2. Select 'Human' as the organism.
    3. Choose the BLASTP option (for protein sequences).
    4. Paste Query1 into the provided input box.
    5. Click on the BLAST button to start the search.
    6. Post-search, review the Graphic Summary and Descriptions tab for results.

Analyzing Results

Key Metrics Explained:
  • Query Cover: The portion of the query sequence included in the aligned sequences.
  • E-value: Describes the expected number of matches one might see by chance; lower values indicate significance.
  • Bit Score: Reflects alignment quality; higher scores mean better alignment.
  • Accession Number: Unique identifier for each entry in the database.
Questions to Answer:
  1. Identify the protein matched by the search.
  2. Determine if Query1 sequence aligns with a known protein and explore differences.
  3. Collect essential information about the identified protein, including:
    • Name of the protein
    • Function
    • Interacting partners
    • Transcript variations

Part 2: DNA Sequence Manipulation

  • Retrieve Full-Length Nucleotide Sequence:
    • Follow procedures to find the nucleotide showing mRNA coding for the protein; find the correct entry from database results.
    • Convert the mRNA sequence into FASTA format for further bioinformatics work.
    • Confirm the starting codon and identify necessary PCR primers. (Use GAG as a reference for start codon).

Part 3: Domain Identification

  • Isolate DNA Binding Domain:
    --- Use Prosite to identify conserved domains in Oct-1.
  • Highlight two identified domains in the amino acid sequence and document their corresponding DNA sequences in results.
Structural Information of POU Domain
  • Utilize Pymol to visualize the POU domain structure.
    1. fetch 1POU command retrieves the structure.
    2. Use display options to analyze structural components such as helix-turn-helix formations.
    3. Provide screenshots of structural analysis for documentation.
Conclusion
  • The Oct-1 transcription factor, indicated by the sequence analysis, modulates gene expression and is integral to cancer research.