3.1 Choosing the Right Tech Stack for LLM applications

Cloud Providers

  • Majority of users prefer cloud providers like Azure, Google, and Amazon.
  • Azure AI Studio: Most mature enterprise-grade AI offering.
  • Google: Strong competitor to Azure with a complete tool offering.
  • Amazon Bedrock: Up-and-coming, leveraging SageMaker functionalities; not yet generally available (GA).

Open Source Stack

  • LangChain:
    • Built-in orchestration for calling different models.
    • Memory management.
    • Task orchestration.
  • Auto-GPT:
    • Built on top of LangChain.
    • Aims for agent-based functionality.
    • Mimics human-like thinking and reasoning (Research paper called RUAG).
    • Breaks down tasks into subtasks (e.g., researching generative AI in cybersecurity).
    • Early stages but promising.

Vector Databases

  • Pinecone:
    • Stores embeddings.
    • Also referred to as embedding database or vector embedding database.
  • Chroma:
    • Alternative to Pinecone.
  • Llama Index:
    • Connects to various data sources (PDF, Reddit, Wiki).
    • Indexes data to avoid building custom connectors.

Model Hubs

  • Hosts open-source large language models.
  • Hugging Face:
    • Major platform for hosting LLMs.
    • GitHub is another option.

Azure AI Studio

  • Microsoft emphasizes AI as a survival strategy, heavily investing in it.
  • Allows building and training custom models.
  • Users can browse foundation models and train using Azure's GPUs.
  • Azure OpenAI service:
    • Can be "grounded" to reduce hallucination.
    • Built-in vector index.
    • Text embedding API.
    • Image embedding API (different algorithm due to pixel-based nature).
  • Retrieval Augmented Generation (RAG):
    • Uses vector database as context/memory.
    • Retrieves information and uses LLM to augment response.
    • Organizes information and generates a response.
  • Facilitates prompt workflow management and collaboration.
  • Meets security compliance requirements with built-in safety features.

Comparison of Cloud Offerings

  • AWS:
    • First infrastructure service cloud with a large market share.
    • Trailing in the AI space compared to Google and Azure.
    • Risk of being permanently delayed if they don't catch up within a year.
  • Vendor Locking:
    • Users tend to stay with a platform once they've integrated models and data.
    • Consider open-source options to avoid vendor lock-in.

Security Considerations for Small Enterprises

  • Need a security team to vet models and tech stack components (e.g., Pinecone).
  • High demand for cybersecurity personnel with LLM expertise.

AWS Foundation Model

  • Subcategories include runtime.
  • Amazon Bedrock: (potentially available by the time of viewing).
    • Lacks text generation capabilities. (At the time of recording)
  • Microsoft: Using OpenAI's GPT.
  • Google: Using PaLM.
  • Code Generation:
    • Google: Kodi.
    • Microsoft: GPT.
  • Image Generation:
    • Google: Imagen.
    • Microsoft: DALL-E (also used by OpenAI).
  • Translation:
    • Google offers an API.
    • Microsoft and Amazon need to catch up in the general AI space.

Model Catalog/Hub

  • Platform for publishing models for developers.
  • Amazon SageMaker: Jumpstart (commercial, smaller) and Teton (larger).
  • Hugging Face: Major open-source hub.
  • Google: Model Garden (both open source and commercial).
  • Azure: Foundation model built-in with options from Hugging Face.

Vector Databases in different Cloud Providers

*Every cloud provider has solution called PG Vector.

  • Amazon RDS: Implemented with PGVector.
  • Google: Built into relational database using PGVector.
  • Azure: Cosmos DB and Azure Cache.

Model Deployment and Fine-Tuning

  • Amazon SageMaker: Good for model deployment.
  • Vertex AI & Azure ML: Catching up.
  • Azure OpenAI: Preferred for fine-tuning.
  • Vertex: Next in line.
  • Bedrock: Not quite available.
  • No-Code Deployment:
    • Azure Power Apps or Jing Apps Builder.

Code Completion

  • Amazon: CodeWhisperer.
  • Google: Duet AI (potential name change).
  • GitHub: Copilot (uses GPT behind the scenes).
  • Developers should use these tools to improve productivity but must conduct due diligence for security vulnerabilities using SAST and DAST.

Azure OpenAI Architecture Stack

  • LLM application accessible to end-users via API calls.
  • Orchestrator (e.g., LangChain or Microsoft's Semantic Kernel SDK) manages API calls.
  • Semantic Kernel:
    • Supports .NET platform (suited for Microsoft shops).
    • Complementary to LangChain but lacks some connectors.
    • Azure Cognitive Search:
      • Indexing database for vector data from Azure Blob.
      • Cosmos DB for NoSQL index.
      • Requires Azure OpenAI for embedding before indexing.
      • Offers similarity search for querying.
        *Typical model requires prompt from the user and query from knowledge to generate response.
  • Azure Cognitive Search indexes internal data and supports text and image embeddings.
  • LangChain (open source) is supported.

Open Source Stack (OPL Stack)

  • O: Open-source LLMs.
  • P: Pinecone (vector database).
  • L: LangChain (orchestration).
  • Alternative to Pinecone is Chroma (supports on-premise deployment).
  • OpenAI can be selected depending on business requirement. GPT-4 is more expensive compared to GPT 3.5.
  • LangChain manages prompts, indexing, and memory, and supports agent functionalities (reasoning and acting).
    *It is good to use Open AI for the "O", but it can also be any LLM from HuggingFace.

Hugging Face

  • Hosts thousands of models.
  • Requires security vetting.

Four-Panel Step Breakdown (Silicon to Application)

  1. Hardware (Silicon Level):
    • NVIDIA (GPU provider).
    • AMD (competing with NVIDIA).
    • Amazon, Microsoft, Cerebras (working on ML processing chips).
    • Google (TPU).
  2. Cloud:
    • Azure, AWS, Google Cloud.
  3. Foundation Models (LLMs):
    • Proprietary: OpenAI, Cohere, Anthropic, Google, Microsoft.
    • Open Source: Hugging Face, GitHub (Stable Diffusion, LLaMA, Bloom, GLM, Falcon).
  4. Tooling and Applications:
    • FMOps for deploying and training models.
    • Tools for building applications, evaluation, orchestration, and connecting to data sources and APIs.
    • Applications: ChatGPT, Jasper, GitHub Copilot, and various AI tools.