3.1 Choosing the Right Tech Stack for LLM applications
Cloud Providers
- Majority of users prefer cloud providers like Azure, Google, and Amazon.
- Azure AI Studio: Most mature enterprise-grade AI offering.
- Google: Strong competitor to Azure with a complete tool offering.
- Amazon Bedrock: Up-and-coming, leveraging SageMaker functionalities; not yet generally available (GA).
Open Source Stack
- LangChain:
- Built-in orchestration for calling different models.
- Memory management.
- Task orchestration.
- Auto-GPT:
- Built on top of LangChain.
- Aims for agent-based functionality.
- Mimics human-like thinking and reasoning (Research paper called RUAG).
- Breaks down tasks into subtasks (e.g., researching generative AI in cybersecurity).
- Early stages but promising.
Vector Databases
- Pinecone:
- Stores embeddings.
- Also referred to as embedding database or vector embedding database.
- Chroma:
- Alternative to Pinecone.
- Llama Index:
- Connects to various data sources (PDF, Reddit, Wiki).
- Indexes data to avoid building custom connectors.
Model Hubs
- Hosts open-source large language models.
- Hugging Face:
- Major platform for hosting LLMs.
- GitHub is another option.
Azure AI Studio
- Microsoft emphasizes AI as a survival strategy, heavily investing in it.
- Allows building and training custom models.
- Users can browse foundation models and train using Azure's GPUs.
- Azure OpenAI service:
- Can be "grounded" to reduce hallucination.
- Built-in vector index.
- Text embedding API.
- Image embedding API (different algorithm due to pixel-based nature).
- Retrieval Augmented Generation (RAG):
- Uses vector database as context/memory.
- Retrieves information and uses LLM to augment response.
- Organizes information and generates a response.
- Facilitates prompt workflow management and collaboration.
- Meets security compliance requirements with built-in safety features.
Comparison of Cloud Offerings
- AWS:
- First infrastructure service cloud with a large market share.
- Trailing in the AI space compared to Google and Azure.
- Risk of being permanently delayed if they don't catch up within a year.
- Vendor Locking:
- Users tend to stay with a platform once they've integrated models and data.
- Consider open-source options to avoid vendor lock-in.
Security Considerations for Small Enterprises
- Need a security team to vet models and tech stack components (e.g., Pinecone).
- High demand for cybersecurity personnel with LLM expertise.
AWS Foundation Model
- Subcategories include runtime.
- Amazon Bedrock: (potentially available by the time of viewing).
- Lacks text generation capabilities. (At the time of recording)
- Microsoft: Using OpenAI's GPT.
- Google: Using PaLM.
- Code Generation:
- Google: Kodi.
- Microsoft: GPT.
- Image Generation:
- Google: Imagen.
- Microsoft: DALL-E (also used by OpenAI).
- Translation:
- Google offers an API.
- Microsoft and Amazon need to catch up in the general AI space.
Model Catalog/Hub
- Platform for publishing models for developers.
- Amazon SageMaker: Jumpstart (commercial, smaller) and Teton (larger).
- Hugging Face: Major open-source hub.
- Google: Model Garden (both open source and commercial).
- Azure: Foundation model built-in with options from Hugging Face.
Vector Databases in different Cloud Providers
*Every cloud provider has solution called PG Vector.
- Amazon RDS: Implemented with PGVector.
- Google: Built into relational database using PGVector.
- Azure: Cosmos DB and Azure Cache.
Model Deployment and Fine-Tuning
- Amazon SageMaker: Good for model deployment.
- Vertex AI & Azure ML: Catching up.
- Azure OpenAI: Preferred for fine-tuning.
- Vertex: Next in line.
- Bedrock: Not quite available.
- No-Code Deployment:
- Azure Power Apps or Jing Apps Builder.
Code Completion
- Amazon: CodeWhisperer.
- Google: Duet AI (potential name change).
- GitHub: Copilot (uses GPT behind the scenes).
- Developers should use these tools to improve productivity but must conduct due diligence for security vulnerabilities using SAST and DAST.
Azure OpenAI Architecture Stack
- LLM application accessible to end-users via API calls.
- Orchestrator (e.g., LangChain or Microsoft's Semantic Kernel SDK) manages API calls.
- Semantic Kernel:
- Supports .NET platform (suited for Microsoft shops).
- Complementary to LangChain but lacks some connectors.
- Azure Cognitive Search:
- Indexing database for vector data from Azure Blob.
- Cosmos DB for NoSQL index.
- Requires Azure OpenAI for embedding before indexing.
- Offers similarity search for querying.
*Typical model requires prompt from the user and query from knowledge to generate response.
- Azure Cognitive Search indexes internal data and supports text and image embeddings.
- LangChain (open source) is supported.
Open Source Stack (OPL Stack)
- O: Open-source LLMs.
- P: Pinecone (vector database).
- L: LangChain (orchestration).
- Alternative to Pinecone is Chroma (supports on-premise deployment).
- OpenAI can be selected depending on business requirement. GPT-4 is more expensive compared to GPT 3.5.
- LangChain manages prompts, indexing, and memory, and supports agent functionalities (reasoning and acting).
*It is good to use Open AI for the "O", but it can also be any LLM from HuggingFace.
Hugging Face
- Hosts thousands of models.
- Requires security vetting.
Four-Panel Step Breakdown (Silicon to Application)
- Hardware (Silicon Level):
- NVIDIA (GPU provider).
- AMD (competing with NVIDIA).
- Amazon, Microsoft, Cerebras (working on ML processing chips).
- Google (TPU).
- Cloud:
- Azure, AWS, Google Cloud.
- Foundation Models (LLMs):
- Proprietary: OpenAI, Cohere, Anthropic, Google, Microsoft.
- Open Source: Hugging Face, GitHub (Stable Diffusion, LLaMA, Bloom, GLM, Falcon).
- Tooling and Applications:
- FMOps for deploying and training models.
- Tools for building applications, evaluation, orchestration, and connecting to data sources and APIs.
- Applications: ChatGPT, Jasper, GitHub Copilot, and various AI tools.