3.1 Choosing the Right Tech Stack for LLM applications

Majority of users prefer cloud providers like Azure, Google, and Amazon.
Azure AI Studio: Most mature enterprise-grade AI offering.
Google: Strong competitor to Azure with a complete tool offering.
Amazon Bedrock: Up-and-coming, leveraging SageMaker functionalities; not yet generally available (GA).

LangChain:
- Built-in orchestration for calling different models.
- Memory management.
- Task orchestration.
Auto-GPT:
- Built on top of LangChain.
- Aims for agent-based functionality.
- Mimics human-like thinking and reasoning (Research paper called RUAG).
- Breaks down tasks into subtasks (e.g., researching generative AI in cybersecurity).
- Early stages but promising.

Pinecone:
- Stores embeddings.
- Also referred to as embedding database or vector embedding database.
Chroma:
- Alternative to Pinecone.
Llama Index:
- Connects to various data sources (PDF, Reddit, Wiki).
- Indexes data to avoid building custom connectors.

Hosts open-source large language models.
Hugging Face:
- Major platform for hosting LLMs.
- GitHub is another option.

Microsoft emphasizes AI as a survival strategy, heavily investing in it.
Allows building and training custom models.
Users can browse foundation models and train using Azure's GPUs.
Azure OpenAI service:
- Can be "grounded" to reduce hallucination.
- Built-in vector index.
- Text embedding API.
- Image embedding API (different algorithm due to pixel-based nature).
Retrieval Augmented Generation (RAG):
- Uses vector database as context/memory.
- Retrieves information and uses LLM to augment response.
- Organizes information and generates a response.
Facilitates prompt workflow management and collaboration.
Meets security compliance requirements with built-in safety features.

AWS:
- First infrastructure service cloud with a large market share.
- Trailing in the AI space compared to Google and Azure.
- Risk of being permanently delayed if they don't catch up within a year.
Vendor Locking:
- Users tend to stay with a platform once they've integrated models and data.
- Consider open-source options to avoid vendor lock-in.

Subcategories include runtime.
Amazon Bedrock: (potentially available by the time of viewing).
- Lacks text generation capabilities. (At the time of recording)
Microsoft: Using OpenAI's GPT.
Google: Using PaLM.
Code Generation:
- Google: Kodi.
- Microsoft: GPT.
Image Generation:
- Google: Imagen.
- Microsoft: DALL-E (also used by OpenAI).
Translation:
- Google offers an API.
- Microsoft and Amazon need to catch up in the general AI space.

*Every cloud provider has solution called PG Vector.

Amazon: CodeWhisperer.
Google: Duet AI (potential name change).
GitHub: Copilot (uses GPT behind the scenes).
Developers should use these tools to improve productivity but must conduct due diligence for security vulnerabilities using SAST and DAST.

LLM application accessible to end-users via API calls.
Orchestrator (e.g., LangChain or Microsoft's Semantic Kernel SDK) manages API calls.
Semantic Kernel:
- Supports .NET platform (suited for Microsoft shops).
- Complementary to LangChain but lacks some connectors.
- Azure Cognitive Search:
  - Indexing database for vector data from Azure Blob.
  - Cosmos DB for NoSQL index.
  - Requires Azure OpenAI for embedding before indexing.
  - Offers similarity search for querying.
    *Typical model requires prompt from the user and query from knowledge to generate response.
Azure Cognitive Search indexes internal data and supports text and image embeddings.
LangChain (open source) is supported.

O: Open-source LLMs.
P: Pinecone (vector database).
L: LangChain (orchestration).
Alternative to Pinecone is Chroma (supports on-premise deployment).
OpenAI can be selected depending on business requirement. GPT-4 is more expensive compared to GPT 3.5.
LangChain manages prompts, indexing, and memory, and supports agent functionalities (reasoning and acting).
*It is good to use Open AI for the "O", but it can also be any LLM from HuggingFace.

Hardware (Silicon Level):
- NVIDIA (GPU provider).
- AMD (competing with NVIDIA).
- Amazon, Microsoft, Cerebras (working on ML processing chips).
- Google (TPU).
Cloud:
- Azure, AWS, Google Cloud.
Foundation Models (LLMs):
- Proprietary: OpenAI, Cohere, Anthropic, Google, Microsoft.
- Open Source: Hugging Face, GitHub (Stable Diffusion, LLaMA, Bloom, GLM, Falcon).
Tooling and Applications:
- FMOps for deploying and training models.
- Tools for building applications, evaluation, orchestration, and connecting to data sources and APIs.
- Applications: ChatGPT, Jasper, GitHub Copilot, and various AI tools.