AI Platforms and Tools
Last updated: 2026-04-06
Quick answer: Most production agent work sits on a small set of model APIs, plus optional orchestration, retrieval, and observability layers—chosen for governance, latency, and cost, not hype.
Definition
In this KB, AI platforms and tools means the products and layers teams use to run LLM-based systems: hosted or self-hosted models, APIs that expose them, assistant or IDE copilots for humans, frameworks that wire prompts, tools, and memory, retrieval and vector stores for grounding, and MLOps or observability when models are trained, fine-tuned, or heavily measured. The list below is representative, not exhaustive; names change, categories persist.
- Model and cloud APIs: Providers that expose chat, completion, embeddings, and sometimes image or audio through HTTP APIs and SDKs (e.g. OpenAI, Anthropic, Google Gemini, Azure OpenAI-style offerings).
- End-user assistants: Chat UIs and copilots (e.g. ChatGPT, Claude, Microsoft Copilot, Gemini) used for research, drafting, and coding assistance.
- Agent and RAG frameworks: Libraries that structure tool calling, graphs, and retrieval (e.g. LangChain, LangGraph, LlamaIndex, Semantic Kernel).
- Retrieval and vectors: Embedding indexes and databases (e.g. Pinecone, Weaviate, Qdrant, or pgvector in PostgreSQL).
- Open-weight and local inference: Models you host yourself or via third-party GPU hosts (e.g. Llama, Mistral, Qwen families) when latency, cost, or data residency matters.
- Hubs and lifecycle: Model registries, datasets, and experiment tracking (e.g. Hugging Face, Weights & Biases, MLflow) when training or fine-tuning is in scope.
- Tool protocols: Standard ways to attach capabilities to agents (e.g. MCP alongside direct REST integrations).
Why it matters
Swarm and agent designs inherit constraints from whatever sits underneath: rate limits, context windows, logging, regional deployment, and who can revoke a key. Picking categories deliberately avoids painting yourself into a corner when you add tool boundaries, MCP, or stricter approval flows.
When to use
Use a hosted model API when you need speed to market and acceptable data-handling terms. Add retrieval when answers must cite private docs. Add an orchestration framework when workflows branch, call many tools, or need repeatable patterns. Consider open-weight or dedicated hosting when cost at scale, offline use, or strict residency applies.
When not to use
Skip heavy stacks for a one-off script or a single prompt with no tools. Avoid bolting on vector search before you have a clear chunking, permission, and evaluation story—otherwise you add latency and cost without reliability.
Failure modes
Treating “the model” as the whole system: weak approval and intent boundaries, unscoped tools, and no tracing when something goes wrong in production. Chasing every new tool category instead of nailing observability, evals, and rollback.
Related pages
Categories · LLMs in agentic systems · MCP vs direct API integration · Engineering code review swarm · Tool boundaries and execution · Design your first swarm