How to Integrate S.A.M.I. with Your Claude or Custom LLM Setup
S.A.M.I. exposes an integration server that any Model Context Protocol client can attach to. Here is a practical walkthrough of the connection flow, available tools, authentication, and the design choices we made.
The SAMI Team
Engineering
The Model Context Protocol is an open standard for connecting LLMs to external data sources and tools in a host-agnostic way. S.A.M.I. exposes an integration server that speaks this protocol, which means any compatible client — Claude Desktop, Cursor, Zed, or a custom agent you build — can attach to S.A.M.I. and use it as a tool provider.
This post walks through the full connection setup, the available tools, authentication, and some design decisions that are worth understanding before you build on top of this surface.
What the Integration Server Exposes
S.A.M.I.'s integration server provides three categories of tools:
- Agent execution tools. Submit a task to the S.A.M.I. multi-agent orchestrator and receive structured output. You can specify the consensus strategy, which roles to involve, and whether to require static analysis.
- Knowledge retrieval tools. Query the user's RAG index — documents, code, or external knowledge bases that have been ingested through the S.A.M.I. desktop app. Useful for giving the host LLM access to project-specific context without stuffing it into the system prompt.
- Platform management tools. Read plan status, remaining quota, saved prompts, and agent run history. Mostly useful for tooling that needs to track usage programmatically.
Authentication
The integration server sits behind the same API gateway as the rest of S.A.M.I.'s backend. Authentication uses a scoped integration token generated by an administrator from the admin panel — not a self-service user key. This keeps the integration surface audited and revocable without touching user accounts.
The token is passed as a Bearer token in the Authorization header. Each external system should have its own token so a single revocation does not affect other integrations. Tokens can be given an expiry and are listed with their last-used timestamp in the admin panel.
A public packaged integration client is planned but not yet released. If you are building on this surface today, reach out to support@sami-agent.com — we can guide you through the current connection flow directly.
A Concrete Integration: Retrieval-Augmented Code Review
Here is a practical workflow: you have a codebase indexed in S.A.M.I., and you want to build a custom code review agent that uses your own LLM but pulls relevant context from the project's RAG index before generating review comments.
Step one: index your project. Open the S.A.M.I. desktop app, point it at your project directory, and trigger an index run. Documents are chunked, embedded, and stored in the project vector index. This is a one-time setup, with incremental updates as files change.
Step two: in your agent, call the sami_retrieve tool with the code diff or function you want to review as the query. The tool returns the top-k most semantically similar chunks from your codebase — related functions, similar patterns, existing tests, documentation snippets. You inject these chunks into the context window of your LLM before it generates the review.
Step three: optionally, send the LLM's draft review to the sami_agent_run tool with a verification task — asking S.A.M.I.'s reviewer and security roles to check the review itself for false positives or missed issues. This is meta-consensus: your LLM writes the review, S.A.M.I. validates it.
The whole pipeline can be orchestrated in roughly 50 lines of Python using the anthropic SDK with tool use, or in any other agent framework that speaks the standard.
Rate Limits and Quota
Tool calls consume quota from your S.A.M.I. plan. Agent execution calls are metered by token usage across all the models involved in the consensus run. Retrieval calls are cheaper — they are read-only queries against the vector index with no LLM inference.
The sami_quota tool lets you check remaining quota programmatically so your agent can back off gracefully rather than hitting a 429 mid-workflow. We recommend checking quota at the start of any long-running pipeline.
The server enforces rate limits per API key, not per IP. If you are running multiple agents against the same key, their rate limits are pooled. For high-throughput pipelines on the Team or Enterprise plan, contact support for adjusted limits.
Design Choices Worth Knowing
Stateless tool calls. Each tool call is stateless — the server does not maintain a session between calls. If you want continuity across calls (e.g., a multi-turn conversation with an agent), you must pass the prior context in each request. This is deliberate: stateless calls are easier to retry, cache, and scale horizontally.
Structured output only. All tool responses are JSON with a defined schema. We do not return raw markdown or prose from tool calls. This makes the responses easier to parse and embed into downstream prompts without additional cleaning.
No streaming on tool calls. Streaming is available on the chat API but not on tool calls. Agent runs that take longer than 30 seconds return a run ID; you poll the sami_run_status tool to retrieve the result when it is ready. This avoids HTTP timeouts on long consensus runs.
Own provider credentials on Enterprise only. If Enterprise customers bring their own provider credentials, those are used only inside S.A.M.I.'s managed infrastructure — they are never returned through the integration interface. The server only surfaces tool results, not credentials.
Getting Started
The full integration reference — tools, parameters, and response schemas — is available once your plan grants integration access. Integration tokens are issued by an administrator from the admin panel. If you build something interesting on top of this surface, or want early access, we would like to hear about it: support@sami-agent.com.