Soma Agent

January 15, 2026 · Allan Karanja

Searching through company documentation is hard. The information we want is usually scattered around, rarely in one place. The only tool we’ve had up to now was search. Now in the age of AI, it’s surprisingly easy to build a simple agent that surfaces what you need. What if instead the flow was:

  1. You ask a question
  2. The agent reasons about it, then uses tools like listing, reading and indexing files
  3. The agent reads relevant ones
  4. It synthesizes an answer

The agent loops until it has enough context to respond. This nested loop is what power most agents today:

main.py handles user input and passes it to the agent

agent.py runs the ReAct loop: receive text, call tools if needed, return text

    while True:
        response = self.llm.chat(self.messages, self.tools)
                            
        if response.stop_reason == "tool_use":
            for block in response.content:
                if block.type == "tool_use":
                    self._execute_tool(block.name, block.input)
            ...
        else:
            return response.text

How does the agent know which tool to use? Claude is the best model I’ve found for this, it just gets tool use. We index folders using a lightweight embedding model (all-MiniLM-L6-v2, ~90MB). Then the agent can search semantically or read files directly.

Why this works

Pure RAG has a limitation: user intent doesn’t map cleanly to vector similarity. “How do I deploy?” might match docs about Docker, CI/CD, and environments. Vector search returns close chunks but can’t reason about which ones answer your question.

The agent reads the semantic search results, decides if it needs more context, reads specific files, and synthesizes.

I wrote about building one of the first RAG systems back in 2023. The tiny context windows, the poor translation of a user query to intent in semantic search are mostly solved by this approach. Context windows grew from 4K to 200K tokens and model tool use improved. The agent approach sidesteps the remaining limitations by letting the model reason about what to read next.

From here you could add more tools (grep, web search), a web UI, or package it as a CLI. The core stays the same: let the agent figure out what to read. You just ask.

Check out the code on GitHub

Related Articles