Early Days of RAG

February 16, 2023 · Allan Karanja

Building the first RAG systems

A few months after ChatGPT was released, as I was experimenting with it. I stumbled on this tool that would help you read documents. You would highlight a piece of text, and it would get you an explanation using context from the same document. It was powered on GPT-3. It was ahead of the curve and extremely impressive, at the time, late 2022 or early 2023, so much so that I had to do it myself. This is how I ended up creating one of the first RAG systems.

I wanted a tool that could allow you to have a Q&A with any PDF or document you have. The first problem is how to fit the document into ChatGPT’s(v3.5) tiny context window (4096 tokens). The answer was to create token embeddings. They convert the text into a list of numbers that LLMs understand. These numbers are then stored in a vector database. This is key as you can now have semantic search(search by meaning) over the uploaded document and get the relevant context.

When a query comes in, what should have happened it is the LLM would have extracted the keywords, then fed them to the database. This is to improve the performance as longer queries tended to have too much noise because they were in the form: What are the key points mentioned? Summarize the document etc. In practice, doubling the number of API calls to marginally improve performance didn’t make design, cost and perfomance wise. The search would return top-k matches and these would be input into the LLM as context. If the answer was not present, we attempted, unsuccessfully to ground the LLM with the following prompt:


prompt = f"Given a question, try to answer it using the content of the file extracts below, and if you cannot answer, or find " \
            f"a relevant file, just output \"I couldn't find the answer to that question in your files.\".\n\n" \
            f"If the answer is not contained in the files or if there are no file extracts, respond with \"I couldn't find the answer " \
            f"to that question in your files.\" If the question is not actually a question, respond with \"That's not a valid question.\"\n\n" \
            f"In the cases where you can find the answer, first give the answer. Then explain how you found the answer from the source or sources, " \
            f"and use the exact filenames of the source files you mention. Do not make up the names of any other files other than those mentioned "\
            f"in the files context. Give the answer in markdown format." \
            f"Use the following format:\n\nQuestion: <question>\n\nFiles:\n<###\n\"filename 1\"\nfile text>\n<###\n\"filename 2\"\nfile text>...\n\n"\
            f"Answer: <answer or \"I couldn't find the answer to that question in your files\" or \"That's not a valid question.\">\n\n" \
            f"Question: \n\n" \
            f"Files:\n#\n" \
            f"Answer:"

There was a real opportunity to monetize and offer this as service due to the novelty. But there were huge gaps in being a fun tool and having it in production. Among the problems it suffered from was getting the wrong context as the semantic search wasn’t powerful enough. As time went by, this process was overcome by better technology as context length have grown and the AIs have become more agentic, they can plan and use tools.

Early Days of RAG

Building the first RAG systems

Related Articles

Building an AI QR Code Generator with ControlNet + StableDiffusion

PromptGuard

(A possible) Future of the web

Soma Agent