Insights /

How RAG, protocols, and agents are evolving LLMs

How RAG, protocols, and agents are evolving LLMs

Robert Balaban
Lead Data Engineer

By now, we've all been interacting with Large Language Models (LLMs) on a near-daily basis. It could be in a direct manner with the many options such as ChatGPT, Google's Gemini, or Claude. Or it could be in an indirect manner, as with Google’s AI Overview embedded into many search engine results pages. 

As to how you use LLMs to support your business objectives, the options are many – with new ripples appearing every day. Model Context Protocol recently made headlines, as did Google Agent2Agent Protocol. This article explores the rapid yet natural evolution of LLMs, so that your business has all the information it needs to choose the right approach.

Constraints leading to innovation

In November 2022, when ChatGPT was first unveiled to the public, its model was trained on data up to a knowledge cutoff of September 2021. Consequently, the original ChatGPT lacked information or awareness of any events or developments occurring after that date.

This September 2021 cutoff applied uniformly to both GPT-3.5 and the initial iterations of GPT-4, which powered ChatGPT at its launch. While OpenAI subsequently released updated versions with more recent knowledge cutoffs, the inaugural public release was constrained to limited information.

Or course, the people using ChatGPT wanted more. This technical limitation sparked a significant rethinking of how we keep AI systems in sync with the current world. A few key developments embody this profound evolution.

Retrieval-Augmented Generation (RAG) 

RAG is the process through which an LLM combines its pre-trained knowledge with real-time or up-to-date information retrieved from external sources.

Here’s how it works:

1. (R)etrieval: When you ask a question, the system first searches (or "retrieves") relevant information from external databases, documents, or the internet. This could be anything from recent news articles to specific datasets.

2. (A)ugmentation: The retrieved information is then fed into the LLM alongside your original query. This gives the model the latest context it needs to provide an accurate and relevant answer.

3. (G)eneration: Finally, the LLM uses both its pre-trained knowledge and the newly retrieved information to craft a suitable response. 

Various frameworks have been developed to streamline RAG implementations, and while the space is quickly evolving, two leading systems for building RAG systems have emerged:

While these two frameworks are the most feature-rich, there are countless others which take a more specialized approach to solving RAG. Some notable mentions are dsRag, which focuses on dense information-rich reports, and DeepEval, a framework for evaluating and testing LLM/RAG output with new capabilities added regularly.

Evolution beyond RAG

While RAG is great for tackling the knowledge cutoff problem, it’s not always enough when you need to pull information from multiple sources at once. Imagine you have a database and an API server – both packed with useful info for your users. How do you get your LLM to tap into both seamlessly?

The answer is Model Context Protocol (MCP). Think of MCP as the gateway which enables LLMs to selectively access and combine information from various sources in a structured way. MCP can also take actions – it can interact and update – beyond retrieval. 

With MCP, your LLM doesn’t have to rely on just one source; it can pull from multiple places and run actions and updates on those sources. This ensures the most comprehensive and up-to-date answers. It’s a natural next step beyond RAG to connect your LLMs with business knowledge.

Advantages of MCP over RAG

The new agency paradigm

While traditional LLM interactions are reactive (you ask, it responds), agents are proactive – they can break down complex tasks, make decisions, and execute multi-step workflows without constant human guidance. Think of agents as having an internal feedback loop over each step, allowing them to autonomously execute actions.

As with RAG, several frameworks and SDKs have appeared that help with the implementation of agentic systems. Each has its strengths and weaknesses. 

A few of the frameworks we recommend trying out are:

Framework/SDK Description & strengths Language/integration
LangChain Open-source; modular; excels in LLM integration, context/memory, and external tool orchestration. Python (primary)
LangGraph Graph-based; extends LangChain for stateful, multi-agent workflows; advanced error recovery, debugging. Python, built on LangChain
CrewAI Orchestrates teams of specialized agents; supports both no-code and code-first deployments. Python, LangChain ecosystem
OpenAI Agents SDK Comprehensive tools for building and deploying AI agents with native GPT integration; streamlined API access and model fine-tuning capabilities. Python
Microsoft AutoGen Multi-agent conversations and collaborative problem-solving through configurable agent roles; both automated workflows and human-in-the-loop interactions. Python
Google Agent Development Kit (ADK) Enables creation of intelligent agents with access to Google AI models and services; scalable deployment and integration with Google Cloud infrastructure. Python, Google Cloud

The above are code solutions, but there have been advancements in the low-code, no-code movement to integrate agents seamlessly into workflows. Some of these no-code solutions are: 

The progression is clear: 

A four-step diagram titled “Evolution of AI Systems” that illustrates the progression of AI capabilities:  Static Knowledge – AI systems relying on pre-existing data with no real-time updates.  Dynamic Retrieval (RAG) – AI retrieving and leveraging real-time information to support better decision-making.  Connected Systems (MCP) – AI integrating with various business systems to streamline operations and enable seamless connectivity.  Autonomous Agents – AI making independent decisions and executing tasks without human intervention.  Each step is visually represented as a connected node, showing the increasing sophistication and autonomy of AI as it evolves.

Each step builds upon the last, creating increasingly capable AI systems that don't just understand your business but actively participate in running it.

Future trends that will influence LLMs

While the LLM space is evolving at a breakneck pace, we can speculate on a few trends that we believe have staying power. 

  1. Smarter RAG: Advanced architectures and techniques to improve the accuracy of RAG, such as Retrieval with Feedback Loop, Iterative Retrieval, and Semantic Chunking. These innovations will help address fundamental limitations of traditional RAG, such as context loss, poor relevance, and insufficient handling of multi-step queries.
  2. Testing and evaluation: Expect to see a surge in frameworks like DeepEval that will go beyond static benchmarking and enable dynamic, continuous evaluation for production-ready AI. 
  3. Guardrailing RAG:  As RAG systems mature, guardrails will become foundational. While deploying such systems to production, data privacy and leakage are at the top of the list when it comes to the reasons companies hesitate about implementing such systems. Already, there are techniques and tools to mitigate some of those risks, such as: 
    1. Microsoft Presidio: helps with masking and anonymizing PII data and more.
    2. Guardrails AI: a comprehensive suite of tools that helps with data leak prevention, hallucination, and more.

Similar tools and best practices will continue to gain traction and evolve to offer real-time risk mitigation, adaptive privacy controls, and explainability mechanisms for regulated industries.

  1. Enhanced communication protocols: Protocols like COMPASS, FIPA, and Google’s Agent2Agent Protocol will evolve into orchestration backbones for multi-agent systems, enabling autonomous collaboration at scale with minimal human intervention.
  2. Bias mitigation frameworks: These aim to identify and reduce biases in AI systems to ensure equitable outcomes. New approaches will develop and embed fairness and compliance directly into model training and inference pipelines, making it easier to build responsible AI by default.

We keep up with the latest 

Count on change to remain the constant for LLMs. The evolution is happening quickly, so quickly that it can be difficult for businesses to keep up. A recent article from The Economist found that 42% of companies are abandoning most of their Generative AI projects – up from 17% just last year.

Transcenda can help your team tap into these powerful tools and keep your AI projects on track, elevating your business and gaining a competitive advantage in your sector. Connect with us.

Robert Balaban is a Lead Data Engineer at Transcenda. With over 10 years of experience working across various startups and Fortune 500 companies, Robert's focus is transforming raw information into actionable insights. He specializes in real-time data architectures, data warehouse and lake-house design.

Subscribe to receive the latest industry insights and news from Transcenda

Related articles:

Subscribe to Transcenda's newsletter

Receive exclusive, expert-driven engineering & design insights to elevate your projects to the next level.