By now, we've all been interacting with Large Language Models (LLMs) on a near-daily basis. It could be in a direct manner with the many options such as ChatGPT, Google's Gemini, or Claude. Or it could be in an indirect manner, as with Google’s AI Overview embedded into many search engine results pages.
As to how you use LLMs to support your business objectives, the options are many – with new ripples appearing every day. Model Context Protocol recently made headlines, as did Google Agent2Agent Protocol. This article explores the rapid yet natural evolution of LLMs, so that your business has all the information it needs to choose the right approach.
In November 2022, when ChatGPT was first unveiled to the public, its model was trained on data up to a knowledge cutoff of September 2021. Consequently, the original ChatGPT lacked information or awareness of any events or developments occurring after that date.
This September 2021 cutoff applied uniformly to both GPT-3.5 and the initial iterations of GPT-4, which powered ChatGPT at its launch. While OpenAI subsequently released updated versions with more recent knowledge cutoffs, the inaugural public release was constrained to limited information.
Or course, the people using ChatGPT wanted more. This technical limitation sparked a significant rethinking of how we keep AI systems in sync with the current world. A few key developments embody this profound evolution.
RAG is the process through which an LLM combines its pre-trained knowledge with real-time or up-to-date information retrieved from external sources.
Here’s how it works:
1. (R)etrieval: When you ask a question, the system first searches (or "retrieves") relevant information from external databases, documents, or the internet. This could be anything from recent news articles to specific datasets.
2. (A)ugmentation: The retrieved information is then fed into the LLM alongside your original query. This gives the model the latest context it needs to provide an accurate and relevant answer.
3. (G)eneration: Finally, the LLM uses both its pre-trained knowledge and the newly retrieved information to craft a suitable response.
Various frameworks have been developed to streamline RAG implementations, and while the space is quickly evolving, two leading systems for building RAG systems have emerged:
While these two frameworks are the most feature-rich, there are countless others which take a more specialized approach to solving RAG. Some notable mentions are dsRag, which focuses on dense information-rich reports, and DeepEval, a framework for evaluating and testing LLM/RAG output with new capabilities added regularly.
While RAG is great for tackling the knowledge cutoff problem, it’s not always enough when you need to pull information from multiple sources at once. Imagine you have a database and an API server – both packed with useful info for your users. How do you get your LLM to tap into both seamlessly?
The answer is Model Context Protocol (MCP). Think of MCP as the gateway which enables LLMs to selectively access and combine information from various sources in a structured way. MCP can also take actions – it can interact and update – beyond retrieval.
With MCP, your LLM doesn’t have to rely on just one source; it can pull from multiple places and run actions and updates on those sources. This ensures the most comprehensive and up-to-date answers. It’s a natural next step beyond RAG to connect your LLMs with business knowledge.
While traditional LLM interactions are reactive (you ask, it responds), agents are proactive – they can break down complex tasks, make decisions, and execute multi-step workflows without constant human guidance. Think of agents as having an internal feedback loop over each step, allowing them to autonomously execute actions.
As with RAG, several frameworks and SDKs have appeared that help with the implementation of agentic systems. Each has its strengths and weaknesses.
A few of the frameworks we recommend trying out are:
The above are code solutions, but there have been advancements in the low-code, no-code movement to integrate agents seamlessly into workflows. Some of these no-code solutions are:
The progression is clear:
Each step builds upon the last, creating increasingly capable AI systems that don't just understand your business but actively participate in running it.
While the LLM space is evolving at a breakneck pace, we can speculate on a few trends that we believe have staying power.
Similar tools and best practices will continue to gain traction and evolve to offer real-time risk mitigation, adaptive privacy controls, and explainability mechanisms for regulated industries.
Count on change to remain the constant for LLMs. The evolution is happening quickly, so quickly that it can be difficult for businesses to keep up. A recent article from The Economist found that 42% of companies are abandoning most of their Generative AI projects – up from 17% just last year.
Transcenda can help your team tap into these powerful tools and keep your AI projects on track, elevating your business and gaining a competitive advantage in your sector. Connect with us.