From Models to Systems: The Next Evolution of AI Lies in Vertical Agentic Pipelines

The Myth of the Prompt
The Rise of Vertical Agentic Systems
Context Engineering: The New Data Engineering
Pipelines, Not Prompts
Humans in the Loop: From Oversight to Co-Creation
Defining the Future Pipeline
Conclusion: Augmentation, Not Replacement

In the early days of machine learning, progress was primarily measured by the performance of individual models on benchmarks. Achievements included classifiers that surpassed state-of-the-art results and regression models that improved R² scores—these were seen as significant milestones. However, as the field evolved, practitioners came to a crucial realization: real-world problems are not solved by single models but by systems composed of multiple models, data pipelines, and feedback loops.

This insight led to the development of ML pipelines—coordinated sequences that involve data ingestion, feature engineering, model training, validation, deployment, and monitoring. The focus shifted from simply finding "the best model" to designing a comprehensive system that effectively connects data, models, and people to create value.

Now, as we enter the era of Large Language Models (LLMs) and agentic AI, we see history repeating itself.

The Myth of the Prompt

The enthusiasm surrounding LLMs has been nothing short of remarkable — and justifiably so. Today, with a single well-crafted prompt, we can generate intricate code, summarize lengthy reports, draft persuasive marketing copy, or engage in complex strategic reasoning. However, beneath this surface excitement lies a crucial reality: a single prompt issued to an LLM does not equate to a fully functional system.

While prompts have the capacity to simulate intelligence, they fall short in providing sustained, long-term value. A prompt can inspire creativity and ignite new ideas, but it cannot execute comprehensive end-to-end tasks with the same effectiveness. Similarly, while a single prompt can initiate a workflow, it cannot manage or sustain that workflow through its entirety.

The past decade of progress in machine learning has imparted a vital lesson: scaling machine learning models is not synonymous with scaling practical solutions. Simply running a sophisticated model that employs clever prompting techniques will never obviate the necessity for crafting thoughtfully designed, domain-specific, multi-model systems. Such systems must seamlessly integrate critical elements, including business context, relevant data, and essential human oversight, to drive meaningful outcomes and adapt to real-world complexities.

In essence, while LLMs represent a significant leap forward in artificial intelligence capabilities, leveraging them effectively requires a more holistic approach that acknowledges the importance of structured systems and human involvement in executing complex tasks.

The Rise of Vertical Agentic Systems

The next frontier of AI lies in what I call Vertical Agentic Systems—purpose-built AI pipelines that combine multiple specialized models, domain-specific data, and human expertise to address specific business challenges.

These systems are considered vertical because they focus on a single domain, such as energy disaggregation, financial analysis, legal research, or healthcare operations. They are agentic because they do more than process data; they take actions, make recommendations, and interact with humans and other systems in real time.

Unlike generic chatbots or prompt-based assistants, vertical agentic systems feature:

Context Engineering Pipelines: Similar to data engineering in machine learning, these pipelines collect, structure, and deliver the right information to models in the appropriate format, ensuring the quality of context rather than just data quality.
Specialized Models: Not every task requires a large language model with 70 billion parameters. Smaller models, trained for specific subtasks such as classification, prediction, entity recognition, and retrieval, can operate efficiently and accurately.
Business-Oriented Logic: The pipeline must incorporate domain workflows, decision trees, and constraints to ensure that the system produces usable outputs, rather than just elegant answers.
Human-in-the-Loop Feedback: No AI system should operate in isolation. Humans should not just serve as failsafes but act as co-pilots, providing domain insights, feedback, and ethical considerations.

This approach transforms AI from being an oracle into an organism—an integrated system that learns, adapts, and collaborates.

Context Engineering: The New Data Engineering

Data engineering focuses primarily on the processes of collecting and cleaning information to ensure that raw data is usable and reliable. In contrast, context engineering delves deeper into curating and structuring knowledge to enhance the comprehensibility and applicability of that data.

In an agentic system, where autonomous agents interact and operate, the quality of the contextual framework significantly influences the overall quality of intelligence exhibited by the system. This context layer serves as the foundational knowledge base, informing the model about what it knows, what assumptions it makes, and what inferences it is capable of drawing. Effective context engineering encompasses the development of sophisticated retrieval pipelines that allow for the efficient extraction of relevant data, knowledge graphs that visualize and interconnect complex information, memory stores that retain critical insights, and business-rule embeddings that encapsulate organizational knowledge. Together, these components ensure that models are consistently supplied with precise, relevant, and up-to-date information, thereby enabling them to make informed decisions.

As data engineering laid the groundwork for scalable machine learning, the emergence of context engineering is set to revolutionize the landscape for scalable reasoning systems. By enhancing the quality and accessibility of contextual knowledge, context engineering not only improves the performance of models but also enables them to reason more effectively, leading to deeper insights and more accurate outcomes. This shift paves the way for more sophisticated applications, ultimately broadening the horizons of technology in various fields.

Pipelines, Not Prompts

Imagine a customer support automation pipeline. Instead of relying on a single LLM to handle everything, a well-designed system could involve several specialized components:

A classifier to identify customer intent (such as billing, technical issues, or product feedback).
A retrieval model to fetch relevant documents or past tickets.
A specialized summarizer to condense lengthy conversations.
A policy model to ensure compliance and manage escalation procedures.
Finally, an LLM-based agent to craft a human-like, empathetic response.

Each of these models is smaller, faster, and optimized for its specific subtask, allowing them to work together more effectively than a monolithic LLM prompt. This approach enhances both reliability and cost efficiency. Most importantly, a support agent remains involved in the process, reviewing and refining the system's suggestions to ensure accuracy.

This design is not automation aimed at replacement; it's automation designed for amplification.

Humans in the Loop: From Oversight to Co-Creation

The role of humans in AI systems must transition from a passive "monitor" to an active "collaborator." In the context of vertical agentic systems, where specialized knowledge is paramount, domain experts assume a vital role in several key areas:

Firstly, they are responsible for defining the system's ontology, which encompasses the fundamental concepts that are deemed significant within the system. This process involves clarifying how decisions are generated, what metrics are used to evaluate outcomes, and identifying the trade-offs that must be navigated in various scenarios. By establishing a comprehensive ontology, experts ensure that the AI operates within the relevant parameters that reflect real-world complexities.

Secondly, domain experts shape and refine the feedback loop that is essential for the system's performance. This process involves tasks such as accurately labeling data, scoring the outputs generated by the AI, and continuously refining the decision-making rules based on observed outcomes. Their expertise allows for the identification of anomalies or patterns that the system may overlook, ensuring that the AI remains aligned with the intended goals and expectations.

Additionally, these experts play a critical role in guiding the continuous learning process of the AI. As business demands evolve and new challenges arise, human collaborators help the AI adapt by integrating fresh insights and perspectives. This iterative process not only enhances the robustness of the AI's capabilities but also ensures that the system remains relevant and effective in a dynamic environment.

This human-in-the-loop structure is vital not only for safeguarding the accuracy and ethical considerations of AI systems, but it also actively fosters innovation. By aligning the system's intelligence with human expertise, we can unlock new possibilities, enhance decision-making, and create AI technologies that are not only powerful but also socially responsible and attuned to real-world needs.

Defining the Future Pipeline

Building a vertical agentic presents a complex design challenge that necessitates careful consideration of the boundaries between human input and machine functioning, as well as the interplay between various models and the overarching business logic they support. It is essential to strike the right balance between automation, where tasks are fully delegated to machines, and augmentation, where human capabilities are enhanced through technology.

The future of artificial intelligence extends beyond the development of general-purpose models that aspire to encompass a wide array of functions. Instead, it focuses on the strategic architecture of specialized pipelines that seamlessly integrate multiple elements to achieve superior outcomes. This approach hinges on four key components:

Specialization: By employing narrowly defined models that excel at specific tasks, businesses can ensure greater accuracy and efficiency. These specialized models bring expertise to particular domains, enhancing the quality of outputs tailored for unique applications.
Integration: This involves the coordinated orchestration of different components within the system, ensuring that data flows smoothly and that each part contributes to a cohesive whole. Effective integration is crucial for maximizing the capabilities of the individual models and fostering collaborative problem-solving.
Feedback: Continuous improvement is facilitated through ongoing human interaction and feedback loops. This collaboration ensures that the AI systems evolve based on real-world applications and experiences, allowing for refinement and enhancement over time. Feedback mechanisms can help identify areas for improvement and adapt to changing conditions or user needs.
Contextualization: It's critical to ground AI applications in real-world data and knowledge, which provides the necessary context for decision-making. By contextualizing the outputs of AI systems, organizations can ensure that their responses are relevant and applicable, resulting in more meaningful interactions and outcomes.

Ultimately, this framework shifts the focus from merely assessing "what the AI can do" to evaluating "what the AI system delivers." By prioritizing system performance over individual model performance, organizations can create more robust solutions that consistently meet business objectives and enhance overall operational effectiveness.

Conclusion: Augmentation, Not Replacement

We find ourselves at a crossroads similar to the one faced by machine learning a decade ago—a time when researchers focused on improving model performance. At the same time, practitioners quietly developed systems that addressed real-world problems.

The lesson remains relevant: progress does not come from simply increasing intelligence; it comes from designing systems that effectively integrate that intelligence.

Vertical agentic systems, which are grounded in context, powered by specialized models, and guided by human expertise, are not just the next step; they are essential for a sustainable and effective future in AI.

The objective is not to replace human skills but to augment them—creating systems that enhance human capabilities instead of diminishing them.

Ultimately, the most potent form of intelligence is not artificial; it is collaborative.