The Ever-Growing Power of Small Models

8 min read

Silvio Savarese

Recent AI media coverage has followed a familiar pattern: a massive new model is released, making the rounds with beta testers and eventually the public, but it’s barely a month or two before rumors start to swell about the even bigger one supposedly being trained to replace it. Yet another eye-watering boost in parameter count, more data than ever before, and, of course, promises of earth-shattering capabilities on the other side. It’s hard not to get caught up in the hype, even as an expert, and it’s easy to believe this is the reality of life on the cutting edge. Scale at any cost, and data wherever one can find it. In fact, this trend was initially motivated by the first scaling law paper from OpenAI, which was later modified by researchers from Deepmind, commonly known as the Chinchilla law. They're called large language models for a reason right?

Not so fast. A term like "large" is relative, after all, and for more and more applications—especially in the enterprise, where cost, control, and trust matter more than anywhere else—eye-watering parameter counts aren't as important as the hype and headlines would have you believe. In fact, for many of our customers, excessive scale sometimes does more harm than good.

It’s an insight we’re already applying here at Salesforce to great effect. When we released CodeGen in 2022, for example, it was one of the world’s first text-to-code models. Of course, all that power—and it takes a lot to translate natural language into code that executes at all, let alone reliably—came at a steep cost. The latest release, however, CodeGen 2.5, took advantage of training techniques like multi-epoch training and flash attention to create a result that competes with larger models at half the size. It’s part of our larger sustainable AI initiative, discussed in more detail here.

Allow me to start by dispelling the misconception that increasing parameter counts is the only, or even best, way to improve performance. While there’s no doubt that this can be a powerful technique—although warnings of diminishing returns have been circulating just as long, even if the pattern has held thus far—it’s important to remember that effective AI deployments come in countless forms, and parameter count is just one of many variables that determine how well they can solve problems in the real world. So let's talk about why thinking small may be your best bet at success in deploying enterprise AI.

Cost to Serve

The first issue worth addressing is, of course, the biggest hurdle in any enterprise AI application: cost to serve. AI is abnormally compute-intensive regardless of how it's deployed, and the relationship between model size and the expense is a clear one. As parameter counts grow, training and inference alike demand more silicon, more power, and more downstream costs like maintenance. To put this in perspective, consider that each individual parameter of the model—tiny computational units numbering in the millions and sometimes billions—adds another small but tangible compute cost (known as a floating-point operation), the sum of which is multiplied by every fragment of data input (known as a token), every time it’s run. In total, this is what makes even conceptually simple tasks like answering a question so expensive. It's a cost measured in speed as well, with training times ballooning and inference slowing. For organizations aiming to serve large communities of users—sometimes an entire planet of customers—these are significant drawbacks. The fact that smaller models can cut both cost and latency, sometimes significantly, makes them powerful alternatives.

Performance

Of course, cost savings don’t matter much if the resulting deployment can't offer competitive performance. But the assumption that smaller models must perform worse than their big siblings is, thankfully, simply wrong. First, it's important to understand that model performance doesn't exist along a single dimension; for instance, a model's ability to solve problems within a single domain—say, answering IT questions or providing customer service—is largely independent of its ability to generalize across multiple, unrelated domains smoothly. Small models can excel on the former, by focusing their depth on a smaller set of tasks, even if they're admittedly ill-equipped to compete on the latter—there's no substitute for hundreds of billions of parameters when you want to be everything to everyone, after all. But in the enterprise, this ability is almost entirely moot.

Consider the headlines that have captured the public's interest over the last couple years. Many of them concern the seemingly magical ability of extremely large models to answer questions on just about every topic imaginable, and even cross domains in single prompt, as in the ever-popular pastiche examples seen across social media: a question about plumbing answered in the style of Shakespeare, a summary of the War of 1812 rendered as if they were Jay-Z lyrics, and so on. They constitute a fun party trick, and have done wonders to popularize the power of AI. But they're an excess few enterprise users will ever need, where entertainment and novelty matters a lot less than productivity.

Conversely, for companies looking to build models focused on a well-defined domain, such as knowledge retrieval, technical support, and answering customer questions—small models are often neck and neck with large ones. In fact, with the right strategy, they can outperform them altogether. A number of models from the open source world, including our own XGen 7B—a model specifically trained on longer sequences of data, helping it with tasks like the summarization of large volumes of text, writing code, and predicting protein sequences—consistently exceed the performance of larger models by leveraging better pretraining and data curation strategies.

Fine-Tuning, Data Curation, and Ownership

Smaller language models present a compelling advantage in training and fine-tuning for specific tasks. Unlike their larger counterparts, these models require significantly less computational power and data to reach optimal performance. This reduced scale translates into a more streamlined and efficient training process, the ability to iterate and test faster, and the possibility of more extensive validation. Moreover, smaller models can be fine-tuned more effectively to specialize in particular domains or tasks. Their compact nature allows for a more focused learning process, enabling them to adapt quickly and accurately to the nuances of specific datasets or applications. This efficiency in training and fine-tuning not only saves time and resources but also results in a model that is more adept at handling targeted tasks, making them a practical choice for enterprises seeking specialized AI capabilities.

They can also encourage developers to focus on smaller, more curated datasets that describe unique problems in clear, understandable terms. Small models are inherently suited to smaller datasets, which makes the organization of such training material not just easier and more cost-effective to find, but considerably safer. Organizations can focus on the data they already know and trust, and, most importantly, own—helping avoid the numerous pitfalls of copyright, toxicity, and unpredictability that so often undercut a generative AI deployment’s reliability. And because these datasets are so tightly focused on a domain-specific task, they can train powerful, purpose-built models that do things no general-purpose alternative can touch.

Scaling in Other Ways

While we're on the topic of performance, I want to touch on orchestration, which is an issue I've grown more and more interested in over the last year. Orchestration refers to the connection of multiple models into a single deployment, analogous to multiple human workers coming together as a team. Even small models can do amazing things when composed with one another, especially when each is geared towards a specific strength that the others might lack: one model to focus on information retrieval, one to focus on user interactions, another to focus on the generation of content and reports, and so on. In fact, smaller models are arguably a more natural choice in such cases, as their specialized focus makes their role in the larger whole easier to define and validate. In other words, means small models can be combined to solve ever-bigger problems, all while retaining the virtues of their small size—each can still be cleanly trained, tuned, and understood with an ease large models can't touch. And it's yet another example of why a simple parameter count can often be misleading.

A Marketplace of Custom Models

In fact, as I've discussed previously, small models and orchestrated solutions that leverage them might be so well-suited to specific tasks, with such clear domains and simple interfaces, that their applicability extends beyond a single organization. It's not hard to imagine entire marketplaces forming around this idea, as small, useful models proliferate across industries. Over time, I can see such model marketplaces transforming enterprise AI in the same way app stores once transformed our relationship with mobile devices. More and more, I expect to see such models leveraged by users with little to no AI expertise of their own, content to simply plug and play.

Environmental Impact

On a related note, as industries all over the world face increasing pressure to curb emissions, compute costs are facing heavy scrutiny. This is an especially awkward truth given the meteoric rise of interest in enterprise AI and its often significant silicon requirements. For companies that want to explore the future of this technology while still helping to make the world a greener, cleaner place, small models can mean the difference between an AI strategy that works and one that runs afoul of regulators.

As mentioned above, sustainability is part of our mandate even at the research level, and the results speak for themselves. When combined with efficient hardware architecture and a focus on low-carbon data centers, our small model strategy has helped reduce our AI-related emissions by 68.8%—avoiding 105 tons of carbon dioxide equivalents compared to the global average.

Trust

Finally, one of the subtlest but most important benefits of small models happens to coincide with our central value here at Salesforce—trust. But trust is a goal; achieving it in practice requires tangible measures, the first and foremost of which is transparency. Here, small models truly shine. As mentioned above, their reduced size means their training data can be evaluated more clearly and exhaustively, making it easier than ever to ensure that desirable content, ideas, and patterns are fed into it—going a long way towards improving the quality and safety of what comes out. And because their parameter counts are lower, the potential for unanticipated capabilities or behaviors to emerge is reduced as well.

Smaller data sets also make it easier and more efficient to document the training process that went into the model—an increasingly important transparency measure as the role of LLMs grows to include mission-critical applications that don’t just require reliability, but accountability as well, both in advance and in hindsight. It’s how we’re ensuring our models align with the expectations of groups like Stanford’s Center for Research on Foundation Models, whose recently published Foundation Model Index Report has helped bring the question of model transparency to the forefront of the conversation.

Measures like fine-tuning can be more effective as well, given that they have a smaller neural network to influence, boosting efforts at controlling output and encouraging the model to follow rules. In this sense, small models are more analogous to a highly-focused expert trained in a single task or set of tasks, rather than a generalist fielding requests from all directions. They can play a more disciplined, predictable role, working within a space more understandable to developers and administrators. As enterprise AI grows to support more and more of a company’s operations—not to mention its reputation—the value of this virtue can’t be overstated.

Conclusion

I believe generative AI entering a smarter, second phase commonly seen in the evolution of technology: after an explosive emergence in which capabilities evolve fast and the shortest path to success is favored, we’re reevaluating our strategy in favor of something more nuanced. The early days of LLMs—a funny phrase to use given that all of this is still, undeniably, early—have shown us just how powerful this technology can be. But the time has come to work out new paths to achieve that power with less spending, more efficiency, and an increased emphasis on values like trust and clarity. Smaller models are unlikely to capture the public’s imagination the way the big ones have, but for those of us looking to solve real problems—the kind that span continents and affect millions of people, if not billions—they’re quickly reshaping the landscape, and point to a more inclusive future for AI in which everyone can benefit.