Think Together
by Tim R. Davidson, March, 2026
Think Together
by Tim R. Davidson, March, 2026
Speed changes the rules of the game
The promise of artificial intelligence, like all technology, is to compress time: given an objective, can a specific outcome be improved upon or achieved more quickly? Technological accelerations can be disruptive; they can broaden the playing field by opening new dimensions or subsume existing efforts entirely. Few domains make this dynamic as legible as financial markets, where the relationship between speed and value is unusually explicit.
Historically, one could divide investment decisions into two classes: First, there are short-term, reactionary decisions based on news events. For example, it is announced that the CEO of a public company steps down. While this event may, or may not influence the mid- or long-term value of the company in question, it does represent a change to the information previously used to determine this value. As a result, it forces many market participants to update their beliefs, likely resulting in short-term fluctuations. The short-term investor makes money by correctly anticipating the reactions of "other'' market participants. These decisions have to be made in a matter of seconds or at most minutes.
The second type of decision entails how a collection of such events over time affects the longer-term valuation of the company. This usually requires combining a large number of disparate information sources and weighing the likelihood and importance of different factors. Such longer-term investment theses are less time dependent and can take days, months, or even years to develop.
As exchanges turned electronic in the early nineties, a third, new type of strategy emerged: high-frequency trading (HFT). Roughly speaking, HFT strategies exploit short-lived pricing mismatches between the demand and supply of certain financial instruments. These (statistical) mismatches enable "arbitrage'' opportunities, which are risk-free profit events. For instance, if party A is willing to buy a security for x and party B is willing to sell it for x-y, a third party C can buy from B and instantaneously sell to A, netting y of profit in the process. In efficient markets, such mismatches should in theory not exist. Yet, what HFT practitioners found was that when trading bots were fast enough, they actually did [6, 7].
How fast is fast enough? In the late nineties, fast enough meant bots capable of spotting and executing opportunities in less than a second. Today, such bots have to operate in windows measured in nanoseconds. However, to achieve such dazzling speeds, bots must be relatively simplistic: While enormous amounts of up-front reasoning may go into designing profitable HFT strategies, "test-time'' reasoning is necessarily confined to simple rules and data processing. This generally means strategies revolve around a deep understanding of "system infrastructure,'' e.g., how do information bits travel from one place to another and how are they processed?
The advent of general-purpose agents powered by language models (LMs) is changing this picture in two important ways. Firstly, the new generation of bots is capable of processing unstructured, nuanced data events. Instead of relying on low-level system events, agents can interpret (to some degree) the significance of the departing CEO or a climate disaster. Secondly, humans can only process the information of one event at a time — we cannot parallelize ourselves. In contrast, AI can endlessly replicate itself to process many events at the same time.
Now imagine bots capable of attending to different events in parallel at dazzling speeds and combining insights without any friction. Even if humans remained superior in anticipating the reaction of market participants to a single event, surely there exists a threshold where processing multiple events simultaneously provides a richer signal than processing fewer signals in greater depth. Before long, the objective is no longer to anticipate the reactions of human market participants, but those of other AI agents — mirroring the developments in HFT. However, while HFT introduced a new investment-decision category largely orthogonal to the existing categories, capable agentic systems are more likely to erase the existing category of short-term investment decisions made by humans entirely.
A natural next question becomes: what happens to longer-term investment decisions? And, taken more generally: what other decision areas might humans lose simply by being too slow — a concern Kulveit et al [5] frame as gradual disempowerment?
From single context to distributed agentic systems
The trading scenario above is an instance of a broader pattern. Its defining feature is not finance per se, but the fact that relevant information is distributed across many sources, arrives asynchronously, and must be synthesized under time pressure. To see why this pattern demands a fundamentally different approach to using LMs, it helps to first characterize the default one.
Over the past decade, the AI industry has primarily invested in building increasingly larger LMs. This strategy is mainly driven by empirical "scaling laws," which promise a growth in model performance commensurate to their computational and data resources [4]. The default pattern to use such LMs is to pass information relevant to a specific objective in their context, after which the model performs one, or multiple inference passes to produce an outcome. This works when solutions to problems rely on centralized, static information, and holds as long as our model's context window has enough capacity to process everything at once. Consequently, the core bottlenecks to time compression in this setting are larger context windows and stronger reasoning power.
Issues emerge once the required information cannot be passed into a single context simultaneously. This can happen when information exceeds the context's volume, spans sources with different privilege profiles, or arrives asynchronously.
When information volume exceeds the context capacity, the user (or model) has to make the non-trivial decision of how best to divide information across different inference passes (e.g., "refactor this giant code base''). Alternatively, information might come from multiple sources with different privilege profiles; even if the parties have access to instances of the same model, this still requires separate inference passes. For instance, "solve the bug in my sensitive local code, introduced by a closed-source software package.'' Finally, information might arrive from different sources with asynchronous, stochastic arrival times. To process all of it in a single context, one would have to decide on a time-chunking strategy — but in settings like the trading decisions discussed above, waiting for "all" events to finish before performing a unified inference pass is neither well-defined nor feasible.
At a (very) high level of abstraction, all three scenarios share a similar pattern. To give it a working handle, let's call it "BotMapReduce'' (BMR): information is mapped to different model instances that produce insights through independent inference passes, which must then be reduced into a final output.
BMR is fundamentally different from the default LM-usage pattern: Depending on the problem, it (i) may require reasoning about an information dependency graph to effectively distribute information to different instances, as in the case of large code bases; (ii) produces variable, unstructured insights from independent instances that are unlikely to always fit a predetermined output format; and (iii) all of this might have to happen really fast.
A shift in model-building trade-offs
BMR problems force us to rethink Pareto optimality in model development. Time-constrained BMR problem make this paradigm shift especially clear: The solution to the Riemann hypothesis or a cure for cancer will be the same tomorrow or next decade. In contrast, solutions to time-sensitive problems like the short-term investment setting sketched out before are highly ephemeral: correctly interpreting market events too late makes the interpretation lose all value.
Such time-sensitive coordination problems extend well beyond trading. In the ER, physicians combine data from heterogeneous sensors in different modalities under severe resource constraints. In combat, tactical decisions depend on many simultaneous signals — some produced by adversarial opponents — to guide outcomes that cannot be undone.
Part of what makes coordination in such scenarios so challenging, is that signals are produced by heterogeneous sources that do not necessarily follow a centrally agreed-upon protocol. This "protocol uncertainty"' necessitates models capable of "ad hoc collaboration'' — fluid, on-the-fly reasoning about the best way to present information to another model instance and interpret information presented to them [2].
Extreme time pressure introduces a first trade-off: models should be sufficiently strong reasoners to process specific event types, but no stronger, so as to not lose time. This implies more research is needed into effective routing strategies [8] and into building small models [1]. Given a problem, which model possesses the optimal quality-speed ratio to solve it? And for a problem space of interest, what are the optimal scopes to specialize models towards?
Assuming we can successfully train such specialized models and route applicable problems to them, a second trade-off emerges in the reduce step: specialization versus communication. As models become more specialized, it becomes increasingly challenging to present insights using a format that is optimal for models specialized in a different area. Consider two models, one optimized to parse corporate events in Chinese and the other to parse climate events in Turkish. Naive optimization of the quality-speed tradeoff would mandate that each model retain only tokens relevant to its respective language and minimize storing any factual knowledge beyond its specific event type. However, this narrow optimization makes it increasingly unlikely that the two models can effectively exchange information, likely necessitating a third model capable of processing both of their outputs. This in turn raises a new question: is it faster to have two models that can communicate about their respective insights directly, or does it make sense to introduce additional models, and, if so, what should their capability scopes be?
At its core, this is a recursive design problem: each additional model introduced to mediate between specialists reproduces the same specialization-versus-communication tension at a new level of the hierarchy, with no obvious fixed point.
Speed changes who gets to play
To summarize: many important problems of interest involve distributed information that requires models capable of dynamic communication. For some of these problems there are strong time constraints, necessitating the development of small, specialized models that are still capable of effectively communicating their findings.
Neither of these requirements has computational resources as a strong bottleneck, marking a departure from the dominant scaling-law paradigm [3]. For a research community that has spent the better part of a decade competing in an arena biased toward those with the largest GPU clusters, a class of problems where the key challenges are architectural and algorithmic reopens the playing field. It can also usher in an era of new companies building for increasingly narrow problem domains, each finding value in small, specialized models tailored to their specific information environment.
Several concrete challenges remain. We lack evaluation frameworks for distributed, time-constrained multi-model coordination. Routing problems to specialist models in real time, when the dependency structure is itself uncertain, remains unsolved. And recent empirical work reveals a "collaboration gap'' where models that perform well individually degrade substantially when required to coordinate, even with identical copies of themselves [2] —though brief priming interactions from stronger models can close much of this gap, suggesting the bottleneck is establishing common ground, not reasoning capability.
Think together, think small — Many big problems need it.
Thanks to Bob West, Marija Šakota, Caglar Gulcehre, and Josh Joseph for valuable feedback on drafts of this post.
——
Please cite this post as follows:
Tim R. Davidson. Think Together. Blog post, 2026.
Or use the BibTeX citation:
@misc{davidson2026thinktogether,
author = {Davidson, Tim R.},
title = {Think Together},
howpublished = {\url{https://www.trdavidson.com/think-together}},
year = {2026},
month = {April},
note = {Blog post}
}
——
References
[1] Peter Belcak, Greg Heinrich, Shizhe Diao, Yonggan Fu, Xin Dong, Saurav Muralidharan, Yingyan Celine Lin, and Pavlo Molchanov. Small language models are the future of agentic ai. arXiv preprint arXiv:2506.02153, 2025.
[2] Tim R. Davidson, Adam Fourney, Saleema Amershi, Robert West, Eric Horvitz, and Ece Kamar. The collaboration gap: exploration and benchmarking of open-world agentic cooperation, 2025. URL https://arxiv.org/abs/2511.02687.
[3] Sara Hooker. On the slow death of scaling. Available at SSRN 5877662, 2025.
[4] Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
[5] Jan Kulveit, Raymond Douglas, Nora Ammann, Deger Turan, David Krueger, and David Duvenaud. Gradual disempowerment: Systemic existential risks from incremental ai development. arXiv preprint arXiv:2501.16946, 2025.
[6] Michael Lewis. Flash boys: a Wall Street revolt, volume 1. WW Norton & Company New York, 2014.
[7] Scott Patterson. Dark pools: The rise of the machine traders and the rigging of the US stock market. Crown Currency, 2013.
[8] Marija Šakota, Maxime Peyrard, and Robert West. Fly-swat or cannon? cost-effective language model choice via meta-modeling. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, pp. 606–615, 2024.