Artificial intelligence is often discussed in terms of models, GPUs, and breakthroughs in reasoning.

But underneath those discussions lies a much more fundamental unit.

Tokens.

Every prompt sent to a language model is converted into tokens. Every reasoning step a model performs produces tokens. Every generated response is ultimately a stream of tokens.

Prompt → Tokens → Model → Tokens → Response

In other words, tokens are the smallest measurable unit of AI work.

Over the past year, researchers and infrastructure teams have increasingly begun to measure AI demand not in users, models, or GPUs — but in tokens generated and processed.

And the numbers are growing extremely quickly.

The scale of token consumption

One way to understand the scale of this growth is to look at inference platforms that route requests across multiple AI models.

OpenRouter, an inference gateway used by developers to access many different models, provides a rare view into real-world usage. Its recent research analyzed more than 100 trillion tokens of real interactions, offering one of the most comprehensive datasets of how tokens are actually consumed in production systems.

More recent operational data shows how quickly the demand is accelerating. In early 2026, OpenRouter processed 13 trillion tokens in a single week, roughly double the 6.4 trillion tokens processed in the first week of January.

Even these numbers represent only a small slice of global AI activity, because large AI providers run most inference workloads internally.

For example, Google reported generating roughly 1.3 quadrillion tokens per month across its AI systems in 2025.

That translates to more than 40 trillion tokens per day from a single company.

OpenRouter: 100T+ (research), 13T/week (early 2026, 2× Jan 6.4T)
Google: 1.3 quadrillion/month, 40T+/day

Seen in this light, it becomes easier to understand why infrastructure teams increasingly talk about token throughput as the real measure of AI demand.

Why token usage is increasing so rapidly

The interesting question is not only how many tokens are produced, but why token usage is increasing so rapidly.

Part of the answer lies in the changing nature of AI workloads.

Early language models were mostly used for question-answering or text generation. A user would ask a question, and the model would produce a short response. The number of tokens involved in each interaction was relatively small.

But modern AI systems increasingly operate as multi-step systems rather than single responses.

Agents plan tasks, call tools, analyze results, and revise outputs. Coding assistants generate code, review it, debug it, and iterate. Document reasoning systems read large corpora before generating an answer.

Each of these steps consumes tokens.

Research on agentic software engineering workflows shows that token usage is often dominated not by the first generation step, but by review and refinement loops.

Code review: ~59% of token consumption
Input tokens: >50% of total usage

This means that AI systems are no longer simple input-output tools. They behave more like iterative computational processes that generate tokens continuously while solving a task.

Another factor contributing to the growth of token consumption is the expansion of context windows.

Language models can now process hundreds of thousands — and sometimes millions — of tokens of context in a single interaction. This allows AI systems to read entire repositories, datasets, or document collections.

But it also dramatically increases the number of tokens required for each task.

The result is that the number of tokens per query is rising, even when the number of users remains constant.

Tokens as the unit of AI economics

Because tokens correspond directly to model computation, they have become a useful unit for thinking about the economics of AI systems.

Every token generated by a model requires GPU computation, memory bandwidth, networking infrastructure, and ultimately electricity.

Organizations deploying AI systems increasingly track metrics such as the following when designing their infrastructure:

tokens per second
cost per million tokens
energy per token

Deloitte notes that companies are beginning to treat tokens as the core unit of AI cost management, since the total cost of AI systems depends on how efficiently infrastructure can generate tokens.

Importantly, the cost of generating tokens is not determined solely by GPUs. Networking, storage systems, cooling infrastructure, and data-center power also contribute significantly to the overall economics of inference systems.

Physical limits of token production

Perhaps the most striking perspective comes from research that links token generation directly to energy consumption.

One recent paper explores the physical limits of token production by estimating how much AI output can be generated given projected electricity allocations.

Under certain assumptions about model efficiency and compute infrastructure, the authors estimate that projected U.S. AI electricity consumption could support approximately 6.5 × 10¹⁷ tokens annually.

~326 TWh/year → ~6.5×10¹⁷ tokens/year
~225,000 tokens per person per day

The implication is striking.

AI output is not an abstract digital resource. It is ultimately constrained by physical systems — compute hardware, data centers, and energy supply.

Every token generated by an AI model is a tiny unit of computation powered by electricity somewhere in the world.

A new perspective on the AI economy

Seen from this perspective, the AI economy can be understood as a system built around token generation:

Applications → Token Demand
Infrastructure → Token Production
Energy → Physical Limit

Applications generate token demand; infrastructure exists to produce tokens efficiently; and energy systems ultimately set the physical limits of how many tokens the world can produce. Understanding AI may therefore require a subtle shift in perspective—instead of focusing only on models or applications, it may be more useful to focus on the flow of tokens moving through the global AI system. Because in the end, tokens are the smallest unit of intelligence produced by machines.

About Meterra

Meterra is an AI & software development company specializing in custom AI agents, LLM integration, custom software, and cloud-native infrastructure. We build production-ready systems for startups, SMBs, and enterprises—from RAG pipelines and agentic workflows to Kubernetes and multi-cloud operations.

Learn more Contact us

Continue reading

Mar 9, 2026AIAgentEnterprise

The Software Stack of an AI-Native Company

Mar 5, 2026AIAgent

The scale of token consumption

Why token usage is increasing so rapidly

Tokens as the unit of AI economics

Physical limits of token production

A new perspective on the AI economy

About Meterra

Continue reading

The Software Stack of an AI-Native Company

SaaS is Taking Its Final Bow: What the Eran Zinman Interview Reveals

Engineering Partners for Products That Ship

Headquarters

Contact