The tokenization tax: Why AI costs non-English speakers up to 5x more per query, and how India is fighting back

When Vivek Raghavan saw ChatGPT for the first time, he faced a choice that defines the AI era for most of the world’s population. He could wait for Silicon Valley to eventually localize its models for Indian languages, accepting whatever pricing and priorities American companies decided on. Or he could build something from scratch, optimized for a country where hundreds of millions of people carry smartphones but many of them struggle with English and spotty bandwidth. He chose the harder path. And the logic behind that choice contains lessons that extend far beyond India.

The conventional wisdom about AI development runs something like this: the race belongs to whoever has the most compute. Train larger models on more data with more GPUs, and you win. OpenAI, Google, Anthropic, and a handful of Chinese labs dominate because they can spend heavily on infrastructure that most countries will never afford. The implicit conclusion is that everyone else should simply wait for these models to trickle down, accessed through APIs, on someone else’s terms.

That framing misses what is actually happening on the ground in India, and what it means for dozens of other nations facing the same structural constraints. The startups building AI in India aren’t trying to out-compute San Francisco. They’re doing something more interesting: proving that constraint itself can be a design philosophy, and that models built for scarcity often outperform expensive ones in the specific contexts where they’re needed most.

The tokenization tax nobody talks about

Before you can understand why India’s approach matters, you need to understand a technical problem that operates like an invisible tax on non-English speakers. Large language models process text by breaking it into tokens, small chunks of characters that the model treats as discrete units. English, because it dominated the training data for most major models, gets efficient tokenization. Common English words often map to a single token.

Indian languages don’t get this treatment. According to Sarvam AI’s research, queries in Indian languages can cost far more than equivalent English queries due to tokenization inefficiencies—potentially up to five times more per interaction. Research has shown that simple Hindi sentences can require several times more tokens than their English equivalents. That means every API call, every chatbot interaction, every medical query costs multiples more when conducted in a language spoken by hundreds of millions of people.

This is not a bug. It is the structural outcome of building AI for one linguistic context and then exporting it globally. The tokenization gap functions as a regressive tax: the people who can least afford expensive AI interactions, those who don’t speak English fluently, pay the most per query. When the tech industry discusses making AI accessible to all, this is the reality that phrase often conceals.

Sarvam AI’s response was to build better tokenizers for Indian languages, reducing the cost per interaction at the root level. This sounds like a minor engineering fix. It isn’t. It determines whether voice-enabled medical triage in rural Tamil Nadu is economically viable or not. It determines whether an AI tutor can affordably explain algebra in Marathi to a fourteen-year-old who has never seen a laptop.

Indian smartphone rural user — Photo by Mehmet Turgut Kirkgoz on Pexels

Building on top, not from scratch

The smartest thing about India’s frugal AI movement is what it doesn’t do. It doesn’t try to train foundation models from zero. That would require billions of dollars in compute that India’s startups don’t have and don’t need.

Instead, Sarvam AI developed OpenHathi, an open-source project that takes existing large language models and teaches them Indian language capabilities. The approach involves bolting Indian language skills onto existing models, then creating smaller, domain-specific models in fields like finance or medicine that are much cheaper and more efficient to use.

This is an architectural insight, not just an engineering shortcut. The expensive part of building an LLM is the base layer: the massive pre-training run that teaches the model general language understanding. But the valuable part, the part that actually serves users, is often the domain-specific fine-tuning on top. India’s startups are essentially saying: let Silicon Valley and Beijing spend billions on the foundation. We’ll build the floors that people actually live on.

Sarvam’s flagship model is a large-scale LLM trained across multiple Indian languages. That’s large enough to be useful but small enough to be affordable. It can run healthcare triage. It can power educational tutors. It handles code-mixed queries, the natural way millions of Indians actually speak, blending Hindi and English and regional languages in a single sentence.

Compare this to GPT-4 or Claude, which are designed to do everything for everyone. Sarvam’s model is designed to do specific things, extremely well, for people who have been poorly served by general-purpose English models. The constraint produced better design.

Krutrim and the infrastructure question

Sarvam AI isn’t alone. Bhavish Aggarwal, co-founder of Ola Cabs, launched Krutrim in late 2023 with a different but complementary approach. Krutrim’s model was trained on a large volume of tokens and is designed to understand and generate text in India’s official languages. Where Sarvam focuses on bolting language capabilities onto existing open-source models, Krutrim is building with India’s infrastructure limitations baked into the design from day one.

The model is optimized to run without supercomputers. That sentence deserves a pause. Most frontier AI development assumes access to data centers packed with NVIDIA H100s. Krutrim assumes the opposite: schools, government offices, and small businesses that need powerful AI at low cost, running on modest hardware.

The broader context here is AI4Bharat, an initiative launched at IIT Madras that laid much of the groundwork. AI4Bharat builds lightweight systems designed for low-end smartphones and low-bandwidth networks. India has numerous dialects across its diverse linguistic landscape. The challenge isn’t just linguistic. It’s about getting usable AI to people whose internet connection drops out when it rains and whose phone cost less than a pair of running shoes.

Pratyush Kumar, who co-founded Sarvam AI with Raghavan, brings a background in AI and systems engineering with advanced degrees from leading technical institutions. The academic lineage matters because it signals something important: India’s frugal AI movement isn’t a second-rate imitation of Western AI. It’s world-class talent making deliberate engineering choices to optimize for a different set of constraints.

Why this is a blueprint, not just an Indian story

The title of this piece makes a specific claim: that India’s approach offers a blueprint for resource-strapped nations. I want to defend that claim carefully, because “blueprint” can easily become a lazy metaphor.

Here’s what I mean concretely. Most countries in the Global South face some combination of the following constraints: limited compute infrastructure, populations that primarily speak non-English languages, low per-capita income that makes expensive API calls unsustainable, and fragile internet connectivity. India faces all of these at massive scale, which means the solutions being developed there are stress-tested against precisely the conditions that other nations will encounter.

The OpenHathi approach—taking open-source foundation models and fine-tuning them for local languages and use cases—is directly transferable. A team in Kenya could apply the same method to Swahili and Kikuyu. A team in Indonesia could do it for Bahasa and Javanese. The tokenization problem Sarvam solved for Hindi affects every non-Latin-script language. The infrastructure optimization Krutrim pioneered applies anywhere data centers are scarce.

In my recent piece on the economic variable that determines whether AI creates or destroys jobs, I argued that the critical factor isn’t AI capability but the rate at which new tasks and economic roles emerge to absorb displaced labor. India’s frugal AI models are relevant here too. By making AI affordable enough to deploy in healthcare clinics and rural schools, they’re creating new categories of work: AI system administrators in district hospitals, voice-interface trainers for regional languages, local-language content curators for educational platforms.

This contrasts sharply with the Western AI deployment pattern, where the primary effect has been productivity gains for already-skilled knowledge workers in wealthy economies. As Silicon Canals has explored, AI job displacement predictions rest on metrics that many economists consider unreliable. The Indian model suggests a different trajectory entirely: AI that doesn’t displace existing workers so much as it creates new capabilities for populations that currently have no access to the services AI can provide.

India AI startup workspace — Photo by CadoMaestro on Pexels

The sovereignty question is really a power question

Raghavan and his co-founders frame their work around the concept of sovereign AI—and it’s easy to dismiss this as nationalist branding. It isn’t. The question of who controls AI models, who owns the data they’re trained on, and whose cultural context they embed is a power question with economic consequences that will compound for decades.

When a rural Indian doctor uses ChatGPT through an API to triage patients, the data from those interactions flows to OpenAI’s servers. The model’s medical knowledge reflects American and European clinical practices. The cost per query is set in dollars. Every element of that interaction involves dependency on a foreign company’s priorities.

When the same doctor uses a Sarvam AI model deployed locally, the data stays within Indian jurisdiction. The medical knowledge is fine-tuned for conditions and treatment protocols common in India. The cost structure is designed for Indian economics. Sovereign AI isn’t about nationalism. It’s about structural independence in a technology that will increasingly mediate access to healthcare, education, financial services, and government.

The dependency pattern is already visible across Southeast Asia. Thailand, Vietnam, and the Philippines all face the same choice: rely on American and Chinese AI platforms—ceding data flows, cultural framing, and economic value to foreign corporations—or invest in local alternatives. The costs of inaction are not hypothetical. Every API call routed through a US provider means patient data, financial records, and educational interactions stored under foreign jurisdiction, subject to foreign pricing decisions, and optimized for foreign contexts. India’s startups are demonstrating that the latter path is technically feasible without requiring sovereign wealth fund-sized budgets.

The design philosophy that informed India’s Aadhaar identity system and the Unified Payments Interface (UPI) is visible in these AI efforts. Build open. Build cheap. Build at scale. UPI processes billions of transactions because it was designed for the constraints of the Indian market, not imported from a different economic context. The same logic is driving frugal AI development.

What the blueprint actually requires

I want to be honest about the limitations. A blueprint is useless without builders, and most countries in the Global South lack India’s specific advantages: a massive domestic market of hundreds of millions of smartphone users that makes commercial AI development viable, a deep bench of world-class AI researchers at institutions like IIT Madras and IIT Bombay, and a government that has, at various points, supported digital public infrastructure investment.

A country of 10 million people with three universities and limited technical talent cannot simply copy Sarvam AI’s playbook. But it can adopt the core principles. Use open-source foundation models rather than building from scratch. Invest in local-language tokenization. Design for low-bandwidth, low-compute environments. Prioritize domain-specific applications in healthcare and education over general-purpose chatbots.

The open-source dimension is critical. By releasing models on platforms like Hugging Face, Sarvam AI and the AI4Bharat initiative make their work available to developers anywhere. A team in Senegal doesn’t need to solve the same tokenization problems from scratch. They can build on what India has already done and adapt it for Wolof or Fulani.

This is how technology transfer works when it’s driven by shared constraints rather than corporate licensing. And it’s a fundamentally different model from the one Silicon Valley offers, where access depends on ability to pay API fees denominated in US dollars.

Frugality as competitive advantage

There’s a tendency in tech coverage to frame frugal approaches as inferior versions of the “real” thing. Smaller models are seen as less capable. Lower budgets are treated as handicaps. The implicit assumption is that if India had more money, it would build GPT-5.

That assumption misunderstands what’s happening. Raghavan didn’t build a cheaper ChatGPT. He built something different: AI that works in Hindi on a phone with intermittent internet, that can triage medical symptoms for a village health worker, that can teach mathematics in a student’s mother tongue. GPT-5, no matter how capable, cannot do these things as effectively because it wasn’t designed for these constraints.

The frugal approach produces models that are genuinely better for their intended users. Not better on benchmarks. Better at being useful in the real conditions where most of the world’s population lives.

When I ran Ideapod, I learned something painful: shifting from a grand vision to doing small things that actually help users is what creates viability. Sarvam AI’s vision of bringing AI to all of India became viable only when the team focused on specific, concrete problems: tokenization costs, bandwidth constraints, voice-first interfaces for populations that prefer speaking to typing.

The Global South doesn’t need a cheaper version of American AI. It needs a different kind of AI, one designed from the ground up for the conditions that most humans actually live in. India is building it. The question now is whether other countries recognize the blueprint for what it is and start adapting it to their own constraints.

That recognition will require abandoning the idea that AI development means competing with OpenAI. It means accepting that a large language model trained on multiple Indian languages and optimized for low-end smartphones is, for most of the world, more valuable than a trillion-parameter model that requires a data center to run and charges in dollars per query.

The future of AI isn’t one model to rule them all. It’s thousands of models, each shaped by the specific constraints and needs of the people they serve. India is proving that this future works. The rest of the Global South is watching.

Feature image by Vladimir Srajber on Pexels