Towards AI-ready data for India

0
2
Towards AI-ready data for India


India’s digital transformation has made government data more accessible than ever. The next frontier is to make this data work seamlessly with the AI ​​systems that citizens and businesses increasingly rely on for information and guidance.

Aye

Often, critical information related to policy making to ease the lives of citizens exists in government databases, regulatory portals, certification systems and official statistical repositories, which are accurate, structured and authoritative, yet largely inaccessible to analytical and modern AI tools. While AI systems can generate fluent answers from scraped Internet content, they remain isolated from government databases by their design.

As a result, when citizens, policy-makers, or entrepreneurs seek guidance on questions related to regulatory and planning requirements, such as government certification or eligibility for government schemes, using AI tools, AI produces informed guesses rather than definitive answers, not because the knowledge is absent, but because it cannot access the systems where that knowledge actually resides.

As citizens and enterprises alike increasingly turn to AI to unearth such information, and its use is expected to increase as awareness of AI’s capabilities increases, the need for AI-ready data is more urgent than ever.

India generates huge amounts of data in agriculture, commerce, health, education and other civil services.

The e-statistics portal of the National Statistical Office alone contains 135 million records of various socio-economic indicators such as GDP, industrial production, consumer prices and labor force surveys. Through its Digital India initiative, the government has already taken important steps in digitizing public services and improving access to schemes and regulatory information through UMANG, MyScheme and several other portals, generating information on a large scale.

Although these efforts have made information more accessible to citizens and businesses, data remains dispersed across multiple platforms and departments. The next step is to extend this remarkable digital transformation to AI systems by making our information portals interoperable with them, enabling a simple natural-language query to get accurate, authoritative answers directly from these government sources and providing accurate guidance to citizens and businesses in real-time.

Data exists in a number of distinct systems and portals, each designed to serve specific purposes and audiences. A wealth of resources spanning departments, documents and datasets, each providing unique insights, although often in different formats and contexts, is available; There is a need to make them AI-ready. Linking these resources effectively can empower citizens and businesses with a more holistic and actionable view of the information available. AI has already proven it can operate at this scale, processing huge datasets, converting unstructured documents into usable formats, and answering complex questions in natural language.

The opportunity lies in applying this technology to government data. By bridging the gap between AI tools and the rich information already available, we can unlock economic value for millions of users, including enterprises.

Reducing information friction benefits citizens and businesses alike, but can be especially transformative for small businesses and the marginalized, for whom access to timely intelligence and regulatory guidance can open up opportunities that were previously difficult to access. One of the best use cases could be in the MSME sector.

When these enterprises can efficiently access market information, compliance requirements and plan eligibility, they compete more effectively in global markets, optimize their operations and grow faster. Manufacturing competitiveness strengthens, employment expands in cities and towns, and supply chains become more flexible. The multiplier effect of improved information access on this huge enterprise base translates into significant economic acceleration. With such enablement, the MSME sector can truly act as a champion as envisaged in this year’s Budget.

All this is possible only when we have AI-ready data. A weak data foundation cannot be compensated for by AI. NSO India, being the nodal agency for official statistics for the Government of India, has taken several steps for data harmony such as national meta data structure, statistical quality assurance framework, compilation of unique identifiers, and codes and taxonomies to be used while creating planned data layers for interoperability.

The data accessibility requirement is met through a digital bouquet for data dissemination consisting of a website, eSankhyiki portal, mobile app, microdata portal and metadata portal. These applications provide government data in an interoperable manner using APIs for consumption by both humans and machines. Based on these parameters, NSO India has AI-ready data for use.

On February 6, 2026, the National Statistical Office, India took a significant step forward by launching a Model Context Protocol server, which maintains an open technology layer on the eSankhyiki repository. This infrastructure allows AI systems to connect directly to datasets on eSankhyiki and query them programmatically. Now anyone can ask an AI assistant about pricing trends, employment patterns or industrial production and get answers synthesized from official data in real time.

Market analysis shows that previously sought-after specialized expertise is becoming accessible through natural language queries. Immediate use cases started coming from users, from dashboard creation to deep-dive data analysis. This represents a fundamental shift in how government data serves the economy.

In an AI-driven world, more than just publishing and organizing data is required. It must be reliable, interoperable, relevant and machine-usable. Users need to know where the data comes from and whether they can trust it. Privacy protection needs to be built into the architecture, not added as an afterthought. The governance framework needs to clearly define lawful use. Quality assurance needs to be maintained continuously, flagging anomalies before they are propagated through the AI ​​system.

These elements determine whether AI produces reliable outputs and builds public trust. These safeguards are included by design in the NSO implementation. MCP provides access to verified official data through server controlled protocols while keeping the data close to its official source. When datasets are standardized, agreed upon, and made searchable through common APIs, they become an AI-ready foundation on which any model or application can safely build.

The initial deployment includes seven datasets: Periodic Labor Force Survey, Consumer Price Index, Annual Survey of Industries, Index of Industrial Production, National Accounts Statistics, Wholesale Price Index and Environmental Statistics, with more to come in the future. Each may be interrogated individually or in combination.

Consider what this makes possible. An MSME exploring new markets can analyze employment trends and industrial potential in the target sectors through an interaction with the AI ​​assistant. A researcher studying regional economic patterns can combine industrial production data with labor force surveys without manually reconciling different formats and time periods. Infrastructure development occurs through progressive integration. As more authoritative data sources join the network, queries can integrate them with other datasets. Each new connection expands and multiplies the analytical possibilities across all existing sources, enabling deeper insights and more comprehensive decision making.

MOSPI’s data reconciliation and dataset linking initiatives are laying the foundation for a data ready future. Government datasets can become accessible foundations upon which any developer, researcher, or enterprise can build. Small businesses gain access to market information that was previously only available to large corporations. Policy makers receive real-time feedback. Researchers connect datasets that were previously incompatible.

As datasets connect across the network, each connection multiplies the analytical possibilities across all existing sources. Economic benefits will increase as the number of adoptions increases. When data becomes a horizontal capability that spans all sectors, innovations stop being trapped in silos.

This approach recognizes a fundamental principle: those who generate the data should benefit from it. India’s competitive advantage in the AI ​​era will come from responsibly mobilizing diverse, relevant, high-quality data.

The question is no longer whether data can create economic value. The question is whether we design systems where that value is shared widely to empower both prosperity and the public good. AI-ready data infrastructure, built with a foundation of trust and security measures, transforms scattered information into accessible intelligence.

Thus India builds the data foundation that powers AI for economic growth at population scale en route to a developing India.

(Views expressed are personal)

This article is written by Saurabh Garg. Secretary, Ministry of Statistics and Program Implementation and Shalini Kapoor, Chief Strategist, Data and AI, Onestep Foundation.


LEAVE A REPLY

Please enter your comment!
Please enter your name here