How an army of data labellers are behind India's AI boom

Roshni Majumdar
Roshni MajumdarJan 23, 2020 | 17:22

How an army of data labellers are behind India's AI boom

Indian AI sector is growing by leaps and bounds and data labelling business is expected to be valued at $1.2 billion by 2023.

The AI boom has been a long time coming, and it may finally be here. In Noida’s Sector 2, just 30 minutes from the heart of Delhi on a good day, at Cogito Tech, a data-annotation firm, hundreds of young girls and boys are patiently using digital tools to identify and label the images on their screens. They’re mostly graduates from universities around Delhi, and they’re working in tandem with an American company, Labelbox, which creates data labelling software for companies looking to train their machines to perform AI-related tasks.


And they aren’t the only ones. In Kerala’s Palakkad district, Mujeeb Kolasseri, founder of Infolks, a firm that labels images for companies in the United States and Europe, is constantly thinking about how to hire more college graduates from his village. Mujeeb had long been an independent 'crowdworker' for Mechanical Turk  —  Amazon’s crowdsourced marketplace, where dozens of remote workers annotate data (among other things) for very little money. In 2016, he founded his own company, beginning with just six people. Today, he employs 350, and in the next two years, hopes to take that number to more than 2,000. 

Data labellers at work in Infolks. (Photo: Mujeeb Kolasseri)

But what, exactly, is data labelling? It is the manual classification of information  —  labelling  —  which allows machines to make sense of the data and perform the task it has been given. Since machines have no understanding of the natural world, humans have to 'teach' them to recognise patterns by giving them data that has already been labelled. Labelling some kinds of data is easy  —  when you mark an email as important on Gmail, you’re teaching Google to recognise the kind of emails that are important to you. Going forward, Google will flag similar emails for you. Other tasks are more complex  —  say, for example, when a company is looking to develop AI for autonomous vehicles. In this case, a car has to be taught what a tree or a person looks like. This process — which involves humans — is known as humans-in-the-loop machine learning.


Consider an example to gain a sense of why humans are so crucial in the process. In 2016, Russian operatives had bought a large number of political ads on Facebook, paying for them in Russian rubles. The scale of this effort was massive  —  Russian-backed content directly reached nearly 126 million Americans, about half those eligible to vote. But Facebook algorithms  —  that can process billions of data points  —  entirely missed the signal. They weren’t able to connect the dots between rubles being spent to buy American political ads and foreign interference in the US election. How did this happen? Part of the reason for this was that the algorithms had not been taught to look for such interference, while another reason was the fact that algorithms were working with information that involved lots of unlabelled, or noisy, data floating on the platform.

Facebook algorithms entirely missed the signal when Russian operatives bought political ads on the site that reached millions of American voters. (Representational image: Reuters)

That is why human data labellers are so important  — they not only provide machines with labelled data but provide the machine with just the right amount of data. In the world of big data, there is always going to be more data. What becomes so important then is, labelling the right data. “If you don’t find a grey shirt in Flipkart, you can go to Myntra,” explains Ajinkya Malasane, co-founder of Playment, a data-labelling platform based out of Bengaluru. In other words, humans are the only ones who can provide companies with high-quality, or accurate, data sets. There are numbers to prove that the sector is growing.


Over the past five years, the Indian AI sector accounted for $150 million worth of investments in more than 400 companies. Though this number pales in comparison to comparable investment in the US or Europe  —  which runs into the billions  —  the domestic AI sector is growing by leaps and bounds. Investment really kicked off after 2016, when the figure nearly doubled from $44 million that year to $77 million in 2017. Data analytics firms were a major component of the 1,200 new tech startups in the country in 2018. Data labelling business is expected to reach $1.2 billion by 2023. Plus, AI has the potential to add $957 billion to India’s economy in 2035 if the sector receives the right kind of support.

Industry leaders and the youth are holding out hope for that result. Many college graduates at Cogito Tech are entirely happy with their jobs, hoping to scale up their skills to meet the demands of future work. If the sector does see the kind of support it is hoping for, there is room for a new kind of AI revolution in India that Malasane calls “Tech 3.0.” Rohan Agrawal, the founder and CEO of Cogito Tech, who formerly ran a BPO, agrees. "At least we’re going to eliminate the whole ordeal of talking to people who’re yelling at us," he says, referring to the hellish work scenario at call centres. Data labellers are quietly teaching machines to do the work instead, and they’re perfectly happy.

Last updated: January 23, 2020 | 17:22
Please log in
I agree with DailyO's privacy policy