AI data labeling is big thing in China
Helps autonomous vehicles learn what objects to avoid
JIAXIAN, CHINA — Yi Yake and his boyhood friends grew up in a farming village in central China, swinging sickles to harvest the family wheat crop.
As young adults, they ventured out of the farm economy. Yi got a job marketing computer games. His friend worked in a fireworks store.
Today Yi drives a white BMW and, along with two childhood buddies, employs over 200 people in what is quickly becoming a boom industry in China: artificial intelligence.
Their company, located in a city near their parents’ village in Henan province, provides an essential early service in the AI process, labeling images and videos to help make computers smarter. Before a self-driving car can learn to avoid hitting people or trees, it must learn what people and trees look like — by digesting thousands of images labeled by thousands of humans.
Demand for labeling is exploding in China as large tech companies, banks and others attempt to use AI to improve their products and services. Many of these companies are clustered in big cities like Beijing and Shanghai, but the lower-tech labeling business is spreading some of the new-tech money out to smaller towns, providing jobs beyond agriculture and manufacturing.
The science is mired in controversy in China, where the ruling Communist Party is using AI to help it identify and track people in mass-surveillance programs, most prominently in the largely Muslim province of Xinjiang, according to Human Rights Watch. The rights group has raised concerns that China’s private sector has aided the government surveillance by providing AI-powered software and other services.
Yi said his business, Ruijin Science & Tech, mostly works for Chinese tech giants Baidu and Alibaba, labeling footage captured by autonomous cars. Data labeling for autonomous vehicles is taking off in both the United States and China, as both countries invest heavily in the technology.
“I was working on online game promotion and never heard about the AI labeling business,” Yi said during a break in his office, serving green tea in small ceramic cups. He and his partners noticed other small firms getting into the field and decided to try it out with an initial investment of $15,000. “We believed this business could become better and bigger,” he said.
Just outside his office, in a large room resembling a field house, employees worked cheek-by-jowl at long rows of computers, examining blurry images filmed by self-driving vehicles.
The employees, who earn $350 to $550 a month, drew digital boxes around each object on the screen, and labeled them from a drop-down menu — vehicle, human, obstacle, animal. If they selected “vehicle,” another drop-down menu with more options appeared — small car, motorbike, truck, train.
“Sometimes there could be a train at a crossing,” explained 30-year-old Kang Qing, though he added he’d not yet encountered one after 18 months on the job. The workers mostly label objects located directly on the road, but when they see a human near the road, they label that, too, because a human could theoretically move into traffic.
“When I was young I heard about AI from robots in movies. It was a term that sounded mysterious to me,” Kang said. “It’s still mysterious, but I’ve learned more about it, and I’ve developed a more reasonable view. It’s humans setting the rules for AI, and the scary feeling mostly comes from the movies, I think.”
At the Beijing headquarters of Baidu — China’s answer to Google — autonomous cars, buses and sweepers roam the campus. The company is testing self-driving cars in 13 cities, where they cruise around in regular traffic. The Lincoln-brand cars, made by Ford, have a human driver at the wheel as a backup in case something goes wrong.
For now the cars don’t ferry passengers, but Baidu says it plans to introduce a self-driving taxi service in the city of Changsha in October. The taxis, called Apollo Go, will be connected to the city’s new 5G wireless Internet network, Baidu said. The company is using cars made by Chinese company FAW Group for that project.
In the United States, Google offshoot Waymo last year launched the nation’s first commercial self-driving taxi service, in the Phoenix area. Industry experts say U.S. autonomous-car companies generally outsource their image labeling to freelancers who work at home, or to lower-cost workers in countries such as India or the Philippines.
Much of Ruijin’s business comes directly from Baidu and Alibaba, but sometimes the company gets jobs through outsourcing companies, said Liu Zhanjie, one of Yi’s business partners. On a few occasions, that work has involved drawing digital boxes around photos of human faces, for use in payment apps, Liu said.
Facial-recognition screens are cropping up at retailers across China, allowing customers to pay by having their faces scanned. Amos Toh, an AI researcher at Human Rights Watch, said there is concern among rights activists that the proliferation of facial recognition for commercial use could give the Chinese government more pools of data to access for public surveillance.
Liu said Ruijin’s clients were using the data for commercial purposes. How else it might be used down the road “is beyond our job or knowledge,” he said.
On a recent afternoon, Ruijin workers were handling a job for Microsoft, which had hired the company through an intermediary.
The assignment was simple: drawing boxes around handwritten Japanese or Korean characters. A different company would later label each character with its proper name, so a computer could recognize the text. The process, known as optical character recognition, or OCR, is a basic form of AI.
The Ruijin employees guessed the work was aimed at improving a translation app, but they didn’t know for sure. Microsoft declined to comment.
“We can’t understand the text,” said He Yongchao, 29, a high school graduate from the area whose previous tech experience was limited to playing computer games. “We just draw a frame around the image to crop the text — that’s all we’re responsible for.”
AI used to sound extraordinary to him, He said. “Now I have more knowledge about it. ... The intelligence becomes intelligent based on a huge amount of manual work.”