The computer vision field is one of the most promising areas of technological development. Advances in computer vision will underpin autonomous car infrastructure, improve medical diagnosis, and - in the insurance industry - automate claims settlement.
Yet computers need to be told what they are looking at before they can learn how to react. In response, a new "data labeling" industry has emerged. Workers look at countless pictures and videos and label different elements such as traffic signals, pedestrians, and other cars. This manual investment is essential before a computer vision system can be put into production and start making decisions.
This poses an interesting question: who will own the computer vision infrastructure of the future? There are a host of early stage companies pursuing computer vision opportunities at present, but few of them have the resources to invest in these intensive back office operations.
Tech monoliths like Google, Amazon and Microsoft have the capital to invest in this kind of enterprise and existing enterprise cloud relationships where they can offer computer vision as a tech layer. It could be that this particular area of AI is a battle between big tech companies rather than a space for early stage companies.
Today, companies are embracing artificial intelligence as a way to automate decision-making and help drive new business opportunities. The challenge is that the algorithms that underpin the technology are as naive as newborns. They need to be fed millions of labelled examples to teach them to “see”. For a self-driving car algorithm to be taught the meaning of road signs, or to tell the difference between a child and a fox, hours of footage have to be watched and objects tagged, frame by frame. An hour of video takes eight hours to annotate. In fact, a McKinsey report from 2018 listed data labelling as the biggest obstacle to AI adoption in industry.