Implementing Google Photos Cloud Vision API

May 15, 2017 9:05:00 AM

shutterstock_414530338.jpgEven before there was such a thing as a digital computer, science fiction authors and other dreamers imagined sentient machines that could see and understand what they saw. Just the “seeing” part—identifying objects in an image—has proven to be one of the thorniest problems in computing. However, with recent advances in machine learning, we are now a step closer to realizing that goal: Google has released its Google Photos Cloud Vision API for developers everywhere to use.

The Google Photos Cloud Vision system uses deep-learning algorithms to perform several advanced image-processing tasks:

  • Identify objects in an image: Google claims that it has taught its system to identify thousands of different objects, and can identify multiple different objects in the same image. In particular, the system recognizes logos and famous landmarks.
  • Recognize text in an image: The system’s optical character recognition algorithms can extract text in any of several major languages.
  • Identify sentiment: Although the system cannot (yet) match a face with its owner’s name, it can make a pretty good guess as to the emotional state of the person on the basis of the facial features shown in the image.
  • Identify inappropriate content: The system can classify adult and violent content in images.

The computational heavy lifting is performed on Google’s cloud service, so the system can be easily integrated into any web or mobile app.

The possibilities are practically without limit. You could, for example, create applications that:

  • Automatically flag inappropriate visual content in your blog’s public comments (no more checking and releasing each post manually!)
  • Identify and filter all-image spam email
  • Gauge audience reactions to a performance
  • Automatically tag collections of photos

You get the idea.

 

Google Photos Cloud Vision API: Better with Time

 

One of the most intriguing features of the Google Photos Cloud Vision API system is that the more it is used, the better it will become. Because it is based on machine learning, as more applications use the system, its repertoire of object recognition will grow, and the accuracy will improve over time.

What does the future hold? One obvious next step would be extending the system’s capabilities from still photos to full-motion video. Obviously, this involves quite a bit more data, so processing recorded video will probably come before real-time analysis of live video feeds. Once that happens, though, there will be essentially no limit to what machine vision systems can do.

The lack of machines’ ability to understand and react to visual data has long held them back from the level of automation (and autonomy) that humans have long dreamed of. We now appear to be on the verge of turning that corner. Strap in—it’s going to be an exciting ride!

Brian Geary

Written by Brian Geary

Brian is a true believer in the Agile process. He often assists the development process by performing the product owner role. In addition to his technical background, he is an experienced account manager with a background in design and marketing.

Lists by Topic

see all

Get in touch

LET’S BUILD SOMETHING AWESOME. TOGETHER.

Clients

 
Arthromeda
Bloomberg
crossref
Honeywell Logo
Medica
NexRev
Onset
Predicata