Experimental Google software that can describe a complex scene could lead to better image search or apps to help the visually impaired.
Researchers at Google have created software that can use complete sentences to accurately describe scenes shown in photos—a significant advance in the field of computer vision. When shown a photo of a game of ultimate Frisbee, for example, the software responded with the description “A group of young people playing a game of frisbee.” The software can even count, giving answers such as “Two pizzas sitting on top of a stove top oven.”
Previously, most efforts to create software that understands images have focused on the easier task of identifying single objects. Multi-layered descriptions can lead to better understanding for vision-impaired readers and better searches by all.