Bing is making its image search engine more precise through improved understanding of the relationships between user queries, images, and webpages.
In addition, Bing is also bringing multi-granularity matching to image search with new vector-match, attribute-match, and best representative query match techniques.
Being explains how these enhancements will improve image search:
“… Bing image search has employed many deep learning techniques to map both query and document into semantic space greatly improving our search quality. There are however still many hard cases where users search for objects with specific context or attributes (for example: {blonde man with a mustache}, {dance outfits for girls with a rose}) which cannot be satisfied by current search stack. This prompted us to develop further enhancements.”
Here’s more about Bing’s multi-granularity matching.
Vector match
Using the above-mentioned example of “dance outfits for girls with a rose,” Bing illustrates how its new vector match for image search works:
“With recent advancements we incorporated BERT/Transformer technology leveraging 1) pre-trained knowledge to better interpret text information … 2) attention mechanism to embed the image and webpage with the awareness of each other, so that the embedded document is a good summarization of the salient areas of the image and the key points on the webpage.”
Attribute match
Attribute match utilizes a set of techniques to extract a set of object attributes from both the query and source document and use those attributes for matching.
Using the example query “elderly man swimming pictures,” Bing shows how it applies attribute detectors to extract descriptions of the person’s appearance and behavior.
“Despite the webpage having insufficient textual information for this image, we are now able to detect certain similar attributes from the image content and its surrounding text. Now the query and document can be considered a “precise match” since they share the same attributes.”
Best Representative Query (BRQ) match
Bing has enriched the metadata for images with Best Representative Query information. The Best Representative Query for a given image is a query that the image would be a good result for.
BRQs resemble user queries, which means they can be naturally and easily matched to incoming queries. They are typically a summarization of the main topics of the webpage and the major image content.
Generating a richer set of BRQs for images will leads to better search results, Bing says.