BERT stands for Bidirectional Encoder Representations from Transformers. Unlike other models, BERT is intended to pre-train deep bidirectional representations from an unlabeled text by jointly conditioning on both left and right context in all layers. In other words, BERT should enable the machine to comprehend what the words in a sentence mean, considering every single detail of the context. Thereby, this model can be fine-tuned with just one additional output layer. State-of-the-art models can be created for a wide range of tasks such as question answering and language inference, without substantial task-specific architecture modifications.
For example, consider the following two sentences:
Sentence 1: What is your date of birth?
Sentence 2: I went out for a date with John.
The meaning of the word date is different in these two sentences.
Contextual models generate a representation of each word based on the other words in the sentence, i.e., they would represent ‘date’ based on ‘What is your‘, but not ‘of birth.‘ However, BERT represents ‘date’ using both its previous and next context — ‘ What is your …of birth?‘
Google has displayed several examples regarding this search update. One such example is the query: “do estheticians stand a lot at work.“