We are experiencing exponential growth in smart speaker technology. It is estimated that by 2020, 50% of all web searches will be voice-enabled, and by 2021, there could be more smart speaker than humans on the planet. And there is a good reason for these predictions.
Spoken language is the most fundamental form of intelligence and has been an essential tool for communication and collaboration between humans since the dawn of history. According to University of Reading professor Mark Pagel, “Language has played a more important role in our species’ recent evolution than have our genes.”
Smart speakers rely on a set of complex artificial intelligence (AI) technologies. They listen to sound waves and convert them into words using automatic speech recognition (ASR). Then, they convert those words into meanings using Natural Language Understanding (NLU). Once the meanings are understood, the smart speaker responds using Natural Language Generation (NLG) technology. These relatively small devices mask layers of complex technologies to deliver a simple, frictionless user experience.
Yet the transformational power of this technology transcends the sum of its components and is driven by the perception of its human users. Throughout technology’s evolution, we had to use our eyes, our hands, our fingers and our brains to interact with technology. This made technology always an inanimate object and unquestionably nonhuman, despite some devices becoming very close companions, such as smartphones. With smart speakers, this is about to fundamentally change, as we are likely to humanize technology in ways that we have not seen before.
Occasionally, human interaction with technology takes a giant leap that changes us. One great example is the touchscreen technology on smartphones and the introduction of the iPhone, which essentially overthrew Nokia’s dominance of cellphone devices. The touchscreen technology enabled many new applications for interacting with others via social media, weather, navigation, reading books, listening to music and much more. And these apps, in turn, have fundamentally changed us in the way we socialize, learn, collaborate, work and relate to each other. Smartphones have become our closest and dearest devices and our primary portals to the web and the rest of the world.
I believe a potentially bigger shift is unfolding today with smart speakers. It is a new way of interacting with technology, the web and the world, and its promising ability to provide frictionless interactions will possibly have a bigger impact on us than any other type of technology.
So how is this frictionless technology different? And how will it alter existing businesses and create new business opportunities?
First, there is an enormous convenience with this technology in that we are speaking to devices instead of looking at screens and typing on keyboards to convey our intention. With smartphones and web browsers, we rely on our eyes and our fingers to interact with these devices, but most importantly, we rely on our brains to make choices and decisions. Let’s take the example of searching the web for “smart speaker” using Google. We get a list of answers and ads. When scrolling down and looking at the results, we select the hyperlink that matches our intention. We are in charge and make the decision about which results we want to review.
With a smart speaker, the device performs a search based on its understanding of our intention. It then selects a result and makes the decision on our behalf. It searches the web by reviewing the results and uttering an answer. It transforms the experience from a search engine to an “action engine.” And that is a major change. The other key shift is that, possibly unknowingly, we are trusting the companies behind the smart speakers with the accuracy of the results they produce.
Secondly, there is the likely possibility of affecting existing businesses and business models. When Googling for the phrase “smart speaker,” Google provides millions of results and displays ads for smart speakers. Google is projected to become the first digital advertising company to generate over $100 billion in ad revenue in 2019. How will such revenue be affected when we are only talking to a smart speaker?
Currently, no ads are “spoken” to us when we search the web using Alexa, Siri or Google Home. Will we tolerate voice ads in the future? Will we see the current digital ads business model transformed by this technology? Will new voice applications/skills be the way of the future, and will they unseat the dominance of mobile apps? Will web search and the search engine optimization industry be affected, and how soon will we transition from a mobile-first to a voice-first paradigm?
No matter what the exact answers to those questions are, we know that several business models will change. In contrast, new business opportunities are unfolding for voice-first apps and for companies that can retool to deliver them.
Our perception of technology is shifting radically to a new level. For the first time, technology is speaking to us in a human voice, masking layers of complex AI technologies that are being abstracted and simplified through voice commands. With technology talking to us, we are, unconsciously, elevating it to a new level. We are humanizing it. We are starting to attribute traits to our smart speakers, which is similar to how we react to other humans. This anthropomorphic attribution and humanization of technology is ushering in a new era where technology is elevated to a human-level companion.