fbpx

A Tale of Two Conferences, Part II: SpeechTek Talks the Talk

A Tale of Two Conferences, Part II: SpeechTek Talks the Talk
Reading Time: 6 minutes
A clash of Nerds versus Suits at two industry events

SpeechTek-MobileHead.png

As mentioned in my last post, I attended SpeechTek in Washington, DC (April 24-26) followed by Forrester’s Digital Transformation conference in Chicago (May 8-10). To contrast with Forrester, SpeechTek presented depth in its content and practicality in terms of real-life application. 

This was my second year going to SpeechTEK, and I walked in with limited expectations.  Mainly because I gained so much from my first year of attending, meaning a ton of discussion on speech technologies and applications. For example, last year I learned quite a bit about speech computing, far-field voice recognition, the eventual dominance of voice for search, and the use of voice-based bots for customer service. It was an enlightening first time but as a veteran of many conferences I tempered my outlook, knowing that attending a second time usually results in only incremental insights. Basically, tech trends don’t evolve that quickly.

To my surprise, SpeechTEK 2017 was a completely new experience. The topics for 2017 were related to the 2016 event but were not simple updates. I felt as if I suddenly arrived at my sophomore year of college. What I learned last year applied, but we were moving onto brand new stuff and I needed to be facile in what I learned the year before to keep up.

SpeechTEK — Engineering and Existential Crises

This conference was for nerds and not really for suits. However, the material was presented in a readily digestible manner for those of us who don’t do deep engineering. Although acronyms like NLU (Natural Language Understanding), DL (Deep Learning), TTS (Text To Speech), and VUI (Voice User Interface) were flung about, the presenters graciously defined the terms and helped participants understand how the concepts behind the acronyms were manifest in products or solutions.

For example, many of these concepts were used in discussing the growing use and sophistication of chatbots which are now often voice enabled. The trend to use artificial intelligence along with voice technology has reached a practical maturity. Bots now can understand context, anticipate needs, and apply previous interactions into helping users navigate a transaction or support a request. As stated in a plenary session by Aspect’s Joe Gagnon, companies need to consider chatbots and the like as digital employees. Meaning they have their roles and responsibilities and should be counted as part of the resource pool. The workload should consider what human and digital workloads can support.  

speech_voice_mobile_phone.jpg

The mainstage sessions promoted interactions and valuable insights from the conference’s participants. The business executives were prepared to present their ideas and then discuss in a manner that was meaningful to technologists and business leaders. They shared their concepts and successes, but it was also clear they wanted be challenged.

The conference wasn’t all great, but it certainly was intriguing and bridged the technology and business divide. One key learning was what a good voice-based digital experience requires. It was repeated by many of the presenters in technical terms, but Intuit’s Wolf Paulus did a great job of explaining each of the foundational concepts through incredibly funny and nerdy examples while interacting with an Echo Dot jerry-rigged to his car and in his house.

I cannot go through each of his principles (read this if you want the full presentation), but a couple of them showed why artificial intelligence was critical. One was the concept of shared control of a conversation. For humans to effectively work using voice, the discourse must not simply be a human request and digital speech response. The digital solution and human must be able to interact as near equals, allowing the machine to take control to help the human get what he needs. An example Paulus used was a human checking for his bank balance. The machine asked, “Which account?” and then gave the answer that the human requested.  Knowing that the human has multiple accounts is about context and not sharing. The sharing came next when the digital assistant asked a follow-up question citing history. Something to the tune of, “The last time you asked about this account you made reservations for a vacation? Are you about to travel and do you need help with that?”  

Moving things towards natural language usage is critical for us to use speech more and fingers just a little less.

The other design learning from Paulus was to make sure that the machine can offer random responses to a query. Again, in most computing requests a user asks a question like, “How much money do I have in the bank?” A typical digital response is, “In this bank account you have a balance of $x.” Random responses are more natural in that they don’t always have the same construct in delivering the same information. At times the answer could be, “$x.” Another time it could be, “Currently, it’s $x.” The point is that if the responses always have the same construct, the human considers the digital assistant as limited. Moving things towards natural language usage is critical for us to use speech more and fingers just a little less.

With respect to trends, the conference did some thoughtful work of having participants consider speech technology simply as more than chatbots or digital assistants. The speakers brought in the world of gaming and robotics. It’s not that they expected the business people in attendance to want to construct games, but they used games in explaining how speech could be used for obtaining more than just information.  Speech can also be used to command actions and at the highest level allows the game characters to assist the human in planning and strategizing — kind of like a teammate. The robotics presenters showed some of the same, indicating that robots could be companions as well as task performers. 

In both the gaming and robotics cases there were certainly nerdy discussions on algorithms, artificial intelligence, and NLG (Natural Language Generation). However, for the gaming characters and robots to work effectively with humans, they need to be able to show emotions (e.g., embarrassment when AI creates a wrong guess), pick-up or leverage non-verbal cues, and have personalities so humans understand their limitations.

The technology is showing great promise and utility, but the spotlight is on their role as digital assistants.

The conference did reveal that there is a bit of an existential crisis going on in the world of speech. The group of speech veterans spoke of decades of dedication to speech. The technology landscape evolved slowly with mainstream implementations, mainly in the form of interactive voice response units (IVRs) and other menu-like solutions. Truly natural and sophisticated voice interactions have generally failed to meet the high expectations being set by sci-fi movies. Now the technology is showing great promise and utility, but the spotlight is on their role as digital assistants (e.g., Siri, Echo/Alexa, Google) and hotshot analysts such as Gartner and Forrester espousing speech as a key element in digital transformation. Although the voice veterans were excited to gain the attention, they were being realistic engineers and expressed concern that they’d potentially not deliver what the world now expects.

It’s just like engineers to assume failure. They live with it everyday while delivering life-altering tech to all of us. 

My two spring 2017 conferences couldn’t have been more different. One promised a vision of transformation and provided little context and content. Not all bad, but hardly the equal of a tech-oriented conference that used examples and case studies to help us understand some of the toughest and most abstract of concepts.