Digital assistants like the Google Home and the Amazon Echo are growing in popularity every day. Not to be left behind, both Microsoft and Apple have entered the market as well, with the Invoke and HomePod, respectively. These assistants are increasingly capable, some incorporating visual elements with their own screens even as others are finding their way into other devices, including TVs and smartphones. So prolific are digital assistants that their installed base is set to exceed 7.5 billion active devices by 2021 (Ovum).
With all of this in mind, we devloped a simple skill, or app, for one of these digital assistants. Our skill was designed for demonstration purposes, so we picked a fun topic: Where to go for lunch.
We decided to go with Amazon’s Alexa. Not only is it currently the most popular digital assistant, but with the recent additions of the Echo Show and Echo Spot, you can display an interactive interface in response to your statements, as well.
Our Lunch Bot skill wasn’t deployed publicly, so you can’t use our version on your own Alexa. However, the source code and technical details are public on GitHub. In this post, we’ll share the four main design consideration challenges and lessons, so you can keep them in mind as you brainstorm digital assistant skills applications of your own.
Understanding Intent & NLP
User commands are created via Intents, which are essentially callouts to an API. Each device has its own proprietary NLP (Natural Language Processing) system. We used Amazon’s Alexa Skills Kit to create Lunch Bot, and you’ll find that many NLP systems out there use similar management interfaces.
Lunch Bot’s primary intent is called “getidea” and as you would expect, it gets a lunch idea. To tell Alexa how a user might ask for an idea, we first gave Alexa as many examples as we possibly could. Lunch Bot has 15 possible questions for this intent.
- “where should I go”
- "can you give me an idea”
The beauty of NLP is that using these examples, Amazon will take a best-guess when a user speaks, so a user only has to get close to one of our examples. In defining 15 ways to trigger our intent, we get 50 or more for free. This also means that Alexa gets it wrong sometimes, triggering our intent for phrases we didn’t want. What’s more, creating the same definition in Google’s NLP system may result in a different set of “free” examples, so we will get subtly different behavior on different devices.
After an intent is triggered, all of the slots, or form data, that was collected are sent to our API on our own servers. At this point, we get to process the data, look up additional information, and decide how we want Lunch Bot to respond.
Alexa’s skills can be conversational, but only to a limited extent. You won’t be able to have a realistic conversation, but you can arrange your intents, in a way to make interactions with the skill feel natural. For example, this is the exchange a co-worker had today while playing with our Lunch Bot skill:
User: Let’s talk to Lunch Bot.
Alexa: Hi, I can give you some lunch ideas!
User: Ok, let’s hear one then.
Alexa: How does Reids’ Fine Foods sound?
User: I’m not sure about that today.
Alexa: Ok, I’ll suggest that less often. What about Jimmy Johns?
User: That’ll work for me!
Alexa: Ok that idea will come up more often!
This conversation is possible on all of the Amazon devices, including the “Amazon Alexa” iOS and Android applications. On devices with a screen, like the Echo Show and Echo Spot, you will even see some food related stock photos we chose. This write-once-run-everywhere ability is not without its costs, however. You’ll want to pay special consideration to these areas of your your skill design, in particular.
1. Sandboxed by NLP Technology
Digital assistants are restricted to using proprietary NLP technology. When parsing the users’ speech and text, Amazon devices use the Alexa Skills Kit while Google uses Dialog Flow. That means you are limited to what their environments support. Using third party NLP is difficult, Amazon and Google explicitly forbid it in most cases. While your websites’ chat bot may be amazing, plugging Alexa into the existing system will be difficult.
That doesn’t mean you can’t share much of the same code and backend logic. Ultimately, all chat bots call out to a web server to actually perform actions.
2. Persistent Subject
“Persisting a subject” refers to the subject of a conversation being stated in the beginning, then implied for the remainder of the conversation. In the case of Lunch Bot, this is the current lunch idea we are talking about. You can ask for an idea, then follow up with additional commands without having to explicitly restate the idea.
Alexa and Google Home, like many NLP systems, can automatically ask follow-up questions to fill in missing data points. For example, if you ask “What is the average airspeed velocity of a laden swallow?” they will automatically ask the follow up question, “What do you mean, an African or European Swallow?”
After all of the required slots are filled, your intent function will be called. Asking follow-up questions is persisting context in a limited manner.
3. Deep Conversational A.I.
Amazon Alexa doesn’t provide many ways to have a stateful conversation flow. If you want some intents to become available only during parts of your conversations, you’ll have to manually build your own solution. As far as Alexa is concerned, all of the intents your skill supports are always available for the user to execute at any time.
When you cannot predict in which order commands will run, it’s very hard to build a coherent conversation. This results in many skills being small, or having all of the functionality expressed as robotic commands. “MySkill do thing one”, “MySkill do thing two”, etc. Lunch Bot has a total of 5 commands. Each is focused around a single subject: the last idea. Even with these simple requirements, a realistic conversation is difficult.
4. Varied Responses
The responses Lunch Bot will give in most situations are varied, which adds tremendously to the feeling of natural conversation. The sentences Lunch Bot speaks are less predictable but still have the same clear meaning. Like a human, its responses have different levels of emotion and reflect the personality of the individual speaking.
For a skill with such limited commands, it makes it a lot more pleasant to ask for an idea 3-4 times until you get a good one. Alexa and Google Home support SSML (speech synthesis markup language), which allows you to specify both what to say and how to say it.
Lunch Bot is a bit flippant.
That was a bad idea
“Well all the other bots like that spot.”
“That’s your opinion.”
That was a good idea
“I’m a Lunch Bot, good ideas are kind of my thing.”
“Glad I could help.”
Business Applications for Digital Assistant Skills
Our Lunch Bot demonstration did a great job of highlighting the considerations companies need to keep in mind while designing digital assistant apps. Imagine the enhanced customer experience you could deliver with an app designed to deliver product or service recommendations, just as Lunch Bot was trained to deliver restaurant recommendations.
As you plot out your desired conversation flow, keep the above limitations in mind. It’s impossible to predict each and every variation, but a well-designed AI-powered digital assistant will give your customers a friendly, natural way to interact with your business.
Many companies are already leveraging NLP, AI, and Intent in multiple channels, make sure you are ready to manage these challenges. Working with an experienced partner will help improve the functionality of your app, while ensuring you get the best return on your technology investment. See how Skookum can help.