Conversational AI (Artificial Intelligence) dialogue systems such as Amazon Alexa and Google Assistant are reshaping computing, user experience and lifestyles for generations to come. According to Google, they sold one smart home device every second during the Holiday season in 2017. However, these systems face a unique challenge, in that they must have comprehensive vocabular knowledge in all the possible areas a question could be asked, for they are expected to consistently and effectively answer user queries. Knowledge bases of this size, that contains responses to almost every possible question that could be asked, are realistically not possible to load into a system in advance. So, this presents the problem of creating, maintaining, and continuously expanding a dynamic knowledge base, that will allow systems to handle cases where a user might use a new term outside of the system’s vocabulary, and then learn it for future references as well. Furthermore, growing this dynamic knowledge base manually is both expensive and tedious, and therefore impractical. A team of researchers at Osaka University in Japan believe that they have found the solution to this problem in a process called Implicit Confirmation.
When a conversational AI system typically encounters an unknown term, it understands the term by repeating simple and abrupt questions to the user, which often causes the user to lose interest and can lead to quick disengagement. However, the team at Osaka University, which includes Kohei Ono, Ru Takeda, Eric Nichols, Mikio Nakano, and Kazunori Komatani, propose using natural dialogue to help the computer determine the category of the unknown word encountered during a conversation with a human. The main idea is to continue the dialogue through implicit confirmation, instead of directly asking the meaning of the new word. The researches chose to use implicit confirmation because explicit confirmation requests can be frustrating and often fail to keep the user engaged, in turn failing to assist the user, and therefore defying the sole purpose of voice activated devices. For example, if the user says “the Mutton Biryani was good”, and the computer responds “Is Mutton Biryani Italian?”, the incorrect assumption made by the device will degrade the user experience and, as a result, the user may feel obligated to make corrections, or simply abandon the conversation that is already off track from the main idea. On the contrary, the computer can implicitly confirm or deny its prediction of what category the word belongs to by saying something like, “There are other Italian restaurants opening nearby where you can get the same dish,” and depending on the response of the user, whether it be “great!”, or “Mutton Biryani is not an Italian dish”, computer can finally lead to an answer such as, “Some other restaurants that serve Mutton Biryani near you are Taj and Bhukara Grill”. A mechanism of continuous dialogue that involves implicit confirmation will eventually add Mutton Biryani to Indian cuisine in that device’s knowledge base.
Though implicit confirmation provides a very user-friendly approach to lexical acquisition, it creates new challenges for the developers of voice activated systems. The first major problem is the actual determination of the category for a new word. In a natural dialogue, users can respond with varied and indirect expressions instead of simple affirmative or negative responses, which can make it hard for the machine to interpret their answer. For example, the user might say, “I baked Pandoro yesterday”, and the computer might respond, “Wow, sounds tasty. I really like Japanese food”. To this, if the user responds with something along the lines of “Yes, me too”, would lead the computer to incorrectly categorize Pandoro as Japanese food while it is, in fact, Italian food. The second major problem, though similar in nature, can be more adverse. The user can sometimes explicitly provide a completely incorrectly answer, which would cause the computer to add incorrect knowledge to its knowledge base. In this case, Pandoro could end up being listed as a vegetable!
The researchers have considered these problems and have also proposed viable solutions. To solve the first obstacle involving the difficulty of category determination, they propose designing and implementing a feature set for machine learning based classification of responses. A feature set is a group of traits which a machine learning model will look for and make a conclusion based upon. Within this feature set used by researchers, user’s first statement was regarded as U1, system’s response to that was S1, and the response of the user following S1 was regarded as U2. The feature set included true or false cases such as U2 includes an expression affirmative to S1, U1 includes the category name used in S1, U2 includes a word preventing topic change in S1, and U1 includes any interrogative, amongst other such cases. The model chosen in this case was a logistic regression model, a machine learning model where the dependent variable is categorical, in this case the categories would be the categories of the unknown words, and the output can only take two values, 0 and 1, representing a binary “true” or “false”. The researchers trained this model by giving it phrases which were known to correspond to either a confirmation or denial. Once the model was given enough known relationships to deduce a mathematical relationship between the binary values of the features and what the users meant, it could be used to effectively maintain a natural flow of conversation and adapt to odd responses from the user. Regarding the second problem, where the user may provide incorrect category, the team proposes linking all instances of the chatbot or voice interface to a server, and, using a statistical model, calculating the probability of word w belonging to the category c based on all the interactions the system has had regarding that word in the past. If the function of confidence Conf (w,c) > 0.5, or the odds of the system being correct were one out of two or better, then the system would consider its categorical prediction correct. Thus, even if a user gave an outlying incorrect answer to an implicit confirmation request, the rest of the answers from other interactions would allow for the system to accurately respond to the user.
This research, which provides significant advancement in the voice activated user experience, has greater implications in the world of computer science as well as in the everyday lives of people. For years, computer scientists have been striving to develop True AI, or Artificial General Intelligence, where a machine can emulate and perform just as well as a human in all domains of human life. Implicit confirmation can allow a machine to learn new things while maintaining natural conversation, and can thus allow computers to converse in the same way humans do, which is a gigantic leap towards True AI. For everyday people, however, implicit confirmation can have an equally large impact, in that they can converse with smart home devices and the technology around them just as they could converse with another human being, extremely naturally, without being interrupted or running into deadlock situations where user statements are not understood by the machines. We may never have to hear a “Sorry, I don’t know how to answer that” again.