Live chatbots are certainly not new, in fact I recall interacting with one in the mid-2000s. So why am I talking about them now? Well according to a survey by Oracle, by 2020 approximately 80% of companies want to use chatbots in their business. This begs the question, is your business considering the use of chatbots, if so, do you have the expertise to test them? I have recently been developing a chatbot for our company website to assist the recruitment team and have found that testing chatbots is quite different from that of traditional software. There are a several reasons for this:
The first being non-linear input. What do I mean by that? Well, no two humans will articulate themselves in the same way, therefore it becomes impossible to cover every possible user input, so from a testing perspective achieving 100% coverage is simply impossible.
Another reason is non-deterministic behaviour. Many chatbots are built on top of learning cloud services, they continue to adapt and learn from the repeated interactions to improve on their service. This means that repeating the same test cases can distort and skew the cloud service’s assumptions of “real-life interactions”. In traditional software testing, each assertion will have an expected value. This is not the case for chatbots, as the “expected behaviour” will constantly change.
One other reason is, due to the nature of human interaction, chatbots must be able to handle unexpected inputs.
So how exactly do we test a chatbot? You may be asking yourself “Where do I start?” I recently stumbled upon a Chrome extension for testing chatbots called Alma. Alma is perfect for beginners, it helps by guiding the end user through a series of short chatbot tests and assists in identifying common design or functional issues, these are separated into 7 categories:
Research has shown that users are more prone to continue using a chatbot if it has a personality. Humans are relational beings, we crave true interaction so by appearing more “human-like” a chatbot is more likely to be used.
So, how do we test a personality? Well it’s not an exact science and somewhat subjective, but here are some key considerations:
There is nothing more frustrating than being on the phone to customer services and not being given the information that you require. It is important that a chatbot provides the right level of information to the end user, not too much and not too little. The quality of the information provided by the chatbot must be considered.
Some further questions to consider:
Users have the freedom to say whatever they want to a chatbot and may even have the ability to send images; videos; emojis or voice notes depending on the platform. Is the chatbot capable of understanding these things?
Some other questions worthy of note, does the bot understand…
Sooner or the later it is inevitable that the end user is going to utter something that the chatbot doesn’t understand. I’m sure we have all said something to a chatbot and got the response “Sorry, I don’t understand”. To an end user this is extremely unhelpful! Some of the more refined chatbots will attempt to clarify its misunderstanding; remind the user of its scope, perhaps by giving the user a list of options and if possible, re-direct them to a human.
Chatbots often integrate with several other services, you may have a chatbot that speaks to a RESTful web service for attaining weather for example. What happens if this web service is unavailable? How does the chatbot handle this?
Assessing intelligence can be quite subjective, but I believe there are two main areas that we can look at to evaluate the intelligence of the chatbot under test. They are:
Context: Context will inform a chatbot on how best to respond to an utterance. Does your chatbot understand later inputs in light of earlier ones? Does it respond differently depending on your geographical location; past interactions or other factors such as the time of day or season?
Knowledge & Memory: Our human interactions are based upon the assumption that people will remember us and any previous conversations we may have had. A chatbot needs to be able to mimic this. Does it remember things like your name, age, personal preferences etc?
Chatbots are built with a number of dialog flows. A chatbot for a restaurant for example, might have one flow for booking a table and another for cancelling a booking. How easy is it to navigate through each flow? Is it possible to change to another one mid-flow? Are the different flows made apparent to a new user?
Onboarding is the process of integrating someone into something new, for example a new customer or client. In this context it is the process of a new user familiarising themselves with a chatbot that they haven’t used before. According to an analyst from the Silicon Valley “The average app loses 77 percent of its users in the first three days after they install it. After a month, 90 percent of users eventually stop using the app, and by the 90-day mark, only 5 percent of users continue using a given app”. The expectations of most users nowadays are very high, so if an application isn’t up to scratch then a user will simply stop using it, therefore it is imperative that the onboarding process is well designed and tested properly.
Chatbottest, who are the creators of Alma, have also created a list of 120 questions which are available in GitHub, separated into the 7 categories mentioned above. These act as a great beginner’s guide for anyone new to chatbot testing. It can be found at: https://github.com/chatbottest-com/guide/wiki
If you would like to learn more about testing chatbots - get in touch!
Felix Walne - Senior Test Engineer