11 months Opinion Pieces

Testing Chatbots

Live chatbots are certainly not new, in fact I recall interacting with one in the mid-2000s. So why am I talking about them now? Well according to a survey by Oracle, by 2020 approximately 80% of companies want to use chatbots in their business. This begs the question, is your business considering the use of chatbots, if so, do you have the expertise to test them? I have recently been developing a chatbot for our company website to assist the recruitment team and have found that testing chatbots is quite different from that of traditional software. There are a several reasons for this:

The first being non-linear input. What do I mean by that? Well, no two humans will articulate themselves in the same way, therefore it becomes impossible to cover every possible user input, so from a testing perspective achieving 100% coverage is simply impossible.

Another reason is non-deterministic behaviour. Many chatbots are built on top of learning cloud services, they continue to adapt and learn from the repeated interactions to improve on their service. This means that repeating the same test cases can distort and skew the cloud service’s assumptions of “real-life interactions”. In traditional software testing, each assertion will have an expected value. This is not the case for chatbots, as the “expected behaviour” will constantly change.

One other reason is, due to the nature of human interaction, chatbots must be able to handle unexpected inputs.

So how exactly do we test a chatbot? You may be asking yourself “Where do I start?” I recently stumbled upon a Chrome extension for testing chatbots called Alma. Alma is perfect for beginners, it helps by guiding the end user through a series of short chatbot tests and assists in identifying common design or functional issues, these are separated into 7 categories:


Research has shown that users are more prone to continue using a chatbot if it has a personality. Humans are relational beings, we crave true interaction so by appearing more “human-like” a chatbot is more likely to be used.

So, how do we test a personality? Well it’s not an exact science and somewhat subjective, but here are some key considerations:

  • Personality strength: Does the chatbot give itself a name? Does it have a profile picture of itself? Is there a consistent tone or perceived personality throughout the conversation?
  • Personality suitability: Is the personality appropriate for its purpose? Are the chatbots answers appropriate to a typical end user?
  • Personality adaptability: Does the personality adapt to a given situation? For example, show sympathy and compassion to an angry or agitated user? Or re-adapt its tone to celebrate user happiness or satisfaction?


There is nothing more frustrating than being on the phone to customer services and not being given the information that you require. It is important that a chatbot provides the right level of information to the end user, not too much and not too little. The quality of the information provided by the chatbot must be considered.

Some further questions to consider:

  • Are the chatbot answers clear or ambiguous?
  • Are answers presented in a clear format?
  • Are responses spelt properly with the correct grammar?
  • Does the chatbot have a range of responses to the same user input?
  • Is the response time under 5 seconds? If the chatbot needs to process or gather some information, does it both inform the user and politely ask them to wait?


Users have the freedom to say whatever they want to a chatbot and may even have the ability to send images; videos; emojis or voice notes depending on the platform. Is the chatbot capable of understanding these things?

Some other questions worthy of note, does the bot understand…

  • Spelling or typing mistakes?
  • Idioms?
  • Sarcasm?
  • Invalid data formats?
  • Small talk; greetings; manners; gratitude or frustration?

Error Handling

Sooner or the later it is inevitable that the end user is going to utter something that the chatbot doesn’t understand. I’m sure we have all said something to a chatbot and got the response “Sorry, I don’t understand”. To an end user this is extremely unhelpful! Some of the more refined chatbots will attempt to clarify its misunderstanding; remind the user of its scope, perhaps by giving the user a list of options and if possible, re-direct them to a human.

Chatbots often integrate with several other services, you may have a chatbot that speaks to a RESTful web service for attaining weather for example. What happens if this web service is unavailable? How does the chatbot handle this?


Assessing intelligence can be quite subjective, but I believe there are two main areas that we can look at to evaluate the intelligence of the chatbot under test. They are:

Context: Context will inform a chatbot on how best to respond to an utterance. Does your chatbot understand later inputs in light of earlier ones? Does it respond differently depending on your geographical location; past interactions or other factors such as the time of day or season?

Knowledge & Memory: Our human interactions are based upon the assumption that people will remember us and any previous conversations we may have had. A chatbot needs to be able to mimic this. Does it remember things like your name, age, personal preferences etc? 


Chatbots are built with a number of dialog flows. A chatbot for a restaurant for example, might have one flow for booking a table and another for cancelling a booking. How easy is it to navigate through each flow? Is it possible to change to another one mid-flow? Are the different flows made apparent to a new user? 


Onboarding is the process of integrating someone into something new, for example a new customer or client. In this context it is the process of a new user familiarising themselves with a chatbot that they haven’t used before. According to an analyst from the Silicon Valley “The average app loses 77 percent of its users in the first three days after they install it. After a month, 90 percent of users eventually stop using the app, and by the 90-day mark, only 5 percent of users continue using a given app”. The expectations of most users nowadays are very high, so if an application isn’t up to scratch then a user will simply stop using it, therefore it is imperative that the onboarding process is well designed and tested properly.


Chatbottest, who are the creators of Alma, have also created a list of 120 questions which are available in GitHub, separated into the 7 categories mentioned above. These act as a great beginner’s guide for anyone new to chatbot testing. It can be found at: https://github.com/chatbottest-com/guide/wiki

If you would like to learn more about testing chatbots - get in touch!

Felix Walne - Senior Test Engineer


Comments are closed.