11 months Opinion Pieces

The Robots are Coming (but don’t panic)

Testing is under pressure! The sea changes in technology, delivery practices, and user and business expectations are all putting pressure on testers - raising questions about what testing is done, and how software quality can be guaranteed.

Testing Pressures

At ROQ, we’re busy delivering for our clients on current technologies, but we’re also looking forward at where things are heading. The topic of AI is increasingly featuring in conversations that we’re having with CIOs, and we’re being asked to look at how we’ll do testing in projects that incorporate machine learning technologies. In addition, we’ve been doing some experimentation internally with some AI based technologies to find out how they can help us test.

All of this is why we decided to talk about AI at Test Expo. Before starting on the meat of it, though, I would like to consider what we mean when we talk about artificial intelligence.

AI: Strong vs Weak

Artificial intelligence has been divided into two broad categories.

Strong AI refers to a machine intelligence that can apply itself to any problem, like the human mind. Also referred to as Artificial General Intelligence (AGI), strong AI is associated with consciousness, sentience and mind. Society’s view on this is heavily influenced by science fiction books and movies

From HAL 9000 in ‘2001: A Space Odyssey’ to Skynet in the Terminator movies, there are plenty of stories of strong AI going horribly wrong. Benevolent fictional AI is perhaps rarer, but still very much in evidence. Examples include Star Trek, where the computer on the Enterprise is very capable, if somewhat devoid of personality, and the quirky hyper-intelligent Minds in the Culture series by Iain M Banks, who play a central role in looking after everyone’s well being and happiness.

The fictional view carries weight, because it drives public perceptions of AI, and there are also examples of how science fiction has driven actual science. For example, Marvin Minsky, perhaps the founding father of AI research, cited Isaac Asimov, and his story ‘Runaround’ where he coined the Three Laws of Robotics, as an inspiration for his interest in artificial minds.

Strong AI only exists in fiction today. You can find different views about how far away in the future it is, and also about how desirable or dangerous it would be. Most commentators seem to think that strong AI is decades away, but there are some who worry that it might just be one or two decades!

What we have today is weak AI, or narrow AI. This doesn’t have those qualities of sentience or consciousness, but is instead able to solve narrow, focused problems. This seems much less valuable, but even narrow AI can be interesting and extremely helpful to humanity.


One way in which weak AI is being vividly demonstrated is in games. It’s surprising to remember, but just thirty years ago it was widely believed that a computer could never beat a human being at chess. When IBM’s Deep Blue beat the world’s best human chess player, Garry Kasparov, in a six game match in May 1997, that perception was forever shattered.

IBM Deep Blue

After this success in chess, IBM went on in 2011 to successfully pit it’s Watson computer system against the world’s two best human players of Jeopardy. This is an American TV game show in which contestants are presented with general knowledge clues in the form of answers, and must phrase their responses in the form of questions. The show features wordplay, puns, and deep general knowledge. Watson won comfortably, proving that machines could now understand language, including idiom and puns, and respond back intelligibly to humans.

Both of these systems, however, have relied heavily on training data from past games between humans. DeepMind, an AI company that Google acquired in 2014, has stepped beyond this limitation with it’s AlphaGo system, that plays the ancient Chinese game of Go. This has much simpler rules than chess, but has a far far greater number of alternative moves each turn, and a number of possible positions that defies understanding - far more than the estimated number of atoms in the universe. The most recent version, known as AlphaGo Zero, is remarkable in that it trained itself from scratch with no human input by playing millions of games against itself and learning from them. This approach has shown us that the step of training a machine learning system with copious data and human expertise is not always required – a significant advance.

Real World Applications

Large companies such as IBM invest in game-winning computer systems as they provide such a vivid demonstration of their technology in action. Watching Watson winning at Jeopardy was a compelling and ground-breaking spectacle.

However, IBM has to make money from Watson. Since Jeopardy, IBM has taken the Watson software, and have enhanced and decomposed it into individual services that can be accessed via APIs on it’s Bluemix cloud platform. Anyone can now use these and incorporate machine learning into their applications. It’s not just IBM. Amazon, Google, Microsoft, HP Enterprise, and others all have cloud based machine learning APIs.

IBM has also produced substantial applications using this technology, such as Watson for Oncology, which helps cancer doctors with diagnosis and treatment. At the same time, DeepMind is working with the UK’s National Health Service, with the aim of employing AI to help relieve the pressure on clinicians. These are important uses of narrow AI which move things forward for humankind.

Other applications in such diverse fields as HR and recruitment, self driving vehicles, IT service delivery, customer service, cyber security, and buying online advertising are rapidly emerging. So AI is here today, and it’s about driving profits, not just winning at games.

Should we Panic?

This array of capabilities and applications shows that that weak AI is able to add a lot of value in many areas - but it still has sharp limits. There are still many areas where it simply can’t compete with humans. Jobs that rely heavily on empathy, ethical judgment, creativity, exploration, understanding, analysis, and the application of knowledge will perhaps always be performed better by humans.

Testing roles require all of these qualities, which is why I don’t think that we need to panic just yet about the advent of machine intelligence. AI can certainly help testers and, as time goes by, there will be more and more examples where integration of AI into testing tools saves testers time and effort. This will have the effect of moving the actual activities of testers further into the areas where only humans can operate.

Learning from Defects

At ROQ we have been experimenting with the value that AI can deliver from analysing the data associated with testing, such as defects, logs and tickets.

In particular we have achieved some interesting results from using IBM Watson Analytics, an AI powered data analytics service, to examine the defect logs from a programme of work that involved testing a browser based application and the underlying database.

Defect logs

Whilst we had some thoughts on the potential areas and questions we might be able to answer, there was also real value from the queries that Watson suggested when presented with our defect data. Our early work showed that we could gain real insight into the patterns of defect severity along with the relationship between Actual and Estimated resolution times for defects as well as which testers were best at detecting particular types of defects. We also see potential for investigating defect root causes using the AI tools and perhaps even monitoring whether improvement initiatives (in areas such as data and environment management perhaps) are actually delivering the benefits for our testing process.

Testing AI

The other question for testers around AI is how testers should approach testing applications that have AI in them. As AI systems are primed with training data rather than coded, it’s hard to produce a definitive set of test cases as, unlike transactional systems, there is no deterministic result to a query directed at an AI system. There are no simple answers to this problem.

One of the more common type of AI system that is being implemented today is the chatbot. When a user approaches a chatbot, the most important task of the AI is to correctly identify what the user’s intent is, so that the chatbot can help them achieve their goal. This is done by training the AI with many examples of how users would express their goal, exactly as it would be typed or spoken by real users. Once identified, each of these is tagged with the intent of the user. For a worked example of this, and to learn how to test a chatbot, look out for our upcoming blog post on this topic.

For an example of testing a different type of AI system, you might be interested to read ’Test Automation for Machine Learning: An Experience Report’ by Angie Jones, now a senior automation engineer at Twitter. This article was published in the April 2017 edition of Testing Trapeze magazine, and it describes the approach that she devised to testing a machine learning algorithm and what she learned along the way. It’s a great illustration of how testers will need to think in order to engage with this type of problem.


Yes, the robots are coming and they will change what testers do, and how they do it. However, organisations still need to achieve the same outcome of software that works, and they will continue to need empathic, creative, curious testers in order to successfully do that.

Ian Smith, Head of Innovation - ROQ