From a IT security point of view, the current approach to GUI test automation is careless or even dangerous. And here is why...

A general principle in IT security is to forbid everything and only allow what is really needed. This reduces your attack surface and with it the number of problems you can encounter. For most situations (e.g. when configuring a firewall), this means to apply a whitelist: forbid everything and allow only individual, listed exceptions. And make sure to review and document them.

Compare this to the current state of the art of test automation of software GUIs. With tools like Selenium — the quasi standard in web test automation — it is the other way around. These tools allow every change in the software under test (SUT), unless you manually create an explicit check. With regard to changes, this is a blacklisting approach. If you are familiar with software test automation, you know that this is for good reasons. It is because of both the brittleness of every such check and the maintenance effort it brings about. But apart from why it is that way, does it make sense? After all, false negatives (missing checks) will decay trust in your test automation.

To be defensive would mean to check everything and only allow individual and documented exceptions. Every other change to the software should be highlighted and reviewed. This is comparable to the “track changes” mode in Word or version control as used in source code. And it is the only way to not miss the dancing pony on the screen, that you didn’t create a check for. At the end of the day, this is what automated tests are for: to find regressions.

Of course, for that approach to work in practice, there are a few necessary preconditions:

  1. We need the execution of the system under test (SUT) to be *repeatable* (e.g. use the same test data). This is a very sensible idea anyway. And it is way easier with today’s tools of virtualization and containerization than it was a couple of years before.
  2. We need to deal with the *multiplication of changes*. Every change to the software shows up in multiple tests, probably multiple times. E.g. if the logo on a website changes, this may well affect each and every test. Yet it should be necessary to review a change only once.

The dose makes the poison

There is an ideal amount of checks for every software. Everything that can change without ever being a problem should not be checked. And everything that must not change should be checked.

There are two important considerations when choosing between the two approaches:

  1. How do you reach that middle ground in the most effective way?
  2. What “side” is less risky to approach it from, if the perfect spot is missed?

IT security guidelines recommend to err on the side of caution. So in case both approaches create an equal amount of effort, you should choose whitelisting. But, of course, you usually don’t have equal amounts of effort.

A real-life example

Imagine you have a software that features a table. In your GUI test, you should put a check for every column of every row. With seven columns and rows, this would mean 49 checks — just for the table. And if any of the displayed data ever changes, you have to copy and paste the changes manually to adjust the checks.

Starting with a whitelisting approach, the complete table is checked per default. You then only need to exclude volatile data or components (typically build-number or current date and time). And if the data ever changes, maintaining the test is way easier, because you usually (depending on the tool) have efficient ways to update the checks. Guess which of the two approaches requires less effort...

Text-based vs pixel-based whitelist tests

There are already tools out there that let you create whitelist tests. Some are purely visual/pixel-based, such as PDiff, Applitools and the like. This approach comes with its benefits and drawbacks. It is universally applicable — no matter if you check a website or a PDF document. But on the other hand, if the same change appears multiple times, it is hard to treat it with one go. Whitelisting of changes (i.e. excluding parts of the image) can be a problem, too.

Other tools are text based, such as Approval Test and TextTest. These tools are often more robust, but can only be applied to text. So checking PDFs, websites, or software GUIs is not directly possible. These tools also rely on third-party tools to do the actual text diffing and come usually without proper means to ignore changes. This can be a real pain-point, e.g. if the goal is to check XML, log files, or proprietary formats with some volatile data.

Shameless self-promotion

I am only aware of one tool that is semantic, can be applied to software GUIs, is not pixel-based (although it can be), and easily lets you ignore volatile elements: ReTest.

Tagged in : Post,

Follow us on Twitter:
Dr. Jeremias Rößler
Dr. Jeremias Rößler

hat am Lehrstuhl für Softwaretechnik an der Universität des Saarlandes promoviert und besitzt über zehn Jahre Berufserfahrung in der Entwicklung von Individualsoftware.
Folgen Sie ihm auf Twitter: