Parallel Selenium testing

Parallel Selenium testing

In this post we publish an English translation of Dima Yakubosky’s first talk on SeleniumCamp 2012 conference which took place in February in Kiev, Ukraine. Text precedes a slide it is related to. Here it is …

What I will talk about?

I will tell you about the standard instruments, with the help of which we can execute tests in parallel – TestNG and Selenium Grid. And about a real system that can launch tests in different browsers, parallel to each other.

I will show you some problems of executing the same test in parallel – and a few ways to fix it. Also we will talk about when it is appropriate to increase the amount of parallel threads you are using.


Lets look at an elementary set of tests, containing two tests – a and b, where they do not depend on each other.

There are helper functions – prepareBrowser(), freeBrowser() in which we are preparing and freeing our browser, and main{}.

To begin, we need to launch a() and b() individually from each other. As I said before, the TestNG testing framework will help us with this. It can execute your tests in parallel. It can also execute some function just before or right after your tests. For this, such methods and test methods should be marked with special annotations @BeforeMethod, @AfterMethod and @Test.

This is how our class looks now. The method – main() can remain empty.

To execute our tests in parallel with the help of TestNG, we will set parameters in the testng.xml configuration file, which you see in the picture. Let’s not get into details – you can read about it all on the TestNG website. Just notice that we are defining a parameter -thread-count=2 and parallel=methods. We launch TestNG with Ant, so besides the configuration file, we need to prepare a build.xml file for Ant.

When we launch TestNG, the following things will happen: TestNG will find all the methods marked as @Test. After this, TestNG will create two parallel thread, and will start executing each method in its own thread. Before and after each method, prepareBrowser() and freeBrowser() are executed in each thread.

That’s it, tests a() and b() are executed in parallel. Now if you start your Selenium Server and run your tests, you’ll have two browsers, one for each test. I must say that TestNG doesn’t exactly execute all threads at the same time, it does it with a slight delay. In any case, it looks so for us as observers i.e. we see how one test starts working, and in some time – the other. The order TestNG selects them in is random. Remember this, it’ll be useful later.

You might want to execute your tests in several browsers. This can be done with the help of TestNG. A special method is created which acts as a DataProvider, in this case it will supply the “*chrome” and “*opera” strings. TestNG executes each test twice, once for each parameter.

This way we can have many tests running at the same time, and at some point one machine may not be enough to run them all. Besides if you run many tests on the same Selenium server it is hard for each process to get a screenshot as another browser may be in the way. Also you might need to run tests on different OSs and you will need more than one server to do that. In cases like these it makes sense to use several servers.

Good, we now have several virtual machines running a Selenium Server prepared. But, we need to somehow distribute these tests among the machines. Doing this manually is not very easy.

The Selenium Grid solves this problem with ease. The Selenium Grid consists of two parts: a Selenium Hub and a Selenium RC. An RC is a selenium server and there can be several in one grid. The Hub basically distributes RCs among tests. So the Hub is the single access point to the Selenium RC pool. It is also transparent, tests don’t know that they are interacting with the hub, it basically translates commands and results. If test contact the Hub, but there are no RCs available, test will wait until there is one available. Only then they can begin to work with an RC.
The hub allows us to pool every RC on different machines and OSs.

On that note, let’s wrap up on basics of parallel testing, next I will talk about the nuances involved with parallel tests.

I have to mention how we execute tests in different browsers, and what is our web application under test (WAUT). To automate testing, we wrote our own system – Nerrvana.

To execute tests, we load them into the system, and tell when, in what browsers and in how many threads for each browser to execute in. When the time comes for another execution, the system creates a Selenium Grid on virtual machines for each environment; a hub and an amount of RC’s – one RC per one thread. Tests are loaded and executed on each hub. When they are finished, results are downloaded and placed in separate folders. This way we can have either different tests, or the same test in different browsers, executing at the same time.

At some point we realised that others may want to use our system, and we are soon releasing it as a service – www.nerrvana.com. If you are interested, you are welcome to beta test it!

Let me explain a few things about our system a little further you can use. As you can see, tests require a substantial amount of machines. Let’s say we need to execute a test in 5 browsers, and 2 threads per browser, we will need 5*(2+1)=15 virtual machines. This is too much for one physical machine. That is why we have three physical machines. They are regular machines – dual core Intel processers, 2GB RAM. Each can support around 8 virtual machines, but with an amount like that, the time taken for tests becomes greater.

The most resource-intensive process is the start of a virtual machine. The tests themselves and browsers don’t overload the server as much. The system itself control the distribution of its workload among xen-servers.

A machine starts like this: we have three templates of virtual machines: Linux Tester, Linux Hub and Windows Tester. When a virtual machine is created, a copy is made and all work is done with this copy.

But simply copying it is a time consuming process, besides, a few of these operations slow each other down. So we use the following capability of LVM. Image isn’t copied completely. VM is started and while it is working only differences between template and started machine are recorded. Because tests work for a fairly short amount of time, and barely change it, this doesn’t change the work speed, but reduces the time needed to create a virtual machine image from 1 minute (copying 2-3 gigabytes) to a few seconds (creating a virtual snapshot).

Starting a machine takes from 20 to 50 seconds. Windows machines take a little longer to start. Machines are prepared early by the system, to start tests as soon as possible. There is a limit to how many can be started in advance – available memory, as the virtual machines must have enough RAM. Hub – 128Mb, Linux tester – 196Mb, Windows tester – 256Mb.

As I said, source code of our tests are copied to a hub of each grid and executed when the tests are launched. Question: How do we supply different browsers with the same source code? The solution was simple; before starting each hub, we edit the grid_configuration.yml file so that when the browser queries *chrome, the tests will have access to any available RC. For each grid it will be an RC with the required browser.

The interface for planning the execution of tests. As you see, there are 5 browsers on 2 OSs.

In the system’s report, we can see the results for each browser. Also (will talk шт my second presentation about reports), tests can send alerts to the system, for example; all events marked as WARN or above.

You can also see the execution time for each browser.

Now a few words regarding our WAUT. This is a simple question/answer web application. Questions can be asked, and answered. You can obviously search and filter results using tags. There are users with different sets of privileges.

Also, questions can be edited and answers can be marked as correct. All these functions are checked by Selenium’s tests.

Now that you can imagine how our application under test works, let’s talk about the nuances.

In one instance, when the same test is running in different browsers, the test creates a few questions with different tags and amount of answers, and then checks the filter by using tags, date and amount of answers. First thing that comes to mind – each test needs to name it’s questions and tags in a unique way, or the test won’t be able to tell which questions are his, and which aren’t. Or it could delete other tests question instead of it’s own.

Out of the blue, it became apparent that a unique keys is not always the answer. We faced the following problem: parallel tests create a load of questions, and the questions page only displays a recent few, 15 for example. At some point one of the test realises, that its questions have been pushed of the page without it knowing, and begins to panic.

We couldn’t find a way out, which didn’t affect our Answers application itself. We decided that we will let tests see only the questions they have asked. To minimise the tests effects on the product itself, we implemented this with the help of tags, which were already unique to each test and Answers could filter by using these tags. Now test before starting its work use a url like this:
?tags_for_tests=unique_tag1,unique_tag2,unique_tag3
If Answers gets a request like this, any following queries for extracting questions and tags have an extra condition added “AND WHERE tag IN (unique_tag1, unique_tag2, unique_tag3)”. The passed value is stored in a session, so it only works for the test which entered this value. Done!

How do we test the production version of our system, if all added questions become visible to public? This means that questions asked by tests must be marked somehow, and not shown to real users. How do we do this?

If tests constantly worked under the same user name, we could filter out questions related to tests by that user. But Answers allows guests, who don’t have a user_id, to ask questions. This means that tests have to find another way to signal Answers. This can be done in a few ways; for example, we could use a special substring in the subject of each question, and filter the results that way. But this is a very costly alternative.

Because we already have a parameter named tags_for_tests, which all tests set as they begin to work, we can just check if this parameter is present. If it is, the question is from a test, we will mark it as such and keep it hidden from everyone, besides tests.

Let’s return to parallel test execution. I have prepared a graph with the times taken to execute with a certain amount of threads – from 1 to 25. In my tests there are 7 classes which can be executed in parallel, but for some reason the time stops changing after 5 threads! Let’s try to figure out when and why there is no point in increasing the number of threads.

First, it is obvious that there is not point in setting a large amount of threads, like the amount of methods or classes which you execute in parallel.

Second, if all your tests are executed on the same machine, it becomes apparent after a while, that they begin to slow each other down. Even though the overall execution time would be a little less, each test will be running much slower, unlike a one thread execution. It may change your expectations about a speed of work of your WAUT and cause unexpected timeouts.

Thirdly, your WAUT may not have enough ‘stamina’ and we end up with the same result – timeouts.

Fourth, let’s examine this interesting example. TestNG, by default, distributes tests over available threads randomly. Let’s say we have 4 tests with lengths of 4 minutes, 1 minute and 2 of 2 minutes. If we have 1 or 4 threads, everything is peachy.

Now let’s see, how work time changes with three threads.

And with two, the difference is noticeable right?

It’s probably a good idea to group shorter tests together. It would be nice, if we could tell TestNG the approximate length of tests, so it could launch tests in this way, by itself.

Give some thought to your parallel testing, and, may be, more servers won’t be needed for faster testing.



Print this post | Home

2 comments

  1. Igor says:

    We have just added parallel testing option in Nerrvana.

  2. MAdhumitha says:

    Very useful post