What is Sikuli?

Sikuli is a test automation tool which uses image recognition algorithms to identify GUI elements. It provides it’s own IDE as well as libraries to be used e.g. in any Java project.

Pros

  • it lets you automate anything that is displayed on the screen
  • can reach anything that’s out of range for other test automation tools
  • there’s no need to have access to application’s structure, all Sikuli needs to interact  with the app are images / parts of the screenshots
  • cross-platform
  • freeware
  • it’s extremely easy to create simple automatic test, even for people with no technical / programming skills

Cons

  • relying only on image recognition has big limitations, e.g. application’s UI tends to change quite often, which requires keeping image set up to date. Even small change in colours/position can result in test failure
  • inconvenient image files versioning

Generally Sikuli isn’t the best solution for large test automation projects due to it’s limitations. However I’ve seen some people implementing large frameworks with Sikuli, using Page Objects design pattern etc. But this tool performs surprisingly well in untypical situations or small / fast automation tasks.

Let’s have a look at two test cases I created with Sikuli. First one is a combination of Selenium and Sikuli in Java. The other one is implemented solely in Sikuli IDE.

Example: Sikuli + Selenium WebDriver + TestNG + Java

Our test steps to automate are following:

  • Go to „https://code.google.com/p/altdrag”
  • Click „Download installer”
  • Click „Save File” button
  • Click „Save” button in the next window
  • Press Win+D to minimize all windows
  • Verify downloaded on the desktop
  • Delete above file

Step 1: Download and install Sikuli
http://www.sikuli.org
Step 2: Include Sikuli in your project
Add „sikuli-java.jar” library in your IDE
Step 3: Create screenshots
Picture2 Picture3 Picture4 Picture5
Step 4: Write the script:

package sikuli;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.sikuli.script.FindFailed;
import org.sikuli.script.Key;
import org.sikuli.script.KeyModifier;
import org.sikuli.script.Screen;
import org.testng.annotations.Test;

public class SikuliExample {

	@Test
	public void fileDownload() throws FindFailed {
		//WebDriver
		WebDriver driver = new FirefoxDriver();
		driver.get("https://code.google.com/p/altdrag");
		driver.findElement(By.linkText("Download installer"))
				.click();

		//Sikuli
		Screen screen = new Screen();
		screen.click("images/saveFile.PNG");
		screen.click("images/save.PNG");
		screen.type("d", KeyModifier.WIN);
		screen.wait("images/icon.PNG");
		screen.click("images/icon.PNG");
		screen.type(Key.DELETE);
		screen.click("images/yes.PNG");

		driver.close();
	}
}

Above test will use Selenium WebDriver to start Firefox browser, open desired page and click download button. Then Sikuli will handle Windows UI, which is our of range for Selenium. It’ll save the file, verify it exists on Desktop and then delete it.

Here’s the demo how the test execution looks like:

Example: Sikuli IDE

Sikuli provides it’s own IDE:

Picture6

It’s quite intuitive tool. There are all basic commands listed on the left panel: click(), hover(), type() etc. As you can see in above screenshot the test script is written in the mixture of code and screenshot thumbnails, which makes it perfectly understandable even for people who doesn’t know any programming language. Scripts are written in Jython, so you can extend it as you wish with any Python programming structures and Java libraries.
There are also few handy tools for creating screenshots, regions. The other helpful tool is matching preview window:

Picture7

It lets you see in real time what’s the result of the image recognition algorithm for given screenshot and current content displayed on the screen.

Ok, now we will automate following test flow in Sikuli IDE:

  • Run the script on Windows
  • Switch to Ubuntu on VMware
  • Open Firefox
  • Navigate to Rio de Janeiro on Google Maps
  • Collapse side panel
  • Enable Google Street View on Astoria Palace
  • Look around Copacapana until you find a girl on a bicycle
  • Zoom into the girl

As you see these steps would be a pain in the neck to implement in other test automation tools. We have to handle Windows UI, then enter into virtual machine, navigate to web page and then interact with dynamically generated web content.
In Sikuli we can do it with such simple script:

Picture8

The code is self-explanatory so let’s just see how it works:

Bottom line

I hope this post gives an idea what Sikuli is and when it could be useful. Certainly this tool is not something that you could rely all your test automation on, but it’s worth remembering as one day it could save you a lot of work with some simple tasks.