OCR integrate with Appium, Part 1 - Dilato Innovative Technology (Beijing) Limited

One of the unfortunate realities of mobile automation is that not every UI element is automatable in practice. This could be because the element is built from a custom class with no accessibility or automation support. It could be because there’s no uniquely identifying locator strategy and selector that can be applied to the element. Or imagine a 2D or 3D game with no traditional UI controls at all, just pixels painted by a rendering engine.

Historically, Appium hasn’t tried to support these use cases, and its backend offering support via the XCUITests and UiAutomator2, If they couldn’t find an element, Appium couldn’t find an element.
The Appium team finally decided to support a small set of visual detection features, which are available as of Appium 1.9.0.

In this article, we’ll take a look at the most common use case for these features, namely image element finding. In Part 2 of this series, we’ll look at other general OCR libraries that achieve image recognition.

Copy to Clipboard

Using Find-by-image strategy to identify the UI components:

Copy to Clipboard

Once we have an image element, we can click it just like any other element as well. Of course, for this to work we have to have a Base64-encoded version of our image file:

Copy to Clipboard

One great thing is that finding elements by image supports both implicit and explicit wait strategies, so your tests can robustly wait until your reference image matches something on the screen:

Copy to Clipboard

Here’s the relevant code

Copy to Clipboard

Share This Story, Choose Your Platform!

About the Author: Dilato

Related Posts

Pytest Fixture: Enhancing Test Maintainability

Pytest Plugins: Extending the Functionality of the Testing Framework

Fault Tolerance and Exception Handling in Test Automation