The Green Report | Combining Desktop and Web Automation with the Abstraction Pattern

Combining Desktop and Web Automation with the Abstraction Pattern

Jun 15th 2025 17 min read

hard

desktop

web

integration

cross-platform

When building robust automation frameworks, we often face the challenge of testing both web-based and desktop interfaces. Rather than writing completely separate logic for each platform, we can apply the Abstraction Pattern to unify our automation efforts under a common interface. In this post, we'll explore how to design a flexible, scalable solution that supports both Selenium-based web automation and PyAutoGUI-powered desktop automation — allowing our test logic to work seamlessly across platforms.

The Abstraction Pattern

When building a framework that supports both web and desktop automation, one of the most powerful design choices is using the abstraction pattern. This pattern allows us to define a consistent interface that different implementations can follow — no matter the underlying technology (Selenium for web, PyAutoGUI for desktop, etc.).

Defining the UIAutomationInterface

At the core of the abstraction is the UIAutomationInterface, an abstract base class that outlines all the actions our automation system must support, such as click, type_text, find_element, and take_screenshot.

Here's a simplified version:

                
from abc import ABC, abstractmethod

class UIAutomationInterface(ABC):
    @abstractmethod
    def click(self, element_identifier: str) -> bool: pass

    @abstractmethod
    def type_text(self, element_identifier: str, text: str) -> bool: pass

    @abstractmethod
    def find_element(self, element_identifier: str): pass

    @abstractmethod
    def get_text(self, element_identifier: str) -> str: pass

    @abstractmethod
    def wait_for_element(self, element_identifier: str, timeout: int = 10) -> bool: pass

    @abstractmethod
    def take_screenshot(self, filename: str) -> bool: pass

    @abstractmethod
    def close(self) -> None: pass

This interface defines a contract that any automation class must fulfill — whether it's interacting with a browser or the desktop UI.

Why Decouple Logic from Implementation?

By decoupling our test logic from the implementation details, we gain several important benefits:

Reusability Across Platforms: We can write one set of tests and run them on different platforms by simply swapping out the automation engine.
Easier Maintenance: Changes in implementation (e.g., switching from PyAutoGUI to a different desktop automation tool) don't affect the core test logic as long as the new tool still implements the same interface.
Test Portability: We can run the same test scenario — like logging in — across both the web app and desktop version of the application without rewriting test steps.
Scalability: As our automation grows, we can add more platforms (mobile, smart TVs, etc.) by adding new classes that implement the same interface.

Implementing Web Automation

With the interface in place, the next step is to implement the UIAutomationInterface for web testing. This is where Selenium WebDriver comes into play.

The WebAutomation class encapsulates browser-based actions like clicking, typing, and navigation — while abstracting away the WebDriver-specific complexities from our test logic.

Browser Initialization

The constructor supports launching Chrome or Firefox, with optional headless mode for CI pipelines or performance runs:

                
if browser.lower() == 'chrome':
    options = webdriver.ChromeOptions()
    if headless:
        options.add_argument('--headless')
    self.driver = webdriver.Chrome(options=options)

This makes the framework adaptable across multiple environments and testing needs.

Locator Strategy Parsing

Rather than hard-coding Selenium's By strategies everywhere, this implementation uses a clean prefix-based convention:

                
def _parse_element_identifier(self, element_identifier: str) -> Tuple[str, str]:
    if element_identifier.startswith('id:'):
        return By.ID, element_identifier[3:]
    elif element_identifier.startswith('xpath:'):
        return By.XPATH, element_identifier[6:]
    # More strategies...

This allows us to define element locators declaratively in a config file:

                
"username_field": {
  "web": "id:username"
}

The benefit? Element definitions are centralized and flexible — easy to override per platform without touching the test code.

Interacting with Web Elements

Actions like click() and type_text() are implemented using standard Selenium methods — but wrapped with WebDriverWait to improve stability:

                
def click(self, element_identifier: str) -> bool:
    by, value = self._parse_element_identifier(element_identifier)
    element = self.wait.until(EC.element_to_be_clickable((by, value)))
    element.click()

This ensures tests wait for elements to be ready before interacting, reducing flakiness caused by timing issues or slow-loading UIs.

Capturing State

The take_screenshot() method uses driver.save_screenshot() to preserve the browser state — useful for debugging failures:

                
def take_screenshot(self, filename: str) -> bool:
    return self.driver.save_screenshot(filename)

Challenges and Benefits of Selenium

While Selenium is robust and widely supported, it comes with trade-offs:

Challenges: Benefits:

Implementing Desktop Automation

To support non-web platforms, the framework also includes a DesktopAutomation class that handles interactions with desktop applications using PyAutoGUI.

PyAutoGUI enables automation by simulating keyboard and mouse events — including clicks, typing, and screenshots — based on pixel recognition.

Using PyAutoGUI for UI Actions

Unlike Selenium, PyAutoGUI doesn't use DOM locators. Instead, it relies on image recognition and coordinates. Here's how the click() method interprets identifiers:

                
def click(self, element_identifier: str) -> bool:
    if element_identifier.startswith('text:'):
        location = pyautogui.locateOnScreen(f'text_{text}.png')
    elif ',' in element_identifier:
        x, y = map(int, element_identifier.split(','))
        pyautogui.click(x, y)
    else:
        location = pyautogui.locateOnScreen(element_identifier)

This allows test configuration to define desktop elements in flexible ways:

Each strategy supports slightly different use cases, making the approach versatile for testing custom GUIs.

Typing and Text Entry

To enter text in a desktop field, the framework clicks on the element first, then types using PyAutoGUI:

                
def type_text(self, element_identifier: str, text: str) -> bool:
    if self.click(element_identifier):
        pyautogui.typewrite(text)

This simulates real user input and works even for native input fields outside the browser.

Element Detection with Waiting Logic

The wait_for_element() method continually polls for a match using image-based detection:

                
def wait_for_element(self, element_identifier: str, timeout: int = 10) -> bool:
    while time.time() - start_time < timeout:
        if self.find_element(element_identifier):
            return True

This helps reduce test flakiness in slower or animation-heavy desktop environments.

OCR and Limitations

While PyAutoGUI excels at taking screenshots, directly extracting text from those screenshots presents a challenge. This process typically relies on Optical Character Recognition (OCR), a capability not built into PyAutoGUI.

I encourage you to implement text extraction as a practice by integrating an OCR solution. As the current get_text method illustrates:

                
def get_text(self, element_identifier: str) -> str:
    print("Warning: Text extraction from desktop requires OCR implementation")
    return ""

If text validation is a critical part of your automation, I highly recommend you explore and integrate tools like Tesseract OCR or similar solutions into your workflow. Taking the time to set this up will significantly enhance the capabilities of your automation scripts.

Desktop Automation Limitations

While powerful, desktop automation comes with caveats:

Limitations: Benefits:

Using the Abstraction

One of the key strengths of this framework is the ability to run the same test logic across different platforms — web or desktop — with no changes to the test flow. This is made possible by leveraging abstraction, configuration management, and the factory pattern.

Running the Test Suite with a Single Flag

To run tests for either platform, we simply specify the platform in the automation_config.json file:

                
{
  "platform": "web"  // or "desktop"
}

Then, execute our test suite:

                
test_suite = CrossPlatformTestSuite('automation_config.json')
test_suite.run_test_suite()

The same test case definitions are reused regardless of whether we're testing a browser app or a desktop UI. This is the core benefit of designing with platform-agnostic logic.

Dynamic Configuration Loading

The AutomationConfig class loads the configuration from a JSON file and provides platform-specific settings and element identifiers:

                
def get_element_identifier(self, element_name: str) -> str:
    platform = self.config.get('platform', 'web')
    return elements[element_name][platform]

Each element in our test — like a button or input field — maps to different locators per platform:

                
"elements": {
  "login_button": {
    "web": "id:login-btn",
    "desktop": "login_button.png"
  }
}

This allows tests to remain the same while adapting behavior based on the platform.

Factory Pattern for Platform Initialization

The AutomationFactory takes care of instantiating the correct automation class (WebAutomation or DesktopAutomation) depending on the platform set in the config:

                
if platform.lower() == 'web':
    return WebAutomation(...)
elif platform.lower() == 'desktop':
    return DesktopAutomation()

The test suite doesn't need to know the details — it just works with the abstract UIAutomationInterface.

Benefits of This Design:

Zero changes to test logic when switching platforms
Centralized configuration for all platform-specific details
Easier maintenance when UI changes occur
Reusability of test cases across environments
Extensibility for future platforms (e.g., mobile, CLI) via plug-and-play interface classes

Advanced Considerations

While our framework already supports clean abstraction and cross-platform execution, real-world QA often throws curveballs that require more advanced handling. Here are some key enhancements and techniques worth considering as we scale our automation efforts.

Handling Hybrid Apps

Many modern applications aren't purely web or desktop — they're hybrid apps. For example:

To handle this, we can enhance the factory and interface logic to combine multiple automation engines within the same test run.

Example Strategy:

Use a composite automation interface that delegates actions to the appropriate engine:

                
class HybridAutomation(UIAutomationInterface):
    def __init__(self):
        self.web = WebAutomation(...)
        self.desktop = DesktopAutomation()

    def click(self, element_id):
        if element_id.startswith("web:"):
            return self.web.click(element_id[4:])
        elif element_id.startswith("desktop:"):
            return self.desktop.click(element_id[8:])

Configure hybrid elements explicitly in automation_config.json:

                
"login_button": {
  "hybrid": "web:id:login-btn",
  "desktop": "login_button.png"
}

Screenshot Naming Conventions

Good screenshot practices can greatly aid debugging — especially in CI environments or when reviewing test failures from remote teams.

Suggested Naming Pattern:

                
<test_name>_<step>_<timestamp>.png

Example:

                
login_failure_20250614_123045.png

In our take_screenshot() method, we could automate this:

                
def take_screenshot(self, test_name: str, step: str) -> str:
    timestamp = time.strftime("%Y%m%d_%H%M%S")
    filename = f"{test_name}_{step}_{timestamp}.png"
    pyautogui.screenshot().save(filename)
    return filename

This structure helps correlate screenshots to logs and makes archiving or auto-deleting based on age easier.

Extending the Interface

Our UIAutomationInterface is just a starting point. As our test coverage expands, we might want to support additional interactions like:

As our automation grows, so should our tooling. Abstraction doesn't mean limitation — it's a foundation for extensibility.

Conclusion

Building a cross-platform automation framework using abstraction, configuration, and the factory pattern provides numerous advantages. It allows the same test logic to be reused across web and desktop environments without code duplication, making the test suite more maintainable and scalable. By separating configuration from execution, the platform can be switched simply by modifying a config file, removing the need to rewrite test cases. This design also simplifies debugging through consistent screenshot naming and logging mechanisms, making it easier to identify issues across platforms.

This pattern is particularly beneficial when our application spans multiple interfaces, such as a desktop application that opens a web browser or when our testing needs to evolve rapidly. It's also ideal for QA engineers who want to focus on test scenarios rather than the details of automation implementation, and for teams looking to streamline their test strategy across different technologies.

Looking ahead, this framework can be extended to support mobile platforms using tools like Appium or even API-level testing by integrating HTTP clients and validation logic. With minimal effort, the interface can also support advanced interactions like drag-and-drop or OCR-based element detection.

The full code implementation discussed in this blog post is available on our GitHub page, where you're welcome to experiment, fork, and adapt it to your own projects. Whether you're testing a desktop, web, or eventually a mobile app, this approach lays a strong foundation for flexible and unified test automation.