When building robust automation frameworks, we often face the challenge of testing both web-based and desktop interfaces. Rather than writing completely separate logic for each platform, we can apply the Abstraction Pattern to unify our automation efforts under a common interface. In this post, we'll explore how to design a flexible, scalable solution that supports both Selenium-based web automation and PyAutoGUI-powered desktop automation — allowing our test logic to work seamlessly across platforms.
When building a framework that supports both web and desktop automation, one of the most powerful design choices is using the abstraction pattern. This pattern allows us to define a consistent interface that different implementations can follow — no matter the underlying technology (Selenium for web, PyAutoGUI for desktop, etc.).
At the core of the abstraction is the UIAutomationInterface, an abstract base class that outlines all the actions our automation system must support, such as click, type_text, find_element, and take_screenshot.
Here's a simplified version:
from abc import ABC, abstractmethod
class UIAutomationInterface(ABC):
@abstractmethod
def click(self, element_identifier: str) -> bool: pass
@abstractmethod
def type_text(self, element_identifier: str, text: str) -> bool: pass
@abstractmethod
def find_element(self, element_identifier: str): pass
@abstractmethod
def get_text(self, element_identifier: str) -> str: pass
@abstractmethod
def wait_for_element(self, element_identifier: str, timeout: int = 10) -> bool: pass
@abstractmethod
def take_screenshot(self, filename: str) -> bool: pass
@abstractmethod
def close(self) -> None: pass
This interface defines a contract that any automation class must fulfill — whether it's interacting with a browser or the desktop UI.
By decoupling our test logic from the implementation details, we gain several important benefits:
With the interface in place, the next step is to implement the UIAutomationInterface for web testing. This is where Selenium WebDriver comes into play.
The WebAutomation class encapsulates browser-based actions like clicking, typing, and navigation — while abstracting away the WebDriver-specific complexities from our test logic.
The constructor supports launching Chrome or Firefox, with optional headless mode for CI pipelines or performance runs:
if browser.lower() == 'chrome':
options = webdriver.ChromeOptions()
if headless:
options.add_argument('--headless')
self.driver = webdriver.Chrome(options=options)
This makes the framework adaptable across multiple environments and testing needs.
Rather than hard-coding Selenium's By strategies everywhere, this implementation uses a clean prefix-based convention:
def _parse_element_identifier(self, element_identifier: str) -> Tuple[str, str]:
if element_identifier.startswith('id:'):
return By.ID, element_identifier[3:]
elif element_identifier.startswith('xpath:'):
return By.XPATH, element_identifier[6:]
# More strategies...
This allows us to define element locators declaratively in a config file:
"username_field": {
"web": "id:username"
}
The benefit? Element definitions are centralized and flexible — easy to override per platform without touching the test code.
Actions like click() and type_text() are implemented using standard Selenium methods — but wrapped with WebDriverWait to improve stability:
def click(self, element_identifier: str) -> bool:
by, value = self._parse_element_identifier(element_identifier)
element = self.wait.until(EC.element_to_be_clickable((by, value)))
element.click()
This ensures tests wait for elements to be ready before interacting, reducing flakiness caused by timing issues or slow-loading UIs.
The take_screenshot() method uses driver.save_screenshot() to preserve the browser state — useful for debugging failures:
def take_screenshot(self, filename: str) -> bool:
return self.driver.save_screenshot(filename)
While Selenium is robust and widely supported, it comes with trade-offs:
Challenges: Benefits:To support non-web platforms, the framework also includes a DesktopAutomation class that handles interactions with desktop applications using PyAutoGUI.
PyAutoGUI enables automation by simulating keyboard and mouse events — including clicks, typing, and screenshots — based on pixel recognition.
Unlike Selenium, PyAutoGUI doesn't use DOM locators. Instead, it relies on image recognition and coordinates. Here's how the click() method interprets identifiers:
def click(self, element_identifier: str) -> bool:
if element_identifier.startswith('text:'):
location = pyautogui.locateOnScreen(f'text_{text}.png')
elif ',' in element_identifier:
x, y = map(int, element_identifier.split(','))
pyautogui.click(x, y)
else:
location = pyautogui.locateOnScreen(element_identifier)
This allows test configuration to define desktop elements in flexible ways:
Each strategy supports slightly different use cases, making the approach versatile for testing custom GUIs.
To enter text in a desktop field, the framework clicks on the element first, then types using PyAutoGUI:
def type_text(self, element_identifier: str, text: str) -> bool:
if self.click(element_identifier):
pyautogui.typewrite(text)
This simulates real user input and works even for native input fields outside the browser.
The wait_for_element() method continually polls for a match using image-based detection:
def wait_for_element(self, element_identifier: str, timeout: int = 10) -> bool:
while time.time() - start_time < timeout:
if self.find_element(element_identifier):
return True
This helps reduce test flakiness in slower or animation-heavy desktop environments.
While PyAutoGUI excels at taking screenshots, directly extracting text from those screenshots presents a challenge. This process typically relies on Optical Character Recognition (OCR), a capability not built into PyAutoGUI.
I encourage you to implement text extraction as a practice by integrating an OCR solution. As the current get_text method illustrates:
def get_text(self, element_identifier: str) -> str:
print("Warning: Text extraction from desktop requires OCR implementation")
return ""
If text validation is a critical part of your automation, I highly recommend you explore and integrate tools like Tesseract OCR or similar solutions into your workflow. Taking the time to set this up will significantly enhance the capabilities of your automation scripts.
While powerful, desktop automation comes with caveats:
Limitations: Benefits:One of the key strengths of this framework is the ability to run the same test logic across different platforms — web or desktop — with no changes to the test flow. This is made possible by leveraging abstraction, configuration management, and the factory pattern.
To run tests for either platform, we simply specify the platform in the automation_config.json file:
{
"platform": "web" // or "desktop"
}
Then, execute our test suite:
test_suite = CrossPlatformTestSuite('automation_config.json')
test_suite.run_test_suite()
The same test case definitions are reused regardless of whether we're testing a browser app or a desktop UI. This is the core benefit of designing with platform-agnostic logic.
The AutomationConfig class loads the configuration from a JSON file and provides platform-specific settings and element identifiers:
def get_element_identifier(self, element_name: str) -> str:
platform = self.config.get('platform', 'web')
return elements[element_name][platform]
Each element in our test — like a button or input field — maps to different locators per platform:
"elements": {
"login_button": {
"web": "id:login-btn",
"desktop": "login_button.png"
}
}
This allows tests to remain the same while adapting behavior based on the platform.
The AutomationFactory takes care of instantiating the correct automation class (WebAutomation or DesktopAutomation) depending on the platform set in the config:
if platform.lower() == 'web':
return WebAutomation(...)
elif platform.lower() == 'desktop':
return DesktopAutomation()
The test suite doesn't need to know the details — it just works with the abstract UIAutomationInterface.
Benefits of This Design:
While our framework already supports clean abstraction and cross-platform execution, real-world QA often throws curveballs that require more advanced handling. Here are some key enhancements and techniques worth considering as we scale our automation efforts.
Many modern applications aren't purely web or desktop — they're hybrid apps. For example:
To handle this, we can enhance the factory and interface logic to combine multiple automation engines within the same test run.
Example Strategy:Use a composite automation interface that delegates actions to the appropriate engine:
class HybridAutomation(UIAutomationInterface):
def __init__(self):
self.web = WebAutomation(...)
self.desktop = DesktopAutomation()
def click(self, element_id):
if element_id.startswith("web:"):
return self.web.click(element_id[4:])
elif element_id.startswith("desktop:"):
return self.desktop.click(element_id[8:])
Configure hybrid elements explicitly in automation_config.json:
"login_button": {
"hybrid": "web:id:login-btn",
"desktop": "login_button.png"
}
Good screenshot practices can greatly aid debugging — especially in CI environments or when reviewing test failures from remote teams.
Suggested Naming Pattern:
<test_name>_<step>_<timestamp>.png
Example:
login_failure_20250614_123045.png
In our take_screenshot() method, we could automate this:
def take_screenshot(self, test_name: str, step: str) -> str:
timestamp = time.strftime("%Y%m%d_%H%M%S")
filename = f"{test_name}_{step}_{timestamp}.png"
pyautogui.screenshot().save(filename)
return filename
This structure helps correlate screenshots to logs and makes archiving or auto-deleting based on age easier.
Our UIAutomationInterface is just a starting point. As our test coverage expands, we might want to support additional interactions like:
As our automation grows, so should our tooling. Abstraction doesn't mean limitation — it's a foundation for extensibility.
Building a cross-platform automation framework using abstraction, configuration, and the factory pattern provides numerous advantages. It allows the same test logic to be reused across web and desktop environments without code duplication, making the test suite more maintainable and scalable. By separating configuration from execution, the platform can be switched simply by modifying a config file, removing the need to rewrite test cases. This design also simplifies debugging through consistent screenshot naming and logging mechanisms, making it easier to identify issues across platforms.
This pattern is particularly beneficial when our application spans multiple interfaces, such as a desktop application that opens a web browser or when our testing needs to evolve rapidly. It's also ideal for QA engineers who want to focus on test scenarios rather than the details of automation implementation, and for teams looking to streamline their test strategy across different technologies.
Looking ahead, this framework can be extended to support mobile platforms using tools like Appium or even API-level testing by integrating HTTP clients and validation logic. With minimal effort, the interface can also support advanced interactions like drag-and-drop or OCR-based element detection.
The full code implementation discussed in this blog post is available on our GitHub page, where you're welcome to experiment, fork, and adapt it to your own projects. Whether you're testing a desktop, web, or eventually a mobile app, this approach lays a strong foundation for flexible and unified test automation.