Or press ESC to close.

Detecting Duplicate XPath Locators with Puppeteer

Jul 7th 2024 16 min read
medium
puppeteer22.12.1
javascriptES6
nodejs20.15.0
acorn8.12.0

Have you ever encountered this situation? You're writing a test script and need to target an element. You add an XPath locator (perhaps because there's no automation ID). Then, you discover another locator targeting the same element that already exists under a different name. As you update your code, you find yet another locator for the same element! This redundancy can lead to maintenance headaches. In this blog post, we'll explore how to write a script to identify all locators targeting the same element in our test scripts.

Test Example

Before we dive into the technical details, let's look at a small example to understand the problem of duplicate XPath locators. Here, we have a Playwright page class that represents a simple web page. This class contains several locators, some of which point to the same element:

                        
const { expect } = require("@playwright/test");

class HomePage {
    constructor(page) {
        this.page = page;
    }
                            
    get firstPlantTypeSelector() {
        return this.page.locator(`//select[@name='plant-type']`);
    }
                            
    get secondPlantTypeSelector() {
        return this.page.locator(`//select[@id='plants']`);
    }
                            
    get thirdPlantTypeSelector() {
        return this.page.locator(`//div[@class='controls']/select[1]`);
    }
                            
    get firstGardenNotesTextArea() {
        return this.page.locator(`//textarea[@id='garden-notes-auto-id']`);
    }
                            
    get secondGardenNotesTextArea() {
        return this.page.locator(`//div[@class='main-container']/textarea`);
    }
                            
    // Additional locators...
                            
    async selectPlantType() {
        await this.firstPlantTypeSelector.click();
    }
                            
    async enterGardenNotes(notes) {
        await this.firstGardenNotesTextArea.fill(notes);
        await this.trackButton.click();
    }
                            
    async verifyTaskListValue() {
        await expect(this.thirdVegetableName).toContainText("Tomatoes");
    }
}
                            
module.exports = HomePage;
                      

In this example, firstPlantTypeSelector, secondPlantTypeSelector, and thirdPlantTypeSelector are different locators pointing to the same select element on the web page. Similarly, firstGardenNotesTextArea and secondGardenNotesTextArea point to the same textarea element. Identifying and managing such duplicates is essential to keep the test scripts clean and maintainable.

The Parser

The parser's main purpose is to read through the test script, locate the XPath locators, and extract them for further analysis. The specific implementation can differ based on how the locators are defined and used within our script. For instance, in JavaScript-based test scripts, XPath locators might be embedded within template literals or string literals, often within page object model classes.

The code reads the content of the file specified by filePath and parses it into an Abstract Syntax Tree (AST) using the Acorn parser.

                        
const code = fs.readFileSync(filePath, "utf-8");
const ast = Parser.parse(code, { ecmaVersion: 2020 });
                      

An empty array named locators is initialized. This array will hold the locator objects that are extracted from the file.

                        
let locators = [];
                      

The traverse function is defined to recursively traverse the AST. It takes two arguments: the current node and its parent node. If the node is an array, it recursively traverses each child node. If the node is an object, it checks for specific patterns to identify locators.

                        
function traverse(node, parent) {
    if (Array.isArray(node)) {
        node.forEach((child) => traverse(child, parent));
        return;
    }
                            
    if (node && typeof node === "object") {
        if (
            node.type === "CallExpression" &&
            node.callee.type === "MemberExpression"
        ) {
                      

Within the traverse function, the code checks if the current node represents a call expression on a page object. If so, it extracts the locator (XPath) value from the argument of the call expression and adds it to the locators array.

                        
    const pageObject = node.callee.object;
    if (
        pageObject.type === "MemberExpression" &&
        pageObject.object.type === "ThisExpression" &&
        pageObject.property.name === "page"
    ) {
        const argument = node.arguments[0];
        if (argument.type === "TemplateLiteral") {
            const functionName = parent && parent.key && parent.key.name ? parent.key.name : "";
            locators.push({
                name: functionName,
                value: argument.quasis
                .map((quasi) => quasi.value.cooked)
                .join(""),
            });
        }
    }
}
                      

After checking for locators, the traverse function recursively processes all child nodes of the current node.

                        
        for (let key in node) {
            if (node[key] && typeof node[key] === "object") {
                traverse(node[key], parent);
            }
        }
    }
}
                      

The function iterates over the body of the AST, specifically looking for class declarations. For each class, it checks for getter methods and traverses their body to find locators.

                        
ast.body.forEach((node) => {
    if (node.type === "ClassDeclaration") {
        node.body.body.forEach((method) => {
            if (method.kind === "get") {
                traverse(method.value.body, method);
            }
        });
    }
});
                      

Finally, the function returns the array of locators that were found in the file.

                        
    return locators;
}
                      

Applying this code now to the aforementioned page class would result in an output similar to this one:

                        
[
    {
        name: 'firstPlantTypeSelector',
        value: "//select[@name='plant-type']"
    },
    {   name: 'secondPlantTypeSelector', 
        value: "//select[@id='plants']" 
    },
    {
        name: 'thirdPlantTypeSelector',
        value: "//div[@class='controls']/select[1]"
    },
    // remaining locators
]
                      

Detecting Duplicates with Puppeteer

Once we have extracted the XPath locators from our test script, the next step is to identify which of these locators point to the same element on the web page. We can achieve this using Puppeteer, a Node library that provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol.

First, we need to launch a new browser instance and open a new page. Puppeteer provides a straightforward API to accomplish this.

                        
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
                      

Next, we iterate through each locator, use it to find elements on the page, and store these elements in a dictionary (elementHandles) based on their HTML content. This helps in identifying elements that are identical despite being located using different XPaths.

                        
const elementHandles = {};

for (let locator of locators) {
    const elementHandle = await page.$$(`xpath/.${locator.value}`);
    if (elementHandle.length > 0) {
        const element = elementHandle[0];
        const elementId = await page.evaluate((el) => el.outerHTML, element);
                            
        if (!elementHandles[elementId]) {
            elementHandles[elementId] = [];
        }
        elementHandles[elementId].push(locator);
    }
}
                      

The page.$$() function selects elements on the page using the provided XPath locator. Meanwhile, the page.evaluate() function evaluates a specified function within the context of the page and returns the result. In this context, it retrieves the outer HTML of the element to use as a unique identifier.

After extracting the elements, we group locators that point to the same element. This is done by filtering the dictionary to find entries that have more than one locator.

                        
const result = Object.values(elementHandles).filter(
    (group) => group.length > 1
);
                      

Finally, we close the browser and return the result, which contains groups of duplicate locators.

                        
await browser.close();
return result;
                      

Putting It All Together

In this section, we'll integrate the parser and the Puppeteer-based duplicate detection to identify duplicate XPath locators in a test script and analyze the results.

The script starts by importing the necessary functions from the parser and locator detection modules.

                        
import { checkLocatorEquivalence } from "./locatorDetection.js";
import { getLocatorsFromFile } from "./parser.js";
                      

Next, the URL of the web page to be tested and the file path of the test script containing the locators are defined.

                        
const url = ""; // URL of the webpage under test
const filePath = ""; // File path to the script containing locators
                      

The getLocatorsFromFile function reads the specified test script file, parses it, and extracts the XPath locators.

                        
let locators = getLocatorsFromFile(filePath);
                      

The checkLocatorEquivalence function is called with the URL and the extracted locators. The function returns a promise that resolves to an array of duplicate locators. If duplicates are found, they are logged to the console, and further actions (such as removing or flagging them) can be taken. If no duplicates are found, a message indicating this is logged to the console.

                        
await checkLocatorEquivalence(url, locators).then((duplicates) => {
    if (duplicates.length > 0) {
        console.log("Duplicate locators found:", duplicates);
        // Handle duplicates as needed, e.g., remove or flag them
    } else {
        console.log("No duplicate locators found.");
    }
});
                      

Understanding the Response

The script's output consists of a list of duplicate locators if any are found. Each entry in this list represents a group of locators that point to the same element. For instance:

                        
Duplicate locators found: [
    [
        { 
            name: "firstPlantTypeSelector", 
            value: "//select[@name='plant-type']" 
        },
        { 
            name: "secondPlantTypeSelector", 
            value: "//select[@id='plants']" 
        },
        { 
            name: "thirdPlantTypeSelector", 
            value: "//div[@class='controls']/select[1]" 
        }
    ],
    [
        { 
            name: "firstGardenNotesTextArea", 
            value: "//textarea[@id='garden-notes-auto-id']" 
        },
        { 
            name: "secondGardenNotesTextArea", 
            value: "//div[@class='main-container']/textarea" 
        }
    ]
]
                      

In this example, three locators (firstPlantTypeSelector, secondPlantTypeSelector, and thirdPlantTypeSelector) all point to the same select element, and two locators (firstGardenNotesTextArea and secondGardenNotesTextArea) point to the same textarea element. The response structure makes it easy to identify and handle duplicate locators in our test scripts.

Conclusion

In this blog post, we've walked through the process of identifying duplicate XPath locators in test scripts. We started by extracting locators from a file using a parser, then utilized Puppeteer to check for duplicate locators by navigating to the specified URL and grouping locators that point to the same element. By integrating these steps, we created a comprehensive solution to detect duplicate locators, helping to maintain clean and efficient test scripts.

As a next step, you can enhance this solution by implementing logic to automatically remove or flag duplicate locators once they have been identified. This will further streamline your test scripts and reduce potential errors.

The complete code for this project is available on our GitHub page. Feel free to clone the repository, try it out, and customize it to suit your needs. Happy coding!