The Green Report | Downloads That Don't Fail: Automating File Verification

Downloads That Don't Fail: Automating File Verification

Sep 15th 2024 18 min read

medium

Verifying downloads in automation testing is often overlooked, yet it's crucial to ensure our files are delivered correctly every time. In this post, we'll dive into practical strategies and code examples to automate file verification, helping us catch download failures before they impact our users. Say goodbye to unreliable downloads and hello to robust, automated file validation!

Challenges in Download Verification

Download verification is a critical part of automation testing, but it's often overlooked or considered too challenging to implement effectively. Unlike testing UI interactions or form submissions, verifying downloads involves working outside the browser's scope, dealing with file systems, and handling asynchronous processes, which can complicate test automation. Here are some common challenges that make download verification tricky:

Timing Issues and Asynchronous Downloads: One of the biggest challenges in download verification is handling timing. Downloads are asynchronous and can vary in duration depending on file size, network speed, or server response time. If a test script moves forward before the download is complete, it may incorrectly report a failure or miss checking the file altogether. Synchronizing test actions with download completion requires carefully designed wait mechanisms to avoid false negatives or timeouts.
Unknown or Dynamic Download Locations: Automation frameworks often download files to default locations or temporary directories, which can vary depending on the environment or browser settings. If our test script doesn't explicitly set a known download directory, locating and verifying the downloaded file becomes cumbersome. Managing and consistently configuring the download paths across different test environments is crucial for reliable verification but is often missed during setup.
Verifying Various File Types and Contents: Verifying downloaded files is not just about confirming the file exists- it's about ensuring it's the correct file with expected contents. Different file types (PDFs, images, CSVs, etc.) require different validation methods. For instance, checking text in a PDF involves reading and parsing its contents, while verifying an image download may require checking its resolution or comparing it against a reference image. These diverse validation needs demand additional tools and libraries, adding complexity to the automation script.
Handling Failed or Partial Downloads: Downloads can fail or complete partially due to interruptions like network issues, server errors, or browser timeouts. Detecting these failures isn't always straightforward; sometimes, a file may appear complete but is corrupted or empty. Test scripts need robust mechanisms to check not just for the presence of a file but also for its integrity, size, and completeness, adding an extra layer of verification that can be challenging to automate.
Dealing with Browser Security Restrictions: Modern browsers have built-in security features that can interfere with automated download verification, such as prompts for file types that are considered dangerous or restrictions on downloading multiple files in quick succession. These security measures can cause automation scripts to hang or fail, requiring additional steps to configure browser settings or handle pop-up dialogs programmatically.
Environment-Specific Behaviors: Download behavior can differ across operating systems, browsers, or network conditions, leading to inconsistent test results. What works seamlessly on one system might fail on another due to permission issues, folder access restrictions, or browser-specific quirks. Managing these environment-specific behaviors requires meticulous setup and adjustments, often leading to overlooked verification steps.

These challenges highlight why download verification is often sidelined during test automation. However, overcoming these obstacles with the right strategies, tools, and best practices ensures that our downloads are tested thoroughly and reliably, making our automation suite more robust and dependable.

Setting Up Our Testing Environment

Setting up our testing environment properly is essential for effective download verification. Let's look at a comparison of download directory configuration in Selenium and Playwright, including the advantages of using a custom download path.

1. Selenium (Python):

In Selenium, we can customize the download directory using browser options. This setup directs the browser to save downloaded files to a specific location on our machine.

                
from selenium import webdriver
from selenium.webdriver.common.by import By
                    
options = webdriver.ChromeOptions()
prefs = {"download.default_directory": "/path/to/downloads"}
options.add_experimental_option("prefs", prefs)
                    
driver = webdriver.Chrome(options=options)
driver.get("https://www.test.com")
                    
download_button = driver.find_element(By.ID, "download-button")
download_button.click()

Benefits:

Ensures all files are saved in a controlled location, making it straightforward to locate and validate them.
Avoids interference from the browser's default download behavior, which can vary between environments.
Simplifies cleanup by allowing us to clear the entire directory after tests complete.

2. Playwright (JavaScript):

Playwright allows us to handle downloads seamlessly by using its built-in event handling for downloads and specifying a custom directory for saving files. The following example demonstrates how to save a file to a specific path when a download is triggered.

                
const { test } = require("@playwright/test");

test("Download file to specified path", async ({ page }) => {
    await page.goto("https://www.test.com");
                    
    const downloadPromise = page.waitForEvent("download");
    await page.getByText("Download Sample File").click();
    const download = await downloadPromise;
                    
    await download.saveAs("/path/to/downloads/" + download.suggestedFilename());
});

Also, we can specify where to persist downloaded files using the downloadsPath option in browserType.launch().

Benefits:

We have full control over where the file is saved, eliminating issues with default directories.
Playwright's waitForEvent method ensures the download is handled asynchronously, preventing timing issues and making the test more reliable.
The suggested filename from the browser is used automatically, simplifying file handling and validation.

Why Configuring the Download Directory Matters

Setting a custom download directory in our testing environment offers several key advantages:

Predictability and Control: By directing downloads to a known location, we eliminate the guesswork of locating files and reduce potential test failures caused by missing files.
Consistency Across Environments: Custom directories standardize download paths across different machines and testing environments, making our tests more reliable and easier to maintain.
Simplified File Verification: With a consistent download location, our scripts can easily access and verify downloaded files, enabling more thorough checks, such as content validation, file size inspection, and integrity testing.
Efficient Cleanup: Cleaning up test artifacts becomes straightforward since we know exactly where all downloaded files are stored, preventing disk clutter and ensuring a clean testing state.

Configuring our download directory is a foundational step that simplifies the entire process of download verification, allowing us to build more effective and dependable automation tests.

Triggering the Download Action

Simulating a user-triggered download is an essential part of verifying download functionality in automated tests. Downloads can be initiated in various ways, such as clicking a link, submitting a form, or interacting with a button. Ensuring the right approach to triggering downloads in automation tests is crucial to accurately simulate real-world user behavior.

Various Download Initiation Methods

Clicking a Link to Trigger a Download

One of the simplest ways to trigger a download is by clicking a link. This is commonly used when users download files like PDFs or images from a webpage. The automation framework clicks the link, and the download is triggered in the background.

Example (Selenium - Python):

                
from selenium import webdriver
from selenium.webdriver.common.by import By
                    
driver = webdriver.Chrome()
driver.get("https://www.test.com")
                    
download_link = driver.find_element(By.LINK_TEXT, "Click to Download")
download_link.click()

Example (Playwright - JavaScript):

                
await page.goto("https://www.test.com");

const downloadPromise = page.waitForEvent("download");
await page.click(`text="Click to Download"`);
const download = await downloadPromise;
                  
await download.saveAs("/path/to/downloads/" + download.suggestedFilename());

Best Practices:

Ensure Correct Element: Make sure the correct link is being selected. Using text selectors, like Click to Download, ensures the script interacts with the right element.
Wait for Download Event: Use an event listener (waitForEvent('download')) to wait until the download process starts, ensuring that the download is properly triggered before continuing the test.

Submitting a Form to Trigger a Download

In some cases, downloads are triggered by submitting a form, such as when a user fills out information and downloads a receipt or invoice. In this scenario, the form submission is automated, and the download is triggered once the form is processed.

Example (Selenium - Python):

                
driver = webdriver.Chrome()
driver.get("https://www.test.com")
                    
first_name = driver.find_element(By.NAME, "first-name")
first_name.send_keys("John")
                    
last_name = driver.find_element(By.NAME, "last-name")
last_name.send_keys("Doe")
                    
submit_button = driver.find_element(By.ID, "submit-button")
submit_button.click()

Example (Playwright - JavaScript):

                
await page.goto("https://www.test.com");

await page.fill('input[name="first-name"]', "John");
await page.fill('input[name="last-name"]', "Doe");
                  
const downloadPromise = page.waitForEvent("download");
await page.click('button[type="submit"]');
const download = await downloadPromise;
                  
await download.saveAs("/path/to/downloads/" + download.suggestedFilename());

Best Practices:

Clicking a Button to Trigger a Download

Many web applications have buttons that directly trigger a download (e.g., "Download Report" or "Export CSV"). Simulating this action is straightforward with automation tools by clicking the button and handling the download event, as we have already seen in the setup section.

Example (Selenium - Python):

                
driver = webdriver.Chrome()
driver.get("https://www.test.com")
                    
download_button = driver.find_element(By.ID, "download-button")
download_button.click()

Example (Playwright - JavaScript):

                
await page.goto("https://www.test.com");

const downloadPromise = page.waitForEvent("download");
await page.click("#download-button");
const download = await downloadPromise;
                  
await download.saveAs("/path/to/downloads/" + download.suggestedFilename());

Best Practices:

Ensure Button Visibility: Make sure the button is visible and clickable by checking if it's properly loaded on the page before triggering the click action.
Event Handling: Use waitForEvent('download') to ensure that the download is completed and captured correctly after the button click.

Handling and Waiting for Downloads to Complete

Once a download is triggered, the next critical step is to handle the download properly and ensure that the file is fully downloaded before proceeding with further test steps. Different automation tools offer various methods to manage and wait for downloads to complete. This section covers how to handle downloads in Selenium by polling the directory and in Playwright using its built-in download handling methods. We'll also discuss managing timing issues and ensuring the file is downloaded successfully.

Handling Downloads in Selenium: Polling the Directory

Selenium doesn't have built-in support for download handling, so a common approach is to poll the download directory and wait until the file appears. We need to monitor the directory for the expected file, then proceed once the file is detected.

                
import os
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
                    
options = webdriver.ChromeOptions()
download_directory = "/path/to/downloads/"
prefs = {"download.default_directory": download_directory}
options.add_experimental_option("prefs", prefs)
                    
driver = webdriver.Chrome(options=options)
driver.get("https://www.test.com")
                    
download_button = driver.find_element(By.ID, "download-button")
download_button.click()
                    
file_name = "sample-file.txt"
file_path = os.path.join(download_directory, file_name)
                    
timeout = 30  # Maximum wait time in seconds
poll_interval = 1  # Time between checks in seconds
elapsed_time = 0
                    
while not os.path.exists(file_path):
    time.sleep(poll_interval)
    elapsed_time += poll_interval
    if elapsed_time > timeout:
        raise Exception(f"Download timed out. File not found: {file_name}")
                    
print(f"File downloaded successfully: {file_name}")
                    
driver.quit()

Tips for Managing Timing Issues in Selenium:

Polling Frequency: Adjust the polling interval (e.g., 1 second) to balance performance and reliability. Polling too frequently may lead to unnecessary overhead, while polling too infrequently could cause longer wait times.
Timeout: Set a reasonable timeout to avoid hanging tests. If the file is not downloaded within the specified time, raise an exception or log an error.
File Extensions: While polling, be aware of temporary file extensions like .crdownload in Chrome, which appear during the download process. Ensure you wait until the final file is fully downloaded.

Handling Downloads in Playwright: Built-in Download Handling

Unlike Selenium, Playwright has built-in support for handling downloads, making it much easier to manage the download process. As we have already seen, Playwright can capture download events and wait for the file to be completely downloaded before saving it to a specific path.

                
const downloadPromise = page.waitForEvent("download");

await page.click("#download-button");
                  
const download = await downloadPromise;
                  
const filePath ="/path/to/downloads/" + download.suggestedFilename();
await download.saveAs(filePath);

How Playwright Manages Downloads:

Event Listener: Playwright's waitForEvent('download') method waits for the download event to occur. This ensures that the download is captured immediately after it's triggered, preventing timing issues.
saveAs Method: The download.saveAs() method saves the file to a specified location, and Playwright waits until the download is fully completed before proceeding.
Error Handling: If the download fails, Playwright automatically throws an error, making it easier to handle failed downloads without manually checking the directory.

Verifying Downloaded Files

Once the download process is complete, the next crucial step in download testing is to verify the integrity of the downloaded files. This involves ensuring that the file has the correct name and type, verifying its content, and performing checks to ensure the file isn't empty or corrupted.

File Name and Type Verification

Ensuring the downloaded file has the correct name and extension is the first step in verifying that the download worked as expected. Automation scripts should compare the actual file name and extension against what is expected.

Example (Selenium - Python):

                
import os

expected_file_name = "sample-file.txt"
download_directory = "/path/to/downloads"
file_path = os.path.join(download_directory, expected_file_name)
                    
if os.path.exists(file_path) and file_path.endswith('.txt'):
    print(f"File name and type verification passed: {expected_file_name}")
else:
    raise Exception(f"File name or type verification failed: {file_path}")

Example (Playwright - JavaScript):

                
const fs = require("fs");
const path = require("path");
                    
const expectedFileName = "sample-file.txt";
const downloadDirectory = "/path/to/downloads";
const filePath = path.join(downloadDirectory, expectedFileName);
                    
if (fs.existsSync(filePath) && filePath.endsWith(".txt")) {
    console.log(`File name and type verification passed: ${expectedFileName}`);
} else {
    throw new Error(`File name or type verification failed: ${filePath}`);
}

Best Practices:

Content Verification

Content verification is crucial when downloading files that need to contain specific data, such as text files, PDFs, or images. We need to ensure that the downloaded file not only has the correct name but also contains the expected content.

For text files, verifying the content is straightforward. We can read the file and check if it contains specific keywords or phrases.

Example (Selenium - Python):

                
with open(file_path, 'r') as file:
    content = file.read()
                    
if "expected text" in content:
    print("Text content verification passed")
else:
    raise Exception("Text content verification failed")

Example (Playwright - JavaScript):

                
const fileContent = fs.readFileSync(filePath, "utf-8");

if (fileContent.includes("testing download")) {
    console.log("Text content verification passed");
} else {
    throw new Error("Text content verification failed");
}

To verify the content of PDFs, we can use libraries like PyPDF2 in Python to extract and check the text inside the PDF.

                
import PyPDF2

with open(file_path, 'rb') as file:
    pdf_reader = PyPDF2.PdfReader(file)
    first_page = pdf_reader.pages[0]
    text = first_page.extract_text()
                    
if "expected content" in text:
    print("PDF content verification passed")
else:
    raise Exception("PDF content verification failed")

Similarly, we can use the pdf-parse library for Javascript.

                
const fs = require("fs");
const pdf = require("pdf-parse");

const dataBuffer = fs.readFileSync(filePath);

const data = await pdf(dataBuffer);
                  
const text = data.text;
                 
if (text.includes("expected content")) {
    console.log("PDF content verification passed");
} else {
    throw new Error("PDF content verification failed");
}

For images, content verification may involve comparing the downloaded image against an expected reference image. Tools like Pillow in Python or image-diff-js in JavaScript can be used for pixel-based comparisons.

Example (Python using Pillow):

                
from PIL import Image, ImageChops

downloaded_image = Image.open(file_path)
reference_image = Image.open("/path/to/reference-image.png")
                    
diff = ImageChops.difference(downloaded_image, reference_image)
                    
if not diff.getbbox():
    print("Image verification passed")
else:
    raise Exception("Image verification failed")

Example (JavaScript using image-diff-js):

                
const { imgDiff } = require("img-diff-js");

imgDiff({
    actualFilename: filePath,
    expectedFilename: "/path/to/reference-image.jpg",
    diffFilename: "/path/to/diff.png",
}).then((result) => {
    if (result.imagesAreSame) {
        console.log("Image verification passed");
    } else {
        throw new Error("Image verification failed");
    }
});

File Size Verification

To ensure that the downloaded file isn't empty or corrupted, checking the file size is a quick and effective method. A file that is too small may indicate that the download failed or the file is incomplete.

Example (Selenium - Python):

                
file_size = os.path.getsize(file_path)
                    
if file_size > 0:
    print(f"File size verification passed: {file_size} bytes")
else:
    raise Exception("File size verification failed: File is empty or corrupted")

Example (Playwright - JavaScript):

                
const stats = fs.statSync(filePath);
const fileSizeInBytes = stats.size;
                    
if (fileSizeInBytes > 0) {
    console.log(`File size verification passed: ${fileSizeInBytes} bytes`);
} else {
    throw new Error("File size verification failed: File is empty or corrupted");
}

Best Practices:

Set Minimum File Size: Define a reasonable minimum file size threshold depending on the type of file. For example, an empty text file may be valid, but a 0-byte PDF or image indicates a failed download.
Check for File Corruption: If the file size is too small compared to the expected size, it might be corrupted or incomplete.

Handling Edge Cases

In real-world automation testing, file downloads don't always go smoothly. Various factors, such as network issues, incorrect file paths, or even server-side errors, can lead to failed or incomplete downloads. To ensure the robustness of our test scripts, it's important to handle these edge cases effectively.

Handling Download Failures

Download failures can occur due to reasons such as:

To account for these potential issues, our tests should have mechanisms to detect and handle download failures gracefully.

Example (Selenium - Python):

                
import os
import time
                    
def wait_for_download(file_path, timeout=30):
    for _ in range(timeout):
        if os.path.exists(file_path):
            return True
        time.sleep(1)
    return False
                    
file_path = "/path/to/downloads/sample-file.txt"
                    
# Attempt to download the file
# (Add your Selenium download code here)
                    
if wait_for_download(file_path):
    print("File downloaded successfully")
else:
    raise Exception("File download failed after timeout")

Example (Playwright - JavaScript):

                
const fs = require('fs');
                    
async function waitForDownload(downloadPath, timeout = 30000) {
    const startTime = Date.now();
    while (Date.now() - startTime < timeout) {
        if (fs.existsSync(downloadPath)) {
            return true;
        }
    await new Promise(resolve => setTimeout(resolve, 1000));
    }
    return false;
}
                    
const downloadPath = 'C:/path/to/downloads/sample-file.txt';
                    
// Trigger the download action
                    
const downloadCompleted = await waitForDownload(downloadPath);
if (downloadCompleted) {
    console.log('Download succeeded');
} else {
    throw new Error('Download failed or timed out');
}

Strategies for Handling Failures:

Timeout Handling: Set a reasonable timeout period within which the file should be downloaded. If the file is not detected after the timeout, flag the test as failed.
Network Retry Mechanism: If the download fails due to network issues, implement retry logic to attempt the download again after a short delay.

Handling Incorrect or Corrupted Files

In some cases, the download may be complete, but the file is incorrect or corrupted. This could happen due to server-side errors or incomplete downloads. One approach to address this is to verify the file's integrity after download, such as by checking its size or content.

Example (File Integrity Check - Python):

                
import os

# Set a minimum file size in bytes
min_file_size = 1024
                    
if os.path.getsize(file_path) > min_file_size:
    print("File size is valid")
else:
    raise Exception("Downloaded file is too small, might be corrupted")

Example (File Size Check in Playwright - JavaScript):

                
const stats = fs.statSync(downloadPath);
const fileSizeInBytes = stats.size;
                    
// Set a minimum file size to ensure the file isn't corrupted
if (fileSizeInBytes > 1024) {
    console.log('File size is valid');
} else {
    throw new Error('Downloaded file is too small, might be corrupted');
}

Implementing Retry Mechanisms

A common strategy to handle intermittent issues, such as network blips or server timeouts, is to implement a retry mechanism. This allows the test to retry the download action a certain number of times before failing.

Given that I've already written a blog post on retry patterns, I encourage you to experiment with different approaches to find the optimal solution for your specific scripts.

Handling Edge Cases with Error Handling

Beyond retrying, we can handle download errors gracefully by adding error-handling mechanisms within our test scripts. This ensures the tests fail with meaningful error messages, making it easier to debug issues.

Example (Selenium - Python with Try-Except):

                
try:
    # Trigger download and wait for file
    if wait_for_download(file_path):
        print("File downloaded successfully")
    else:
        raise Exception("File download failed after timeout")
except Exception as e:
    print(f"Error occurred during download: {e}")

Example (Playwright - JavaScript with Try-Catch):

                
try {
    // Trigger download
    await downloadWithRetry(page);
} catch (error) {
    console.error(`Download error: ${error.message}`);
}

Best Practices for Download Verification

1. Environment Cleanup

After tests are complete, it's essential to delete downloaded files to prevent clutter and potential interference in future tests. Regular cleanup also helps maintain available disk space, especially when dealing with large files.

2. Stable Paths and Reliable File Names

Using consistent download paths and predictable file names minimizes the chances of false negatives. A custom download directory ensures files are always saved in a known location, and validating filenames helps confirm the correct file was downloaded.

3. Performance Considerations

Verifying large or multiple downloads can slow down tests. To improve performance, limit the scope of validation (e.g., check file sizes before deeper inspection) and run parallel validations for multiple downloads. Using minimum file size thresholds can quickly rule out incomplete or corrupted files.

Conclusion

Automating file download verification is a critical step in ensuring that files are correctly delivered and functional in our applications. By addressing common challenges like handling downloads, verifying file content, and managing edge cases, we can build more reliable and efficient tests. Following best practices such as environment cleanup, using stable paths, and optimizing performance will enhance the stability of our automation framework. Ultimately, with the right strategies in place, we can confidently ensure that downloads don't fail and meet expectations every time.

As always, you can find code examples and a demo website on our GitHub repository for hands-on experimentation.