The Green Report | Automating Integrity Checks with Python and MySQL

Automating Integrity Checks with Python and MySQL

Apr 13th 2024 10 min read

easy

database

Ensuring the reliability and integrity of database operations is paramount. Automated database testing emerges as a crucial tool in this pursuit, offering a systematic approach to validate data consistency and functionality. In this blog post, we delve into the realm of automated testing, specifically focusing on MySQL databases. Through the lens of an art gallery scenario, we explore the significance of automated testing in maintaining data integrity and reliability.

Setting Up the Database

To begin our journey into automated database testing, let's first set up a MySQL database to represent an art gallery. Below are the SQL commands to create the necessary tables and populate them with sample data:

                                     
import mysql.connector

# Connect to your MySQL server
db_connection = mysql.connector.connect(
    host="your_host",
    user="your_user",
    password="your_password",
    database="art_gallery"
)
                        
db_cursor = db_connection.cursor()
                        
# Create artists table
db_cursor.execute("CREATE TABLE artists (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255), nationality VARCHAR(255))")
                        
# Create paintings table
db_cursor.execute("CREATE TABLE paintings (id INT AUTO_INCREMENT PRIMARY KEY, title VARCHAR(255), artist_id INT, year_drawn INT, FOREIGN KEY (artist_id) REFERENCES artists(id))")
                        
# Create painting details table
db_cursor.execute("CREATE TABLE painting_details (id INT AUTO_INCREMENT PRIMARY KEY, painting_id INT, medium VARCHAR(255), dimensions VARCHAR(255), FOREIGN KEY (painting_id) REFERENCES paintings(id))")
                        
# Insert data into artists table
artist_data = [
    ("Leonardo da Vinci", "Italian"),
    ("Vincent van Gogh", "Dutch"),
    ("Pablo Picasso", "Spanish"),
    ("Claude Monet", "French"),
    ("Georgia O'Keeffe", "American")
]
db_cursor.executemany("INSERT INTO artists (name, nationality) VALUES (%s, %s)", artist_data)
db_connection.commit()
                        
# Insert data into paintings table
painting_data = [
    ("Mona Lisa", 1, 1503),
    ("Starry Night", 2, 1889),
    ("Guernica", 3, 1937),
    ("Water Lilies", 4, 1919),
    ("Black Iris", 5, 1926)
]
db_cursor.executemany("INSERT INTO paintings (title, artist_id, year_drawn) VALUES (%s, %s, %s)", painting_data)
db_connection.commit()
                        
# Insert data into painting details table
painting_details_data = [
    (1, "Oil on panel", "30 x 20 in"),
    (2, "Oil on canvas", "29 x 36.3 in"),
    (3, "Oil on canvas", "11 ft x 25.6 ft"),
    (4, "Oil on canvas", "6 ft 6.75 in × 13 ft 11.5 in"),
    (5, "Oil on canvas", "36 x 30 in")
]
db_cursor.executemany("INSERT INTO painting_details (painting_id, medium, dimensions) VALUES (%s, %s, %s)", painting_details_data)
db_connection.commit()
                        
                        
db_cursor.close()
db_connection.close()

By executing the above SQL commands, we'll create a MySQL database with tables representing artists, paintings, and painting details, along with sample data reflecting an art gallery environment. This database will serve as the foundation for our automated testing endeavors.

Writing Test Automation Script

Now that we have our MySQL database set up, let's dive into the process of writing a Python script to automate the testing of the art gallery database. This script will streamline the testing process by executing various operations and comparing the results.

Here are the key steps involved in writing the test automation script:

Connecting to the Database: The script will establish a connection to the MySQL database using the mysql.connector library in Python.
Retrieving Initial Data: Before making any modifications, the script will retrieve the initial state of the database. This includes fetching information about tables, columns, and data.
Executing Stored Procedure to Modify Data: Next, the script will execute a stored procedure or series of SQL queries to modify the data in the database. For example, we might update certain column values or insert new records.
Retrieving Modified Data: After the modifications are applied, the script will retrieve the modified state of the database.
Comparing Initial and Modified Data: Finally, the script will compare the initial and modified data to identify any differences. This comparison will encompass various aspects, such as table structures, column values, and overall data consistency.

By automating these steps, we can efficiently test the integrity and reliability of the database operations, ensuring that our art gallery database remains consistent and accurate.

Code Walkthrough

Now, let's dissect the Python script responsible for automating the testing of our art gallery database. We'll break down the script into sections and provide an in-depth explanation of each part.

Importing Dependencies:

The script begins by importing the mysql.connector library, which allows us to interact with MySQL databases. Additionally, it imports the alter_data function from a module named stored_procedures. This function is responsible for executing stored procedures or SQL queries to modify the data in the database.

                                     
import mysql.connector
from stored_procedures import alter_data

Helper Functions:

The get_table_data function retrieves data from a specified table in the database and returns it in a structured format.

                                     
def get_table_data(cursor, table_name):
    cursor.execute(f"SELECT * FROM {table_name}")
    columns = cursor.column_names
    data = [dict(zip(columns, row)) for row in cursor.fetchall()]
    return {'columns': columns, 'data': data}

get_database_data retrieves the entire database schema and data, organizing it into a dictionary format for easy manipulation.

                                     
def get_database_data(connection):
    cursor = connection.cursor()
    cursor.execute("SHOW TABLES")
    tables = [table[0] for table in cursor.fetchall()]
    database_data = {}
    for table in tables:
        database_data[table] = get_table_data(cursor, table)
    cursor.close()
    return database_data

The Core Function:

The compare_data function serves as the core mechanism for identifying differences between the initial and modified data. Let's break down this function part by part, explaining each piece of code:

The function initializes a boolean variable differences_found to track whether any differences are detected between the initial and modified data.

                                     
def compare_data(initial_data, modified_data):
    differences_found = False

The function iterates through the keys (table names) in the modified_data dictionary and checks if each table exists in the initial_data. If a table exists in modified_data but not in initial_data, it indicates a newly added table.

                                     
for table_name in modified_data.keys():
    if table_name not in initial_data:
        print(f"New table '{table_name}' found in the modified data.")
        differences_found = True

The function iterates through each table in the initial_data dictionary and retrieves the corresponding data from the modified_data. It then compares the column structures and data rows between the initial and modified data for each table. If differences are found in column structures or data rows, the function prints relevant information about the differences.

                                     
for table_name, initial_table_data in initial_data.items():
    modified_table_data = modified_data.get(table_name)
                
    if not modified_table_data:
        print(f"Table '{table_name}' is missing in the modified data.")
        continue
                
    initial_columns = initial_table_data['columns']
    modified_columns = modified_table_data['columns']
                
    if initial_columns != modified_columns:
        print(f"Columns in table '{table_name}' are different.")
        print("Initial columns:", initial_columns)
        print("Modified columns:", modified_columns)
        differences_found = True
                
    initial_rows = initial_table_data['data']
    modified_rows = modified_table_data['data']
                
    if initial_rows != modified_rows:
        print(f"Data in table '{table_name}' has differences.")
        for i, (initial_row, modified_row) in enumerate(zip(initial_rows, modified_rows)):
            if initial_row != modified_row:
                print(f"Difference in row {i+1}:")
                for column, initial_value in initial_row.items():
                    modified_value = modified_row.get(column)
                    if initial_value != modified_value:
                        print(f"Column '{column}': Initial value '{initial_value}', Modified value '{modified_value}'")
        differences_found = True

Finally, if no differences are found, the function prints a message indicating that the data is identical.

                                     
if not differences_found:
    print("The data is identical.")

The Stored Procedure:

The alter_data function serves as our stored procedure, and its implementation can be customized according to specific requirements. The primary objective is to modify the existing data, enabling thorough testing of the compare_data function. An example implementation of this function will be available on our GitHub page for reference.

The Flow:

The script establishes a connection to the MySQL database using the provided connection parameters (host, user, password, database).

                                     
initial_db = mysql.connector.connect(
    host="your_host",
    user="your_user",
    password="your_password",
    database="art_gallery"
)

Before making any modifications, the script retrieves the initial state of the database using the get_database_data function. This includes fetching information about tables, columns, and data.

                                     
initial_db_state = get_database_data(initial_db)

Next, the script executes the alter_data function to modify the data in the database. This function may perform operations such as updating column values or inserting new records, depending on the specific requirements.

                                     
alter_data(initial_db)

After the modifications are applied, the script again calls the get_database_data function to retrieve the modified state of the database.

                                     
modified_db_state = get_database_data(initial_db)

Finally, the script calls the compare_data function to compare the initial and modified data obtained from the database. This comparison helps identify any differences, ensuring that the database operations were executed correctly.

                                     
compare_data(initial_db_state, modified_db_state)

Executing the Test Script

To execute the test automation script, simply run the Python script file containing the code.

Upon execution, the script will connect to the MySQL database, retrieve initial data, execute the stored procedure to modify the data, retrieve modified data, and compare the initial and modified data.

Here's an example output of the script:

Data altered successfully.
New table 'test_table' found in the modified data.
Data in table 'artists' has differences.
Difference in row 2:
Column 'nationality': Initial value 'Dutch', Modified value 'Unknown'
Difference in row 3:
Column 'nationality': Initial value 'Spanish', Modified value 'Unknown'
Difference in row 4:
Column 'nationality': Initial value 'French', Modified value 'Unknown'
Difference in row 5:
Column 'nationality': Initial value 'American', Modified value 'Unknown'
Data in table 'painting_details' has differences.
Difference in row 2:
Column 'medium': Initial value 'Oil on canvas', Modified value 'Canvas'
Difference in row 3:
Column 'medium': Initial value 'Oil on canvas', Modified value 'Canvas'
Difference in row 4:
Column 'medium': Initial value 'Oil on canvas', Modified value 'Canvas'
Difference in row 5:
Column 'medium': Initial value 'Oil on canvas', Modified value 'Canvas'
Data in table 'paintings' has differences.
Difference in row 1:
Column 'year_drawn': Initial value '1503', Modified value '1900'
Difference in row 2:
Column 'year_drawn': Initial value '1889', Modified value '1900'

Explanation of Results:

The output confirms that the data alteration process was successful.
A new table named 'test_table' was added during the modification.
Differences were detected in the 'artists' table, where the nationality of all artists except those of Italian origin was changed to 'Unknown'.
Similarly, the 'painting_details' table saw modifications where the medium of all paintings originally listed as 'Oil on canvas' was updated to 'Canvas'.
The 'paintings' table also exhibited differences, specifically in the 'year_drawn' column. Years prior to 1900 were updated to 1900 to standardize the data.

By systematically comparing the initial and modified data, the script effectively identifies differences such as new tables, removed tables, differing columns, and data discrepancies. This comprehensive testing approach ensures the integrity and reliability of database operations.

Conclusion

Automated database testing plays a critical role in maintaining the integrity and reliability of data-driven applications. By systematically validating database operations, organizations can minimize the risk of errors, inconsistencies, and data corruption. The test automation script presented in this blog post exemplifies the power of automated testing in identifying differences between initial and modified data, ensuring the consistency of database operations.

Through the execution of the script, we observed how it effectively detected modifications such as new tables, changes in column values, and alterations to data entries. This systematic approach not only streamlines the testing process but also enhances confidence in the reliability of database operations.

As we reflect on the effectiveness of the test automation script, it becomes evident that automated database testing is indispensable for modern software development practices. By encouraging readers to implement similar testing practices in their database projects, we advocate for a proactive approach to data integrity and reliability, ultimately leading to improved software quality and user satisfaction.

The complete code examples can be found on our GitHub page. Additionally, a stored procedure has been included to restore the database to its original state. Feel free to explore and try it out!