The Green Report | Claude Code Hooks as a Test Quality Gate

Claude Code Hooks as a Test Quality Gate

May 10th 2026 9 min read

medium

claude

ai/ml

shell

Code review catches a lot of things, but test hygiene is rarely one of them. Hardcoded waits, missing assertions, and forbidden selectors tend to slip through because reviewers are focused on logic and coverage, not on whether a test is following framework conventions. By the time someone notices, the pattern has already been copied across five other test files. Claude Code hooks let you stop this at the source. They are shell commands that fire automatically as Claude Code edits your files, turning your conventions into an enforced gate rather than a guideline. This post shows you how to set them up specifically for QA automation, so bad test patterns get caught the moment they are written, not the moment they reach review.

What Are Claude Code Hooks and How Do They Work

Claude Code hooks are shell commands that run automatically when Claude Code performs certain actions in your project. Instead of relying on Claude Code to follow your rules every time, hooks let you enforce them deterministically at the tool level. No matter what Claude Code writes, your hook runs, checks it, and can stop the session in its tracks if something looks wrong.

There are two hook events that matter most for QA quality gates:

PreToolUse fires before Claude Code performs an action, such as writing a new file. This is useful for blocking something before it happens.

PostToolUse fires after Claude Code has performed an action, such as editing an existing file. This is useful for scanning what was just written and flagging issues immediately.

Hooks are configured in .claude/settings.json at the root of your project. The basic structure looks like this:

                
{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "bash .claude/hooks/qa_gate.sh $CLAUDE_TOOL_INPUT_FILE_PATH"
          }
        ]
      }
    ]
  }
}

In this example, every time Claude Code edits or writes a file, it automatically runs a shell script and passes the path of the modified file as an argument. That script is where your quality checks live.

The key behaviour to understand is what happens when a hook fails. If your script exits with a non-zero code, Claude Code treats it as a signal that something went wrong. It surfaces the hook output in the session and pauses, giving you visibility into exactly what was flagged before anything else continues.

Setting Up Your First Hook: Detecting Hardcoded Waits

Hardcoded waits are one of the most common test smells in any automation framework. They make your suite slower, they mask real timing issues, and they tend to multiply once one gets through review. The fix is always the same: use a proper wait condition instead. The problem is that Claude Code can introduce them too, especially when it is iterating quickly through a failing test loop.

A hook that catches them immediately after a file is edited is one of the highest-value quality gates you can add.

Here is what the hook script looks like:

                
# .claude/hooks/check_hardcoded_waits.sh

#!/bin/bash

FILE=$1

# Only check Python test files
if [[ ! "$FILE" == tests/*.py ]]; then
  exit 0
fi

# Check for hardcoded wait patterns
if grep -nE "time\.sleep\(|waitForTimeout\(|Thread\.sleep\(" "$FILE"; then
  echo ""
  echo "QA GATE FAILED: Hardcoded wait detected in $FILE"
  echo "Replace with an explicit wait condition:"
  echo "  Playwright: page.wait_for_selector(), expect(locator).to_be_visible()"
  echo "  Selenium:   WebDriverWait(driver, timeout).until(...)"
  exit 1
fi

exit 0

And the corresponding hook configuration in .claude/settings.json:

                
{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "bash .claude/hooks/check_hardcoded_waits.sh $CLAUDE_TOOL_INPUT_FILE_PATH"
          }
        ]
      }
    ]
  }
}

When Claude Code edits a test file and the hook fires, here is what a failure looks like in the terminal:

tests/e2e/test_checkout.py:14: time.sleep(2)

QA GATE FAILED: Hardcoded wait detected in tests/e2e/test_checkout.py
Replace with an explicit wait condition:
Playwright: page.wait_for_selector(), expect(locator).to_be_visible()
Selenium: WebDriverWait(driver, timeout).until(...)

Claude Code sees this output, recognises the hook failed, and will attempt to fix the issue before continuing. In most cases it will replace the hardcoded wait with the correct pattern on its own, which is exactly the behaviour you want.

Enforcing Your Selector Strategy

Even with your selector strategy clearly documented in CLAUDE.md, forbidden selectors still sneak in. Claude Code might borrow a pattern from an older test file, or infer a selector from the page structure rather than following your conventions. A hook that scans every edited file for non-approved selector patterns closes that gap permanently.

Here is the hook script:

                
# .claude/hooks/check_selectors.sh

#!/bin/bash

FILE=$1

# Only check Python test files and page objects
if [[ ! "$FILE" == tests/*.py ]] && [[ ! "$FILE" == pages/*.py ]]; then
  exit 0
fi

VIOLATIONS=0

# Check for forbidden selector patterns
if grep -nE "By\.XPATH|find_element\(By\.ID|find_element\(By\.CLASS_NAME|css=\'" "$FILE"; then
  echo ""
  echo "QA GATE FAILED: Forbidden selector pattern detected in $FILE"
  echo "Your framework only allows data-testid attributes:"
  echo "  Correct:   page.locator(\"[data-testid='submit-btn']\")"
  echo "  Forbidden: By.XPATH, By.ID, By.CLASS_NAME, css selectors"
  VIOLATIONS=1
fi

# Check that data-testid is being used where selectors appear
if grep -nE "locator\(|find_element\(" "$FILE" | grep -v "data-testid"; then
  echo ""
  echo "QA GATE WARNING: Selector found that may not use data-testid in $FILE"
  echo "Verify that all locators follow the data-testid convention."
  VIOLATIONS=1
fi

if [ $VIOLATIONS -ne 0 ]; then
  exit 1
fi

exit 0

And the hook configuration:

                
{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "bash .claude/hooks/check_selectors.sh $CLAUDE_TOOL_INPUT_FILE_PATH"
          }
        ]
      }
    ]
  }
}

When a violation is detected, the terminal output looks like this:

tests/e2e/test_login.py:22: self.driver.find_element(By.ID, "username").send_keys(email)

QA GATE FAILED: Forbidden selector pattern detected in tests/e2e/test_login.py
Your framework only allows data-testid attributes:
Correct: page.locator("[data-testid='username-input']")
Forbidden: By.XPATH, By.ID, By.CLASS_NAME, css selectors

What makes this hook particularly valuable is that it enforces a rule that is very easy to get wrong under time pressure. When Claude Code is iterating quickly through a failing test, it will sometimes fall back on whatever selector is most obvious from the page structure. This hook ensures that convenience never overrides your conventions.

Catching Missing Assertions

A test that navigates, fills in fields, clicks buttons, and never asserts anything is one of the most dangerous patterns in an automation suite. It will pass every single time, give you a false sense of coverage, and tell you nothing when something breaks. Claude Code can produce these too, particularly when it is focused on getting a test to run without errors rather than on what the test is actually verifying.

Here is a hook that checks every edited test file for the presence of at least one assertion:

                
# .claude/hooks/check_assertions.sh

#!/bin/bash

FILE=$1

# Only check test files
if [[ ! "$FILE" == tests/*.py ]]; then
  exit 0
fi

# Skip files that are not test classes or functions
if ! grep -qE "def test_" "$FILE"; then
  exit 0
fi

# Count test functions in the file
TEST_COUNT=$(grep -cE "def test_" "$FILE")

# Count assertion patterns
ASSERTION_COUNT=$(grep -cE "assert |expect\(|should\.|to_be_|to_have_|to_contain_" "$FILE")

if [ "$ASSERTION_COUNT" -eq 0 ]; then
  echo ""
  echo "QA GATE FAILED: No assertions found in $FILE"
  echo "Every test must verify at least one expected outcome."
  echo "Examples:"
  echo "  assert page.url.endswith('/dashboard')"
  echo "  expect(page.locator(\"[data-testid='success-msg']\")).to_be_visible()"
  exit 1
fi

# Warn if there are significantly more tests than assertions
if [ "$TEST_COUNT" -gt "$ASSERTION_COUNT" ]; then
  echo ""
  echo "QA GATE WARNING: $FILE has $TEST_COUNT test functions but only $ASSERTION_COUNT assertions."
  echo "Some tests may be missing meaningful verifications."
fi

exit 0

And the hook configuration:

                
{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "bash .claude/hooks/check_assertions.sh $CLAUDE_TOOL_INPUT_FILE_PATH"
          }
        ]
      }
    ]
  }
}

A failure in the terminal looks like this:

QA GATE FAILED: No assertions found in tests/e2e/test_profile.py
Every test must verify at least one expected outcome.
Examples:
assert page.url.endswith('/dashboard')
expect(page.locator("[data-testid='success-msg']")).to_be_visible()

The warning case is equally useful. If a file has five test functions but only one assertion, that is a strong signal that Claude Code wrote tests that interact with the UI without actually verifying anything meaningful. Catching this at the file edit stage means you are not discovering it during a failed production deployment.

Combining Hooks into a Full QA Gate

Running three separate hooks works, but consolidating them into a single script is cleaner and easier to maintain. One script, one hook configuration entry, and a clear output that tells you exactly what failed and why.

Here is the combined script:

                
# .claude/hooks/qa_gate.sh

#!/bin/bash

FILE=$1
VIOLATIONS=0

# Only check Python test files and page objects
if [[ ! "$FILE" == tests/*.py ]] && [[ ! "$FILE" == pages/*.py ]]; then
  exit 0
fi

echo "Running QA gate checks on $FILE..."

# Check 1: Hardcoded waits
if grep -nE "time\.sleep\(|waitForTimeout\(|Thread\.sleep\(" "$FILE"; then
  echo ""
  echo "FAILED [Hardcoded Wait]: Replace with an explicit wait condition."
  echo "  Playwright: page.wait_for_selector(), expect(locator).to_be_visible()"
  echo "  Selenium:   WebDriverWait(driver, timeout).until(...)"
  VIOLATIONS=$((VIOLATIONS + 1))
fi

# Check 2: Forbidden selectors
if grep -nE "By\.XPATH|find_element\(By\.ID|find_element\(By\.CLASS_NAME|css=\'" "$FILE"; then
  echo ""
  echo "FAILED [Forbidden Selector]: Use data-testid attributes only."
  echo "  Correct:   page.locator(\"[data-testid='submit-btn']\")"
  echo "  Forbidden: By.XPATH, By.ID, By.CLASS_NAME, css selectors"
  VIOLATIONS=$((VIOLATIONS + 1))
fi

# Check 3: Missing assertions (test files only)
if [[ "$FILE" == tests/*.py ]]; then
  if grep -qE "def test_" "$FILE"; then
    ASSERTION_COUNT=$(grep -cE "assert |expect\(|should\.|to_be_|to_have_|to_contain_" "$FILE")
    TEST_COUNT=$(grep -cE "def test_" "$FILE")

    if [ "$ASSERTION_COUNT" -eq 0 ]; then
      echo ""
      echo "FAILED [Missing Assertions]: Every test must verify at least one outcome."
      VIOLATIONS=$((VIOLATIONS + 1))
    elif [ "$TEST_COUNT" -gt "$ASSERTION_COUNT" ]; then
      echo ""
      echo "WARNING [Assertion Coverage]: $TEST_COUNT tests but only $ASSERTION_COUNT assertions found."
      echo "Some tests may be missing meaningful verifications."
    fi
  fi
fi

# Summary
if [ $VIOLATIONS -gt 0 ]; then
  echo ""
  echo "QA gate failed with $VIOLATIONS violation(s) in $FILE"
  echo "Claude Code will attempt to fix the issues above before continuing."
  exit 1
fi

echo "QA gate passed."
exit 0

And the single hook configuration entry that replaces all three individual ones:

                
{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "bash .claude/hooks/qa_gate.sh $CLAUDE_TOOL_INPUT_FILE_PATH"
          }
        ]
      }
    ]
  }
}

When multiple violations are detected in the same file, the output gives you a clear picture of everything that needs fixing in one go:

Running QA gate checks on tests/e2e/test_checkout.py...

tests/e2e/test_checkout.py:14: time.sleep(2)

FAILED [Hardcoded Wait]: Replace with an explicit wait condition.
  Playwright: page.wait_for_selector(), expect(locator).to_be_visible()
  Selenium: WebDriverWait(driver, timeout).until(...)

tests/e2e/test_checkout.py:22: self.driver.find_element(By.ID, "confirm-btn").click()

FAILED [Forbidden Selector]: Use data-testid attributes only.
  Correct: page.locator("[data-testid='confirm-btn']")
  Forbidden: By.XPATH, By.ID, By.CLASS_NAME, css selectors

QA gate failed with 2 violation(s) in tests/e2e/test_checkout.py
Claude Code will attempt to fix the issues above before continuing.

The important thing to understand here is what happens next. When Claude Code sees a non-zero exit code from a hook, it reads the output, understands what was flagged, and attempts to fix the violations before moving on. In practice this means the gate rarely stops your workflow for long. Claude Code self-corrects, reruns the hook, and continues once everything passes. You get the enforcement without the interruption.

As your framework evolves, the qa_gate.sh script becomes the natural home for any new convention you want to enforce. Add a check, commit the script, and every future Claude Code session in that project will respect it automatically.

Conclusion

Bad test patterns are not always the result of carelessness. They happen because the feedback loop is too slow. By the time a hardcoded wait or a missing assertion reaches code review, it has already cost someone time. Claude Code hooks move that feedback to the earliest possible moment, the instant a file is written, without adding any manual steps to your workflow.

Start with one check from the qa_gate.sh script, get comfortable with how hooks behave, and expand from there. The enforcement scales with your conventions, so the more rules you document, the more useful the gate becomes.

The complete code for all hook scripts and configuration examples covered in this post is available on our GitHub page.