Or press ESC to close.

Extension Hooks in Promptfoo: Building a Custom LLM Testing Pipeline That Adapts at Runtime

Jan 4th 2026 10 min read
medium
ai/ml
javascriptES6
promptfoo0.120.8
ci/cd

When testing LLM applications, you often need more flexibility than static YAML configurations can provide. Maybe you need to load test cases from a database at runtime, enrich test data with customer context from your CRM, or generate custom reports for your CI/CD pipeline. Promptfoo has a powerful but rarely discussed feature called extension hooks that makes all of this possible. These hooks let you inject custom code at four key points in the evaluation lifecycle, transforming a static test suite into a dynamic, adaptive pipeline. In this post, we'll build a complete example that demonstrates all four hooks while testing a customer support chatbot.

Understanding the Hook Lifecycle

Promptfoo's extension hooks follow a familiar pattern that QA automation engineers will recognize from testing frameworks like pytest, JUnit, or Mocha. There are four hooks available, each executing at a specific point in the evaluation lifecycle.

The Four Hooks

beforeAll runs once before the entire test suite begins. This is where you set up global fixtures, load additional test cases from external sources, or add default assertions that should apply to every test. Think of it as your suite-level setup method.

beforeEach runs before each individual test case. Use this hook to enrich test data with runtime context, modify variables based on external state, or add conditional assertions based on test properties. If you've ever wanted to inject customer data from a CRM or add stricter validations for certain test categories, this is where you do it.

afterEach runs after each individual test case completes. This hook is perfect for tracking metrics, logging results to external systems, or triggering alerts when tests fail. You have access to both the test configuration and the evaluation result, making it easy to correlate inputs with outputs.

afterAll runs once after the entire test suite finishes. Use this hook to generate summary reports, calculate aggregate statistics, send notifications to Slack or other channels, or implement quality gates for your CI/CD pipeline.

How Context Mutation Works

The beforeAll and beforeEach hooks have a special capability: they can modify the evaluation state. When promptfoo calls these hooks, it passes a context object containing the relevant data. You can mutate specific properties of this context to change how tests run.

Here's the critical part: to persist your changes, the hook must return the modified context. If you forget to return the context, your modifications will be lost.

                
async function extensionHook(hookName, context) {
  if (hookName === 'beforeAll') {
    // Add a new test case dynamically
    context.suite.tests.push({
      description: 'Dynamically added test',
      vars: { input: 'test value' },
    });
    
    // IMPORTANT: Return context to persist changes
    return context;
  }
  
  if (hookName === 'beforeEach') {
    // Modify variables for this specific test
    context.test.vars.timestamp = new Date().toISOString();
    
    // IMPORTANT: Return context to persist changes
    return context;
  }
  
  // afterEach and afterAll don't need to return anything
  // because they run after the evaluation is complete
}
                

The afterEach and afterAll hooks don't need to return anything since they execute after the evaluation has already completed. Any data you want to preserve from these hooks should be written to external storage like files, databases, or monitoring systems.

What's Available in Each Context

Each hook receives different data in its context object:

Hook Context Properties
beforeAll context.suite.tests, context.suite.prompts,
context.suite.defaultTest, context.suite.scenarios
beforeEach context.test.description, context.test.vars,
context.test.assert, context.test.options
afterEach context.test (same as beforeEach), context.result.success,
context.result.score, context.result.latencyMs
afterAll context.suite (same as beforeAll),
context.results (array of all test results)

Understanding this lifecycle and context structure is essential before we dive into the practical implementation. In the next section, we'll build a real example that uses all four hooks to create a dynamic testing pipeline for a customer support chatbot.

Practical Example: Dynamic Test Case Injection

Let's build a practical example that demonstrates the real power of extension hooks. We'll create a testing pipeline for a customer support chatbot that dynamically loads test cases, enriches them with customer data, and generates custom reports.

Project Setup

Before we look at the hooks, here's the basic structure of our project:

  • 📁 promptfoo-hooks
    • 📄 promptfooconfig.yaml
    • 📄 hooks.js
    • 📁 prompts
      • 📄 customer_support_v1.txt
      • 📄 customer_support_v2.txt
    • 📁 results

The configuration file references our hooks using the extensions property. Notice that you must specify both the file path and the function name, separated by a colon.

                
extensions:
  - file://hooks.js:extensionHook
                
Loading Test Cases from External Sources

One of the most common needs in enterprise testing is loading test cases from external systems like databases, test management tools, or spreadsheets. The beforeAll hook is perfect for this because it runs once before any tests execute.

Let's start with a function that simulates fetching test cases from an external source. In a real scenario, this could query a database, call an API, or read from a shared Google Sheet.

                
function loadDynamicTestCases() {
  console.log('[beforeAll] Loading dynamic test cases...');
  
  return [
    {
      description: '[Dynamic] Edge case: Empty message',
      vars: {
        customer_name: 'EdgeUser1',
        inquiry_type: 'unknown',
        message: '',
      },
      assert: [
        {
          type: 'llm-rubric',
          value: 'Response gracefully handles empty input',
        },
      ],
    },
    {
      description: '[Dynamic] Multilingual: Spanish inquiry',
      vars: {
        customer_name: 'Maria',
        inquiry_type: 'general',
        message: 'Hola, necesito ayuda con mi pedido',
      },
      assert: [
        {
          type: 'llm-rubric',
          value: 'Response acknowledges the Spanish language',
        },
      ],
    },
  ];
}
                

Now we inject these test cases into the suite using the beforeAll hook. The key here is accessing context.suite.tests and pushing our dynamic tests into the array.

                
if (hookName === 'beforeAll') {
  const dynamicTests = loadDynamicTestCases();
  
  context.suite.tests.push(...dynamicTests);
  console.log(`Injected ${dynamicTests.length} dynamic test cases`);
  
  return context;
}
                

When the evaluation runs, promptfoo will execute both the static tests defined in your YAML file and the dynamically injected tests. From promptfoo's perspective, there's no difference between them.

Adding Global Assertions

The beforeAll hook also lets you add assertions that should apply to every test case. This is useful for enforcing baseline quality standards across your entire suite.

Let's add a global assertion that checks response length. We do this by modifying context.suite.defaultTest, which contains properties that get merged into every test case.

                
if (hookName === 'beforeAll') {
  if (!context.suite.defaultTest) {
    context.suite.defaultTest = { assert: [] };
  }
  
  context.suite.defaultTest.assert.push({
    type: 'javascript',
    value: 'output.length > 10 && output.length < 2000',
  });
  
  console.log('Added global response length assertion');
  
  return context;
}
                

Every test case will now automatically include this assertion, ensuring that no response is too short or excessively long.

Modifying Variables at Runtime

The beforeEach hook opens up even more possibilities. Since it runs before each individual test, you can customize tests based on their properties or external state.

Here's a fun example from the promptfoo documentation: changing all languages to "Pirate dialect". While whimsical, it demonstrates how you can transform variables programmatically.

                
if (hookName === 'beforeEach') {
  context.test.vars.language = `Pirate ${context.test.vars.language}`;
  
  return context;
}
                

A more practical use case is injecting timestamps or environment information into your tests. This helps with debugging and traceability.

                
if (hookName === 'beforeEach') {
  context.test.vars.test_timestamp = new Date().toISOString();
  context.test.vars.environment = process.env.NODE_ENV || 'development';
  
  return context;
}
                
Environment-Based Configuration

Hooks can also adapt your test suite based on the runtime environment. For example, you might want stricter thresholds when running in CI compared to local development.

                
if (hookName === 'beforeAll') {
  const isCI = process.env.CI === 'true';
  
  if (isCI) {
    console.log('CI environment detected');
    context.suite.defaultTest.assert.push({
      type: 'latency',
      threshold: 5000,
    });
  }
  
  return context;
}
                

When your tests run in a CI pipeline, they'll automatically enforce a 5 second latency threshold. Local runs skip this check, giving developers more flexibility during experimentation.

The combination of beforeAll for suite-level setup and beforeEach for test-level customization gives you fine-grained control over your evaluation pipeline. In the next section, we'll take this further by enriching tests with data from external systems like CRMs and databases.

CI/CD Integration Tips

Extension hooks become especially valuable when integrated into your CI/CD pipeline. The afterAll hook can enforce quality gates, generate reports in CI-friendly formats, and send notifications when tests fail.

Implementing Quality Gates

A quality gate prevents deployments when test quality drops below acceptable thresholds. You can implement this in the afterAll hook by checking the pass rate and setting the exit code accordingly.

                
if (hookName === 'afterAll') {
  const totalTests = context.results.length;
  const passed = context.results.filter(r => r.success).length;
  const passRate = (passed / totalTests) * 100;
  
  console.log(`Pass rate: ${passRate.toFixed(1)}%`);
  
  if (process.env.CI === 'true' && passRate < 80) {
    console.log('Quality gate failed: pass rate below 80%');
    process.exitCode = 1;
  }
}
                

When running in CI, this hook will fail the build if fewer than 80% of tests pass. Local development runs skip this check, allowing developers to iterate without blocking on incomplete work.

JUnit Output for CI Systems

Most CI platforms like Jenkins, GitLab CI, and GitHub Actions natively support JUnit XML reports. Promptfoo can output results in this format using the --output flag.

                
npx promptfoo eval --output results.xml
                

Combine this with your hook-generated reports to get both machine-readable results for CI and human-readable summaries for debugging.

Caching for Faster Builds

LLM API calls are slow and expensive. Promptfoo's caching system stores responses locally so identical prompts don't trigger new API calls. Configure caching in your CI pipeline using environment variables.

                
# GitHub Actions example
env:
  PROMPTFOO_CACHE_PATH: ~/.cache/promptfoo
  PROMPTFOO_CACHE_TTL: 86400  # 24 hours

steps:
  - name: Cache promptfoo results
    uses: actions/cache@v4
    with:
      path: ~/.cache/promptfoo
      key: promptfoo-${{ hashFiles('prompts/**', 'promptfooconfig.yaml') }}
      
  - name: Run evaluation
    run: npx promptfoo eval --output results.xml
                

The cache key includes a hash of your prompts and configuration, so the cache invalidates automatically when you make changes that would affect results.

Notifications from Hooks

The afterAll hook can send notifications to Slack, Teams, or any webhook-based service. This keeps your team informed about test results without checking the CI dashboard.

                
if (hookName === 'afterAll') {
  const passed = context.results.filter(r => r.success).length;
  const failed = context.results.length - passed;
  
  if (process.env.SLACK_WEBHOOK_URL && failed > 0) {
    await fetch(process.env.SLACK_WEBHOOK_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        text: `LLM Eval: ${failed} tests failed out of ${context.results.length}`,
      }),
    });
  }
}
                

By combining quality gates, caching, and notifications, your hooks transform promptfoo from a local testing tool into a production-grade CI/CD component.

Conclusion

Extension hooks unlock a level of flexibility that transforms promptfoo from a simple evaluation tool into a dynamic testing framework. By injecting code at key points in the lifecycle, you can load tests from any data source, enrich them with runtime context, track custom metrics, and enforce quality gates in your CI/CD pipeline. The patterns we covered here are just the starting point. Once you understand how hooks interact with the evaluation lifecycle, you can adapt them to fit almost any testing workflow.

The complete code examples from this post are available on our GitHub page. Clone the repository, add your API key, and run npx promptfoo eval to see the hooks in action.

Further Reading: