The Green Report | Quarantining Tests: A pytest Pattern for Failing Tests Without Breaking CI

Quarantining Tests: A pytest Pattern for Failing Tests Without Breaking CI

Jun 20th 2026 5 min read

medium

ci/cd

github

strategy

A test starts failing, not because the feature broke, but because the test itself is flaky, outdated, or tied to an unstable dependency. Nobody has time to fix it right now, so it just sits there, turning the pipeline red and training the team to ignore failures. The usual fixes, deleting the test or skipping it silently, both throw away useful information. What you actually want is a middle ground: keep the test running, stop it from blocking the build, and make sure someone is forced to revisit it later. That's what test quarantine is for.

Quarantine is not the same as skip

It's tempting to reach for @pytest.mark.skip and move on. But skipping a test hides it completely. It stops running, stops reporting, and quietly disappears from your test output. A few months later nobody remembers it exists, let alone why.

Quarantine works differently. A quarantined test keeps running on every build. You still see whether it's passing or failing, and you can track whether it's getting better, worse, or just flaky in the same way every time. The only thing that changes is that its result no longer blocks the pipeline. It's visible, tracked, and temporary by design, not buried.

Defining a quarantine marker

The first step is giving yourself a way to tag a test as quarantined, along with enough context to act on it later. A bare @pytest.mark.skip("flaky") tells you nothing. You want a reason, a tracking ticket, and an expiry date.

Start by registering the marker in pytest.ini (or pyproject.toml) so pytest doesn't warn about unknown markers:

                
[pytest]
markers =
    quarantine(reason, ticket, expires): mark a test as quarantined, non-blocking, with metadata for tracking

Then use it directly on a test:

                
import pytest

@pytest.mark.quarantine(
    reason="Flaky due to race condition in async setup",
    ticket="QA-482",
    expires="2026-08-01"
)
def test_user_session_timeout():
    ...

At this point the marker is just metadata. It doesn't do anything on its own yet, that's what the next section handles: making pytest actually treat these tests differently at runtime.

Making quarantine non-blocking

Tagging a test with quarantine is just metadata until you tell pytest to act on it. The cleanest way is a conftest.py hook that converts quarantined tests into xfail at collection time. That way pytest still runs them and reports the outcome, but a failure won't tank your exit code.

                
# conftest.py
import pytest

def pytest_collection_modifyitems(config, items):
    for item in items:
        marker = item.get_closest_marker("quarantine")
        if marker:
            reason = marker.kwargs.get("reason", "Quarantined test")
            item.add_marker(pytest.mark.xfail(reason=reason, strict=False))

With strict=False, a quarantined test that fails shows up as XFAIL (expected failure, doesn't break the build), and one that unexpectedly passes shows up as XPASS (also doesn't break the build, but signals it might be ready to come out of quarantine). You get visibility in both directions without anyone having to babysit the test manually.

Wiring it into CI

The xfail conversion handles non-blocking at the pytest level, but it's worth splitting quarantined tests into their own CI job anyway. It keeps your stable test report clean, lets you set continue-on-error as a second safety net, and makes it easy to post quarantine results somewhere visible (a Slack channel, a dashboard) without mixing them into the main signal.

                
# .github/workflows/test.yml
jobs:
  test-stable:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v7
      - run: pip install -r requirements.txt
      - run: pytest -m "not quarantine"

  test-quarantine:
    runs-on: ubuntu-latest
    continue-on-error: true
    steps:
      - uses: actions/checkout@v7
      - run: pip install -r requirements.txt
      - run: pytest -m "quarantine"

test-stable only runs the tests that matter for merge decisions. test-quarantine runs the rest, reports its own results, and continue-on-error: true makes sure it never blocks a PR even if the xfail logic in conftest.py somehow doesn't apply.

The catch: expiry

Quarantine only works if it's temporary. Without a forcing function, tagged tests just accumulate forever, and you end up with a graveyard of xfail tests nobody looks at again. The expires field from the marker exists for exactly this reason: to give quarantine a built-in deadline.

A small check, run as a separate CI step, can enforce it:

                
# check_quarantine_expiry.py
import sys
from datetime import date
import pytest

def check_expired_quarantines(session):
    expired = []
    for item in session.items:
        marker = item.get_closest_marker("quarantine")
        if marker:
            expires = date.fromisoformat(marker.kwargs["expires"])
            if expires < date.today():
                expired.append((item.nodeid, marker.kwargs.get("ticket", "no ticket")))

    if expired:
        print("The following quarantined tests have expired:")
        for nodeid, ticket in expired:
            print(f"  {nodeid} (ticket: {ticket})")
        sys.exit(1)

Wire it into CI as its own step, run after collection, and it'll fail the build the moment a quarantined test outlives its deadline, forcing someone to either fix it or formally decide to extend the ticket. No more "we'll get to it eventually."

What's next

This whole setup still relies on a person noticing a test is flaky and manually tagging it. That works fine at a small scale, but it doesn't hold up once you have hundreds of tests and a team that's busy shipping features instead of triaging CI noise.

The natural next step is automating detection itself: rerunning failing tests a handful of times, tracking their pass rate over multiple builds, and auto-applying the quarantine marker once a test drops below a flakiness threshold. That removes the human bottleneck entirely, at the cost of needing somewhere to store historical run data. That's a good topic for a follow-up post on its own.

For now, a marker, a non-blocking hook, a split CI job, and an expiry check will get you most of the way there: tests stay visible, builds stay green, and nothing sits in limbo forever. A runnable version of this code example is available on our GitHub page.