In February 2026, an AI agent was pointed at McKinsey's internal platform with nothing but a domain name. Two hours later it had full read and write access to 46 million chat messages, 57,000 employee accounts, and decades of proprietary research. The vulnerability wasn't exotic. It was SQL injection, one of the oldest bugs in the book. And it was hiding in a place most automation suites never think to look: the JSON keys.
OWASP ZAP is one of the most widely used security scanners in the world. It's free, actively maintained, and has saved countless teams from shipping vulnerable code. But it didn't catch the McKinsey bug. Neither did McKinsey's own internal scanners, despite the platform running in production for over two years.
To understand why, you need to understand how most SQL injection scanners think.
When a tool like ZAP tests an endpoint for SQLi, it looks for places where user input flows into a database query. It finds those places by targeting the values in a request — query string parameters, form fields, JSON values, request bodies. It then fuzzes those values with a library of known payloads: single quotes, UNION statements, boolean conditions, sleep commands. If the response changes in a meaningful way, it flags the field as vulnerable.
This approach catches a huge proportion of real-world SQLi. The reason it's the default is that values are where user input almost always lives. When a user searches for something, filters a list, or submits a form, the data they provide ends up as a value. Scanners are built around this assumption, and for most applications it holds.
The blind spot is the structure of the request itself.
In the McKinsey case, the API accepted a JSON body where the keys represented the fields to search by. The values were safely parameterised — a developer had done the right thing there. But the keys were taken from the request and concatenated directly into the SQL string before execution. No scanner thought to put a single quote in a field name. No checklist included "what happens if the key is malformed?" The request structure was treated as trusted infrastructure, not as user input.
This is the category of bug that falls through the gap between "the scanner passed" and "the app is secure." It doesn't show up in automated reports. It doesn't trigger WAF rules written for value-based injection. It requires someone — or something — to think about the request differently and ask: what if the key itself is the attack surface?
Imagine you have a search endpoint. A client sends a JSON body like this:
{
"user_id": "42"
}
Your backend receives that and builds a database query to find records where user_id equals 42. Straightforward enough.
Now imagine the developer who built this wanted it to be flexible. Instead of hardcoding which field to search by, they let the key in the JSON body decide. Whatever key the client sends becomes the column name in the query. The code looks something like this:
for key, value in body.items():
sql = f"SELECT * FROM messages WHERE {key} = ?"
rows = db.execute(sql, (value,))
Notice what's happening here. The value — the "42" part — is passed as a parameter using ?. This is the safe pattern. The database driver handles it, escaping anything dangerous before it touches the query. A developer looking at this code might feel confident. They're using parameterisation, which is the textbook defence against SQL injection.
But the key — the "user_id" part — is dropped directly into the query string using an f-string. No escaping. No validation. No parameterisation. The database receives whatever the client puts there, verbatim.
So what happens when a client sends this instead:
{
"user_id'; DROP TABLE messages;--": "42"
}
The query becomes:
SELECT * FROM messages WHERE user_id'; DROP TABLE messages;-- = ?
The database sees two statements. The first is a broken SELECT. The second drops your messages table entirely. And the ? placeholder at the end never even gets evaluated because it's been commented out.
That's the whole bug. One part of the request was protected. The other part was trusted without question. An attacker only needs to find the unprotected part, and in this case it was sitting in plain sight.
The reason this is easy to miss in code review is that the dangerous line looks safe. Parameterisation is present. The developer clearly knew about SQL injection. The mistake isn't ignorance — it's an assumption that the key is not user input. In a well-designed internal API that assumption might even be reasonable. But on a publicly accessible endpoint with no authentication, that assumption is a vulnerability.
Before writing any security test, you need something to test against. Running injection payloads at a production system, even your own, is risky. You need a local target that behaves exactly like the real bug.
We built a small Flask app with two endpoints. One is intentionally vulnerable, mirroring the McKinsey pattern exactly. The other has the fix applied. Both sit on top of a SQLite database seeded with dummy users and messages.
# Vulnerable: key is concatenated raw into the query
@app.route("/api/search", methods=["POST"])
def search():
for key, value in body.items():
sql = f"SELECT * FROM messages WHERE {key} = ?"
rows = con.execute(sql, (value,))
# Fixed: key is checked against an allowlist first
ALLOWED_FIELDS = {"id", "user_id", "body"}
@app.route("/api/search/safe", methods=["POST"])
def search_safe():
for key in body:
if key not in ALLOWED_FIELDS:
return jsonify({"error": f"Unknown field: {key}"}), 400
Having both endpoints in the same app is intentional. It lets your tests do two things at once: confirm the vulnerability exists in the broken version, and confirm the fix holds in the safe version. This pattern of keeping a vulnerable twin alongside the patched implementation is something worth carrying into your own security test projects.
The test suite covers three techniques. Each one probes the keys of the JSON body, not the values. Together they form a layered net: if the vulnerability exists, at least one of them will catch it.
The simplest technique. Send a malformed key and check whether the response leaks a database error. If the key is being concatenated into SQL, a single quote will break the query syntax and the database will complain. If that error message makes it into the HTTP response, you have confirmation of the vulnerability.
We maintain a list of known database error signatures and check every response against them:
DB_ERROR_SIGNATURES = [
"syntax error",
"no such column",
"unrecognized token",
"unclosed quotation",
]
def has_sql_error(resp: requests.Response) -> bool:
text = resp.text.lower()
return any(sig in text for sig in DB_ERROR_SIGNATURES)
The test then sends each payload as a key and asserts that the safe endpoint returns a 400 with no error leakage:
@pytest.mark.parametrize("label,key", ERROR_KEY_PAYLOADS)
def test_safe_endpoint_rejects_unknown_keys(self, label, key):
resp = post(SAFE_URL, {key: "anything"})
assert resp.status_code == 400
assert not has_sql_error(resp)
Error messages are not always visible. A hardened application might suppress them while still executing the injected SQL. Boolean blind testing handles this case.
The idea is to send two requests: one with a condition that is always true, and one that is always false. If the application is vulnerable, the responses will differ because the database is actually evaluating the condition. If the fix is working, both requests get rejected identically before they ever reach the database.
def test_safe_endpoint_boolean_blind(self, label, true_key, false_key):
resp_t = post(SAFE_URL, {true_key: "x"})
resp_f = post(SAFE_URL, {false_key: "x"})
assert resp_t.status_code == 400
assert resp_f.status_code == 400
len_diff = abs(len(resp_t.text) - len(resp_f.text))
assert len_diff < 50
Both must be rejected with a 400 and the response sizes must be nearly identical. Any meaningful difference in content is a signal that the condition was evaluated, which means the key reached the database.
The last line of defence covers cases where errors are suppressed and responses are deliberately uniform. A time based test injects a payload designed to make the database pause. If the response is delayed, the query ran.
We use a heavy cross join against SQLite's internal schema table as a substitute for a sleep function, since SQLite does not have a native delay command:
TIME_PAYLOADS = [
("SQLite heavy query",
"body' AND (SELECT COUNT(*) FROM sqlite_master m1, sqlite_master m2) > 0 AND '1'='1",
5.0),
]
The test measures elapsed time and fails if the safe endpoint takes longer than the threshold. A properly implemented allowlist rejects the key in microseconds. If it takes seconds, the query executed.
def test_safe_endpoint_no_delay(self, label, key, min_delay):
t0 = time.monotonic()
resp = post(SAFE_URL, {key: "test"}, timeout=10)
elapsed = time.monotonic() - t0
assert resp.status_code == 400
assert elapsed < min_delay
Three techniques, all targeting the same blind spot. None of them would be generated automatically by a standard scanner. Each one had to be written with the specific attack surface in mind, which is exactly the point.
The fix is straightforward. Before the key touches the query, check it against a list of known valid field names. If it is not on the list, reject the request immediately.
ALLOWED_FIELDS = {"id", "user_id", "body"}
for key in body:
if key not in ALLOWED_FIELDS:
return jsonify({"error": f"Unknown field: {key}"}), 400
Nothing gets concatenated into SQL until the key has been explicitly approved. It does not matter how clever the payload is. If the field name is not recognised, the request never reaches the database.
The important thing is that the tests verify both sides of this. They confirm that injected keys are rejected, and they confirm that legitimate keys still work. That second assertion is easy to overlook when writing security tests, but it matters. A fix that breaks real functionality is not a fix you can ship.
def test_known_good_key(self):
resp = post(SAFE_URL, {"user_id": "1"})
assert resp.status_code == 200
assert "results" in resp.json()
def test_unknown_key_rejected(self):
resp = post(SAFE_URL, {"totally_unknown_column": "x"})
assert resp.status_code == 400
This is what makes the lab setup valuable. You can break the fix, watch the tests fail, restore it, and watch them pass. The feedback loop is immediate and the confidence it gives you is grounded in something real.
Three things worth acting on immediately.
Audit your existing test suite for key injection coverage. If your SQL injection tests only send payloads as values, they have a blind spot. Go through every endpoint that accepts a JSON body and ask whether the keys are validated before they touch any query.
Never treat scanner results as a security sign-off. ZAP and tools like it are excellent at catching known patterns at scale. They are not a substitute for tests written with your specific architecture in mind.
Treat request structure as an attack surface. Field names, header names, query parameter keys — anywhere your application reads the shape of a request rather than just its content is a place worth probing.
The cost of adding key injection coverage to your suite is low. The cost of missing it, as McKinsey found out, is not. The complete code examples for the lab app and the full pytest suite are available on our GitHub page.