Table of Contents

Credential-Blind Agentic Pentesting, Part II: Deny by Default, or How I Stopped Writing Regexes
#

This is the second article in the series. Part I built the core: a host-side proxy that swaps secrets and identities for stable typed tokens before the model sees them, and swaps them back to real values before the command runs. It worked on real machines. But I want to be honest about how it worked, because the how is the whole reason this part exists.
It worked because I taught it the shape of each tool. I knew what a secretsdump line looks like, what a netexec --users table looks like, what a cracked hash from hashcat looks like, and I wrote a parser for each. That is a blocklist. Blocklists do not scale to every tool in the world, and worse, they cannot catch a secret that has no shape at all. So this part inverts the problem, tests the inversion live, and then asks the literature whether anyone has done this before.

Table of Contents
#

The uncomfortable question
#

Part I ended on a real result and a real box. For this part I changed the box on purpose, because changing the target between sessions is the only honest way to test whether the thing generalizes or whether I have quietly overfit to one machine. The new target is HTB Shibuya: a Windows Server 2022 domain controller, shibuya.vl, with an Active Directory Certificate Services misconfiguration at the end of the chain. Different domain, different tooling, different final technique from the box in Part I.

I started by reusing the proxy from Part I, the one with the tool-aware detection packs. And while I was extending those packs to cover Shibuya’s quirks, the uncomfortable question landed, the one I had been avoiding:

You are writing a regex for --users, a parser for secretsdump, a rule for the MSSQL identity functions. On a real engagement there will be a tool you have never seen, with an output you have never parsed. What then? Is this research, or is it a very long list of regexes?

It is a fair question, and the answer was uncomfortable, so I sat with it. The packs are not research. They are maintenance. They are a treadmill: every new tool, every new output format, every new box is another rule. And there is a deeper problem underneath the treadmill, one that no amount of rules can fix.

Why detection can never be complete
#

Here is the very first secret Shibuya hands you. You authenticate as a low privilege machine account, you enumerate users, and one account has its password sitting in the description field:

SMB   shibuya.vl   445   AWSJPDC0522   svc_autojoin   2025-02-15 07:51:49 0   K5&A6Dw9d8jrKWhV

K5&A6Dw9d8jrKWhV is a password. Look at it. There is nothing about that string, as a string, that says password. It is not a hash with a fixed length. It is not a Kerberos ticket with a $krb5 prefix. It is sixteen characters of entropy that look exactly like a random token, a filename, a serial number, or noise. The only reason I know it is a password is that I know netexec --users sometimes prints a password in the description column. That is tool knowledge. That is a pack.

And it gets worse, because the most valuable secret of all has the least shape. A cracked password is, by construction, a human-memorable string. Later on Shibuya I relay and crack an account and get back Sail2Boat3. There is no regex on Earth that distinguishes Sail2Boat3 the password from Sail2Boat3 the boat name in a sentence. Detection by form is not merely hard here. It is information-theoretically impossible. A signature-less secret has no signature to match.

So a blocklist, the detect-the-bad approach, has two failure modes that are not bugs but properties. It is incomplete against tools you have not seen, and it is incomplete against secrets that have no form. You cannot regex your way out of either. I needed to stop detecting.

The inversion: keep the generic, tokenize the rest
#

The move is to flip the question. Detecting the sensitive is an infinite problem. But describing the non-sensitive is a finite one.

Think about what is actually in a line of tool output. There are two kinds of token in it. There are tokens that are generic, that recur across every engagement on Earth, that carry no secret: English words, the names of tools and flags, numbers, dates, version strings, protocol constants like SMB and TCP and LDAP. And there are tokens that are specific to this engagement: the hostname, the domain, the username, the password, the hash. The second set is exactly the sensitive set. And the second set is just the complement of the first.

That gives a clean statement:

sensitive = output − generic

The generic vocabulary is finite, closed, and shared across all engagements. It does not depend on which tool produced the line. So instead of enumerating the infinite set of bad things, I describe the finite set of safe things, and I tokenize everything else by default. I called it deny by default, borrowing the firewall posture, because that is exactly what it is: default deny, allow only the known-good.

The beauty of this is the completeness it buys for free. K5&A6Dw9d8jrKWhV is not an English word, not a number, not a date, not a protocol constant. So it falls through to the default and gets tokenized, without any rule that knows about description fields. Sail2Boat3 is not in the dictionary. Tokenized. AWSJPDC0522, the hostname, is not a word. Tokenized. The signature-less secret that no blocklist could catch is caught by construction, because the burden of proof is reversed. I no longer have to prove a token is secret. The token has to prove it is safe, and a random password cannot.

This is the same inversion a firewall makes, and it is information-theoretic at heart: the sensitive material is the surprising, high-information, engagement-specific part of the text, and the safe material is the low-information, reusable boilerplate. The data minimization literature has a formal version of this idea, which I will come back to.

What the engine actually does
#

The engine has four layers, and only one of them is allowed to be perfect.

Layer one, completeness: deny by default. I split the text into atoms on the separators that carry structure, and for each atom I ask one question, “is this provably generic?” An atom is generic if it is a pure number, a date or time, a semantic version, a protocol constant from a small closed list, or a word in a generic dictionary. I use the system word list, /usr/share/dict/words, roughly 479 thousand entries. IP addresses are explicitly not generic, because an IP is always engagement topology, so they are checked before the version rule. Everything that is not provably generic is tokenized. No seeds, no per-tool rules.

export function isGenericAtom(a: string): boolean {
  if (a.length < 2) return true;                       // punctuation
  if (isIPv4(a)) return false;                          // topology, never generic
  if (/^\d+$/.test(a)) return true;                     // pure integer
  if (/^[0-9]{1,4}([-/:.][0-9]{1,4}){1,6}[zZ]?$/.test(a)) return true; // date / time
  if (/^v?\d+(\.\d+){1,3}$/.test(a)) return true;       // semantic version
  const s = a.toLowerCase();
  if (PROTO_STOP.has(s)) return true;                   // SMB, TCP, LDAP, true, false ...
  if (dict().has(s)) return true;                       // generic English / tech word
  return false;                                         // engagement-specific -> SENSITIVE
}

Notice there is not a single target value in there. No shibuya, no htb, no TLD list. That matters, because in Part I I had quietly hardcoded a list of lab TLDs (htb, local, vl) to recognize domains, and a reader would be right to call that a disguised hardcode. On a real engagement the domain could end in anything. This version recognizes nothing about the target and everything about the generic background, which is the only thing that is actually stable.

Layer two, structure: universal output shapes. A handful of output shapes, not tools, recur across the entire offensive tooling ecosystem because the tools all build on the same libraries. A key: value pair. A DOMAIN\principal. A colon-delimited record like user:rid:hash. The DC= components of a certificate subject. There are about a dozen of these, they are finite and stable, and they are not per-tool parsers. They do two jobs: they give a type to a token for utility, and they catch the one hard case I will get to in a moment.

Layer three, reversibility: the vault. A deterministic, host-only, in-memory map from value to stable token and back. This is the only layer that must be perfect, and it is trivial to make perfect because it is a hash map. Same value, same token, every time. This is what lets the model say “pass-the-hash with EXEGOL_SECRET_NTLM_7” and have the real hash substituted back in before the command runs.

Layer four, utility: a learned model. Not built yet, and the literature review told me exactly what it should be, so I will hold it for the sources section. Its job is not completeness. Layer one owns completeness. Its job is to recover the bits of utility that deny by default oversteps, and to handle the one residual case.

The thing I want to stress is the demotion. In Part I the regexes were the guarantee. Here they are downgraded to signals and typing hints. The guarantee now comes from the inversion, which cannot be defeated by an unknown format, because an unknown format is still not in the dictionary.

What I see versus what the model sees
#

This is the part that makes it real. Here is the actual --users output from Shibuya, the thing my tools print, the thing I see:

SMB  shibuya.vl  445  AWSJPDC0522  _admin         2025-02-15 07:55:29 0  Built-in account for administering the computer/domain
SMB  shibuya.vl  445  AWSJPDC0522  svc_autojoin   2025-02-15 07:51:49 0  K5&A6Dw9d8jrKWhV
SMB  shibuya.vl  445  AWSJPDC0522  Simon.Watson   2025-02-16 10:23:34 0

Here is the exact same text after the engine, the thing the model sees:

SMB  EXEGOL_DOMAIN_2  445  EXEGOL_HOST_2  EXEGOL_USER_2   2025-02-15 07:55:29 0  Built-in account for administering the computer/domain
SMB  EXEGOL_DOMAIN_2  445  EXEGOL_HOST_2  EXEGOL_USER_5   2025-02-15 07:51:49 0  EXEGOL_SECRET_PASSWORD_1
SMB  EXEGOL_DOMAIN_2  445  EXEGOL_HOST_2  EXEGOL_USER_4   2025-02-16 10:23:34 0

The password K5&A6Dw9d8jrKWhV became EXEGOL_SECRET_PASSWORD_1. The username svc_autojoin became EXEGOL_USER_5. The domain, the host, the built-in admin account that I will eventually impersonate, all tokens. And crucially, the structure survives. The model can still read that there is a user, that the user has a secret, that they sit on a host in a domain. It can reason “the account in EXEGOL_USER_5 has a credential EXEGOL_SECRET_PASSWORD_1 exposed in its description, try authenticating with it.” It can do the whole attack. It just never sees a single real value. The dates and the English description stay readable, because they are generic, which is the utility deny by default is careful to preserve.

When the model emits a command, the vault runs in reverse. nxc smb EXEGOL_HOST_2 -u EXEGOL_USER_5 -p EXEGOL_SECRET_PASSWORD_1 becomes the real command with the real host, the real account, and the real password, host side, after the model has spoken, on its way to the container. The model proposed a working attack against a value it cannot name.

Three bugs the box found for me
#

Research writeups love to show the clean result. The interesting part is always the wrong turns, so here are three, because each one taught me something about measurement, not just about code.

The audit was lying to me. My first instinct was to keep the leak audit from Part I: scan the redacted text for anything that still looks like an IP, an FQDN, or a structured secret. On that --users output, the audit cheerfully reported zero leaks. And yet _admin and K5&A6Dw9d8jrKWhV were sitting right there in cleartext in the model’s view. The audit could not see them, because _admin is a username with no structural signature and the password has no form, which is the exact same blind spot that made detection impossible in the first place. An audit that only checks for things with a signature will always validate itself. The lesson, which I now treat as a rule: the only honest measurement is an oracle, a ground-truth list of the real secrets, held host side, never shown to the model, and used only to count. The 0xdf writeup is that oracle. Everything I report as a percentage in this article is measured against ground truth, not against what the engine thought it caught.

A regex that never matched. To decide whether to apply the description-field logic, I tested the command with /\b--users\b/. It never fired. The word boundary \b does not exist between a space and a hyphen, because both are non-word characters, so the pattern can never match --users. The flag I was keying on was silently always false. This is the second time in this project a word boundary quietly broke a check, the first being @@SERVERNAME in Part I, and the pattern is always the same: a check that looks obviously correct returns the wrong answer in silence, and only the empirical test reveals it. I now distrust every “obvious” check until I have watched it run.

The measurement itself over-counted. Late in the multi-box run, my oracle reported the username red leaking. I went to find it in the redacted output, and the only occurrence of the substring red was inside the English word credentials, in a tool banner. The username red had been tokenized correctly everywhere it actually appeared. My oracle was doing a naive substring match, so it counted red inside c-red-entials. A real leak and a substring coincidence are not the same thing, and conflating them under-reports your own success. The fix was word-boundary-aware presence checks in the measurement, the same lesson as the second bug, applied to the measuring instrument instead of the engine.

None of these were in the engine logic. Two were in the measurement and one was a silent always-false flag. That ratio is itself a finding: when the thing you are testing is “did anything leak,” the measurement is as likely to be wrong as the system, and a measurement that is wrong in the optimistic direction is the most dangerous kind.

The one hard case: a username that is also a word
#

Deny by default has exactly one weakness, and it is worth stating precisely because it is the whole frontier. It re-exposes an identity that happens to be a dictionary word.

On Shibuya the machine accounts are red and purple. Those are colors. They are in the dictionary, so layer one treats them as generic and re-exposes them. Same story for operator, a local account that is also an English word. The high-entropy secrets are all caught, the passwords and hashes and tickets, because none of them are words. The leak is only ever a human-friendly identity that collides with the generic vocabulary.

This is where layer two earns its place, and it does so without a single per-tool rule. The fix is structural position, not a list of names. An atom that sits in an identity position is an identity, dictionary word or not. red appears as EXEGOL_HOST_2\red, after a backslash, in the DOMAIN\principal shape, so it is a principal. operator appears as the head of operator:1000:hash, the universal pwdump record shape, so it is a principal. Promote both to tokens. I think of it as guilt by structural association: a token bound to sensitive material, through a shape that is true of the whole ecosystem rather than one tool, is itself sensitive.

With that, Shibuya goes to a clean sweep. Across the full root chain, every secret, every piece of topology, every identity including the dictionary-word ones, redacted. Twenty-five out of twenty-five ground-truth values, zero leaks, zero per-tool rules. But Shibuya is one box, and one box is exactly the overfitting trap I was trying to avoid, so the real test is later.

Live on Shibuya, blind, all the way up
#

Before the breadth test, the depth test. I drove the actual chain on the live box, with the engine in the loop, to prove the blind property holds through real privilege escalation and not just on captured text.

The interesting tier is lateral movement, because it is where a secret is discovered, hidden, and reused without ever being seen. Shibuya has a user logged in over RDP, nigel.mills. The technique is a cross-session relay with RemotePotato0: you trigger that user’s session into authenticating to you, and you capture the NetNTLMv2. It took a few wrong turns to get working headless. The output buffered and never flushed over a plain SSH pipe, so I forced a PTY. The relay port had to land in the firewall’s allowed range, so I aligned it. And then it fired, against the real session, and the capture came back:

[+] Received the relayed authentication on the RPC relay server
NTLMv2 Hash : Nigel.Mills::SHIBUYA:8a0bff96562863cd:23668f66e02a114ce8f7c13fb825ef64:0101...

That NetNTLMv2 went into the engine, became a token, and I cracked the token’s underlying value host side with hashcat against rockyou:

NIGEL.MILLS::SHIBUYA:...:Sail2Boat3

Sail2Boat3. A discovered secret, never seen by the model, cracked, and ready to reuse through its token. That is the loop from Part I, now running one tier deeper, on a value the engine had to discover rather than one I provided.

The foothold tier worked the same way. A local account hash recovered from registry hives inside a backup image, dumped with secretsdump, tokenized, then resolved back into a real pass-the-hash that authenticated as simon.watson and read the user flag. The model saw EXEGOL_SECRET_NTLM_x and proposed the attack. The attack ran with the real hash.

I will be honest about the top tier. The final step is an ADCS ESC1 abuse with certipy, and certipy kept timing out on its LDAP bind from my container, a connectivity quirk that has nothing to do with the research. I made a deliberate call there. The goal of this work is to study the abstraction, not to re-root a box that 0xdf already rooted and documented in public. So for the ESC1 stage I used the writeup’s real output as the tool output, and ran it through the engine like any other stage. The values are real, the redaction is real, the measurement is real. What I did not do is re-execute a step that adds nothing to the question the article is actually asking. Stating that plainly is more useful than pretending I rooted it cleanly.

Universality, measured: ten boxes, zero per-tool rules
#

This is the test that decides whether the inversion is real or whether I overfit to Shibuya. I took ten Active Directory machines from 0xdf’s catalog, deliberately spanning very different tools and techniques, pulled the real tool output from each writeup, built a ground-truth oracle for each, and ran the same engine with zero per-tool rules over all of them.

The spread is wide on purpose. ADCS shadow credentials and ESC abuse. MSSQL and a config INI with a service password. bcrypt hashes from a web database. RID brute and a password hidden in a description. gMSA passwords, DPAPI keys, Kerberos S4U. LAPS and a john-cracked PFX. Timeroasting and SNTP hashes. Splunk secret decryption and an /etc/shadow dump. BloodHound, targeted Kerberoast, secretsdump. If the engine were secretly an AD-specific trick, this is where it would fall apart.

Certificate   100%   Cicada 100%   Fluffy 100%   Haze 100%   Monteverde 100%
RustyKey      100%   Timelapse 100%   Vintage 100%
EscapeTwo      81%   Administrator 76%
----------------------------------------------------------
209 / 219 ground-truth values redacted = 95.4 %, zero per-tool rules

Seven of ten boxes redact every single ground-truth value with no rules at all. The aggregate is 95.4 percent. And the ten misses are not scattered noise. Every single one is the same class: a username that is a first name (olivia, michael, emily, ethan, angela, oscar, kevin, ryan) or a two-letter login like sa, sitting in a bare table column with no DOMAIN\ prefix and no colon-delimited record around it, so the structural shapes do not fire and the dictionary re-exposes it. Every high-entropy secret, on every box, the bcrypt and the NT hashes and the NetNTLMv2 and the krb5 and the DPAPI keys and the Splunk ciphertext and the gMSA passwords and the $6$ shadow hashes, redacted at one hundred percent.

That is the result I actually care about. The thing that scales, the completeness on signature-less and unknown-format secrets, holds across ten different tool ecosystems with no per-tool knowledge. The thing that does not yet scale is a single, narrow, well-understood class: a friendly name in an unstructured position. I know exactly what it is, I can point at all ten instances, and the literature, it turns out, knows exactly how to close it.

What the literature already knows
#

I did not want to claim novelty without doing the reading, so I ran a deep multilingual literature sweep across the PII, de-identification, tokenization, and LLM-privacy worlds, and verified the claims adversarially before trusting them. A few things came back sharply.

Microsoft Presidio is the named version of the trap I climbed out of. Its PatternRecognizer binds one regex or deny-list to one entity type, and covering a new type means registering a new recognizer. That is exactly the per-type enumeration treadmill. Presidio is excellent engineering, and it is structurally the thing deny by default refuses to be.

The closest neighbor to my idea is Philter, and it is closest in the most useful way. Philter, a clinical de-identification system published in Nature Digital Medicine, also inverts to an allowlist: it keeps a safe-word whitelist of around 195 thousand terms and scrubs the rest. That is the same inversion I am making. The difference is the two properties I need that it does not have. Philter is destructive, it replaces with asterisks, and it is one-way, there is no reverse map. My vault keeps the inversion but makes it reversible, which is what lets the agent reuse a secret it never saw. So the inversion is not new. The inversion made bidirectional, in service of reuse, is.

The reversible-token problem is a solved problem in another field. Format-preserving encryption, NIST’s FF1 and FF3 modes, is exactly a reversible, format-preserving token, the same primitive that vaultless tokenization uses for credit card numbers under PCI-DSS. FF3 is broken and you should use FF1 or FF3-1, but the point stands: if I want tokens that do not fragment a password on a stray &, the payments world has the cryptography ready.

The hard case has a name and a tool. The dictionary-word identity, my one residual leak class, is precisely what learned named-entity recognition is good at, because a model learns that olivia in a username column is a person regardless of whether it is in a dictionary. GLiNER, a zero-shot NER that tags any entity type from a natural-language label, and its multilingual PII variants, are the right shape for layer four. The same direction also fixes my other admitted weakness, which is that my dictionary is English and a real engagement has logs in other languages. One caveat I will flag honestly: I could not firmly verify the exact arXiv identifier for the multilingual PII variant, so treat that specific citation as provisional until reconfirmed, and I did confirm that GLiNER does not make a learned redactor unnecessary, so it is an addition, not a replacement.

And the reason to do any of this architecturally rather than by instruction is now well evidenced. Multi-turn extraction attacks exceed seventy percent attack success against behavioral defenses, where single-turn attacks sit in the low single digits, and the data-minimization work shows that an LLM asked to minimize its own exposure does not do it reliably. Both point the same way. You cannot ask the model to keep a secret. You have to make the secret not be there. That is the entire thesis, and it is the thing deny by default delivers by construction.

Putting it together, the verdict of the review was clean. No prior system is, at once, deny-by-default tokenize-everything, bidirectional with deterministic reverse-substitution into the executed command, and aimed at arbitrary engagement-specific identities discovered live from tool output. The pieces all exist. The composition, for a credential-blind agent, is the contribution.

What is still open
#

In the spirit of Part I, the honest ledger.

The dictionary-word identity. Ten leaks across ten boxes, all one class, all friendly names in unstructured columns. The plan is a local NER layer. It is designed, not built.
Multilingual coverage. The generic dictionary is English. CJK and Russian coverage is unproven, and a real audit will have non-English logs. The NER layer is also the answer here, but it is unvalidated, and I will not claim it until I measure it.
Value fragmentation. Deny by default tokenizes atoms, so a password with an internal separator can split across two tokens. That protects against leakage but harms clean reuse, which is exactly the case for adopting FF1 format-preserving tokens.
The deployable audit is still incomplete. It always will be, because a deployable audit has no oracle. In research I measure against ground truth. In production the structural audit is a smoke alarm, not a proof, and I should say so loudly rather than dress it up.
One provider, still. As in Part I, the provider-agnostic claim is earned by construction but not yet measured across models. That debt is still open.

None of these are fatal. All of them are written down with a direction, which remains the difference between research and a demo.

Where the series goes next
#

The learned layer. Build the local NER pass, close the dictionary-word class, and bring in multilingual coverage. This is the single highest-value next step, and the literature handed me the design.
Format-preserving tokens. Adopt FF1 so values stop fragmenting and round-trip cleanly, with the vault key in the OS secret store.
Adversarial robustness, for real. The test I keep promising: a multi-turn adversary trying to make the agent reveal the value behind a token. With deny by default the value is not in the context at all, so the prediction is that it cannot, and the evidence on behavioral defenses says this is exactly where instruction-based approaches fail.
Integration. Fold the engine into the two choke points of a real agentic tool, so this becomes a switch and not a proof of concept.

Part I showed the core works on real machines. Part II throws away the part that did not scale, inverts it, and shows the inversion holds across ten different tool ecosystems with no per-tool knowledge, with one narrow, named, fixable gap. The series is converging on the thing I actually want: a pipeline you can hand real credentials, let it discover more, and trust to never leak a single secret, identity, IP or domain to whatever model sits behind the API, no matter the tool, no matter the box.

Sources
#

Microsoft Presidio, adding recognizers (the per-type architecture): microsoft.github.io/presidio/analyzer/adding_recognizers
Philter, allowlist-based clinical de-identification, Nature Digital Medicine 2020: nature.com/articles/s41746-020-0258-y
NIST SP 800-38G, format-preserving encryption FF1/FF3: csrc.nist.gov/pubs/sp/800/38/g/upd1/final
GLiNER, zero-shot named-entity recognition: arxiv.org/abs/2311.08526
Hide-and-Seek (HaS), reversible anonymize-then-restore for remote LLMs: arxiv.org/abs/2309.03057
PAPILLON, privacy-conscious delegation, NAACL 2025: aclanthology.org/2025.naacl-long.173
C-sanitized, an information-theoretic framework for document sanitization: arxiv.org/abs/1406.4285
Operationalizing Data Minimization for Privacy-Preserving LLM Prompting: arxiv.org/pdf/2510.03662
Beyond Jailbreaking, auditing contextual privacy in LLM agents: arxiv.org/pdf/2506.10171
HTB Shibuya writeup (0xdf): 0xdf.gitlab.io/2025/06/19/htb-shibuya.html
Active Directory writeups used for the breadth test (0xdf): Certificate, EscapeTwo, Administrator, Cicada, Fluffy, Vintage, Timelapse, Monteverde, RustyKey, Haze, all at 0xdf.gitlab.io/tags#active-directory

Credential-Blind Agentic Pentesting, Part II: Deny by Default, or How I Stopped Writing Regexes#

Table of Contents#

The uncomfortable question#

Why detection can never be complete#

The inversion: keep the generic, tokenize the rest#

What the engine actually does#

What I see versus what the model sees#

Three bugs the box found for me#

The one hard case: a username that is also a word#

Live on Shibuya, blind, all the way up#

Universality, measured: ten boxes, zero per-tool rules#

What the literature already knows#

What is still open#

Where the series goes next#

Sources#

Related