Subscribe to Running With Scissors

Hacking, policy, advocacy, and the sharp end of security research. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

Check your inbox

A confirmation link has been sent to your email.

Policy Pulse - Issue #13 | Week of May 3, 2026

AISI's GPT-5.5 evaluation confirms frontier-model offensive cyber capability is a trend, not a Mythos one-off. NIST formally drops enrichment for 29,000 backlogged CVEs. UK still has no statutory defence.

Policy Pulse - Issue #13 | Week of May 3, 2026

Policy Pulse - Issue #13 | Week of May 3, 2026

AISI's second frontier-model cyber evaluation in four weeks confirms what Issue #12 hinted at: this is a trend, not a Mythos one-off. NIST formally drops enrichment for ~29,000 backlogged CVEs. UK researchers warn at CyberUK 2026 that Britain is now the only major western economy with no statutory defence for cyber pros.


Top Story

AISI's GPT-5.5 evaluation confirms the Mythos pattern: frontier-model offensive cyber capability is now a trend, not an outlier

On April 30, 2026, the UK AI Security Institute (AISI) published its evaluation of OpenAI's GPT-5.5, finding that the model hit 71.4% on expert-level cyber tasks and became the second model after Anthropic's Claude Mythos Preview to complete AISI's 32-step end-to-end network attack range ("The Last Ones"), a corporate intrusion simulation, built with SpecterOps, that AISI estimates would take a human expert roughly 20 hours. GPT-5.5 finished it in 2 of 10 attempts; Mythos managed 3 of 10 in AISI's April 13 evaluation. (AISI)

The headline number isn't the story. The story is the four-week interval. AISI's previous frontier-model evaluation, of Claude Mythos Preview on April 14, was framed by the Institute and most coverage as a Mythos-specific uplift event. GPT-5.5's results, released by a different lab on a different architecture, suggest the offensive cyber capability is emerging as a byproduct of broader gains in reasoning, code generation, and autonomous task execution — not as a deliberately trained capability. AISI states this explicitly: "if offensive cyber skill is emerging as a byproduct of wider improvements... then further advances could arrive in quick succession."

For VDP operators, the AISI report's most operationally relevant finding is buried below the capability numbers: AISI red-teamers identified a universal jailbreak that elicited harmful content across all malicious cyber queries tested, including in multi-turn agentic settings. The jailbreak took six hours of expert effort to develop. That is the disclosure pathway problem — a single prompt-injection-class finding in a frontier model is now a finding that affects an estimated hundreds of millions of downstream API consumers, and the existing VDP infrastructure (intake forms, severity scoring, coordinated disclosure timelines) was not designed for that blast radius.

Why it matters for VDP: Two government AI institutes have now published independent capability evaluations of frontier models in roughly four weeks. The disclosure norms — who gets advance notice, how findings are coordinated, what intake channels exist for AI-discovered or AI-enabled vulnerabilities — are being set by precedent right now, mostly without VDP-community input. Programs that intake AI-related submissions need to decide, before the volume hits, whether universal-jailbreak-class findings are in scope, who triages them, and what the coordinated disclosure window looks like when the affected surface is "every API consumer."

Throwback: In Issue #12, we covered AISI's Claude Mythos evaluation as the first government assessment of a frontier model's offensive cyber capability. This week's GPT-5.5 result is the second data point in that line, and it is the one that turns Mythos from "event" into "trend."


Upcoming Deadlines & Events

Date Agency Event/Deadline Action Required Link
May 5, 2026 NIST Cyber AI Profile Spring Working Session #2 Register and attend; written input on Secure/Defend/Thwart focus areas NCCoE Cyber AI Profile
May 12, 2026 NIST Cyber AI Profile Spring Working Session #3 Final spring session before initial public draft (IPD) NCCoE Cyber AI Profile
May 2026 (window) CISA CIRCIA Final Rule expected publication Critical infrastructure operators: review 72-hour incident / 24-hour ransom-payment reporting obligations against current playbooks CIRCIA FAQs
September 11, 2026 European Commission EU Cyber Resilience Act vulnerability and incident reporting obligations begin Manufacturers placing products in the EU: stand up CVD intake channel, 24h early warning + 72h notification + 14-day final report capability via the CRA Single Reporting Platform CRA Reporting
March 2026 (passed; track follow-on) CISA / MITRE 11-month CVE Program contract extension expires Watch for: CVE Foundation transition update, FY27 funding decision, NVD coordination plan CVE Foundation
October 2027 (next cycle) US Copyright Office / Library of Congress Tenth Triennial Section 1201 Proceeding opens Plan now: AI trustworthiness research carve-out is the active community ask for the 2027 cycle Section 1201 Proceedings

This Week in Policy

AI & Emerging Tech Security

  • AISI publishes GPT-5.5 cyber capability evaluation (April 30, 2026): GPT-5.5 hits 71.4% on expert-level cyber tasks, completes the 32-step "Last Ones" network attack range in 2 of 10 attempts, solves a reverse-engineering challenge in 10:22 at $1.73 in API cost (vs. ~12 hours for a human expert). Universal jailbreak elicited violative cyber content across all tested queries. (AISI)
    • Why it matters for VDP: Universal-jailbreak-class findings in frontier APIs are coordination problems, not single-vendor bugs. Programs need an explicit AI-finding intake decision before submission volume forces one.
  • CAISI evaluates DeepSeek V4 Pro (released April 27, 2026): NIST's CAISI used CTF-Archive-Diamond (285 challenges from the pwn.challenge platform) to find DeepSeek V4 Pro is the most capable PRC model to date but lags the closed frontier by ~8 months on cyber. (NIST)
    • Why it matters for VDP: First public US-government evaluation of a major open-weight model's cyber capability. Open-weight + capable-enough means the disclosure perimeter for AI-enabled vulnerabilities now includes uncontrolled fine-tunes, and VDP intake guidance needs to acknowledge that.
  • NIST Cyber AI Profile spring working sessions (April 28, May 5, May 12): Three virtual sessions on the NIST IR 8596 preliminary draft (December 2025), structured around the Secure / Defend / Thwart focus areas. Initial public draft expected later in 2026. (NCCoE)
    • Why it matters for VDP: This is the working-session window where VDP-specific language can still get into the IPD. After IPD, the input surface narrows to formal comment.

Federal Strategy & Regulation

  • CIRCIA final rule slips to May 2026: CISA confirmed the Cyber Incident Reporting for Critical Infrastructure Act final rule is now expected to publish in May 2026, after the agency hosted virtual town halls between March 9 and April 2 to absorb harmonization feedback. The estimated covered population remains over 300,000 entities across 16 critical infrastructure sectors. (CyberScoop)
    • Why it matters for VDP: CIRCIA reporting and VDP intake are not the same pipeline, but the 72-hour and 24-hour clocks apply once an incident is "substantial," and most program owners do not yet have a documented bridge between researcher submissions and CIRCIA-triggering events. Now is the right window to write that bridge.
  • CISA Emergency Directives retired in January 2026 remain retired: A reminder for context on Issue #12's KEV-volume framing: the ED tool is now off the table; KEV is the operational signal. (CISA)

CVE & Vulnerability Programs

  • NIST formally moves to risk-based NVD enrichment (effective April 15, 2026): NIST will prioritize enrichment of CVEs in CISA's KEV catalog (target: one business day), CVEs for software used by the federal government, and CVEs for critical software per EO 14028. Backlogged CVEs with an NVD publish date earlier than March 1, 2026 have been reclassified as "Not Scheduled" (third-party tracking puts the count near 29,000). Going forward, NIST will not routinely provide a separate severity score for non-prioritized CVEs. Q1 2026 submissions ran nearly one-third higher than Q1 2025. (NIST)
    • Why it matters for VDP: "Not Scheduled" is not the same as "low severity." Programs that depend on NVD enrichment (CVSS, CPE, CWE) for triage now need a documented fallback for non-KEV, non-federal-software CVEs. The 263% submission growth (2020-2025) cited by NIST is not slowing.
  • CVE Foundation transition window now active: With the 11-month CISA-MITRE contract extension (April 2025) running through approximately March 2026, the CVE Foundation, launched April 16, 2025, is the standing alternative governance structure. Multiple non-US governments and dozens of private-sector companies have publicly pledged support. (CyberScoop)
  • UK CyberUp Campaign at CyberUK 2026 publishes "Protections for cyber researchers: How the UK is being left behind": Briefing argues Australia, Belgium, France, Germany, Hong Kong, Malta, Portugal and the United States have already secured legal protections for cyber professionals; the UK has not. Campaign continues to push the four-pillar Defence Framework (Harm vs. Benefit, Proportionality, Intent, Competence) for inclusion in any forthcoming Computer Misuse Act 1990 reform. (Computer Weekly)
    • Why it matters for VDP: The CMA is still the law most likely to chill UK-based researcher participation in cross-border VDP. A statutory defence is the single intervention with the largest expected impact on UK researcher coverage in international programs.
  • DOJ 2022 good-faith research charging policy remains the operative US position: No statutory CFAA reform has progressed; the May 2022 policy is still the operative US prosecutorial posture for good-faith security research and is still not binding on courts, civil litigants, or state law. (EFF)

International Developments

  • EU CRA vulnerability reporting clock now four months out: From September 11, 2026, manufacturers placing products with digital elements on the EU market must report actively exploited vulnerabilities through the CRA Single Reporting Platform (SRP): 24-hour early warning, 72-hour notification, 14-day final report (after corrective measure). All products on the market before December 11, 2027 are in scope. (European Commission)
    • Why it matters for VDP: This is the largest single expansion of mandatory CVD intake in years. Every product manufacturer with EU customers needs a documented CVD policy and a defined intake channel before September 11.
  • Pall Mall Process annual cycle continues: UK-France led process on commercial cyber intrusion capabilities continues through the Industry Guidelines drafting work for 2026, with explicit references to the Budapest Convention and UN Cybercrime Convention as anchoring frameworks. (UK Government)

Friends of disclose.io

Zack Whittaker: "Why every organization should make it easy to report security flaws"

Zack Whittaker's May 2 piece in this week in security is the cleanest restatement of the disclose.io thesis we have read this year, and it lands at exactly the right moment. With the EU CRA's mandatory CVD intake clock four months out, NIST's NVD enrichment changes pushing more triage work back to the program owner, and frontier-model capability evaluations starting to land disclosures of AI-enabled vulnerabilities at unprecedented blast radius, "make it easy to report" has graduated from best practice to operational floor.

Whittaker's argument is structural: when there is no legitimate channel, the channel becomes the press. The piece walks through several cases — Express, Bluspark, Home Depot, CSC ServiceWorks, Practice by Numbers — where the absence of a security email or security.txt forced researchers and concerned users to escalate via journalists, and the resulting story carried the reputational damage rather than the underlying flaw. The companies that came out of those stories cleanest were the ones that did the right thing after the fact: published a disclosure policy, committed to a security reporting page, made the intake real.

Editorial sketch portrait of Zack Whittaker

Key findings:

  • Companies without dedicated security contact channels regularly end up surfaced through media instead of researchers, which inverts the cost structure of disclosure
  • A security email plus a security.txt file is, in 2026, the minimum viable intake — not the aspirational target
  • How a company responds to a disclosed vulnerability matters more for reputation than the existence of the vulnerability itself
  • VDP and bug bounty programs measurably improve risk management, not just optics
  • The cost of not having a channel is rising fast as researcher volume, AI-discovered findings, and regulatory intake mandates compound

Read the full article

Zack has been one of the most consistent voices in security journalism on coordinated disclosure for over a decade. this week in security remains essential reading for VDP operators tracking how disclosure stories actually break in the real world. Subscribe at zackwhittaker.com.


Worth Reading


Policy Pulse is a weekly bulletin from disclose.io. Keeping the security research community informed on policy that affects our work.

Have a tip or want to contribute? Reply to this email, reach out on Twitter/X or Bluesky, or drop a comment in the community forum.