Prefer YouTube? Open the video in a new tab.
We gave Hacker Sidekick a CVE number and a description of what we needed: build a proof-of-concept exploit for CVE-2026-23918, with a test environment, validation, and documentation. No follow-up prompts. No guidance along the way.
It downloaded the Apache source code, diffed the vulnerable and patched versions, identified the root cause, built a Docker lab, wrote a working Python exploit, explored the RCE path, validated the results against both targets, and produced a technical report with detection signatures. The whole thing took about forty minutes.
Everything you're about to read happened on camera. We recorded the entire session.
The vulnerability
CVE-2026-23918 is a double-free in Apache httpd 2.4.66's mod_http2 module. It carries a CVSS score of 8.8 and affects any deployment using HTTP/2 with a multi-threaded MPM (worker or event), which covers most production Apache installations.
The bug lives in the stream cleanup path of h2_mplx.c. When a client sends an HTTP/2 HEADERS frame immediately followed by a RST_STREAM on the same stream, before the multiplexer has finished registering the stream, two callbacks fire in sequence: c1c2_stream_joined() and m_stream_cleanup(). Both functions unconditionally push the same h2_stream pointer onto a purge array. When Apache later iterates that array and calls h2_stream_cleanup() / apr_pool_destroy() on each entry, the second call operates on memory that has already been freed.
On its own, that's a reliable denial-of-service. One TCP connection, a few HTTP/2 frames, and a worker process crashes.
On systems where the Apache Portable Runtime uses the mmap allocator (the default on Debian, Ubuntu, and the official Docker image), the double-free opens a research path toward remote code execution. The freed h2_stream object is allocated from an apr_pool_t, which itself lives in malloc/free-backed memory. A double-free of this object means an attacker can potentially force a subsequent allocation of the same size class to reclaim the freed slot with controlled data, redirect a function-pointer-like callback, and execute arbitrary code. We explored this path and documented what we found.
The prompt
We opened Hacker Sidekick, created a workspace folder for the project, and typed one prompt:
We need to develop a proof of concept exploit for CVE-2026-23918 that we can use to validate that the issue has been fixed in our environment and test against our detection and prevention rules in a lab. Please make a plan for developing this exploit including researching how it works, planning the development, how to test and validate that it works, and how to properly document it.
The prompt asked Hacker Sidekick to develop a proof-of-concept exploit for CVE-2026-23918 that could be used to validate that the issue has been fixed in our environment and test against our detection and prevention rules in a lab. We asked it to make a plan covering research, development, testing, validation, and documentation.
No prompt engineering tricks. No jailbreaking. No fiction wrapper. We described a real vulnerability and asked for a real exploit in the same language you'd use to scope the work for a human researcher.
This is the kind of request that every general-purpose AI tool refuses outright. Hacker Sidekick was built for this work, so it doesn't have that problem.
What happened next
Hacker Sidekick started by researching the CVE. It searched the web for advisory data, patch details, and existing technical analysis. Then it produced a full exploit development plan.
The plan laid out seven milestones:
- S1: Research. Download Apache httpd 2.4.66 and 2.4.67 source, diff
mod_http2, identify the root cause. - S2: Environment setup. Build vulnerable and patched Docker containers for lab testing.
- S3: Denial-of-service PoC. Write a reliable crash trigger.
- S4: Detection variant. Build a slow-drip version that produces a fingerprintable pattern for detection rule testing.
- S5: RCE exploration. Analyze the heap, attempt grooming, assess feasibility.
- S6: Validation. Automated test harness that confirms the crash on vulnerable targets and stability on patched targets.
- S7: Documentation. Final report with root cause, reproduction steps, RCE assessment, and detection guidance.
Before proceeding, it asked for approval. This is Hacker Sidekick's checkpoint system. It won't start heavy implementation or execute anything against live targets without explicit sign-off. We approved the full plan.
Source code analysis
It downloaded the upstream Apache 2.4.66 and 2.4.67 source tarballs, extracted them, and diffed the mod_http2 module. 284 lines changed.
From the diff, it identified the exact fix: Apache 2.4.67 introduces a helper function called add_for_purge() that checks whether a stream pointer is already present in the purge array before adding it. If it finds a duplicate, it returns FALSE and prevents the double-push. It also added a secondary sanity check comparing stream IDs. These two changes close the double-free.
The dev log documents the root cause, the race condition between c1c2_stream_joined() and m_stream_cleanup(), and the specific code paths that allow the same pointer to be queued twice.
Lab environment
It built two Docker containers: one running vulnerable Apache 2.4.66 with mod_http2 enabled, SSL configured, and the event MPM active on port 8443, and one running patched 2.4.67 for comparison.
The configuration matches a realistic production deployment. HTTP/2 over TLS, multi-threaded MPM, mmap allocator active. This is the default setup on Debian and Ubuntu, and the configuration most Apache installations are running.
The exploit
The PoC is a Python script called cve_2026_23918_poc.py. It handles the full attack chain:
TLS connection with ALPN negotiation for h2. Raw HTTP/2 frame construction using the h2, hyperframe, and hpack libraries. The HEADERS-then-RST_STREAM trigger sequence that races the stream lifecycle callbacks to induce the double-free.
It has two modes. A DoS mode that sends rapid RST_STREAM frames to crash worker processes as fast as possible. And a detection mode that produces a slower, fingerprintable traffic pattern designed for testing WAF rules and IDS signatures.
The development wasn't a straight line. Hacker Sidekick ran the exploit, checked the Apache error logs for segfault indicators, adjusted timing parameters, and iterated. This is the same cycle a human exploit developer goes through, but it happened in minutes instead of hours.
RCE exploration
With the DoS working reliably, it moved to RCE exploration. It analyzed the freed h2_stream object (approximately 200 bytes on x86_64), examined the apr_pool_t internals, and assessed whether an attacker could reclaim the freed memory with controlled data.
It wrote a heap grooming script (rce_groom.py) that demonstrates the concept: spray HTTP/2 header fields to shape the heap, trigger the double-free, attempt to reclaim the freed slot. The script ran and produced output showing the grooming, the trigger, and the heap slot capture.
The assessment is honest. The DoS is highly reliable on unpatched Apache 2.4.66. The RCE is possible in a controlled lab with substantial additional effort: heap debugging, info-leak, ASLR bypass, and monitor callback registration. The current deliverable is a grooming primitive and technical roadmap, not a turnkey exploit. That distinction is documented clearly, including the practical blockers: build variance across distributions, timing sensitivity, and the fact that the monitor callback is only dereferenced if explicitly registered.
Validation
It wrote a test harness (validate.py) that automates the full validation cycle. Probe the target for HTTP/2 support. Run the PoC. Scan Apache error logs for segfault and double-free indicators. Return PASS or FAIL.
Run it against the vulnerable target, confirm the crash. Run it against the patched target, confirm stability. This is the deliverable you hand to the team doing patch validation or regression testing.
The final report
The completed project folder contains everything:
exploit-development-plan.md: The original staged plan with milestones, constraints, and safety boundaries.dev-log.md: A running activity log covering the diff review, environment setup, PoC iterations, and RCE analysis.cve_2026_23918_poc.py: The proof-of-concept exploit with DoS and detection modes.rce_groom.py: The heap grooming research script.validate.py: The automated validation harness.target/: Dockerfiles for building vulnerable (2.4.66) and patched (2.4.67) Apache containers, plushttpd.confand SSL certificates.report.md: The final report covering root cause, patch analysis, reproduction steps, RCE feasibility assessment, and detection and prevention guidance including network signatures, WAF rules, and log patterns.
Eight milestones. All completed. Forty minutes.
What this means
Building a complete exploit development workflow for a four-day-old CVE is not a trivial exercise. It requires source code analysis, environment engineering, exploit development, heap analysis, iterative testing, and clear documentation. This is work that typically takes an experienced researcher a day or more.
We're not claiming Hacker Sidekick replaces that expertise. You still need to understand what you're looking at, how to scope the work, and how to interpret the results. But having an AI that can take a vulnerability description and produce a working exploit, a test environment, a validation harness, and a technical report, without refusing, without lecturing you about responsible use, without requiring creative prompt workarounds, changes the speed at which this work happens.
Every pentester, red teamer, and security researcher has hit the wall where a general-purpose AI tells them it can't help with their legitimate professional work. That wall doesn't exist here.
Try it yourself
Download Hacker Sidekick at hackersidekick.com and see what it can do with your next engagement.
If you're evaluating for a team, reach out at [email protected] or schedule a demo.
Watch it again: Exploiting a 4-Day-Old CVE with Hacker Sidekick in 40 Minutes on YouTube.
- The Hacker Sidekick Team