GENERATIVE AI-ASSISTED EXPLOIT PAYLOAD GENERATION FROM PUBLIC VULNERABILITY ADVISORY FEEDS: DESIGN, IMPLEMENTATION, AND EMPIRICAL STUDY

Dariga Yermakhankyzy; Syed Imran Moazzam Shah

Authors

Dariga Yermakhankyzy School of Information Technology and Engineering, Kazakh-British Technical University, Almaty, Kazakhstan, https://orcid.org/0000-0005-4911-1259
Syed Imran Moazzam Shah School of Information Technology and Engineering, Kazakh-British Technical University, Almaty, Kazakhstan, https://orcid.org/0000-0002-9289-5467

Keywords:

vulnerability intelligence, CVE automation, large language models, payload generation, penetration testing, NVD, GHSA, DVWA, XSS, CMDI, LFI, threat emulation, empirical evaluation

Abstract

Modern web applications face an ever-growing catalogue of disclosed vulnerabilities, yet translating Common Vulnerabilities and Exposures (CVE) records into actionable security test cases remains a labor-intensive, largely manual process. This paper presents an automated threat-emulation pipeline that bridges public vulnerability intelligence feeds — the National Vulnerability Database (NVD) and the GitHub Security Advisory (GHSA) database — with a target web application under assessment. The pipeline operates in four stages: (1) structured CVE acquisition and severity-based filtering using the Common Vulnerability Scoring System (CVSS) and the Common Weakness Enumeration (CWE) taxonomy; (2) advisory enrichment by fetching GHSA patch-level commit diffs and affected package metadata via the GitHub GraphQL API; (3) context-aware exploit payload generation driven by a Large Language Model (LLM, specifically Claude Sonnet) that transforms structured CVE context into targeted HTTP payloads; and (4) automated response-evidence validation, also LLM-assisted, that classifies each probe attempt as a true positive (TP) or false positive (FP) against a Damn Vulnerable Web Application (DVWA) reference target. An empirical evaluation over 240 CVE records produced 712 probe alerts spanning Cross-Site Scripting (XSS), Command Injection (CMDI), and Local File Inclusion (LFI) vulnerability classes. Among CVEs whose weakness type matched DVWA's attack surface, the system achieved a detection rate of 95.1 %, precision of 0.999, recall of 1.0, F1-score of 0.9993, and Matthews Correlation Coefficient (MCC) of 0.9992.

GENERATIVE AI-ASSISTED EXPLOIT PAYLOAD GENERATION FROM PUBLIC VULNERABILITY ADVISORY FEEDS: DESIGN, IMPLEMENTATION, AND EMPIRICAL STUDY

Authors

Keywords:

Abstract

Published

How to Cite

Issue

Section

License