Application of Large Language Models to Automatic Classification of Vulnerabilities According to the CVSS 3.1 Standard

Authors

  • Michal Walkowski Department of Telecommunications and Teleinformatics, Wroclaw University of Science and Technology
  • Nikita Zhukov Wroclaw University of Science and Technology
  • Slawomir Sujecki Wroclaw University of Science and Technology https://orcid.org/0000-0003-4588-6741

Abstract

We evaluated three chatbot models (ChatGPT-4o-mini, Gemini 2.0 Flash, Deepseek Chat) to automate CVSS 3.1 vulnerability scoring using 4,459 CVE records. Gemini achieved the highest accuracy across prompt strategies, while ChatGPT showed vector-score inconsistencies, and Deepseek underestimated severity. Results suggest that chatbots can support analysts but require validation mechanisms.

Additional Files

Published

2026-02-17

Issue

Section

Applied Informatics