>_~/projects/ai4security-research
AI research / cybersecurity / 2024

Model Assessment

Compared general-purpose and cybersecurity-specific open-source models on tasks such as security interpretation, alert enrichment, and context-aware response generation.

Alert Enrichment

Built an enrichment flow around alerts, assets, vulnerabilities, and synthetic records so a model could generate summaries, remediation guidance, and contributing factors.

Anomaly Detection

Tested LLM-assisted anomaly classification on industrial network-derived NetFlows, comparing in-context learning, fine-tuning, and reasoning-first prompting.

Project Context

The work surveyed practical AI use cases for cybersecurity teams, focusing on model accessibility, local deployment constraints, domain specialization, and how LLM-assisted reasoning might improve analysis of alerts, assets, and vulnerabilities.

Two tracks were explored in parallel: Hugging Face open-source model assessment for alert enrichment and security-data interpretation, and anomaly detection experiments using Azure OpenAI workflows on network-flow data derived from industrial traffic captures.

Metadata
Domain:
Cybersecurity / AI Research
Primary Themes:
Alert Enrichment / RAG / Anomaly Detection
Model Families:
Mistral / Lily Cybersecurity / SecurityLLM / GPT-4o
Deployment Focus:
Quantized local inference + hosted API evaluation
Data Context:
Alerts / Assets / CVEs / Industrial NetFlows
Open-Source Model Workflow

Hugging Face Discovery

Filtered cybersecurity-domain models and narrowed the field to candidates that were realistic to run, compare, and evaluate in a security-research workflow.

filter models -> cybersecurity -> quantized candidates

Local Deployment

Focused on quantized 4-bit models that could be deployed locally with GPU support, reducing hardware cost while keeping experiments practical.

quantized gguf -> llama.cpp -> local inference

Prompted Security Analysis

Tested whether models could read alert context, asset data, and vulnerability records and turn that information into useful analyst-facing summaries.

alert + asset + cve context -> summary + remediation
Alert Enrichment And Local Inference

The first major use case combined alerts, assets, and vulnerability context into a security-enrichment pipeline. Synthetic datasets were created to avoid confidentiality and integrity risks while still giving the models realistic security data to reason over.

Domain-tuned models such as Lily Cybersecurity and SecurityLLM outperformed the baseline Mistral run in the presentation’s qualitative results, even though all tested models still carried heavy inference costs.

Hugging Face cybersecurity model search
Model discovery phase for narrowing cybersecurity-domain candidates on Hugging Face.
Local model loading code
Example of loading a quantized open-source cybersecurity model for local inference.
Alert enrichment response sample
Sample model output summarizing a security alert and associated vulnerability context.
Anomaly Detection Track

The second track used the UNB CIC Modbus 2023 dataset, converting PCAPs to NetFlows and then reformatting those flows into structured LLM prompts. The goal was to determine whether LLMs could classify benign versus anomalous industrial traffic under different prompting and fine-tuning strategies.

Approach 1: In-Context Learning

Accuracy 0.7300 / Recall 0.9700

Provided labeled examples of normal and anomalous NetFlows directly in the prompt to guide classification. It performed well on recall but remained constrained by token limits.

Approach 2: Fine-Tuning

Accuracy 0.7240 / F1 0.1882

Fine-tuned a hosted model on labeled NetFlow data to avoid prompt-size limits. It scaled example volume better, but the measured classification quality was weaker than the best prompt-driven runs.

Approach 3: Reasoning First

Accuracy 0.7300 / Recall 1.0000

Asked the model to explain its reasoning before producing a prediction, reducing post-hoc hallucinated justifications and yielding the strongest overall balance in the deck’s reported results.

Synthetic vulnerability data sample
Structured vulnerability data used as part of the alert-enrichment context layer.
NetFlow prompt format
Structured message format used for anomaly-detection prompting with network-flow data.
Conclusion

The strongest results came from domain-tuned cybersecurity models and from prompt structures that force reasoning before prediction, especially when the task depends on structured evidence and contextual security knowledge.

Final Thoughts

This project sharpened practical thinking around model selection, quantization, prompt design, synthetic data safety, and how to evaluate whether an AI workflow is actually useful for analysts instead of only sounding impressive in a demo.