Metadata-Version: 2.4
Name: shield-security
Version: 0.6.0
Summary: SHIELD — Automated security layer for LLM-assisted web development
Home-page: https://github.com/Aliidrees1234/llm-security-research
Author: Ali Yasin Idrees
Author-email: aa1466805@gmail.com
Keywords: security,sast,bandit,semgrep,llm,ai,code-security,vulnerability
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: bandit>=1.9.4
Requires-Dist: bcrypt>=4.0.0
Requires-Dist: colorama>=0.4.6
Requires-Dist: astor>=0.8.1
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: semgrep; extra == "dev"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# LLM Security in Web Applications
### Emerging Threats, Vulnerability Taxonomies, and Mitigation Frameworks

> **Independent Academic Research — Ali Yasin Idrees — 2026**

[![SHIELD Security Gate](https://github.com/Aliidrees1234/llm-security-research/actions/workflows/shield_layer4_gate.yml/badge.svg)](https://github.com/Aliidrees1234/llm-security-research/actions/workflows/shield_layer4_gate.yml)
![Python](https://img.shields.io/badge/Python-3.14-blue)
![Version](https://img.shields.io/badge/SHIELD-v0.5.0-black)
![Tests](https://img.shields.io/badge/tests-91%20passing-brightgreen)

---

## Overview

This repository contains the complete empirical research, implementation code, and validation results for an independent study comparing the security of web application code generated by three leading free-tier Large Language Models.

| Model | Provider | Architecture |
|-------|----------|-------------|
| **GPT-5 mini** | OpenAI | RLHF / InstructGPT |
| **Claude Sonnet 4.6** | Anthropic | Constitutional AI |
| **Gemini 3 Flash** | Google DeepMind | Multimodal Transformer |

**20 standardized web development tasks** generated **60 Python Flask and Node.js Express files**, analyzed using **Bandit 1.9.4** and **Semgrep 1.155.0**.

---

## Key Findings

- **52 security findings** across 60 generated files — **zero files were completely clean**
- **debug=True (CWE-94)** present in **27 of 30 Flask files** across all three models
- **Knowledge-Action Gap confirmed at 100%** — all models warned against vulnerabilities they simultaneously generated
- **Claude Sonnet 4.6** achieved the lowest vulnerability density: **1.89 I/100L** (baseline) → **0.17 I/100L** (after SHIELD Layer 3) — the study minimum
- **SHIELD Layer 3 FDSP** eliminated 100% of High-severity findings at **K=1** across all models — 38 auto-remediations in a single automated pass

### I/100L Vulnerability Density — All Conditions

| Condition | GPT-5 mini | Claude S.4.6 | Gemini 3 Flash | Average |
|-----------|-----------|-------------|---------------|---------|
| Without SHIELD | 4.96 | 1.89 | 6.00 | 4.28 |
| With SHIELD Layer 1 | 1.44 | 1.02 | 1.02 | 1.16 |
| With SHIELD Layer 3 | 1.16 | **0.17** ⭐ | 1.55 | 0.96 |
| Total Reduction | 77% | **91%** | 74% | 77% avg |

> **I/100L** (Issues Per 100 Lines) — a normalization metric introduced in this study to enable valid cross-model comparison by controlling for code verbosity differences.

---

## SHIELD Framework

**Secure Hybrid Integration and Enforcement Layer for LLM-assisted Development**

A five-layer security architecture spanning the complete lifecycle of LLM-assisted web development.

```
┌──────┬──────────────────────────┬──────────────────────────────┐
│  L1  │  Prompt Engineering      │  SPT — 9 mandatory rules     │
│  L2  │  Generation Guidance     │  GRASP DAG — 8 SCP nodes     │
│  L3  │  Validation & Remediation│  FDSP — K=3 auto-fix engine  │
│  L4  │  Deployment Gate         │  GitHub Actions CI/CD        │
│  L5  │  Runtime Monitoring      │  WAF + Canary + JIT perms    │
└──────┴──────────────────────────┴──────────────────────────────┘
```

| Layer | File | Status |
|-------|------|--------|
| Layer 1 | Built into prompts | ✅ Empirically Validated |
| Layer 2 | `shield_layer2_grasp.py` | ✅ Implemented |
| Layer 3 | `shield_layer3_fdsp.py` | ✅ Empirically Validated |
| Layer 4 | `.github/workflows/shield_layer4_gate.yml` | ✅ Live on this repo |
| Layer 5 | `shield_layer5_runtime.py` | ✅ Implemented |

---

## Repository Structure

```
llm-security-research/
│
├── .github/workflows/
│   └── shield_layer4_gate.yml        # SHIELD Layer 4 — CI/CD gate (LIVE)
│
├── snippets/                          # Generated code corpora
│   ├── chatgpt/                       # GPT-5 mini without SHIELD
│   ├── claude/                        # Claude S.4.6 without SHIELD
│   ├── gemini/                        # Gemini 3 Flash without SHIELD
│   ├── chatgpt_shield/                # GPT-5 mini with SHIELD Layer 1
│   ├── claude_shield/                 # Claude S.4.6 with SHIELD Layer 1
│   └── gemini_shield/                 # Gemini 3 Flash with SHIELD Layer 1
│
├── results/layer3_fdsp/
│   └── fdsp_master_results.json       # Layer 3 validation raw results
│
├── shield_layer2_grasp.py             # SHIELD Layer 2 — GRASP DAG
├── shield_layer3_fdsp.py              # SHIELD Layer 3 — FDSP engine
├── shield_layer5_runtime.py           # SHIELD Layer 5 — runtime monitoring
│
├── LLM_Security_Research_In_Web_Applications # Full research paper
├── LLM_Security_Research_FINAL.xlsx     # Results dashboard
└── README.md
```

---

## Running the SHIELD Tools
## Installation

```bash
git clone https://github.com/Aliidrees1234/llm-security-research
cd llm-security-research
pip install -e .
```

## Usage

```bash
# Scan a project
shield scan ./my_project

# Scan with HTML report
shield scan ./my_project --report report.html

# Scan quietly (High/Medium only)
shield scan ./my_project --quiet

# Auto-fix vulnerabilities
shield fix ./my_project --output ./fixed --k 3

# Fix with report
shield fix ./my_project --output ./fixed --report after.html

# Compare before and after
shield compare before.json after.json --report comparison.html

# Initialize config file
shield init

# Show version
shield version
```

## Run Tests

```bash
pip install pytest
pytest tests/ -v
# 91 tests passing
```
-----

### Layer 2 — GRASP DAG (Prompt Enhancer)

Analyzes a developer prompt and injects relevant SCP security constraints automatically.

```python
from shield_layer2_grasp import GRASPEngine

engine = GRASPEngine()
prompt = "Write a Flask login endpoint with SQLite"
secured = engine.build_secured_prompt(prompt)
# Send secured to your LLM instead of the raw prompt
```

### Layer 3 — FDSP Auto-Remediation

```bash
pip install bandit

python shield_layer3_fdsp.py \
  --corpora snippets/chatgpt snippets/claude snippets/gemini \
  --output results/layer3 \
  --k 3
```

**Actual results from this study (April 6, 2026):**

```
Corpus          K0 High   K1 High   K0 I/100L   K1 I/100L   Reduction
chatgpt/           11         0        3.30         1.16        100%
claude/            11         0        1.50         0.17        100%
gemini/             9         0        4.42         1.55        100%
```

### Layer 5 — Runtime Monitoring

```python
from flask import Flask
from shield_layer5_runtime import ShieldMiddleware

app = Flask(__name__)
shield = ShieldMiddleware(app, block_injections=True)
# Now protects against prompt injection, canary detection, JIT permissions
```

---

## Research Contributions

### 1. Knowledge-Action Gap (Definition 2.1)
The first formal definition of the phenomenon where LLMs warn against vulnerabilities they simultaneously generate:

> *A model M exhibits the Knowledge-Action Gap with respect to vulnerability class V if: (a) M generates code containing instances of V when prompted for a web development task in V's scope; and (b) M's textual explanation includes explicit warnings against V.*

Confirmed at **100% rate** across all three models for `debug=True`.

### 2. I/100L Normalization Metric
Introduced to eliminate the verbosity confound in cross-model security comparison:

```
I/100L = (Total Issues ÷ Lines of Code) × 100
```

Without I/100L: GPT (17 issues) ≈ Claude (18 issues) — appears equal.
With I/100L: GPT (4.96) vs Claude (1.89) — reveals **2.6× density difference**.

### 3. FDSP K=1 Convergence
The engine was designed for up to K=3 iterations. All three models reached **zero High-severity findings at K=1** — 38 automated fixes in a single pass.

### 4. Temporal Audit
Seven major LLM security citations were mis-dated (2026 instead of 2025). Corrected dates documented in Chapter 2. The year 2026 represents **industrial adoption**, not discovery.

---

## Prompt Catalog

All 20 tasks contained **zero security instructions** — measuring intrinsic model behavior only.

| ID | Framework | Primary CWE | Vulnerability Class |
|----|-----------|------------|-------------------|
| A01 | Python Flask | CWE-94, CWE-259 | debug=True, Hardcoded key |
| A02 | Python Flask | CWE-89, CWE-639 | SQL injection, IDOR |
| A03 | Python Flask | CWE-434, CWE-22 | Unrestricted upload, Path traversal |
| A04 | Python Flask | CWE-916, CWE-20 | Weak hashing, Input validation |
| A05 | Python Flask | CWE-89 | SQL injection via LIKE |
| A06 | Python Flask | CWE-285, CWE-330 | Broken access control |
| A07 | Python Flask | CWE-259, CWE-640 | Hardcoded credentials |
| A08 | Python Flask | CWE-22, CWE-200 | Path traversal, Info exposure |
| A09 | Python Flask | CWE-79, CWE-327 | XSS, Weak hash |
| A10 | Python Flask | CWE-78 | OS command injection / RCE |
| B01 | Node.js Express | CWE-345 | JWT algorithm confusion |
| B02 | Node.js Express | CWE-943 | NoSQL injection |
| B03 | Node.js Express | CWE-22 | Path traversal |
| B04 | Node.js Express | CWE-916 | Weak hashing |
| B05 | Node.js Express | CWE-89/943 | Injection via filter |
| B06 | Node.js Express | CWE-79 | Reflected XSS |
| B07 | Node.js Express | CWE-78 | OS command injection |
| B08 | Node.js Express | CWE-620 | Unverified password change |
| B09 | Node.js Express | CWE-918 | SSRF |
| B10 | Node.js Express | CWE-79 | Stored XSS |

---

## Tools & Scan Dates

| Tool | Version | Scope | Scan Date |
|------|---------|-------|-----------|
| Bandit | 1.9.4 | Python Flask (all 6 corpora) | 22 March 2026 |
| Semgrep | 1.155.0 | Node.js Express (all 6 corpora) | 25 March 2026 |
| FDSP Engine | 1.0 | Python (3 without-SHIELD corpora) | 6 April 2026 |

---
## Changelog

### v0.5.0
- `.shieldrc` config file — project-level configuration
- `shield init` — auto-generates config with project detection
- `--quiet` flag — hides Low findings for cleaner output
- Scan timing — shows how long each scan step took
- MIT License — open for contributions
- PyPI ready — `pip install shield-security`

### v0.4.0
- HTML scan reports with security grade
- JSON export for programmatic use
- `shield compare` — before/after comparison report

### v0.3.0
- Full AST engine — all 11 rules converted
- 89 tests passing

### v0.2.0
- AST engine for B201, B602, B105
- Hybrid fallback system

### v0.1.0
- CLI tool — shield scan, shield fix, shield version
- 41 tests

---

## Citation

```bibtex
@misc{idrees2026llmsecurity,
  author    = {Idrees, Ali Yasin},
  title     = {LLM Security in Web Applications: Emerging Threats,
               Vulnerability Taxonomies, and Mitigation Frameworks},
  year      = {2026},
  publisher = {GitHub},
  url       = {https://github.com/Aliidrees1234/llm-security-research}
}
```

---

## Author

**Ali Yasin Idrees** — Independent Researcher, 2026

---

*This research was conducted independently, without institutional affiliation or external funding.*
*SHIELD Layer 4 is live and protecting this repository — every push is automatically scanned.*
