Beyond "Dumb" Chatbots: Building a Stateful Security Gatekeeper with LangGraph

Posted on May 14, 2026

The industry is currently obsessed with “prompt engineering.” But if you’re building a security gatekeeper for your project dependencies, a better prompt won’t save you. The problem isn’t that LLMs are “dumb”; it’s that they are stateless. Asking a standard LLM to audit a package.json is like asking a genius lawyer to review a contract while they are suffering from short-term memory loss.

If you want AI to actually gatekeep your dependencies, you don’t need another chat window. You need a stateful agentic workflow.

In this post, I’ll break down the architecture of an automated auditor that combines LangGraph, Qdrant, and Hybrid Logic to move beyond simple chatbots and into production-ready security engineering.

The Problem: Stateless Hallucinations

When you send a list of dependencies to a standard LLM, it tries to do everything at once: fetch data, remember policies, and make a judgment. In a security context, “helpfulness” is a vulnerability. An LLM might hallucinate a “safe” verdict for a package with a friendly MIT license while completely ignoring five critical CVEs simply because it wasn’t “looking” at the vulnerability data at that exact moment in the context window.

We needed a system that follows a rigorous, multi-step process—a Nervous System for AI.

Orchestration Over Prompting: The LangGraph Approach

By using LangGraph, we treat the security audit as a directed graph. Each step is a specialized node with an isolated responsibility. Most importantly, it introduces State—a shared memory that persists across the entire audit.

Here is how we define the “chain of command” in code:

# app/agents/graph.py
from langgraph.graph import StateGraph, END
from app.agents.nodes import scout_node, lawyer_node, architect_node

# We define a shared State that flows through the nodes
class AgentState(TypedDict):
    raw_package_list: list
    analyzed_dependencies: list
    current_step: str
    errors: list

workflow = StateGraph(AgentState)

# Each node is a specialized specialist
workflow.add_node("scout", scout_node)      # Data gathering (NPM/OSV/GitHub)
workflow.add_node("lawyer", lawyer_node)     # Policy enforcement (RAG)
workflow.add_node("architect", architect_node) # Decision cleanup & Alternatives

# Set the entry point and edges
workflow.set_entry_point("scout")
workflow.add_edge("scout", "lawyer")
workflow.add_edge("lawyer", "architect")
workflow.add_edge("architect", END)

app_graph = workflow.compile()

Hybrid Intelligence: When Python Must Silence the AI

One of the biggest lessons in AI engineering is knowing when not to use AI. We don’t trust an LLM to count vulnerabilities. We use Hybrid Logic: Python handles the deterministic, “hard” rules, while the LLM (DeepSeek-R1) handles the semantic, “soft” reasoning of license texts.

In our “Lawyer” node, the code enforces the policy before the AI even gets a chance to argue:

# app/agents/nodes/lawyer.py
async def lawyer_node(state: AgentState):
    issues = state["analyzed_dependencies"]
    
    for issue in issues:
        # 1. Hard Logic: Python enforces deterministic limits
        vuln_count = len(issue['vulnerabilities'])
        is_too_vulnerable = vuln_count > 3 # Company policy threshold
        
        # 2. Soft Logic: AI interprets legal nuance via RAG
        # We query Qdrant for specific company license policies
        policy_context = await search_policy("License compliance rules")
        
        prompt = f"Analyze this package: {issue['package_name']}. Logic: {policy_context}"
        res = await llm.ainvoke(prompt)
        
        # 3. Final Verdict: Python has the final say
        if is_too_vulnerable or "FORBIDDEN" in res.content.upper():
            issue['verdict'] = "FORBIDDEN"
        else:
            issue['verdict'] = "SAFE"
            
    return {"analyzed_dependencies": issues}

Deep Scanning: Beyond the API Label

Registry APIs (like NPM) often provide incomplete license data. A package might be labeled “BSD,” but the actual LICENSE file on GitHub could contain custom “copyleft” clauses.

Our Scout node doesn’t just trust the metadata. It performs a Deep Scan: if the license is ambiguous, it hits the GitHub API to pull the raw text. This ensures the LLM is making decisions based on legal evidence, not just a three-letter tag.

Production-Ready Output

Finally, the Architect node cleans the state. It removes massive license texts (which were only needed for analysis) and generates a clean, actionable JSON report. If a package is forbidden, it uses a faster model (Llama 3.1) to suggest compatible alternatives.

# app/agents/nodes/architect.py
async def architect_node(state: AgentState):
    for issue in state["analyzed_dependencies"]:
        # Remove bloat for the final JSON response
        if "license_text" in issue:
            del issue["license_text"]
            
        if issue['verdict'] == "FORBIDDEN":
            # Direct recommendation for the dev
            reco = await llm.ainvoke(f"Suggest safe NPM alternative for {issue['package_name']}")
            issue['reasoning'] += f"\nRECO: {reco.content}"
            
    return {"analyzed_dependencies": state["analyzed_dependencies"]}

The Verdict

Building security tools with AI isn’t about the model you choose; it’s about the architecture you build around it. By moving away from “chatting” and toward a stateful agentic graph, we’ve created a system that gathered proof, checked the manual (RAG), and enforced rules (Hybrid Logic).

If you are still just “prompting,” you are building toys. If you want to build tools, start mapping your graphs.

What’s Next?

Structure is only half the battle. Your security audit is only as good as the data feeding it, and as it turns out, registry APIs are notoriously unreliable when it comes to legal compliance. In the next post, we’ll dive into the implementation of the Scout node—specifically how to build a GitHub-based deep scanner to fetch raw legal evidence when NPM metadata lies.

Stay tuned.