Your employees are generating AI code nobody governs - here is what to build

A salesperson at a mid-size company runs three Python scripts every morning before his first coffee. They pull prospects from a CRM, score them against engagement data, and push the day’s outbound list into a spreadsheet that feeds his email sequencing tool. Claude Code built every line. The scripts work brilliantly. IT has no idea any of it exists.

Now think about what happens when his laptop is closed. When it’s asleep on a plane. When he takes a new job next quarter and walks out with his credentials, his scripts, and the institutional knowledge of how those campaigns actually ran. Everything stops. Nobody gets an alert. The outbound pipeline just… goes quiet. And some poor operations manager spends two weeks figuring out why lead volume dropped off a cliff before anyone even thinks to check if a script was involved.

That’s one person.

In advisory work with mid-size companies, the pattern is always the same. It’s never just one script. With 40 salespeople potentially doing similar things, plus marketing building their own automations, plus finance cobbling together reporting tools, plus HR generating onboarding checklists - what starts as one clever hack on one laptop becomes an invisible code footprint spread across the entire company. Georgetown CSET published a report on precisely this class of risk, and the conclusion is blunt: organizations don’t know what they’re running, and they can’t secure what they can’t see.

Three problems that hit fast without governance

The thing is, most companies don’t notice the problem until something breaks. By then, the debt has been compounding for months. Here’s the failure cascade that plays out over and over.

Failure cascade from employee laptop offline through stopped jobs to IT receiving blind support tickets

No visibility into what exists. IT can’t protect what it doesn’t know about. Nobody has a registry of which scripts exist, what data they touch, what libraries they depend on. If a Python package gets a zero-day vulnerability tomorrow, there’s no way to find which employee laptops are running it. And the vulnerability surface is genuinely alarming. Veracode’s research found that AI-generated code contains 2.74x more security flaws than human-written code. Apiiro found that CVSS 7.0+ vulnerabilities - the serious ones - appear 2.5x more often in AI-generated code. Even worse, about 20% of AI-suggested package names point to libraries that don’t exist. Attackers have caught on. They register those hallucinated package names and plant malicious code inside them. It’s called slopsquatting, and it’s a nightmare to defend against when you don’t even know which packages your people are installing.

No continuity when people leave or laptops sleep. Scripts on individual machines stop running when the machine stops running. Obvious, right? But companies treat laptop-resident automation like it’s infrastructure. It isn’t. Cyberhaven’s research puts hard numbers on the departure risk: 88% of IT workers would take sensitive data when fired, and 50% of ex-employees can still access corporate apps after they leave. That salesperson’s scripts contain API keys, CRM credentials, and business logic that walks out the door with him. And even when scripts keep running, the AI models underneath them change. IBM calls it agentic drift - model performance degrades 20-30% within six months without monitoring. The script that worked perfectly in January is silently producing worse results by July. Nobody notices because nobody’s watching.

No support path when things break. Here’s a scenario that’s basically a weekly occurrence at companies with unmanaged AI code. A user submits a support ticket: “my script doesn’t work.” IT opens the ticket, stares at it, and has absolutely nothing to go on. They never had access to the script, don’t know what it does, can’t see error logs, don’t know what API it calls or what data it processes. How exactly should they troubleshoot that? IBM estimates that 90% of enterprise AI usage is shadow AI. A separate finding puts the number of employees using AI tools at 96%. So IT is now responsible for supporting a codebase they’ve never seen, written in languages they may not know, running on machines they don’t fully control. That’s not a support model. That’s a proper mess.

The three pillars you actually need

Enough about the problem. Here’s the architecture that works, based on real advisory engagements. Everything here is costed, phased, and built for companies that don’t have a dedicated platform engineering team. The whole thing runs about $12,600 a year.

Three governance pillars showing code repository with security scanning, scheduled jobs, and managed Python runtime

Pillar 1: Centralized code repository. Set up a GitHub organization where ALL user-generated code lives. Not on laptops. Not in personal Dropbox folders. Not emailed around as zip files. Organize repositories by use case, not by person - marketing-prospecting-agent/, finance-automation/, ops-reporting/. This way, when someone leaves, the code stays. Each repo gets a standard structure: scripts/, config/, docs/, tests/, .github/, and a CLAUDE.md file.

Why CLAUDE.md specifically? Because it functions as an organizational coding standard that the AI reads and follows automatically. In building Tallyfy, CLAUDE.md became the single most effective way to enforce rules across dozens of repositories. It tells Claude Code which tools and packages to use, what data can and cannot be accessed, and which code patterns are required. Every repo gets one. No exceptions.

GitHub Advanced Security gives you three automated scanning tools that run on every commit. CodeQL performs data-flow analysis - not text pattern matching - to find SQL injection, hardcoded credentials, and weak cryptography. Dependabot monitors every package dependency and auto-creates pull requests when vulnerable versions are detected. Secret scanning catches API keys from over 200 services before they hit the repository. Use CODEOWNERS files so every merge requires sign-off from both a business stakeholder and a technical reviewer. For secrets management, GitHub Secrets handles it immediately; Azure Key Vault is the medium-term move.

Pillar 2: Centralized scheduled jobs. This is the pillar that eliminates the “laptop sleeping on a plane” problem entirely. Move every scheduled script off employee machines and into cloud-based execution.

Here’s how the options compare:

Platform	Cost	Best for	Limitation
Windows Task Scheduler	Free (current state)	Nothing - avoid entirely	Dies with laptop, zero logging, no alerting
GitHub Actions	~$15-30/month	First migration target	50,000 free minutes included, covers most use cases
Azure Functions	~$1-5/month	Mission-critical jobs	Slightly more setup, better monitoring
Claude Cloud Tasks	Included with plan	Simpler recurring tasks	Minimum 1-hour interval

A typical GitHub Actions workflow for a daily Python job looks like this:

name: Daily Prospect Scoring
on:
  schedule:
    - cron: '0 7 * * 1-5'  # 7 AM weekdays
  workflow_dispatch:         # Manual trigger button

jobs:
  score-prospects:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install -r requirements.txt
      - run: python scripts/score_prospects.py
        env:
          CRM_API_KEY: ${{ secrets.CRM_API_KEY }}
          SMTP_PASSWORD: ${{ secrets.SMTP_PASSWORD }}

The wins are sort of staggering once you list them out. The job runs on cloud servers, so laptop state is irrelevant. Full execution logs are captured automatically. Automatic retry on failure. Email or Slack alerts when something goes wrong. A complete audit trail for compliance. And critically - when someone leaves the company, the job keeps running. Someone else takes ownership of the repo and that’s it. I’ve written about non-interactive Claude Code patterns that use this exact approach for scheduled automation, and the same architecture applies here.

Pillar 3: Managed runtime. Deploy Python via Intune or whatever MDM you’re already using. Consistent version, IT-approved, no admin rights needed to install packages. Standardize on uv over pip - Anthropic explicitly recommends this for Claude Code environments. uv auto-manages virtual environments, resolves dependencies faster, and eliminates the “works on my machine” class of problems. Medium-term, set up a private PyPI mirror using JFrog Artifactory (~$50/month) or the free Sonatype Nexus, so IT pre-vets every package and employee machines install from the internal repository only. This kills the slopsquatting attack vector dead, because hallucinated packages never make it into your internal registry.

Microsoft recently open-sourced their Agent Governance Toolkit for exactly this class of runtime security problems. Worth watching as the tooling matures.

Not all code needs the same review

One of the biggest mistakes in AI code governance is treating every script the same way. A Python script that generates a PDF report from internal data does not need the same review process as a script that writes records into Salesforce. A risk-based tiering system prevents the governance from becoming so painful that people route around it - which is the whole shadow AI problem restated. If the approved path is annoying, people go back to their laptops. The shadow AI prevention problem is fundamentally a supply problem, not a policy problem.

Risk-based code review tiers from critical production access through standard to low-risk personal productivity

Critical tier - anything that writes to production systems like CRM, ERP, financial databases, or customer data. Full code review by the dev team. Security scan approval required. IT sign-off. Monitoring and alerting configured before it goes live. Examples: Salesforce integration scripts, CRM API writes, financial reporting that feeds downstream systems.

Standard tier - code that reads from systems or processes internal data. Code review plus automated scanning through CodeQL and Dependabot. Examples: reporting dashboards, data analysis scripts, internal metrics aggregation.

Low-risk tier - personal productivity scripts that don’t touch external systems at all. Standard naming conventions, standard repo structure, automated scanning only. Examples: Excel file processing, internal document generation, local file format conversions.

Regardless of tier, every piece of AI-generated code must follow five non-negotiable rules. First, it gets committed to centralized GitHub - not kept on personal machines. Second, it passes automated security scanning before merge. Third, secrets are managed through GitHub Secrets or Key Vault - never hardcoded, never in config files, never in environment variables on laptops. Fourth, it includes a README explaining what it does and how to run it. Fifth, it gets reviewed by at least one other person before production use. Lawfare published a sharp analysis of the legal liability dimension here - when AI-generated code causes a breach, the question of who’s responsible gets messy fast. Documentation and review trails aren’t just good hygiene; they’re legal protection.

The operating model that makes this work is a lightweight Center of Excellence. IT and dev handle technical review and repo management. Business liaisons in each department own use case approval - they know whether a particular automation actually solves a real problem or is bikeshedding. Advisory input covers architecture guidance and best practices. This maps to the same LLMOps discipline patterns you’d use for any AI system in production: monitoring, versioning, human review gates.

The OWASP Top 10 for LLM Applications provides a solid checklist for what your security scanning should catch. And if you’re in regulated industries, the EU AI Act Article 26 requirements around audit trails and human oversight apply to exactly this kind of employee-generated AI code.

What it costs and how fast you can move

Honestly, the timeline surprises people. This isn’t a twelve-month transformation project. It’s six months, phased, with value from month one.

Months 1-2 (Foundation). Set up the GitHub organization. Create the first repository by migrating that salesperson’s scripts - the same ones from the opening scenario. Deploy Python via MDM across managed devices. Enable GitHub Advanced Security on the first repo. Move API keys from text files and .env files into GitHub Secrets. Migrate the first scheduled tasks from laptop-based execution to GitHub Actions. This phase alone eliminates the single biggest risk: the bus factor of one person’s laptop being the only place critical automation lives.

Months 3-4 (Scale). Set up the private PyPI repository so packages come from a vetted internal source. Configure Azure Functions for the two or three jobs that are genuinely mission-critical and need better monitoring than GitHub Actions provides. Stand up an Application Insights dashboard so IT can see execution history, failure rates, and performance trends across all automated jobs. Create the second and third repos as new use cases emerge from other departments. Establish CODEOWNERS files with dual business-technical review requirements.

Months 5-6 (Mature). Run the first security audit across all user-generated code repositories. Refine the risk-based tier system now that you have real data on what people are actually building. Train power users on Git basics - commit, push, pull request. It’s not hard. It just hasn’t been taught. Establish a quarterly governance review cadence. Document SLAs for IT code support so the support team knows what response time each tier gets. When consulting with companies at this stage, the most common feedback is that it’s less work than they expected, because the automated scanning catches 80% of issues before a human ever looks.

The budget breakdown:

Item	Annual cost
GitHub Team plan (~100 users at $4/user/month)	~$4,800
GitHub Advanced Security (~10 repos at $45/repo/month)	~$5,400
Azure Functions	~$600
Application Insights	~$1,200
Private PyPI mirror (optional)	~$600
Total	~$12,600/year

Compare that against the cost of not doing it. An estimated 0.5 to 1 FTE of reactive IT support - people troubleshooting scripts they’ve never seen, reverse-engineering automations after someone leaves, manually checking code for hardcoded API keys - runs $30,000 to $50,000 a year. And that’s before you factor in the cost of a single security incident from unscanned code containing a known vulnerability. The enterprise cost comparison for AI coding tools is a separate analysis, but the governance layer described here applies regardless of which tool your people use.

The ROI math is a no-brainer. Spend $12,600 to save $30,000-50,000 in reactive support costs, prevent unknown security exposure, and ensure business continuity when employees travel or leave. That doesn’t even count the compliance value - having an audit trail that shows what code exists, who reviewed it, and when it was scanned is the kind of evidence regulators want to see.

The first employee running AI code on their laptop is user number one. Everything in this blueprint starts with that single use case - one person, one repo, one scheduled job migrated off a laptop - and scales to the whole company. Every month you wait, the debt compounds. Not linearly. Exponentially. Because every new hire who discovers Claude Code or Cursor or Copilot starts generating code on day one. And if there’s nowhere approved for that code to go, it stays on their laptop, unscanned, unmonitored, invisible. Until it isn’t.

Three problems that hit fast without governance

The three pillars you actually need

Not all code needs the same review

What it costs and how fast you can move

About the Author