Amit Kothari CEO of Tallyfy, AI advisor at Blue Sheen

Your employees are generating AI code nobody governs - here is what to build

In brief

45% of AI-generated code contains security flaws. Most of it lives on employee laptops with zero scanning, logging, or continuity planning. Here is a three-pillar blueprint that costs a fraction of the reactive support it replaces.

Amit Kothari Follow 10k+

Apr 7, 2026 · Updated Jun 12, 2026 · AI

CEO of Tallyfy · AI advisor at Blue Sheen for mid-size companies

Your employees are generating AI code nobody governs - here is what to build

A salesperson at a mid-size company runs three Python scripts every morning before his first coffee. They pull prospects from a CRM, score them against engagement data, and push the day’s outbound list into a spreadsheet that feeds his email sequencing tool. Claude Code built every line. The scripts work brilliantly. IT has no idea any of it exists.

Now think about what happens when his laptop is closed. When it’s asleep on a plane. When he takes a new job next quarter and walks out with his credentials, his scripts, and the institutional knowledge of how those campaigns actually ran. Everything stops. Nobody gets an alert. The outbound pipeline just… goes quiet. And some poor operations manager spends two weeks figuring out why lead volume dropped off a cliff before anyone even thinks to check if a script was involved.

That’s one person.

In advisory work with mid-size companies, the pattern is always the same. It’s never just one script. With 40 salespeople potentially doing similar things, plus marketing building their own automations, plus finance cobbling together reporting tools, plus HR generating onboarding checklists - what starts as one clever hack on one laptop becomes an invisible code footprint spread across the entire company. Georgetown CSET published a report on precisely this class of risk, and the conclusion is blunt: organizations don’t know what they’re running, and they can’t secure what they can’t see.

What three problems hit fast without governance?

Wait, before I go further, it’s worth saying that this is not a hypothetical. The thing is, most companies don’t notice the problem until something breaks. By then, the debt has been compounding for months. Here’s the failure cascade that plays out over and over.

Failure cascade from employee laptop offline through stopped jobs to IT receiving blind support tickets

No visibility into what exists. IT can’t protect what it doesn’t know about. Nobody has a registry of which scripts exist, what data they touch, what libraries they depend on. If a Python package gets a zero-day vulnerability tomorrow, there’s no way to find which employee laptops are running it. And the vulnerability surface is alarming. Veracode’s research found that 45% of AI-generated code contains security flaws. Apiiro found that as AI coding assistants accelerated output, privilege escalation paths jumped 322% and architectural design flaws spiked 153%. Even worse, about 20% of AI-suggested package names point to libraries that don’t exist. Attackers have caught on. They register those hallucinated package names and plant malicious code inside them. It’s called slopsquatting, and it’s a nightmare to defend against when you don’t even know which packages your people are installing.

Mid-2026 update: the models writing this code keep getting safer at the source, but that does not let you off the hook. When Anthropic shipped Claude Opus 4.8 in May 2026, it said the model is around four times less likely than its predecessor to let flaws in code it has written pass unremarked. Good. It still does not see the other 39 scripts on the other 39 laptops. Model quality reduces the per-script risk; it does nothing about the visibility gap, which is an org problem, not a model problem.

No continuity when people leave or laptops sleep. Scripts on individual machines stop running when the machine stops running. Obvious, right? But companies treat laptop-resident automation like it’s infrastructure. It isn’t. A Beyond Identity survey puts hard numbers on the departure risk: 91% of employees still had access to company files after they were offboarded, and more than a quarter admitted taking financial data on the way out. That salesperson’s scripts contain API keys, CRM credentials, and business logic that walks out the door with him. And even when scripts keep running, the AI models underneath them change. IBM calls it agentic drift - an agent that runs perfectly today can quietly degrade as the models beneath it shift. The script that worked perfectly in January is silently producing worse results by July. Nobody notices because nobody’s watching.

No support path when things break. Here’s a scenario that’s basically a weekly occurrence at companies with unmanaged AI code. A user submits a support ticket: “my script doesn’t work.” IT opens the ticket, stares at it, and has absolutely nothing to go on. They never had access to the script, don’t know what it does, can’t see error logs, don’t know what API it calls or what data it processes. How exactly should they troubleshoot that? IBM reports that employee use of generative AI tools jumped from 74% to 96% in a single year, the vast majority of it ungoverned. So IT is now responsible for supporting a codebase they’ve never seen, written in languages they may not know, running on machines they don’t fully control. That’s not a support model. That’s a proper mess.

The three pillars you actually need

Here is the tricky part. The fix is not technically hard, but most teams under-build it anyway. Enough about the problem. Here’s the architecture that works, based on real advisory engagements. Everything here is phased and built for companies that don’t have a dedicated platform engineering team. The whole thing runs for a fraction of what a single support hire costs.

Three governance pillars showing code repository with security scanning, scheduled jobs, and managed Python runtime

Pillar 1: Centralized code repository. Set up a GitHub organization where ALL user-generated code lives. Not on laptops. Not in personal Dropbox folders. Not emailed around as zip files. Organize repositories by use case, not by person - marketing-prospecting-agent/, finance-automation/, ops-reporting/. This way, when someone leaves, the code stays. Each repo gets a standard structure: scripts/, config/, docs/, tests/, .github/, and a CLAUDE.md file.

Why CLAUDE.md specifically? Because it functions as an organizational coding standard that the AI reads and follows automatically. In building Tallyfy, CLAUDE.md became the single most effective way to enforce rules across dozens of repositories. It tells Claude Code which tools and packages to use, what data can and cannot be accessed, and which code patterns are required. Every repo gets one. No exceptions.

GitHub’s Code Security and Secret Protection add-ons (the two products the old GitHub Advanced Security bundle split into) give you three automated scanning tools that run on every commit. CodeQL performs data-flow analysis - not text pattern matching - to find SQL injection, hardcoded credentials, and weak cryptography. Dependabot monitors every package dependency and auto-creates pull requests when vulnerable versions are detected. Secret scanning catches API keys using over 200 token patterns before they hit the repository. Use CODEOWNERS files so every merge requires sign-off from both a business stakeholder and a technical reviewer. For secrets management, GitHub Secrets handles it immediately; Azure Key Vault is the medium-term move.

Pillar 2: Centralized scheduled jobs. This is the pillar that eliminates the “laptop sleeping on a plane” problem. Move every scheduled script off employee machines and into cloud-based execution.

Here’s how the options compare:

Platform	Cost	Best for	Limitation
Windows Task Scheduler	Free (current state)	Nothing - avoid	Dies with laptop, zero logging, no alerting
GitHub Actions	Low	First migration target	Team plan includes free monthly minutes, covers most use cases
Azure Functions	Minimal	Mission-critical jobs	Slightly more setup, better monitoring
Claude Cloud Tasks	Included with plan	Simpler recurring tasks	Minimum 1-hour interval

A typical GitHub Actions workflow for a daily Python job looks like this:

name: Daily Prospect Scoring
on:
  schedule:
    - cron: '0 7 * * 1-5' # 7 AM weekdays
  workflow_dispatch: # Manual trigger button

jobs:
  score-prospects:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install -r requirements.txt
      - run: python scripts/score_prospects.py
        env:
          CRM_API_KEY: ${{ secrets.CRM_API_KEY }}
          SMTP_PASSWORD: ${{ secrets.SMTP_PASSWORD }}

The wins are sort of staggering once you list them out. Said better, because “staggering” is hand-wavey. The wins are concrete: the job runs on cloud servers, so laptop state is irrelevant. Full execution logs are captured automatically. Automatic retry on failure. Email or Slack alerts when something goes wrong. A complete audit trail for compliance. And critically, when someone leaves the company, the job keeps running. Someone else takes ownership of the repo and that’s it. I’ve written about non-interactive Claude Code patterns that use this exact approach for scheduled automation, and the same architecture applies here.

Pillar 3: Managed runtime. Deploy Python via Intune or whatever MDM you’re already using. Consistent version, IT-approved, no admin rights needed to install packages. Standardize on uv over pip - it is the package manager the Claude Code docs feature. uv auto-manages virtual environments, resolves dependencies faster, and eliminates the “works on my machine” class of problems. Medium-term, set up a private PyPI mirror using JFrog Artifactory or the free Sonatype Nexus, so IT pre-vets every package and employee machines install from the internal repository only. This kills the slopsquatting attack vector dead, because hallucinated packages never make it into your internal registry.

Microsoft recently open-sourced their Agent Governance Toolkit for exactly this class of runtime security problems. Worth watching as the tooling matures.

Not all code needs the same review

This might sound counterintuitive, but the goal is not maximum review. One of the biggest mistakes in AI code governance is treating every script the same way. A Python script that generates a PDF report from internal data does not need the same review process as a script that writes records into Salesforce. A risk-based tiering system prevents the governance from becoming so painful that people route around it, which is the whole shadow AI problem restated. If the approved path is annoying, people go back to their laptops. The shadow AI prevention problem is fundamentally a supply problem, not a policy problem.

Risk-based code review tiers from critical production access through standard to low-risk personal productivity

Critical tier - anything that writes to production systems like CRM, ERP, financial databases, or customer data. Full code review by the dev team. Security scan approval required. IT sign-off. Monitoring and alerting configured before it goes live. Examples: Salesforce integration scripts, CRM API writes, financial reporting that feeds downstream systems.

Standard tier - code that reads from systems or processes internal data. Code review plus automated scanning through CodeQL and Dependabot. Examples: reporting dashboards, data analysis scripts, internal metrics aggregation.

Low-risk tier - personal productivity scripts that don’t touch external systems at all. Standard naming conventions, standard repo structure, automated scanning only. Examples: Excel file processing, internal document generation, local file format conversions.

Regardless of tier, every piece of AI-generated code must follow five non-negotiable rules. First, it gets committed to centralized GitHub - not kept on personal machines. Second, it passes automated security scanning before merge. Third, secrets are managed through GitHub Secrets or Key Vault - never hardcoded, never in config files, never in environment variables on laptops. Fourth, it includes a README explaining what it does and how to run it. Fifth, it gets reviewed by at least one other person before production use. Lawfare published a sharp analysis of the legal liability dimension here - when AI-generated code causes a breach, the question of who’s responsible gets messy fast. Documentation and review trails aren’t just good hygiene; they’re legal protection.

The operating model that makes this work is a lightweight Center of Excellence. IT and dev handle technical review and repo management. Business liaisons in each department own use case approval - they know whether a particular automation actually solves a real problem or is bikeshedding. Advisory input covers architecture guidance and best practices. This maps to the same LLMOps discipline patterns you’d use for any AI system in production: monitoring, versioning, human review gates.

The OWASP Top 10 for LLM Applications provides a solid checklist for what your security scanning should catch. And if you’re in regulated industries, the EU AI Act Article 26 requirements around audit trails and human oversight apply to exactly this kind of employee-generated AI code.

What it costs and how fast you can move

The more I look at it, the more I think the cost question is a red herring. The timeline surprises people. This isn’t a twelve-month project. It’s six months, phased, with value from month one.

Months 1-2 (Foundation). Set up the GitHub organization. Create the first repository by migrating that salesperson’s scripts - the same ones from the opening scenario. Deploy Python via MDM across managed devices. Enable GitHub Advanced Security on the first repo. Move API keys from text files and .env files into GitHub Secrets. Migrate the first scheduled tasks from laptop-based execution to GitHub Actions. This phase alone eliminates the single biggest risk: the bus factor of one person’s laptop being the only place critical automation lives.

Months 3-4 (Scale). Set up the private PyPI repository so packages come from a vetted internal source. Configure Azure Functions for the two or three jobs that are mission-critical and need better monitoring than GitHub Actions provides. Stand up an Application Insights dashboard so IT can see execution history, failure rates, and performance trends across all automated jobs. Create the second and third repos as new use cases emerge from other departments. Establish CODEOWNERS files with dual business-technical review requirements.

Months 5-6 (Mature). Run the first security audit across all user-generated code repositories. Refine the risk-based tier system now that you have real data on what people are actually building. Train power users on Git basics - commit, push, pull request. It’s not hard. It just hasn’t been taught. Establish a quarterly governance review cadence. Document SLAs for IT code support so the support team knows what response time each tier gets. When consulting with companies at this stage, the most common feedback is that it’s less work than they expected, because the automated scanning catches 80% of issues before a human ever looks.

The budget breakdown:

Item	Relative cost
GitHub Team plan (per user)	Modest base subscription
GitHub Code Security + Secret Protection (per active committer)	The bulk of the spend
Azure Functions	Negligible
Application Insights	Small
Private PyPI mirror (optional)	Small
Total	A fraction of one support FTE

Compare that against the cost of not doing it. An estimated 0.5 to 1 FTE of reactive IT support - people troubleshooting scripts they’ve never seen, reverse-engineering automations after someone leaves, manually checking code for hardcoded API keys - costs several times more than the governance stack. And that’s before you factor in the cost of a single security incident from unscanned code containing a known vulnerability. The enterprise cost comparison for AI coding tools is a separate analysis, but the governance layer described here applies regardless of which tool your people use.

The ROI math is a no-brainer. Spend a fraction of one support hire to save several times that in reactive support costs, prevent unknown security exposure, and ensure business continuity when employees travel or leave. That doesn’t even count the compliance value, having an audit trail that shows what code exists, who reviewed it, and when it was scanned is the kind of evidence regulators want to see.

I called this a no-brainer just above. That oversimplifies it. The math is plain; the politics are not. Getting IT, dev, and the business stakeholder to all sign off on a new spend, even a small one, and then asking salespeople and marketers to actually move their scripts off their own machines, is the part that takes leadership air cover. The technical plan is the easy half. Buy-in is the work.

I built Tallyfy because I kept seeing this exact failure mode in operations work: knowledge living inside one person’s head, with no version control, no successor, no audit trail. AI-generated code on a salesperson’s laptop is that pattern wearing a hoodie.

The first employee running AI code on their laptop is user number one. Everything in this blueprint starts with that single use case, one person, one repo, one scheduled job migrated off a laptop, and scales to the whole company. Every month you wait, the debt compounds. Not linearly. Exponentially. Because every new hire who discovers Claude Code or Cursor or Copilot starts generating code on day one. And if there’s nowhere approved for that code to go, it stays on their laptop, unscanned, unmonitored, invisible. Until it isn’t.

A bus factor of one, dressed up as productivity. That’s what unmanaged AI code is. Fix it now, while it’s still one person.

ai-governancecode-securityshadow-aienterprise-aidevopsclaude-code

About the Author

Amit Kothari is an experienced consultant, advisor, coach, and educator specializing in AI and operations for executives and their companies. With 25+ years of experience, he is the Co-Founder & CEO of Tallyfy® (raised $3.6m, the Workflow Made Easy® platform) and Partner at Blue Sheen, an AI advisory firm for mid-size companies. He helps companies identify, plan, and implement practical AI solutions that actually work. Originally British and now based in St. Louis, MO, Amit combines deep technical expertise with real-world business understanding. Read Amit's full bio →

Disclaimer: The content in this article represents personal opinions based on extensive research and practical experience. While every effort has been made to ensure accuracy through data analysis and source verification, this should not be considered professional advice. Always consult with qualified professionals for decisions specific to your situation.

Contact me More about me

View All Posts »

Your locked-down Claude sandbox is a holding pattern, not a destination

Giving everyone Claude inside an isolated VM, no sensitive data allowed, feels like the safe way to start. It is a fine way to start. The trouble is what happens when you leave people there: the leak it was built to stop walks out by copy-paste anyway, the friction recruits the shadow AI you were trying to prevent, and the value never compounds because nothing in an ephemeral box survives the session. A sandbox is a scaffold. Scaffolds come down.

An MCP server is unreviewed code with your file system in scope

Treat every MCP server as untrusted code that runs with the access your agent has, because that is what it is. Anthropic docs say the directory lists connectors but does not security-audit them. A registry of approved servers with nothing enforcing it is a memo. The control that binds is a managed allowlist matched by URL or command, never by name.

Your Claude Code deny rules are not a security boundary

Before you hand Claude Code to hundreds of people you add deny rules for .env and credentials and feel locked down. You are not. Those rules govern Claude own tools, not a Python one-liner that opens the same file, and the control that actually holds, the OS sandbox, reads your whole machine by default and fails open when it cannot start. The baseline worth setting is real. Its dangerous gaps are the defaults you never changed.

Blocking the personal Claude account is an identity problem, not a network one

Your CISO trusts the control posture Microsoft gives Copilot. To get Claude to the same bar, do not reach for tenant restrictions: that header only fires on your network, so it is theater the moment a laptop goes off-VPN. The control that holds lives at identity. Enforce SSO, then claim your domain, and know that the claim is a one-way door.

Claude Code behind a TLS-inspecting proxy: configure the tool, not the proxy

Locked-down shops reach for a proxy exception to make Claude Code connect. Wrong move, and it fails anyway. Claude Code does not pin certificates, so it works through full TLS inspection once you teach it to trust your corporate root CA. The fix is a couple of environment variables and an egress allowlist, not a hole in the proxy.

You are at phase zero, and the deck you were sold starts at phase three

Every enterprise AI maturity model starts a rung above where most companies stand and skips the one that holds the rest up: getting the tool safely into people hands. Your team already has Claude. If IT cannot produce the tenant policy, the egress allowlist, the tool allowlist, and the audit log, you are at phase zero, whatever the deck says.