· AI

CEO of Tallyfy · AI advisor at Blue Sheen for mid-size companies

Your employees are generating AI code nobody governs - here is what to build

45% of AI-generated code contains security flaws. Most of it lives on employee laptops with zero scanning, logging, or continuity planning. Here is a three-pillar blueprint that costs a fraction of the reactive support it replaces.

A salesperson at a mid-size company runs three Python scripts every morning before his first coffee. They pull prospects from a CRM, score them against engagement data, and push the day’s outbound list into a spreadsheet that feeds his email sequencing tool. Claude Code built every line. The scripts work brilliantly. IT has no idea any of it exists.

Now think about what happens when his laptop is closed. When it’s asleep on a plane. When he takes a new job next quarter and walks out with his credentials, his scripts, and the institutional knowledge of how those campaigns actually ran. Everything stops. Nobody gets an alert. The outbound pipeline just… goes quiet. And some poor operations manager spends two weeks figuring out why lead volume dropped off a cliff before anyone even thinks to check if a script was involved.

That’s one person.

In advisory work with mid-size companies, the pattern is always the same. It’s never just one script. With 40 salespeople potentially doing similar things, plus marketing building their own automations, plus finance cobbling together reporting tools, plus HR generating onboarding checklists - what starts as one clever hack on one laptop becomes an invisible code footprint spread across the entire company. Georgetown CSET published a report on precisely this class of risk, and the conclusion is blunt: organizations don’t know what they’re running, and they can’t secure what they can’t see.

What three problems hit fast without governance?

Wait, before I go further, it’s worth saying that this is not a hypothetical. The thing is, most companies don’t notice the problem until something breaks. By then, the debt has been compounding for months. Here’s the failure cascade that plays out over and over.

Failure cascade from employee laptop offline through stopped jobs to IT receiving blind support tickets

No visibility into what exists. IT can’t protect what it doesn’t know about. Nobody has a registry of which scripts exist, what data they touch, what libraries they depend on. If a Python package gets a zero-day vulnerability tomorrow, there’s no way to find which employee laptops are running it. And the vulnerability surface is alarming. Veracode’s research found that 45% of AI-generated code contains security flaws. Apiiro found that as AI coding assistants accelerated output, privilege escalation paths jumped 322% and architectural design flaws spiked 153%. Even worse, about 20% of AI-suggested package names point to libraries that don’t exist. Attackers have caught on. They register those hallucinated package names and plant malicious code inside them. It’s called slopsquatting, and it’s a nightmare to defend against when you don’t even know which packages your people are installing.

No continuity when people leave or laptops sleep. Scripts on individual machines stop running when the machine stops running. Obvious, right? But companies treat laptop-resident automation like it’s infrastructure. It isn’t. A Beyond Identity survey puts hard numbers on the departure risk: 91% of employees still had access to company files after they were offboarded, and more than a quarter admitted taking financial data on the way out. That salesperson’s scripts contain API keys, CRM credentials, and business logic that walks out the door with him. And even when scripts keep running, the AI models underneath them change. IBM calls it agentic drift - an agent that runs perfectly today can quietly degrade as the models beneath it shift. The script that worked perfectly in January is silently producing worse results by July. Nobody notices because nobody’s watching.

No support path when things break. Here’s a scenario that’s basically a weekly occurrence at companies with unmanaged AI code. A user submits a support ticket: “my script doesn’t work.” IT opens the ticket, stares at it, and has absolutely nothing to go on. They never had access to the script, don’t know what it does, can’t see error logs, don’t know what API it calls or what data it processes. How exactly should they troubleshoot that? IBM reports that employee use of generative AI tools jumped from 74% to 96% in a single year, the vast majority of it ungoverned. So IT is now responsible for supporting a codebase they’ve never seen, written in languages they may not know, running on machines they don’t fully control. That’s not a support model. That’s a proper mess.

The three pillars you actually need

The tricky part is, because the fix is not technically hard but most teams under-build it anyway. Enough about the problem. Here’s the architecture that works, based on real advisory engagements. Everything here is phased and built for companies that don’t have a dedicated platform engineering team. The whole thing runs for a fraction of what a single support hire costs.

Three governance pillars showing code repository with security scanning, scheduled jobs, and managed Python runtime

Pillar 1: Centralized code repository. Set up a GitHub organization where ALL user-generated code lives. Not on laptops. Not in personal Dropbox folders. Not emailed around as zip files. Organize repositories by use case, not by person - marketing-prospecting-agent/, finance-automation/, ops-reporting/. This way, when someone leaves, the code stays. Each repo gets a standard structure: scripts/, config/, docs/, tests/, .github/, and a CLAUDE.md file.

Why CLAUDE.md specifically? Because it functions as an organizational coding standard that the AI reads and follows automatically. In building Tallyfy, CLAUDE.md became the single most effective way to enforce rules across dozens of repositories. It tells Claude Code which tools and packages to use, what data can and cannot be accessed, and which code patterns are required. Every repo gets one. No exceptions.

GitHub’s Code Security and Secret Protection add-ons (the two products the old GitHub Advanced Security bundle split into) give you three automated scanning tools that run on every commit. CodeQL performs data-flow analysis - not text pattern matching - to find SQL injection, hardcoded credentials, and weak cryptography. Dependabot monitors every package dependency and auto-creates pull requests when vulnerable versions are detected. Secret scanning catches API keys using over 200 token patterns before they hit the repository. Use CODEOWNERS files so every merge requires sign-off from both a business stakeholder and a technical reviewer. For secrets management, GitHub Secrets handles it immediately; Azure Key Vault is the medium-term move.

Pillar 2: Centralized scheduled jobs. This is the pillar that eliminates the “laptop sleeping on a plane” problem. Move every scheduled script off employee machines and into cloud-based execution.

Here’s how the options compare:

PlatformCostBest forLimitation
Windows Task SchedulerFree (current state)Nothing - avoidDies with laptop, zero logging, no alerting
GitHub ActionsLowFirst migration targetTeam plan includes free monthly minutes, covers most use cases
Azure FunctionsMinimalMission-critical jobsSlightly more setup, better monitoring
Claude Cloud TasksIncluded with planSimpler recurring tasksMinimum 1-hour interval

A typical GitHub Actions workflow for a daily Python job looks like this:

name: Daily Prospect Scoring
on:
  schedule:
    - cron: '0 7 * * 1-5' # 7 AM weekdays
  workflow_dispatch: # Manual trigger button

jobs:
  score-prospects:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install -r requirements.txt
      - run: python scripts/score_prospects.py
        env:
          CRM_API_KEY: ${{ secrets.CRM_API_KEY }}
          SMTP_PASSWORD: ${{ secrets.SMTP_PASSWORD }}

The wins are sort of staggering once you list them out. Said better, because “staggering” is hand-wavey. The wins are concrete: the job runs on cloud servers, so laptop state is irrelevant. Full execution logs are captured automatically. Automatic retry on failure. Email or Slack alerts when something goes wrong. A complete audit trail for compliance. And critically, when someone leaves the company, the job keeps running. Someone else takes ownership of the repo and that’s it. I’ve written about non-interactive Claude Code patterns that use this exact approach for scheduled automation, and the same architecture applies here.

Pillar 3: Managed runtime. Deploy Python via Intune or whatever MDM you’re already using. Consistent version, IT-approved, no admin rights needed to install packages. Standardize on uv over pip - it is the package manager the Claude Code docs feature. uv auto-manages virtual environments, resolves dependencies faster, and eliminates the “works on my machine” class of problems. Medium-term, set up a private PyPI mirror using JFrog Artifactory or the free Sonatype Nexus, so IT pre-vets every package and employee machines install from the internal repository only. This kills the slopsquatting attack vector dead, because hallucinated packages never make it into your internal registry.

Microsoft recently open-sourced their Agent Governance Toolkit for exactly this class of runtime security problems. Worth watching as the tooling matures.

Not all code needs the same review

This might sound counterintuitive, but the goal is not maximum review. One of the biggest mistakes in AI code governance is treating every script the same way. A Python script that generates a PDF report from internal data does not need the same review process as a script that writes records into Salesforce. A risk-based tiering system prevents the governance from becoming so painful that people route around it, which is the whole shadow AI problem restated. If the approved path is annoying, people go back to their laptops. The shadow AI prevention problem is fundamentally a supply problem, not a policy problem.

Risk-based code review tiers from critical production access through standard to low-risk personal productivity

Critical tier - anything that writes to production systems like CRM, ERP, financial databases, or customer data. Full code review by the dev team. Security scan approval required. IT sign-off. Monitoring and alerting configured before it goes live. Examples: Salesforce integration scripts, CRM API writes, financial reporting that feeds downstream systems.

Standard tier - code that reads from systems or processes internal data. Code review plus automated scanning through CodeQL and Dependabot. Examples: reporting dashboards, data analysis scripts, internal metrics aggregation.

Low-risk tier - personal productivity scripts that don’t touch external systems at all. Standard naming conventions, standard repo structure, automated scanning only. Examples: Excel file processing, internal document generation, local file format conversions.

Regardless of tier, every piece of AI-generated code must follow five non-negotiable rules. First, it gets committed to centralized GitHub - not kept on personal machines. Second, it passes automated security scanning before merge. Third, secrets are managed through GitHub Secrets or Key Vault - never hardcoded, never in config files, never in environment variables on laptops. Fourth, it includes a README explaining what it does and how to run it. Fifth, it gets reviewed by at least one other person before production use. Lawfare published a sharp analysis of the legal liability dimension here - when AI-generated code causes a breach, the question of who’s responsible gets messy fast. Documentation and review trails aren’t just good hygiene; they’re legal protection.

The operating model that makes this work is a lightweight Center of Excellence. IT and dev handle technical review and repo management. Business liaisons in each department own use case approval - they know whether a particular automation actually solves a real problem or is bikeshedding. Advisory input covers architecture guidance and best practices. This maps to the same LLMOps discipline patterns you’d use for any AI system in production: monitoring, versioning, human review gates.

The OWASP Top 10 for LLM Applications provides a solid checklist for what your security scanning should catch. And if you’re in regulated industries, the EU AI Act Article 26 requirements around audit trails and human oversight apply to exactly this kind of employee-generated AI code.

What it costs and how fast you can move

The more I look at it, the more I think the cost question is a red herring. The timeline surprises people. This isn’t a twelve-month major projects. It’s six months, phased, with value from month one.

Months 1-2 (Foundation). Set up the GitHub organization. Create the first repository by migrating that salesperson’s scripts - the same ones from the opening scenario. Deploy Python via MDM across managed devices. Enable GitHub Advanced Security on the first repo. Move API keys from text files and .env files into GitHub Secrets. Migrate the first scheduled tasks from laptop-based execution to GitHub Actions. This phase alone eliminates the single biggest risk: the bus factor of one person’s laptop being the only place critical automation lives.

Months 3-4 (Scale). Set up the private PyPI repository so packages come from a vetted internal source. Configure Azure Functions for the two or three jobs that are mission-critical and need better monitoring than GitHub Actions provides. Stand up an Application Insights dashboard so IT can see execution history, failure rates, and performance trends across all automated jobs. Create the second and third repos as new use cases emerge from other departments. Establish CODEOWNERS files with dual business-technical review requirements.

Months 5-6 (Mature). Run the first security audit across all user-generated code repositories. Refine the risk-based tier system now that you have real data on what people are actually building. Train power users on Git basics - commit, push, pull request. It’s not hard. It just hasn’t been taught. Establish a quarterly governance review cadence. Document SLAs for IT code support so the support team knows what response time each tier gets. When consulting with companies at this stage, the most common feedback is that it’s less work than they expected, because the automated scanning catches 80% of issues before a human ever looks.

The budget breakdown:

ItemRelative cost
GitHub Team plan (per user)Modest base subscription
GitHub Code Security + Secret Protection (per active committer)The bulk of the spend
Azure FunctionsNegligible
Application InsightsSmall
Private PyPI mirror (optional)Small
TotalA fraction of one support FTE

Compare that against the cost of not doing it. An estimated 0.5 to 1 FTE of reactive IT support - people troubleshooting scripts they’ve never seen, reverse-engineering automations after someone leaves, manually checking code for hardcoded API keys - costs several times more than the governance stack. And that’s before you factor in the cost of a single security incident from unscanned code containing a known vulnerability. The enterprise cost comparison for AI coding tools is a separate analysis, but the governance layer described here applies regardless of which tool your people use.

The ROI math is a no-brainer. Spend a fraction of one support hire to save several times that in reactive support costs, prevent unknown security exposure, and ensure business continuity when employees travel or leave. That doesn’t even count the compliance value, having an audit trail that shows what code exists, who reviewed it, and when it was scanned is the kind of evidence regulators want to see.

I called this a no-brainer just above. That oversimplifies it. The math is plain; the politics are not. Getting IT, dev, and the business stakeholder to all sign off on a new spend, even a small one, and then asking salespeople and marketers to actually move their scripts off their own machines, is the part that takes leadership air cover. The technical plan is the easy half. Buy-in is the work.

I built Tallyfy because I kept seeing this exact failure mode in operations work: knowledge living inside one person’s head, with no version control, no successor, no audit trail. AI-generated code on a salesperson’s laptop is that pattern wearing a hoodie.

The first employee running AI code on their laptop is user number one. Everything in this blueprint starts with that single use case, one person, one repo, one scheduled job migrated off a laptop, and scales to the whole company. Every month you wait, the debt compounds. Not linearly. Exponentially. Because every new hire who discovers Claude Code or Cursor or Copilot starts generating code on day one. And if there’s nowhere approved for that code to go, it stays on their laptop, unscanned, unmonitored, invisible. Until it isn’t.

A bus factor of one, dressed up as productivity. That’s what unmanaged AI code is. Fix it now, while it’s still one person.

About the Author

Amit Kothari is an experienced consultant, advisor, coach, and educator specializing in AI and operations for executives and their companies. With 25+ years of experience, he is the Co-Founder & CEO of Tallyfy® (raised $3.6m, the Workflow Made Easy® platform) and Partner at Blue Sheen, an AI advisory firm for mid-size companies. He helps companies identify, plan, and implement practical AI solutions that actually work. Originally British and now based in St. Louis, MO, Amit combines deep technical expertise with real-world business understanding. Read Amit's full bio →

Disclaimer: The content in this article represents personal opinions based on extensive research and practical experience. While every effort has been made to ensure accuracy through data analysis and source verification, this should not be considered professional advice. Always consult with qualified professionals for decisions specific to your situation.

Related Posts

View All Posts »
Your AI has no whoami

Your AI has no whoami

Every enterprise AI platform resolves what you can access through SSO and SCIM. None of them load your team instructions from who you are. Claude gives admins one 3,000-character field for everyone. Microsoft Copilot reads your permissions but not your team playbook. Here is the gap and what works today.

CLAUDE.md hierarchy: lock at two levels, split the libraries, audit the rest

CLAUDE.md hierarchy: lock at two levels, split the libraries, audit the rest

CLAUDE.md hierarchy looks tidy in a personal repo. Push it across departments and it splits into a tree most users cannot reason about. Lock at two levels. Split read-only governance from read-write working content. Run a seven-check audit on every new file. Anything deeper is a vanity hierarchy that breaks in weeks.

Claude Code enterprise security is a design problem

Claude Code enterprise security is a design problem

Most guides to running Claude Code in an enterprise stop at the install. That is the easy ten percent. The real work is the security design around an agentic tool that runs commands and reads files: the audit trail, prompt injection, permission modes, and a managed policy file.

How to make a single root CLAUDE.md load across your whole organization

How to make a single root CLAUDE.md load across your whole organization

Drop a CLAUDE.md at the root of a SharePoint site and nothing propagates. Each Claude product reads CLAUDE.md a different way. Four parallel loaders, all pulling from one canonical file, are what makes a single source of truth actually land in every session across Claude Code, Desktop, web, and Cowork.

AI advisory services via Blue Sheen.
Contact me Follow 10k+