Amit Kothari CEO of Tallyfy, AI advisor at Blue Sheen

Why organizing your files comes well before doing any sort of AI

In brief

A Seagate and IDC study of 1,500 enterprise leaders found 68% of business data goes unused. AI amplifies whatever state your files are in. If SharePoint is a graveyard of duplicate presentations and orphaned project sites, AI just indexes the mess faster. File organization is not a nice-to-have before AI adoption. It is a prerequisite.

Amit Kothari Follow 10k+

Apr 3, 2026 · AI

CEO of Tallyfy · AI advisor at Blue Sheen for mid-size companies

Why organizing your files comes well before doing any sort of AI

Key takeaways

AI tools are dramatically faster on local files - No authentication, no API calls, no latency. Local SSD reads in microseconds versus cloud API calls in hundreds of milliseconds. That speed gap reshapes what AI can practically do.
OneDrive sync breaks above 300,000 items - Microsoft's own documentation caps synced files at 300,000 across all libraries, and performance starts degrading well before that threshold. Syncing a departmental root directory will crash your machine.
SharePoint version history eats storage silently - A single file can rack up 500 versions, and the old default never expired them. A 200MB PowerPoint with 10 versions quietly becomes 2GB. Practitioners report version history consuming 20-40% of total storage.
One master file system, not copies - Duplicate files across personal OneDrive accounts mean AI does not know which copy is authoritative. Every AI answer becomes a coin flip between conflicting versions.

AI amplifies whatever state your files are in. Clean, well-organized files with clear naming and logical folder structures? AI tools will find things quickly, produce accurate summaries, and connect information across documents. A sprawling mess of duplicate files, abandoned project folders, and five versions of “Q3 Budget FINAL v3 ACTUALLY FINAL.xlsx”? AI will index that mess faster than any human ever could, and confidently produce answers sourced from the wrong version.

The thing is, most organizations treat file organization as a future cleanup project. Something for a slow quarter. But if you’re rolling out Claude, Copilot, or any AI assistant that touches your documents, your file system is no longer a passive storage layer. It is the training ground. And the quality of what AI produces depends on the quality of what it can find.

AI amplifies your file mess

The speed difference between AI working on local files versus cloud files is not marginal. It is orders of magnitude.

When Claude Code or any local AI tool reads a file from your SSD, that read happens in microseconds. There’s no authentication handshake, no API call to Microsoft’s servers, no waiting for OneDrive to figure out whether you’ve got the latest version synced. The file is just there. Fast.

When that same tool needs to reach into SharePoint through an API? Every single file access involves an HTTP request, authentication, throttling checks, and data transfer. Hundreds of milliseconds per call. Multiply that across hundreds or thousands of files in a typical document analysis task, and you’re looking at the difference between a job that takes seconds and one that takes minutes. Or minutes versus hours.

This is not theoretical. Anyone who has tried running an AI coding assistant against a locally cloned repository versus a cloud-mounted drive knows the feeling. Local is snappy. Cloud is painful.

But here’s where it gets messy. Most enterprise files don’t live locally. They live in SharePoint. In OneDrive. In Teams channels that someone created for a project two years ago and never cleaned up. And the organizations that need AI most urgently are precisely the ones with the most chaotic file systems, because the chaos is what’s driving the need for AI in the first place.

The Seagate and IDC Rethink Data study surveyed 1,500 global enterprise leaders and found that enterprise data grows at roughly 42% year over year. That alone is striking. But the number that bothers me is this: 68% of data available to businesses goes unused. Not poorly used. Not underused. Just untouched.

So you’ve got an exponentially growing pile of files, two-thirds of which nobody ever looks at, and now you’re asking AI to make sense of it all. Good luck.

The problem compounds with data quality issues that break AI projects in ways most teams don’t anticipate. Bad data in, bad answers out. But bad file organization is even more basic than bad data quality. You can’t even assess data quality if you don’t know where your data lives.

The 300,000 file ceiling nobody mentions

There’s a hard limit in the Microsoft 365 world that I keep bringing up in discussions because almost nobody knows about it.

Microsoft’s own documentation states that OneDrive sync should handle no more than 300,000 files across all synced libraries. That’s not 300,000 per library. That’s 300,000 total, across everything you’re syncing.

That sounds like a lot until you think about what happens in practice.

Every Teams channel creates a SharePoint site behind the scenes. Every SharePoint site has document libraries. Every document library accumulates files over months and years. Old project sites never get deleted. Nobody archives anything. And SharePoint version history can keep up to 500 versions per file. On the old count-based default, nothing ever expires them.

Five hundred versions. Per file. No expiry.

A practical example from Nikki Chapple, a SharePoint practitioner who did the math: a 200MB PowerPoint file with just 10 versions consumes 2GB of storage. She found that cleaning up version history on one site collection saved 44% of total storage. Nearly half the storage was old versions nobody would ever look at again.

An IT admin testing the new 64-bit OneDrive client pushed it to 308,000 files and watched the internal.DAT tracking file balloon to roughly 350MB. Their conclusion was basically “please don’t sync everything in OneDrive.” Performance degradation starts well before the 300,000 ceiling. Other practitioners have identified roughly 100,000 items as the practical threshold where things start getting clunky.

In conversations I’ve had with IT teams at mid-size companies, the story is always the same. Someone syncs a departmental root directory. Their laptop grinds to a halt. OneDrive’s CPU usage spikes. Files show sync conflicts. The IT team pushes back and tells people to un-sync, but nobody knows which folders they actually need, so they sync everything or nothing.

This is the exact environment people are now trying to drop AI tools into. Sort of like trying to teach someone to cook in a kitchen where every drawer is jammed with utensils from three previous tenants and the pantry has seven opened bags of flour.

Selective sync is not optional

The fix isn’t complicated. It’s just deliberate.

The concept I keep pushing is selective sync as a conscious act. Not “sync everything and hope for the best.” Not “sync nothing and work exclusively in the browser.” But specifically choosing which folders to sync locally for a specific purpose, working with them, and un-syncing when done.

For AI projects specifically, the pattern looks like this. Create a dedicated workspace in SharePoint for the project. Populate it with the specific documents AI needs to work with. Sync that workspace locally using OneDrive selective sync. Point your AI tool at the local folder. Work. When the project is done or when you need to move on, un-sync and clean up.

This approach solves multiple problems at once. The AI tool gets fast local file access. Your machine doesn’t choke on 300,000 synced files. You know exactly which documents the AI is reading. And you maintain a clean separation between “files AI is actively using” and “the giant archive of everything the company has ever produced.”

A mid-size company I worked with found this out the hard way during their AI rollout. Over a thousand employees across multiple locations, years of accumulated creative assets, marketing materials, project deliverables. IT had been fighting with users about syncing for ages because people kept syncing entire departmental libraries and crashing their laptops. The AI initiative was basically dead on arrival. Not because the AI tools didn’t work. Because nobody could get files to the AI tools in a usable state.

The resolution? They created dedicated AI project spaces in SharePoint. Small, focused libraries with only the documents relevant to each project. Users synced just those libraries locally. Treated selective sync as a deliberate act rather than a default. The shift in mindset was harder than any technical change.

A SharePoint admin writing on Medium documented a similar cleanup and found version history alone consuming 20-40% of total storage across their environment. Think about that. Up to 40% of your SharePoint storage bill might be old file versions that serve no purpose.

Once you’ve got your files properly organized and synced, setting up SharePoint and OneDrive for Claude specifically becomes straightforward. But the organization has to come first.

Need help making this real in your firm? That’s what Blue Sheen does.

One master file system between people

This one keeps coming up and it drives me a bit mad.

Five people working on a proposal. Each person has a copy on their personal OneDrive. One person also has a copy in a Teams channel. Another person emailed a version to an external partner, who sent back comments on a PDF printed from a different version. The “final” version exists in at least three locations, and none of them match.

Now ask an AI tool to summarize the proposal.

Which version does it use? Whichever one it finds first. And it won’t tell you there are four other versions with conflicting information unless you specifically ask. The AI doesn’t know which copy is authoritative. It just sees files.

This isn’t an AI readiness problem in the traditional sense. Nobody puts “file deduplication” on their AI readiness checklist. But it is one of the most common reasons AI produces wrong answers. Not because the AI is hallucinating. Because the AI is accurately summarizing the wrong document.

The fix is boring and organizational, not technical. One master location for each document. One source of truth. If people need to work on something collaboratively, they work on the same file in SharePoint, not copies scattered across personal drives. Microsoft built co-authoring into Word, Excel, and PowerPoint years ago. It works. People just don’t use it because habits from the email-attachment era are deeply ingrained.

People burn a real chunk of every week just searching for files. Across a couple hundred employees, that adds up to thousands of lost hours a year. AI can theoretically reduce that search time. But only if there’s a coherent filing system to search through.

The pattern I recommend is simple. Every project gets one SharePoint site or library. All project documents live there. Personal OneDrive is for personal documents only, not shared work. When a project ends, its library gets archived, not deleted. Clear naming conventions. No “New Folder (2)” ever again.

Mind you, this sounds obvious when written down. In practice, getting 200 people to change how they save files is one of the toughest organizational changes you can attempt. Harder than rolling out a new CRM. Harder than switching email platforms. Because file saving is something people do dozens of times a day without thinking, and changing unconscious habits requires sustained pressure over months.

But if you skip this step, your AI initiative will produce beautiful, confident, precisely wrong answers sourced from outdated copies of documents that should have been deleted six months ago. And that’s worse than no AI at all, because at least before AI, people knew they had to double-check which version was current. AI removes that healthy skepticism. People trust the AI’s answer without asking “which version of the document did you read?”

This connects directly to why AI projects fail at the organizational level. The technology works fine. The organization around it doesn’t.

Storage costs add up quietly

Nobody watches SharePoint storage costs until the bill arrives.

SharePoint Online extra storage costs approximately EUR 0.18 per GB per month. That works out to roughly EUR 2,160 annually for one extra terabyte. Not catastrophic. But it adds up when you’re growing at 42% year over year and never deleting anything.

The really painful comparison is what that same storage costs elsewhere. Azure Blob Storage charges a fraction of what SharePoint charges per gigabyte. The same file sitting in Azure cold storage costs dramatically less than that file sitting in SharePoint with 500 versions. And here’s the kicker: roughly 70% of files in typical SharePoint environments haven’t been accessed in the last 90 days.

So you’re paying premium SharePoint rates for files nobody has opened in three months. With 500 historical versions attached. And growing at 42% per year.

Before any AI rollout, the cleanup is worth real money. Reducing version history limits from 500 to a sensible cap like 100, or switching to SharePoint’s automatic version trimming. Archiving old project sites to cheaper storage. Deleting abandoned Teams channels and their associated SharePoint sites. Establishing retention policies that actually expire content instead of keeping everything forever.

The storage savings alone often justify the organizational effort. But the real payoff comes when AI tools start working against a clean, well-structured file system instead of a digital landfill. Faster processing. More accurate results. Fewer hallucinations caused by conflicting document versions. Less time for humans to verify AI outputs because the AI was working with good inputs in the first place.

Done right, the cleanup before AI saves money twice. Once on storage costs, and again on the time people spend correcting AI outputs that were wrong because the source files were a nightmare.

Every discussion about AI readiness eventually circles back to the same unglamorous truth. The technology is ready. The files are not. And no amount of prompt engineering or model selection will fix an answer that came from the wrong version of a document buried in a SharePoint site that should have been archived two years ago. Fix the files first. Then bring in the AI.

file-managementai-productivitysharepointenterprise-ai

About the Author

Amit Kothari is an experienced consultant, advisor, coach, and educator specializing in AI and operations for executives and their companies. With 25+ years of experience, he is the Co-Founder & CEO of Tallyfy® (raised $3.6m, the Workflow Made Easy® platform) and Partner at Blue Sheen, an AI advisory firm for mid-size companies. He helps companies identify, plan, and implement practical AI solutions that actually work. Originally British and now based in St. Louis, MO, Amit combines deep technical expertise with real-world business understanding. Read Amit's full bio →

Disclaimer: The content in this article represents personal opinions based on extensive research and practical experience. While every effort has been made to ensure accuracy through data analysis and source verification, this should not be considered professional advice. Always consult with qualified professionals for decisions specific to your situation.

Contact me More about me

View All Posts »

CLAUDE.md hierarchy: lock at two levels, split the libraries, audit the rest

CLAUDE.md hierarchy looks tidy in a personal repo. Push it across departments and it splits into a tree most users cannot reason about. Lock at two levels. Split read-only governance from read-write working content. Run a seven-check audit on every new file. Anything deeper is a vanity hierarchy that breaks in weeks.

How to make a single root CLAUDE.md load across your whole organization

Drop a CLAUDE.md at the root of a SharePoint site and nothing propagates. Each Claude product reads CLAUDE.md a different way. Four parallel loaders, all pulling from one canonical file, are what makes a single source of truth actually land in every session across Claude Code, Desktop, web, and Cowork.

SharePoint vs OneDrive for AI - where your files live decides what AI can see

When AI indexes your files, SharePoint and OneDrive expose data very differently. Here is how permission models, governance gaps, and the oversharing problem change everything about where you store documents for AI workflows.

How to organize SharePoint and OneDrive so Claude can actually find your documents

Your SharePoint and OneDrive setup is organized for people browsing folders. Claude reads metadata. That mismatch is why the M365 Connector finds nothing useful and Cowork struggles with your files. Restructuring for AI access takes a weekend and pays off immediately.

Claude is allowed in regulated finance, but it has no EU data residency

Two objections kill most regulated-finance AI conversations before they start. The first, that Anthropic does not permit Claude for regulated work, is false: Claude for Financial Services exists, banks run it, and the usage policy names finance high-risk, not forbidden. The second is real and almost nobody states it plainly: first-party Claude Enterprise has no EU data residency at all. There is no "eu" inference region and workspace storage is US-only. If you are FCA-regulated, that is the fact to design around, and the only EU route runs through a hyperscaler.

Your locked-down Claude sandbox is a holding pattern, not a destination

Giving everyone Claude inside an isolated VM, no sensitive data allowed, feels like the safe way to start. It is a fine way to start. The trouble is what happens when you leave people there: the leak it was built to stop walks out by copy-paste anyway, the friction recruits the shadow AI you were trying to prevent, and the value never compounds because nothing in an ephemeral box survives the session. A sandbox is a scaffold. Scaffolds come down.