The Great Media Library Cleanup: When Storage Costs Became a Wake-Up Call
The AWS bill that landed in my inbox on a Thursday morning made me do a double-take. Our S3 storage costs had tripled in six months. Not gradually—suddenly, dramatically, like someone had flipped a switch.
"Are we backing up everything twice now?" I asked our DevOps engineer.
"No, it's all media files," he replied. "Your Twill projects are uploading a lot of assets."
I pulled up the storage analytics and stared at the numbers. 847GB of images, videos, and documents across our client projects. That seemed... excessive. Especially since most of our sites were fairly straightforward corporate and campaign pages.
"How much of this is actually being used?" I asked.
That's when we discovered the problem that every CMS faces but nobody talks about: digital hoarding.
The Upload-and-Forget Problem
Here's how it typically happened: A content team would get access to Twill's media library for the first time. They'd see the clean, organized interface—folders, tags, search functionality—and think, "Perfect, let's get everything uploaded so we have it when we need it."
Then they'd dump in every asset from their last three campaigns. Every photo from the product shoot (including the 47 takes of the same angle). Every version of every logo (including the ones with the tiny typo that got fixed). Every video file (including the raw footage that was only supposed to be for internal review).
The media library was so easy to use that it became a digital junk drawer. Upload first, organize later. Except "later" never came.
We started noticing patterns in our GitHub issues:
"Media library becomes slow with large numbers of files" (Issue #154)
"Search functionality times out with 10k+ assets" (Issue #298)
"File browser pagination breaks with massive datasets" (Issue #445)
But the real wake-up call was realizing that most of these files were digital ghosts. Uploaded with good intentions, never actually used in any published content, but still sitting in S3 accumulating charges month after month.
The Reference Tracking Nightmare
The obvious solution seemed simple: build a tool to identify unused media files and delete them. How hard could it be?
Very hard, as it turns out.
Twill's flexible architecture meant that media files could be referenced in dozens of different ways:
Direct references in block content
Background images in CSS customizations
Featured images attached to modules
Gallery collections that might be used across multiple pages
PDF files linked in rich text fields
Video files embedded in custom blocks
Images used in email templates
Assets referenced in JSON fields for API integrations
Just because a file wasn't visibly displayed on a page didn't mean it wasn't being used. And just because it was being used today didn't mean it would be used tomorrow when the content team updated that campaign page.
Our first attempt at cleanup was a disaster. We built a script that identified "unused" files by checking database references, ran it on a staging environment, and confidently deleted 200+ assets that appeared to be orphaned.
Then we pushed to production and watched as half of the client's image galleries turned into broken image placeholders.
The files weren't unused—they were being referenced dynamically through a custom field structure that our cleanup script didn't know about. We spent the next six hours frantically restoring files from backups while the client's marketing team wondered why their perfectly good website had suddenly broken.
The Real Cost of Digital Hoarding
Storage costs were just the tip of the iceberg. The bigger problems were more subtle:
Performance degradation: Media library interfaces that worked fine with 100 files became unusable with 10,000. Content teams started complaining that finding the right image took longer than creating the content that used it.
Decision paralysis: When you have 500 product photos in a folder, choosing the right one becomes overwhelming. Content creators would spend more time browsing assets than actually building pages.
Version confusion: Multiple uploads of similar files led to constant questions: "Is this the approved logo?" "Which version of this image should I use?" "Is the high-res version the same as the web version?"
Backup complexity: Our deployment and backup processes slowed down dramatically as they tried to sync massive asset directories across environments.
Support overhead: Every "my image isn't displaying" ticket required forensic work to figure out which of the twelve similar filenames was the right one.
The International Energy Agency project made this painfully clear. They'd upload batches of charts and infographics for their energy reports, but by the time they were ready to publish, they couldn't remember which files were the final versions and which were works-in-progress. We spent more time on asset archaeology than actual content management.
What Actually Worked
Instead of building a perfect automated cleanup system, we learned to attack the problem from multiple angles:
Upload governance: Added file naming conventions and folder structures during client onboarding. Not exciting, but way more effective than trying to organize chaos after the fact.
Version control: Built simple tools for marking files as "draft," "approved," or "archived" so teams could manage their own asset lifecycles without accidentally deleting something important.
Usage tracking: Instead of trying to reverse-engineer what was being used, we started tracking when files were actually served to end users. Files that hadn't been requested in 6+ months got flagged for review.
Bulk operations: Added tools for content managers to select and delete multiple files at once, making cleanup feel manageable instead of overwhelming.
Storage policies: Implemented automatic archiving for files older than a certain age, with easy restoration if someone actually needed something from the archives.
Client education: Started having explicit conversations during project kickoffs about asset management strategies, not just technical capabilities.
The Hard Conversation
The most important change wasn't technical—it was cultural. We had to start having honest conversations with clients about their upload habits.
"I know it feels safer to upload everything," I'd tell new content teams, "but every file you upload becomes someone else's problem six months from now. Either they'll spend time managing it, or they'll spend money storing it, or they'll spend frustration trying to find the right version."
Some clients pushed back. "Storage is cheap," they'd say. "Why not just keep everything?"
But storage costs weren't really the issue. The issue was that unlimited upload capacity created unlimited organizational debt. Every additional asset made the media library slightly less usable for everyone else on the team.
The clients who got the best results from Twill were the ones who treated their media library like a tool, not a warehouse. They uploaded what they needed, organized it as they went, and regularly cleaned up what they weren't using anymore.
The Lesson That Stuck
The great media library cleanup taught me that technical solutions can't fix organizational problems. We could build the most sophisticated asset management system in the world, but if people uploaded files without thinking about long-term consequences, we'd still end up with digital hoarding.
The real solution was designing workflows that made good asset hygiene feel natural instead of burdensome. Making it easy to organize files as you upload them. Building tools that helped people find what they needed instead of browsing through everything they'd ever uploaded. Creating gentle reminders about storage costs and performance implications.
These days, when clients ask about Twill's media capabilities, we don't just demo the upload interface. We show them the file organization tools, the bulk management features, and the usage analytics. Because the goal isn't just to store their assets—it's to help them manage their assets in a way that serves their team instead of overwhelming it.
Our AWS bills are back to reasonable levels. More importantly, content teams actually use their media libraries instead of treating them like digital attics. Because the best CMS feature isn't unlimited storage—it's helping people stay organized so they can find what they need when they need it.
The cleanup is never really finished. But now it's part of the workflow instead of a crisis waiting to happen.