I Ran Paperclip AI for 2 Weeks. Here’s What I Learned

Paperclip is getting hyped as an “OpenClaw killer” that can autonomously manage AI agents and run like a company. For the past two weeks, I’ve been running Paperclip to help manage a couple of my open-source projects. In this post, I’ll share how that experience has gone so far, what worked, what didn’t, and whether Paperclip is worth trying for your own projects.

What exactly is Paperclip?

From the docs:

Paperclip is a control plane for managing autonomous AI agents like a real company. Instead of running the agents directly, it provides the structure around them: organizing roles, tracking tasks, controlling budgets and token usage, and adding governance and oversight. In practice, it acts as the coordination layer that helps AI agents work together in a more controlled, auditable, and goal-aligned way.

First day: the hype and the initial setup

Paperclip started trending on X, much like OpenClaw did in its early days, so I decided to set it up in my homelab and give it a try.

I spun up a VM, set up an Ansible project, and installed Paperclip by following the docs. Then I installed Codex and hooked it up to my $20 plan. Here is the Ansible playbook if you’re interested.

The first step was to create a company, define goals, and assign some initial tasks to the CEO. I used the following values to test the setup, mostly copied from the docs:

Company Name: Jd Corp
Mission: Build the #1 AI note-taking app at $1M MRR in 3 months
CEO Agent prompt: Hire your first engineer and create a hiring plan
CEO Agent prompt description:

You are the CEO. You set the direction for the company.

- hire a founding engineer
- write a hiring plan
- break the roadmap into concrete tasks and start delegating work

After that, I created the company and launched it.

Within a few seconds, the company was ready. The CEO proposed hiring a founding engineer and asked for approval. Since I’m the board member, I have to approve all hiring. It essentially created a copy of the CEO agent with instructions to work on technical tasks. The CEO automatically created Jira-style issues and assigned them to the founding engineer. The two agents worked through a few issues, then gradually stopped doing much of anything.

The prompt and goals were pretty vague, so they created some initial plans and then gave up. Still, it was beautiful to watch. I could already see the potential in this kind of setup. That pushed me to try it on one of my actual open-source projects, which brings us to the next phase: a setup that was actually useful.

Functional setup: AI Experiments company

I tinkered around for a while, creating a few more companies and learning how the system worked. Eventually, I created a company to manage my open-source project Kubernetes AI SRE. The setup now looks something like this:

Company Goal: Build a set of AI agents that can autonomously work on various technical and non-technical projects
CEO Agent: Empty prompt

I learned pretty quickly that giving the CEO an initial prompt without defining the project, workspace, and execution context is mostly useless. So instead, I created a CEO with no tasks, cloned the repo in a workspace, and then created a task for the CEO to set up the project in Paperclip and hire an engineer.

Here was my exact prompt:

# AIE-1: Hire someone to develop k8s-ai-sre

The app code is in the repo https://github.com/kmjaygoodadeep/k8s-ai-sre
Use the working folder /home/paperclip/workspace/k8s-ai-sre-project. The repo is already cloned in a subfolder there
Make a new project "Kubernetes AI SRE Agent" setup to above
Hire a new engineer - gpt-5.3-codex to work on this
The skill https://skills.sh/laguagu/claude-code-nextjs-skills/openai-agents-sdk might be useful
he should check the PLAN.md file in the repo to organize himself around tasks, and create issues in paperclip accordingly. Keep track of the work in memory to avoid duplicate work between runs. 
Maybe use git worktrees to work on multiple tasks in parallel
he should have necessary memory skills and tools to self organize, access issues, escalate to CEO when needed etc.
Always push features as PRs and never push directly to master. 
Check for open PRs before starting new work to avoid duplicates. Keep the status up-to date in the project issue in paperclip

This time, things went much better. I started seeing genuinely useful pull requests. But the system was still making a lot of mistakes, which led to the next phase: improving the agents.

Making the agents useful

It’s cool to see an org chart with a CEO, CTO, CFO, and so on. But honestly, for my project, the CEO alone could probably do most of the job. I still decided to keep one engineer in addition to the CEO.

What I noticed was that the founding engineer was making a lot of mistakes. It wasn’t using the Paperclip issue system properly, escalating to the CEO when needed, or behaving the way I expected. So I looked into the CEO’s instructions.

It was using an OpenClaw-style system built around four markdown files:

AGENTS.md: Defines the role and refers to other files
HEARTBEAT.md: Defines what to do on each heartbeat (a cron job every hour), such as checking active issues and delegating tasks
SOUL.md: Defines tone and style
TOOLS.md: Lists the available tools

This was effectively the operating manual for the CEO, and it was pretty solid.

I couldn’t say the same for the founding engineer. When the CEO hired the new agent, its AGENTS.md was just two lines explaining its role. That was a big reason why the agent was so ineffective.

So I did some research, combined that with the structure already used for the CEO, and created the same four markdown files for the founding engineer. Once I did that, things started working much better.

It created around 50 pull requests over the course of a week. I still had to constantly adjust the instructions in AGENTS.md to steer it. For example:

Don’t push directly to master
Always open a PR
Always run end-to-end validation before raising a PR
Use kind to run a local cluster

Once I was happy with the setup, I was able to ask the CEO to duplicate the agent, create more agents, and delegate more tasks.

Caveats and closing thoughts

Paperclip is not production-ready at the moment. If you’re aiming for a stable setup, it’s probably better to wait until the hype cycle cools down a bit.

Here are some pitfalls I ran into, so you can avoid them:

Do not think of Paperclip’s org structure as being analogous to a real company. The org chart with a CEO, CFO, CMO, and so on is mostly a marketing device. It’s better to think of Paperclip as a nice interface for managing multiple independent agents.
Take your time designing the agents. Add your own skills, customize AGENTS.md, and install the tools they need in the system. Do not rely on the CEO agent to hire a new agent and assume it will work out of the box. It won’t.
There are still bugs. Check the Paperclip issue tracker if you run into one. Given how quickly things are moving, there’s a good chance the issue has already been reported or fixed.
Use a decent model. I tried Codex 5.3, GPT-5.4, some Opencode free models, and MiniMax. The GPT models worked really well through Codex, but MiniMax 2.5 was mostly useless because it couldn’t properly figure out how to use Paperclip issues. MiniMax 2.7, on the other hand, worked pretty well through pi-agent.
Watch your token usage. I recommend having specialized agents for specialized tasks and using cheaper models for straightforward work. MiniMax 2.7 is a very cheap option that can do around 95% of what Claude Sonnet can do, and I never ran out of credits on their $9 plan.
Think about security. I would not run this on my local laptop. Rent a $5 DigitalOcean VPS instead, or buy a small used PC and run it in a VM.

Would I recommend Paperclip today? Yes, but only if you enjoy experimenting and don’t mind getting your hands dirty. It is not a plug-and-play system yet, but for builders who are willing to tune it properly, there is already a lot of value to unlock.

I Ran Paperclip AI for 2 Weeks. Here’s What I Learned

What exactly is Paperclip?

First day: the hype and the initial setup

Functional setup: AI Experiments company

Making the agents useful

Caveats and closing thoughts

Comments

More from this blog

AI is not replacing Devops Engineers, It is making us more Valuable

Plan Mode — The Most Underrated Superpower in AI Coding Agents

My €400 Budget NAS That Replaced Google Drive and the Cloud

From Arch Linux to NixOS: Diving into the Nix Ecosystem – Part 1

Command Palette

What exactly is Paperclip?

First day: the hype and the initial setup

Functional setup: AI Experiments company

Making the agents useful

Caveats and closing thoughts

Comments

More from this blog