What I learned building 200 AI tools in 18 months. The ten rules I follow now that I did not know at the start.

In the past 18 months, I have built approximately 200 AI tools. Production tools for clients. Internal tools for my own business. Experimental tools to test ideas. Small utilities. Complex systems. Tools that saved companies hundreds of thousands of dollars and tools that were abandoned after a week.

These are the ten rules I follow now that I did not know when I started.

Rule 1: The person who does the work must be in the room when you build the tool.

Not their manager. Not a business analyst who documented their process. The actual person who does the work every day.

I have built tools based on process documentation that was technically correct and practically useless. The documentation described the official process. The person who does the work follows a modified process they have optimized over years of experience. The modifications are not documented because they seem like common sense to the person doing the work.

When you build the tool with the practitioner in the room, they tell you things the documentation does not contain. "Oh, we do not actually do it that way. We skip that step because it does not apply to 90% of cases." "The form says to check both databases, but the second one has not been updated since 2019, so I only check the first one." "That field is technically required but I always leave it blank because the next step overwrites it anyway."

These details determine whether the tool works in practice or only in theory.

Rule 2: Build the tool that saves 30 minutes, not the tool that saves 30 hours.

The tool that saves 30 minutes is small enough to build quickly, specific enough to test immediately, and modest enough that nobody expects it to transform the organization. It just works, quietly, every day.

The tool that saves 30 hours is ambitious. It requires months of development, integration with multiple systems, buy-in from multiple stakeholders, and a change management process. It might save 30 hours when it is done. It might also never ship because the scope expanded, the requirements changed, or the stakeholders could not agree on priorities.

I have shipped 15 tools that each save 30 minutes. Together, they save 7.5 hours per day. I could not have shipped one tool that saves 7.5 hours per day because the complexity would have been unmanageable.

Rule 3: The governance file is not optional and it is not an afterthought.

Every project gets a CLAUDE.md file before any code is written. The governance decisions — what data the tool can access, what it cannot do, who reviews its output, how errors are handled — are made first because they constrain the design.

When governance comes after the build, it restricts a tool that was built without constraints. This creates friction because the tool has to be modified to comply with rules that were not considered during design. When governance comes before the build, it shapes a tool that is inherently compliant. No retrofitting required.

Rule 4: If you cannot explain the tool's behavior in one sentence, the tool is too complex.

"This tool reads incoming invoices, extracts line items, and matches them against purchase orders." One sentence. Clear. Testable. Anyone can understand what the tool does and verify whether it is doing it correctly.

"This tool leverages AI to optimize our accounts payable workflow through intelligent document processing, automated matching, exception handling, approval routing, and predictive analytics." That is a marketing sentence, not an explanation. The tool described by the second sentence will take six months to build, will do none of those things well, and will be abandoned within a year.

Every tool I have built that lasted was explainable in one sentence. Every tool I have built that was abandoned required a paragraph.

Rule 5: Demo on real data or do not demo at all.

When I show a client what the tool does, I use their actual data. Their real invoices. Their real patient records (de-identified if necessary). Their real customer emails.

Demo data is seductive because it is clean, consistent, and designed to make the tool look good. Real data is messy, inconsistent, and exposes the tool's limitations. Showing limitations during the demo is better than discovering them during deployment. A client who sees the tool struggle with their messy data and watches me fix it in real time trusts the tool more than a client who sees a perfect demo and then encounters problems later.

Rule 6: The tool must fail visibly.

When the tool makes a mistake, the mistake must be obvious. Not hidden in a log file. Not silently producing a wrong number. Visibly wrong in a way that the user notices immediately.

I design tools to flag uncertainty. If the tool is less than 90% confident in a classification, it does not classify — it shows the user the options and asks them to decide. If the tool extracts a number from a document and the number is outside the expected range, it highlights the number in red and says "verify this."

Silent failures are catastrophic in production. A tool that quietly miscategorizes 5% of documents creates a mess that takes months to untangle. A tool that flags the 5% it is uncertain about and asks a human to decide never creates that mess.

Rule 7: Measure before you build and measure after you deploy.

Before building, I measure the current state of the problem. How many hours does this process take? How many errors occur per month? What is the cost of those errors? These numbers justify the build and set the success criteria.

After deploying, I measure the same things. How many hours does the process take now? How many errors? What is the cost? The comparison tells me whether the tool worked. Not whether people like it. Not whether it is technically impressive. Whether it solved the problem it was built to solve.

I have built tools that people loved but that did not move the metrics. I have built tools that people were indifferent about but that halved the error rate. The second category is more valuable. Metrics do not lie. Enthusiasm does.

Rule 8: Plan for the tool to be wrong 5% of the time and design accordingly.

No AI tool is 100% accurate. Designing for perfection produces tools that are either extremely conservative (rejecting anything uncertain, creating bottlenecks) or extremely dangerous (accepting everything, passing errors downstream).

Designing for 95% accuracy with a clear process for the 5% produces tools that are useful immediately and improve over time. The 5% gets flagged, reviewed by a human, and corrected. The corrections feed back into the tool. Accuracy improves. The 5% becomes 3%, then 2%.

The organizations that get the most value from AI tools are the ones that accept imperfection and build processes around it. The organizations that demand perfection before deployment never deploy.

Rule 9: The best tools disappear into the workflow.

The tools that get used longest are the ones that users forget are AI-powered. They do not have a separate interface. They do not require a separate step. They are embedded in the process the user already follows.

The invoice processing tool does not require the user to open a new application, upload invoices, and review results. It monitors the email inbox where invoices already arrive, processes them automatically, and adds a review step to the approval workflow that already exists. The user's process does not change. One of the steps just got faster.

Tools that require behavior change have a built-in adoption barrier. Tools that enhance existing behavior have a built-in adoption advantage.

Rule 10: You will rebuild the tool in six months and that is fine.

The first version of every tool is wrong. Not completely wrong — it works, it solves the problem, it creates value. But six months of real-world use reveals things that no amount of planning could predict. Edge cases. Workflow changes. New requirements. Shifted priorities.

The rebuild is not a failure. It is a feature. The first version was built with assumptions. The rebuild is built with experience. The second version is always better because it incorporates six months of data about how the tool is actually used.

I no longer try to build the perfect tool on the first attempt. I build the functional tool, deploy it, learn from six months of use, and rebuild with the knowledge I gained. This approach ships faster, creates value sooner, and produces better tools in the long run.

Learn to build tools that last

All Posts

thought-leadership AI development lessons building AI tools AI best practices

Get posts like this in your inbox

No spam. New articles on AI strategy, governance, and building with AI for small business.

What I learned building 200 AI tools in 18 months. The ten rules I follow now that I did not know at the start.

Keep Reading

Your company's AI pilot succeeded. Here is why scaling it will fail unless you change three things.

AI does not understand your business. You understand your business. AI just types faster.

The hidden cost of AI you are not measuring: your team's time learning tools they will abandon in six months