I have sat through 30 or 40 AI vendor presentations in the past two years, both as a buyer evaluating tools for clients and as a builder evaluating whether a vendor's solution is better than what I can build custom. Most of the presentations are 45 minutes long. Most of the useful information comes out in the first 15 minutes. Here is what to pay attention to and what to ignore.
The five things that matter
1. Can they show you the tool working on your data?
Not their demo data. Your data. If you are evaluating an AI tool for document processing, bring a stack of your actual documents and ask them to process one live. If you are evaluating a customer service tool, give them a transcript of a real customer conversation and ask the tool to respond.
Vendors who can run your data through their system in real time have confidence in their product. Vendors who need to "set up a custom demo environment" or "configure the system for your use case first" are buying time because their demo data is curated to make the tool look good.
If the vendor cannot process your data in the meeting, the tool is not ready for your environment. Full stop. You will spend months in implementation discovering the same thing.
2. What happens when the tool is wrong?
Every AI tool makes mistakes. The question is not whether it makes mistakes. The question is what the workflow looks like when it does.
Ask the vendor to show you an example where the tool produces an incorrect result. Then ask: how does a user identify that the result is incorrect? How do they correct it? How does the correction feed back into the system to prevent the same mistake?
If the vendor claims the tool does not make mistakes, leave the meeting. If the vendor shows you a clear, practical workflow for catching, correcting, and learning from errors, that is a product built for real-world use.
The error handling workflow tells you more about the vendor's maturity than any feature list. Companies that have deployed their tool in production environments have built error handling because they have encountered errors. Companies that have only demoed their tool have not.
3. What does the tool NOT do?
Ask the vendor to name three things their tool cannot do that a customer might expect it to do. The answer reveals two things: whether the vendor understands their own product's limitations, and whether they will be honest with you about them.
A good vendor says: "Our document processing tool handles structured forms well but struggles with handwritten notes. It works best with English-language documents and has limited accuracy with other languages. It does not integrate with SAP natively — we use a middleware connector that adds latency."
A bad vendor says: "Our tool can handle any document in any format." That is either a lie or a product so early in development that nobody has tested it against real-world variety.
The limitations a vendor discloses tell you whether the product will work in your environment. The limitations they hide will become your problems after the contract is signed.
4. How does pricing scale?
Ask for the pricing at your current volume, at 3x your current volume, and at 10x your current volume. Do the math per transaction, per document, or per user at each level.
Many AI tools have pricing that looks reasonable at demo scale and becomes prohibitive at production scale. A tool that costs $0.10 per document sounds cheap until you process 500,000 documents per year and realize you are paying $50,000 annually for something a custom tool could do for a one-time build cost.
Also ask what happens to your data if you stop paying. Can you export everything? Is there a transition period? Are there termination fees? The answers to these questions matter more when you are trying to leave than when you are trying to buy, and the vendor knows that.
5. Who maintains the tool after deployment?
AI tools require ongoing maintenance. Models drift. Data patterns change. New edge cases emerge. Who handles that?
Some vendors provide managed service — they monitor the tool, retrain when needed, and handle updates. That is valuable if you do not have internal AI expertise. Some vendors deploy and disappear — the tool works until it does not, and then you call support. Some vendors require your team to manage the tool, which means you need internal expertise or you need to build it.
Ask specifically: when the tool's accuracy drops from 95% to 88% because our document formats changed, what happens? Who notices? Who fixes it? How long does it take?
The ten things that do not matter
1. The demo. Demos are designed to impress. They show the tool working perfectly on perfect data in a perfect environment. Your environment is not perfect. Your data is not perfect. The demo tells you what the tool can do in ideal conditions. You need to know what it does in your conditions.
2. The benchmark scores. A tool that scores 96% on an industry benchmark might score 73% on your data. Benchmarks measure performance on standardized test data, which is useful for comparing tools against each other but not useful for predicting performance in your specific environment.
3. The customer logos. Large company logos on a vendor's slide deck mean that a large company purchased the tool. They do not mean the large company is happy with the tool, that the tool is still in use, or that the large company's use case is similar to yours.
4. The founding team's credentials. The founders went to Stanford and worked at Google. Interesting. Irrelevant to whether the tool processes your invoices correctly. Judge the product, not the pedigree.
5. The investor backing. The company raised $50 million. This means investors believe the company will grow. It does not mean the product is good. Some of the worst enterprise tools I have evaluated were backed by impressive investors.
6. The market analyst endorsement. Gartner put them in the top right quadrant. This means Gartner's evaluation criteria rated them highly. Your evaluation criteria might be different. Analyst reports are a starting point for research, not a substitute for evaluation.
7. The feature roadmap. The vendor promises features coming in Q3. Buy the product that exists today, not the product that might exist in six months. Roadmaps change. Priorities shift. The feature you need might never ship.
8. The AI model they use. The vendor uses GPT-4, Claude, or their own proprietary model. The model matters less than the implementation. A well-implemented tool on a less impressive model will outperform a poorly-implemented tool on a cutting-edge model every time.
9. The number of integrations. The vendor integrates with 200 platforms. You use 3 of them. Ask about those 3 specifically. The other 197 are marketing, not value.
10. The conference presence. The vendor sponsored five conferences and their booth was the biggest. This means they have a marketing budget. It does not mean they have a good product.
The 15-minute evaluation
Minute 1-3: Show me the tool working on my data.
Minute 4-6: Show me what happens when the tool gets something wrong.
Minute 7-9: What are three things your tool cannot do?
Minute 10-12: Walk me through pricing at my volume, 3x, and 10x.
Minute 13-15: Who maintains this after deployment and what happens when accuracy drops?
If the vendor can answer all five clearly and specifically, continue the conversation. If they cannot answer any one of them, you have saved yourself months of evaluation time and possibly years of regret.
Get posts like this in your inbox
No spam. New articles on AI strategy, governance, and building with AI for small business.
Keep Reading
Why the best AI tools are boring. A defense of building things nobody tweets about.
The AI tools that generate the most business value are categorically uninteresting. They process invoices. They check inventory. They format reports. Nobody writes a breathless thread about them. That is the point.
The three questions every board member should ask before approving an AI budget
Most AI budgets get approved based on potential. These three questions force the conversation to focus on probability, specificity, and accountability instead.
I stopped reading AI news. Here is what I do instead that actually matters.
The AI news cycle is designed to keep you reading. Building is designed to keep you learning. I chose building. Here is why and what changed.