Every week, I talk to a company that wants to implement AI. They have a use case. They have budget. They have executive sponsorship. They are ready.
Then I ask three questions:
Where does your customer data live? How many systems contain it, and which one is the source of truth?
When was the last time someone verified that your reporting data matches your transactional data?
If a customer asked you to delete all their data tomorrow, how many systems would you need to touch, and how long would it take?
The answers are usually some variation of: "I would need to check with IT," "We have not done that recently," and "That is a good question."
These companies are not ready for AI. They are ready for data governance. AI comes after.
What a data governance dashboard actually shows
The dashboard I built over a weekend has four sections. None of them involve machine learning, language models, or anything that would qualify as artificial intelligence. All of them are prerequisites for any AI implementation.
Section 1: Data inventory. A list of every system that contains business data, what kind of data it holds, who owns it, and when it was last audited. For the company I built this for, the list included 14 systems. The CTO thought they had 8. The six he did not know about included a legacy CRM that three salespeople still used, a department-level Access database with 40,000 customer records, and a contractor's personal Google Sheet that contained pricing data used in proposals.
You cannot govern what you do not know exists. The inventory is step one.
Section 2: Data quality scores. For each system in the inventory, the dashboard shows basic quality metrics: completeness (what percentage of required fields are populated), consistency (do the same records match across systems), freshness (when was the data last updated), and uniqueness (how many duplicate records exist).
The company's primary CRM had 23% duplicate customer records. Their billing system and CRM disagreed on customer addresses for 18% of accounts. Their product database had 340 SKUs with no price attached. None of this was visible until the dashboard made it visible.
Section 3: Data lineage. For key business metrics — revenue, customer count, active subscriptions, churn rate — the dashboard traces the data from its source system through every transformation to the final reported number. Revenue, for example, started in the billing system, was pulled by an ETL job into a data warehouse, aggregated by a stored procedure, and displayed in a BI dashboard. At each step, there was an opportunity for the number to change.
The company discovered that their reported monthly revenue was consistently 3% higher than their actual collected revenue because the ETL job included pending invoices that the BI dashboard reported as finalized revenue. A 3% error that nobody caught because nobody traced the lineage.
Section 4: Access map. Who has access to what data, through which systems, with what permissions. The dashboard pulls user lists and permission sets from each system and presents a unified view. The company found that 12 former employees still had active credentials in at least one system. Two of them had admin access to the billing system.
Why this comes before AI
If you train an AI model on data that has 23% duplicates, the model learns from duplicate records. If you build an AI reporting tool on data with a 3% revenue discrepancy, the AI reports the wrong number confidently. If you deploy an AI tool accessible to people who should not have data access, you have automated a security problem.
Every AI failure I have seen in enterprise settings traces back to a data problem, not an AI problem. The model worked correctly on bad data. The AI was not wrong. The data was wrong, and the AI faithfully reproduced the wrongness at scale.
The data governance dashboard is the diagnostic. It tells you whether your data is ready for AI or whether AI will amplify problems you do not know you have.
What the build looked like
Friday evening: Connected to the company's four primary systems (CRM, billing, product database, data warehouse) using their standard APIs. Wrote queries to pull record counts, field completeness, and last-modified timestamps.
Saturday morning: Built the quality scoring engine. Completeness is straightforward — count non-null values divided by total records for required fields. Consistency required matching records across systems by a common identifier (email address in this case) and comparing fields that should match. Freshness is the maximum last-modified date per table.
Saturday afternoon: Built the lineage tracer. This was the most manual part — I had to read the ETL job configuration to understand the transformation steps. Once mapped, the dashboard displays the chain visually with the transformation applied at each step.
Sunday: Built the access map by pulling user/role data from each system's admin API. Presented it in a matrix: users on one axis, systems on the other, permission level in each cell.
The dashboard runs on a scheduled refresh — nightly for quality scores, weekly for access maps. It sends an alert when a quality score drops below a threshold or when a new admin-level access is granted.
The reaction
The CTO's initial response was discomfort. The dashboard showed problems he knew existed but had never quantified. Seeing "23% duplicate rate" on a screen is different from vaguely knowing "we probably have some duplicates."
But the discomfort turned productive quickly. Within a week, they had a data cleanup initiative with specific targets: reduce duplicates to under 5%, resolve the revenue discrepancy, deactivate former employee accounts, and audit the six unknown systems.
Two months later, they had clean enough data to start their AI project. The AI initiative that would have launched on bad data instead launched on data that had been inventoried, cleaned, and monitored. The AI tools they built produced results the team trusted because the underlying data was trustworthy.
The cost of skipping this step
I know a company that skipped data governance and went straight to AI. They built a customer churn prediction model. The model identified 200 customers as "high churn risk." The sales team reached out to all 200. Forty-seven of them were duplicate records — the same 23 customers counted twice. Twelve of them were former customers who had already churned and been re-entered as new leads. Eight of them were test accounts created by the QA team.
The sales team burned a week of effort on a list that was 33% garbage. The AI model was not wrong. It correctly identified those records as anomalous. The problem was that the data was a mess, and nobody checked before handing it to a model.
Data governance would have caught every one of those issues before the model was trained.
What this means for your company
If you are planning an AI initiative, start here. Build a data governance dashboard. Inventory your systems. Measure your data quality. Trace your lineage. Map your access.
This is not exciting work. It does not demo well. Nobody writes LinkedIn posts about reducing their duplicate record rate from 23% to 4%. But it is the work that determines whether your AI investment produces reliable results or expensive mistakes.
The dashboard takes a weekend to build. The data cleanup takes weeks to months depending on the mess. The AI initiative that launches on clean, governed data produces value from day one instead of producing confident-sounding answers based on data nobody trusts.
Get posts like this in your inbox
No spam. New articles on AI strategy, governance, and building with AI for small business.