While there's increasing consensus that AI-assisted and AI-native software engineering increases productivity, there is no consensus on how to measure the productivity and efficiency gains from AI. While there's a temporary respite from justifying AI ROI, over time it's important to effectively measure ROI to ensure AI investments are appropriately sized.
AI Hype Metrics
Would a customer pay more because you added 10,000 lines of code or increased the number of PRs? If not, those are not the metrics that matter.
What we're seeing so far is people falling back on measures that are known to be bad productivity metrics. For example, thousand lines of code, or KLOC, was proven to be a poor measure of software engineering productivity decades ago. The story of KLOC goes back to the origin story of Microsoft. In the 1980s IBM measured productivity based on KLOC and did not want to pay Microsoft for code that was more efficient and had fewer lines of code. When we see people in 2026 use lines of code generated as a proxy for productivity, we know that that's not a great measure of productivity, nor is it a good measure of value creation.
KLOC does not measure productivity. Nor does it measure value creation.
Another proxy for productivity used recently has been the number of pull requests (PRs) per engineer per day. Anyone who has managed an engineering org can tell you that productivity metrics should not be easily gameable. The number of PRs is easily gameable. Split every PR in two and you've "doubled output". In this situation, the metric is completely valueless for measuring human and AI-assisted productivity.
Another issue with using PRs as a productivity metric is that the metric does not account for what's in the PR. Are these small, focused PRs that are important to the business? Or do the PRs contain low-value busy work? Essentially, the problem with counting PRs is that they don't capture any sense of value. People could be very busy doing the wrong things that do not help the business.
Counting the number of PRs is easily gameable and does not capture any sense of value.
If an organization is deadset on using PRs as a metric, then one option is to only count PRs that are tied to work approved by a product manager (PM). This ensures that the PRs have been prioritized and tied to a hypothesis on business value creation.
A Proposed Efficiency Metric
Prior to IPO at a previous company, we spent considerable time thinking about and defining engineering efficiency metrics. Ultimately, a good engineering efficiency metric should capture business value creation. After all, we're measuring efficiency, not activity.
While it's tempting to isolate Engineering from Product and Design, the three are inextricably connected. By looking at the Product Development Lifecycle (PDLC) and not limiting ourselves to the Software Development Lifecycle (SDLC), we find that a Product Requirements Document (PRD) captures customer value. From this nugget of truth, we can extract the value creation metric embedded in the PDLC to measure Engineering productivity.
Now, you may point out that the value creation / productivity metric embedded in the PRD is cross-functional and therefore does not isolate Engineering. True, and that is exactly the point. In a scaled organization, efficiency, productivity, and value creation are cross-functional.
Key point: In a scaled organization, efficiency, productivity, and value creation metrics are cross-functional.
It's likely that you noticed that measuring AI-enabled productivity gains are organization specific. There's no one size fits all productivity metric that will capture all the nuances of your organization. Is your value creation based on customer usage of a new feature? Adoption by F50 customers? Is it increased retention? Whatever the metric, use a measurement of how your organization creates customer value to track how much AI-assisted and AI-native software engineering has increased productivity.
The bottom line is that it's extremely important to measure efficiency, not activity. As AI usage expands in your organization, maintain a firm understanding of what actually matters to the business and customers. How can you measure it? How can you instrument this? How can you use AI to do more of the things that matter?