Insights /

How to assess UX quality in AI-powered products

How to assess UX quality in AI-powered products

Julia Kucherenko
Lead Product Designer

AI has accelerated the shift from deterministic to non-deterministic interfaces. We’ve gone from predictable scenarios to infinite possibilities. And when users have such freedom, how do we guide them toward achieving the goals they need to achieve?

For the designers and engineers responsible for building AI products, answering this question requires more than traditional UX assessment.

In this blog post, we discuss how these changes affect our assessment of UX quality in AI-powered products. And we explore why modern UX assessments must account for both technological capability and human needs. 

Challenges of UX for AI vs. traditional digital products

Designing and assessing AI products brings the open-world game analogy to mind. Just as open-world gamers will do, interact with, and build things in unexpected ways, so too will users engage with AI in ways that are difficult to predict. This makes performing a truly comprehensive UX assessment much more difficult.

And that’s not the only challenge that designers face:

The data dilemma

AI systems inherit the biases present in their training data. Complicating the matter, humans inherit the biases present in the AI systems that they use. All of which creates a cascading effect where stereotypes and prejudices become embedded in user interactions.

In the end, AI is only as good as its data. The challenge for UX designers is to identify when outputs reflect data bias versus legitimate patterns and to create interfaces that allow users to recognize and report problematic results.

Then there is the intensifying demand for data transparency. People want to know where AI gets its information and how their own data contributes to the system’s learning. This creates tension between the need to provide transparency (which can overwhelm users with technical details) and the need for a clean, usable interface.

The hallucination problem

Have you ever asked an AI agent to provide citations for its responses, only to get a “Oops, you’re right. I made that up” response? Yes, AI does hallucinate, and the problem isn’t going away: one test revealed that an AI model had a 79% hallucination rate.

AI models have a tendency to predict words and attempt affirmative responses no matter what. AI doesn’t like to say that it doesn’t know something, and rarely, if ever, does. The prevalence of hallucinations adds another ripple to the challenge of assessing UX quality in these products, especially as it relates to user trust and confidence. 

The delegation paradox

Is AI a tool that you use? A colleague? Your personal butler? The relationship users form with AI fundamentally shapes their expectations and interaction patterns.

Recent research reveals that over-reliance on AI for cognitive tasks actually reduces activity in the brain regions associated with critical thinking and creative problem-solving. This creates a delegation-without-understanding trap: users become less capable of evaluating AI outputs because they’ve outsourced the thinking process itself.

Thus, UX designers must create interfaces that keep users cognitively engaged, perhaps through required review steps, active collaboration patterns, or interfaces that show AI’s reasoning process. And then they must find reliable ways to assess the UX quality of these interfaces.

Human-centered AI frameworks

In such uncharted territory, optimizing and measuring UX quality requires a return to the fundamentals. The core principles of human-computer interaction still apply, despite the breakneck speed of AI’s evolution. Humans are still humans. We’re stubborn in our behaviors, instincts, and patterns. We’re consistent in our need for clarity, control, and trust.

This stability in human nature should still be a cornerstone for UX assessment. To this end, leading tech companies have developed a number of comprehensive methodologies, commonly referred to as human-centered AI frameworks that help shape design, assessment, and continuous iteration.

AI Design Fundamentals - Human-Context Model (IBM)

The IBM framework centers on understanding AI within human contexts and mapping how different levels of AI capability require different interaction paradigms. The framework gives designers a structured approach to defining appropriate human-AI boundaries based on: 

Successful AI design is about finding the optimal balance point where AI enhances rather than replaces human judgment.
Explainability Rubric (Google)

This rubric offers a systematic approach to making AI decisions interpretable. It provides specific criteria for when, how, and to whom AI systems should explain their reasoning. Teams can use this rubric to determine the appropriate level of transparency for different contexts.

Explainability isn’t binary but exists on a spectrum.
HAX Toolkit (Microsoft)

Microsoft’s Human-AI eXperience (HAX) Toolkit provides 18 guidelines for human-AI interaction. These guidelines are distilled from more than 150 design recommendations across industry and academia.

HAX addresses the full lifecycle of AI interaction, from mental model formation to error recovery. Many teams rely on HAX for its practical, actionable guidance, which helps designers anticipate and prevent common AI UX failures before they occur.

Institutionalize human-AI considerations early in the design process.

Methods & metrics for evaluating AI-powered UX

Another challenge is finding the right methods for evaluating AI-powered UX. The list of available methods is long, and relying on them all may prove unsustainable for efficiency-inclined design teams. For clarity, we’ve organized key methods into six high-level categories:

User Research Approaches – Methods like user interviews, field studies, embedded feedback, and even synthetic personas help capture diverse perspectives and ensure AI tools are grounded in real-world contexts.  Testing Strategies – Beyond standard QA, teams are encouraged to probe edge cases, examine bias, prioritize security and privacy, and commit to continuous monitoring.  Trust-Building Mechanisms – Trust doesn’t happen automatically; it’s strengthened through transparent signals like inline citations, confidence indicators, human oversight, and performance benchmarks.  Benchmarks – Success with AI requires clarity. Defining realistic KPIs, setting baseline measurements before rollout, and moving past superficial “we added AI” metrics ensure long-term value.  Feedback Loops & Documentation – Continuous learning is emphasized through regular user interviews, stakeholder alignment, and cross-functional reviews, alongside careful documentation of failures, edge cases, and institutional knowledge.

As far as quantitative metrics go, think about the user effort ratio. For example, how many prompts/iterations were needed to get useful output? What about time to task completion, accuracy rate, and citation/source verification rate? 

Qualitative metrics may include confidence metrics that indicate user trust in AI responses. Many design teams look at perceived effort (how users feel about the interaction) and value vs. novelty assessments (is it genuinely useful or just technically impressive?). 

Finally, don’t overlook key business impact metrics. Assess your actual workflow improvements vs. what you projected when first scoping the project. Look at support ticket resolution rates, user retention, and satisfaction scores.

A note on responsible innovation in AI-powered product design

The path to responsible AI innovation begins with acknowledging a fundamental truth: AI systems are only as ethical as the data they're trained on and the humans who design them. Since we live in a biased world, these biases inevitably seep into our AI systems. 

This reality demands that product teams implement robust human oversight mechanisms, not as a temporary measure but as a permanent feature of AI-powered products. The human ability to disrupt – to recognize when something feels wrong despite what the data suggests – remains irreplaceable in maintaining ethical standards.

Responsible innovation also means:

Product teams must resist the temptation to over-automate, especially in high-stakes domains like finance or healthcare, where accuracy matters more than speed. The goal isn't to create AI that replaces human judgment but rather AI that augments human capabilities while respecting human values, cultural contexts, and the fundamental need for human connection in our increasingly automated world.

Related reading: Best practices for building sustainable digital products

Parting thought: bring core design principles back to AI

Though we see many instances of removing human control where users want it, or failing to cite sources or show reasoning, designers are waking up to a new reality: many of our core design principles still apply to the world of AI.

AI isn’t going anywhere. New challenges in UX design for AI-powered products emerge every day. By evolving the ways that we assess UX quality in these products, we have an opportunity to create AI that truly serves human needs.

Transcenda can help. Our team works with leading brands to create user-centric AI solutions. Contact us today to learn how we can bring our expertise to your most important projects. 

With more than 5 years of experience in Product Design, Yulia does design research, orchestrates processes, and fosters collaboration with stakeholders as a Lead Product Designer at Transcenda.

Subscribe to receive the latest industry insights and news from Transcenda

Related articles:

Subscribe to Transcenda's newsletter

Receive exclusive, expert-driven engineering & design insights to elevate your projects to the next level.