AI has accelerated the shift from deterministic to non-deterministic interfaces. We’ve gone from predictable scenarios to infinite possibilities. And when users have such freedom, how do we guide them toward achieving the goals they need to achieve?
For the designers and engineers responsible for building AI products, answering this question requires more than traditional UX assessment.
In this blog post, we discuss how these changes affect our assessment of UX quality in AI-powered products. And we explore why modern UX assessments must account for both technological capability and human needs.
Designing and assessing AI products brings the open-world game analogy to mind. Just as open-world gamers will do, interact with, and build things in unexpected ways, so too will users engage with AI in ways that are difficult to predict. This makes performing a truly comprehensive UX assessment much more difficult.
And that’s not the only challenge that designers face:
AI systems inherit the biases present in their training data. Complicating the matter, humans inherit the biases present in the AI systems that they use. All of which creates a cascading effect where stereotypes and prejudices become embedded in user interactions.
In the end, AI is only as good as its data. The challenge for UX designers is to identify when outputs reflect data bias versus legitimate patterns and to create interfaces that allow users to recognize and report problematic results.
Then there is the intensifying demand for data transparency. People want to know where AI gets its information and how their own data contributes to the system’s learning. This creates tension between the need to provide transparency (which can overwhelm users with technical details) and the need for a clean, usable interface.
Have you ever asked an AI agent to provide citations for its responses, only to get a “Oops, you’re right. I made that up” response? Yes, AI does hallucinate, and the problem isn’t going away: one test revealed that an AI model had a 79% hallucination rate.
AI models have a tendency to predict words and attempt affirmative responses no matter what. AI doesn’t like to say that it doesn’t know something, and rarely, if ever, does. The prevalence of hallucinations adds another ripple to the challenge of assessing UX quality in these products, especially as it relates to user trust and confidence.
Is AI a tool that you use? A colleague? Your personal butler? The relationship users form with AI fundamentally shapes their expectations and interaction patterns.
Recent research reveals that over-reliance on AI for cognitive tasks actually reduces activity in the brain regions associated with critical thinking and creative problem-solving. This creates a delegation-without-understanding trap: users become less capable of evaluating AI outputs because they’ve outsourced the thinking process itself.
Thus, UX designers must create interfaces that keep users cognitively engaged, perhaps through required review steps, active collaboration patterns, or interfaces that show AI’s reasoning process. And then they must find reliable ways to assess the UX quality of these interfaces.
In such uncharted territory, optimizing and measuring UX quality requires a return to the fundamentals. The core principles of human-computer interaction still apply, despite the breakneck speed of AI’s evolution. Humans are still humans. We’re stubborn in our behaviors, instincts, and patterns. We’re consistent in our need for clarity, control, and trust.
This stability in human nature should still be a cornerstone for UX assessment. To this end, leading tech companies have developed a number of comprehensive methodologies, commonly referred to as human-centered AI frameworks that help shape design, assessment, and continuous iteration.
The IBM framework centers on understanding AI within human contexts and mapping how different levels of AI capability require different interaction paradigms. The framework gives designers a structured approach to defining appropriate human-AI boundaries based on:
This rubric offers a systematic approach to making AI decisions interpretable. It provides specific criteria for when, how, and to whom AI systems should explain their reasoning. Teams can use this rubric to determine the appropriate level of transparency for different contexts.
Microsoft’s Human-AI eXperience (HAX) Toolkit provides 18 guidelines for human-AI interaction. These guidelines are distilled from more than 150 design recommendations across industry and academia.
HAX addresses the full lifecycle of AI interaction, from mental model formation to error recovery. Many teams rely on HAX for its practical, actionable guidance, which helps designers anticipate and prevent common AI UX failures before they occur.
Another challenge is finding the right methods for evaluating AI-powered UX. The list of available methods is long, and relying on them all may prove unsustainable for efficiency-inclined design teams. For clarity, we’ve organized key methods into six high-level categories:
As far as quantitative metrics go, think about the user effort ratio. For example, how many prompts/iterations were needed to get useful output? What about time to task completion, accuracy rate, and citation/source verification rate?
Qualitative metrics may include confidence metrics that indicate user trust in AI responses. Many design teams look at perceived effort (how users feel about the interaction) and value vs. novelty assessments (is it genuinely useful or just technically impressive?).
Finally, don’t overlook key business impact metrics. Assess your actual workflow improvements vs. what you projected when first scoping the project. Look at support ticket resolution rates, user retention, and satisfaction scores.
The path to responsible AI innovation begins with acknowledging a fundamental truth: AI systems are only as ethical as the data they're trained on and the humans who design them. Since we live in a biased world, these biases inevitably seep into our AI systems.
This reality demands that product teams implement robust human oversight mechanisms, not as a temporary measure but as a permanent feature of AI-powered products. The human ability to disrupt – to recognize when something feels wrong despite what the data suggests – remains irreplaceable in maintaining ethical standards.
Responsible innovation also means:
Product teams must resist the temptation to over-automate, especially in high-stakes domains like finance or healthcare, where accuracy matters more than speed. The goal isn't to create AI that replaces human judgment but rather AI that augments human capabilities while respecting human values, cultural contexts, and the fundamental need for human connection in our increasingly automated world.
Related reading: Best practices for building sustainable digital products
Though we see many instances of removing human control where users want it, or failing to cite sources or show reasoning, designers are waking up to a new reality: many of our core design principles still apply to the world of AI.
AI isn’t going anywhere. New challenges in UX design for AI-powered products emerge every day. By evolving the ways that we assess UX quality in these products, we have an opportunity to create AI that truly serves human needs.
Transcenda can help. Our team works with leading brands to create user-centric AI solutions. Contact us today to learn how we can bring our expertise to your most important projects.