Validating a Survey on AI Adoption Through Cognitive Interviews

Problem

A survey that assumes the answer is not measuring opinion. It is confirming it.

The survey was designed to measure AI adoption in UX: frequency of use, trust in outputs, workplace concerns. Question 6 asked only about the most significant benefits of AI in UX. That framing told participants what to notice before they answered.

Other questions had different problems. “Transparency” and “consistency” appeared in Q12 and Q14 without definitions. Two questions asked about related topics so closely that participants read them as duplicates. Two others combined separate ideas into a single question, making a clear response impossible. The response format changed between Q12 and Q14 for identical situations: “Nothing” in one place, “None” in another.

None of these were obvious from inside the team. The survey had been through multiple rounds of internal review and it looked finished. Most were fixable with rewording. Q6 needed a structural reframe.

Stakes

One biased question does not produce one biased response. It shapes every conclusion built from it.

If the questions are loaded, the data is too. Findings that look rigorous can rest on instrument assumptions that never appeared in the output.

A participant who held a mixed view of AI in UX had no room to express it. The data would have reflected what the question permitted, and every conclusion built from those responses would carry that error forward.

Research Strategy

You find out what a participant reads by watching them read it.

A researcher reading their own questions is often blind to the assumptions in them. Cognitive interviews with think-aloud protocol make participant interpretation visible in real time.

Participants: 8 UX professionals
Product designers, UX researchers, HCI students
Sessions: ~20 minutes each
In-person and remote (Zoom)
Experience range: 0 to 10+ years

01

Complete the survey out loud, in order.

Participants narrated their interpretation of each question and what influenced their answer. Moderators probed when participants went quiet or answered without elaborating.
02

Probe on language, not just answers.

Standard probes included: “What does that term mean to you?”, “Did any part of that question seem unclear?”, “Did the response options let you express what you actually think?”
03

Debrief after completing the full survey.

Post-survey questions surfaced patterns question-level probes could not: which sections felt repetitive, which response scales felt mismatched.

Evidence

Six categories of instrument problems, surfaced before a single response was collected.

Problems appeared across multiple sections of the survey. Most were fixable with targeted revision. One required a structural reframe, not just rewording.

01
Assumption-laden framing.

Several participants paused on Q6 and noted it constrained them to one type of response. One said they had started to answer before realizing the question gave them no way to express a mixed or critical view. This was a validity issue, not merely a phrasing one.

Before “What do you consider the most significant benefits of using AI in your UX work?”

After “What do you consider the most significant benefits and drawbacks of using AI in your UX work?”
02
Ambiguous terminology.

Words like “transparency” and “human oversight” appeared in Q12 and Q14 without definitions. When probed, participants gave meaningfully different answers to the same question.
03
Double-barreled questions.

Q11 and Q17 each combined two distinct ideas. Participants answered and then qualified, noting the question seemed to ask two things. At least one chose an answer that applied to only one clause and said so unprompted.
04
Redundant questions.

Q6 and Q15 covered overlapping themes; so did Q16 and Q19. Participants flagged both pairs without prompting, asking whether the questions were intentionally different or whether they had missed a distinction.
05
Inconsistent response formatting.

Q12 used “Nothing” as a null response. Q14 used “None.” At least one participant stopped and asked whether the two meant different things. They did not.
06
Poor response options.

In more than half of sessions, at least one participant wanted an option that did not exist. Several selected the closest available answer and said it did not reflect their actual experience.

Recommendation

Fix the framing first. Terminology second. Formatting last.

The framing issue on Q6 required a structural change. The rest required targeted revision.

01

Broaden Q6’s framing.

Q6 was revised to ask about both benefits and drawbacks of AI in UX. The revision opened space for critical or mixed responses where the original permitted only positive ones.
02

Define ambiguous terms explicitly.

“Transparency,” “consistency,” and “human oversight” were given inline definitions in Q12 and Q14. Participants should not have to bring their own definition to a measurement question.
03

Separate double-barreled questions.

Q11 and Q17 were restructured to isolate each distinct idea into its own question. The survey got two questions longer. The data got interpretable.
04

Expand limited response options.

Q2, Q5, and Q6 received additional answer choices based on options participants named during the interviews. Q12 was revised to better capture the range of concerns participants described wanting to express.
05

Standardize response formatting and phrasing.

“Nothing” and “None” were standardized across Q12, Q14, Q18, and Q20. Response phrasing and formatting were aligned across all questions where the same construct appeared.

Outcome

Instrument Quality

Before: a draft survey with a framing assumption in Q6, two double-barreled items, and terminology participants defined differently. After: a revised 22-question instrument with broader framing, defined terms, and consistent formatting.

Six problem categories, spanning revisions across the full instrument, were caught before a single response was collected.

Scope of revision

6 problem categories. Revisions across the full instrument. The survey grew from 20 to 22 questions after separating two double-barreled items into four.

Cost of skipping

Eight sessions, approximately 20 minutes each, before distribution. The alternative is discovering instrument problems after data collection, when fixing them means starting over.

Reflection

The framing catch was the most important finding. The timing was not optimal.

Keep: reading your own instrument as a skeptical participant.

The most important catch required reading the instrument from a position the team did not have: someone who did not share the assumptions embedded in the instrument. That stance, treating your own questions as a skeptical participant would, is the practical skill cognitive interviews formalize.

Change: run cognitive interviews earlier, on a rougher draft.

By the time interviews ran, the survey had been through multiple rounds of internal review. It looked finished. That polish may have raised the threshold for pushback: participants were more likely to assume they had missed something than to flag the question as broken. Run cognitive interviews when the draft is rough enough that participants feel licensed to say so.

Complete the survey out loud, in order.

Probe on language, not just answers.

Debrief after completing the full survey.

Assumption-laden framing.

Ambiguous terminology.

Double-barreled questions.

Redundant questions.

Inconsistent response formatting.

Poor response options.

Broaden Q6’s framing.

Define ambiguous terms explicitly.

Separate double-barreled questions.

Expand limited response options.

Standardize response formatting and phrasing.

Before: a draft survey with a framing assumption in Q6, two double-barreled items, and terminology participants defined differently. After: a revised 22-question instrument with broader framing, defined terms, and consistent formatting.

Keep: reading your own instrument as a skeptical participant.

Change: run cognitive interviews earlier, on a rougher draft.

Evaluating AI Handwriting for Product Readiness