UX Heuristic Compass · 3-Source Cross-Surface Audit

Compass Suite

jonathankhobson.github.io · 2 Desktop Evaluations · 1 Mobile Evaluation
2026-05-03 · 14 heuristics · 102 items per evaluation · Claude Sonnet 4.6 + Codex GPT-5.5
Claude · Desktop
B+
78.0%
Codex · Desktop
B+
81.2%
3-Source Blend
B-
72.5%
blended across surfaces
Codex · Mobile
C
58.3%
MISSION AMPLIFICATION
The Compass Suite is a portfolio of AI productivity tools — its website is the primary sales surface. Two desktop evaluations agree it works well (B+ and A-). The mobile surface tells a completely different story. Closing that gap is the top priority before any audience growth initiative.
3-SOURCE COMPARISON
CLAUDE
Claude Sonnet 4.6
Desktop
B+
78.0%
CODEX
Codex GPT-5.5
Desktop
B+
81.2%
CODEX
Codex GPT-5.5
Mobile
C
58.3%
SURFACE GAP
Desktop avg (79.6%) outscores Mobile (58.3%) by 21.3pp — a structural gap, not a styling gap.
PLAIN LANGUAGE READ

Two independent desktop evaluations — one by Claude, one by Codex — both confirm the desktop experience is solid to strong (B+ and A-). The same evaluator that scored the desktop A- then scored the mobile C. That's the same tool, same criteria, same evaluator — applied to two different viewports.

The mobile surface has a systemic responsive-layout failure. Navigation tabs clip off-screen. CTAs get cut off. Accessibility scores F (18.81%) on mobile while scoring B+ on desktop. The design system works. The responsive implementation doesn't.

One notable evaluator split: Claude flags vocabulary jargon (H02 C / 58%) while Codex Desktop scores it A++ (100%). Two of three evaluators flag H09 error recovery as a real problem. Mobile's H11 F-grade is confirmed by all three evaluators in different degrees.

⚠️
BEFORE SCALING TO MOBILE AUDIENCES

A first-time mobile visitor may need to scroll sideways or infer hidden labels. The same evaluator scored Desktop A- and Mobile C — a 22.9-point drop on identical criteria.

H11 Accessibility: F (18.81%) on mobile — undersized touch targets, WCAG AA contrast failures, missing ARIA roles. H08 Design: D (45.31%) on mobile — layout overflow makes CTAs unreliable. These block real users right now and should be treated as P0 bugs, not design backlog.

CRITICAL · H11 ACCESSIBILITY

Mobile F (18.81%) vs Desktop B+ (78.06%)

All three evaluators place H11 below their desktop scores once mobile context is applied. A 59-point drop between Codex Desktop and Codex Mobile on the same framework confirms this is a responsive implementation failure, not a design language flaw.

Claude Desktop: B- 68.75%
Codex Desktop: B+ 78.06%
Codex Mobile: F 18.81%
CROSS-EVALUATOR ALERT · H09 ERROR RECOVERY

2 of 3 evaluators flag error recovery as failing

Codex Desktop scored H09 at C- (50%) — identical to Codex Mobile. Claude Desktop scored B+ (75%). The 2-of-3 agreement overrides the lone optimistic reading. Error recovery pathways need structured remediation.

Claude Desktop: B+ 75%
Codex Desktop: C- 50%
Codex Mobile: C- 50%
EVALUATOR DIVERGENCE · H02 VOCABULARY

42pp gap between desktop evaluators — investigate

Claude Desktop (58.33%) and Codex Desktop (100%) diverge by the largest margin of any heuristic. Claude flags jargon (MCP, Plugin Zip, Claude Cowork) as a real barrier; Codex reads the vocabulary as appropriate for the target audience. Mobile (58.33%) aligns with Claude. Consider a user research session to resolve the split.

Claude Desktop: C 58.33%
Codex Desktop: A++ 100%
Codex Mobile: C 58.33%

Executive Summary

DESKTOP CONSENSUS STRENGTHS
  • H03 User Control: both desktop evals A range (95% / 85%)
  • H01 System Status: B+ to A+ consensus (80.56% / 94.44%)
  • H04 Consistency: A- to A+ (82% / 92.86%)
  • H06 Recognition: A range on both desktop evals
  • H08 Design: A on both desktops — strong visual foundation
MOBILE CRITICAL FAILURES
  • H11 Accessibility: F (18.81%) — urgent P0
  • H08 Design: D (45.31%) — layout collapse
  • H14 UX Writing: D- (35.11%) — label truncation
  • H05 Error Prevention: D (48.4%)
  • H09 Error Recovery: C- (50%) — confirmed on desktop too
CROSS-EVALUATOR FINDINGS
  • H09 Error Recovery: 2/3 evaluators flag C- or below
  • H02 Vocabulary: major split — user research needed
  • H10 Help Docs: Claude A++ vs Codex B on desktop — resolve gap
  • H13 Journey: desktop A- vs mobile C+ — narrative breaks on mobile
  • H11: all evaluators see mobile below desktop — unanimously broken

Heuristic Scorecard — 3-Source Blend

Claude · Desktop
Codex · Desktop
Codex · Mobile
H01

Visibility of System Status

A-
84.3%
Desktop +10pp
Claude · Desktop80.6%
Codex · Desktop94.4%
Codex · Mobile77.8%
H02

Match Between System and the Real World

B-
72.2%
↕ 42pp spreadDesktop +21pp
Claude · Desktop58.3%
Codex · Desktop100.0%
Codex · Mobile58.3%
H03

User Control and Freedom

A-
86.7%
Desktop +10pp
Claude · Desktop95.0%
Codex · Desktop85.0%
Codex · Mobile80.0%
H04

Consistency and Standards

A-
83.3%
Desktop +12pp
Claude · Desktop82.1%
Codex · Desktop92.9%
Codex · Mobile75.0%
H05

Error Prevention

B-
69.2%
↕ 36pp spreadDesktop +31pp
Claude · Desktop75.0%
Codex · Desktop84.2%
Codex · Mobile48.4%
H06

Recognition Rather Than Recall

B+
79.2%
Desktop +6pp
Claude · Desktop75.0%
Codex · Desktop87.5%
Codex · Mobile75.0%
H07

Flexibility and Efficiency of Use

B+
77.3%
Desktop +15pp
Claude · Desktop83.3%
Codex · Desktop81.4%
Codex · Mobile67.1%
H08

Aesthetic and Minimalist Design

B
73.4%
↕ 44pp spreadDesktop +42pp
Claude · Desktop85.9%
Codex · Desktop89.0%
Codex · Mobile45.3%
H09

Help Users Recognize, Diagnose, and Recover from Errors

C
58.3%
Desktop +12pp
Claude · Desktop75.0%
Codex · Desktop50.0%
Codex · Mobile50.0%
H10

Help and Documentation

B+
80.0%
↕ 33pp spreadDesktop +19pp
Claude · Desktop100.0%
Codex · Desktop72.5%
Codex · Mobile67.5%
H11

Accessibility and Inclusive Design

C
55.2%
↕ 59pp spreadDesktop +55pp
Claude · Desktop68.8%
Codex · Desktop78.1%
Codex · Mobile18.8%
H12

Empathetic Engagement

C+
63.9%
Desktop +15pp
Claude · Desktop58.3%
Codex · Desktop79.2%
Codex · Mobile54.2%
H13

Customer Journey Coherence

B-
72.7%
Desktop +14pp
Claude · Desktop83.3%
Codex · Desktop71.5%
Codex · Mobile63.2%
H14

UX Writing Quality

C
59.1%
↕ 36pp spreadDesktop +36pp
Claude · Desktop71.4%
Codex · Desktop70.8%
Codex · Mobile35.1%

Key Findings by Source

Claude Sonnet 4.6 — Desktop Evaluation (B+ / 78.01%)

H02 · CLAUDE DESKTOP High Impact Jargon Barrier — MCP, Plugin Zip, Claude Cowork

Technical product names used as primary navigation labels without consumer-readable explanations. Claude scored H02 at C (58.33%) — the only desktop evaluator to flag vocabulary as a structural friction point. Mobile aligns with Claude's reading (58.33%), creating a 2-vs-1 split that warrants user research to resolve.

H08 · CLAUDE DESKTOP Medium Competing CTAs Without Clear Visual Hierarchy

Multiple primary-styled buttons appear in close proximity on key pages. Desktop scores A (85.94%) overall on design, but the CTA hierarchy finding is a targeted gap within an otherwise strong visual execution.

H11 · CLAUDE DESKTOP Medium-High Accessibility: Contrast Failures and Missing Semantic Labels

Desktop scores B- (68.75%) — not failing, but not safe to scale. Several text/background combos fail WCAG 2.1 AA. Some interactive components lack semantic ARIA labels. When mobile is added, the picture worsens dramatically.

H10 · CLAUDE DESKTOP Strength Help Documentation: Perfect Score — Raise as Benchmark

Claude scores H10 at A++ (100%). Help docs are comprehensive, well-linked, and contextually placed. This is the benchmark standard the rest of the product should aspire to. Notably, Codex Desktop scored this B (72.45%) — the two desktop evaluators split 28 points here.

Codex GPT-5.5 — Desktop Evaluation (A- / 81.17%)

H04 · CODEX DESKTOP Moderate Consistency and Trust Cues Need Review

Codex's top desktop finding. Navigation, labels, and components show minor inconsistencies across screens that erode the trust pattern. Scores A+ overall (92.86%) — the inconsistency is a targeted issue, not a systemic failure. Cross-reference with Claude's similar H04 observation.

H09 · CODEX DESKTOP High Impact — Cross-Evaluator Confirmed Error Recovery: C- (50%) — Matches Mobile Exactly

Codex Desktop scores H09 at C- (50%) — the same score as Codex Mobile. This is the most alarming desktop finding: error recovery is failing on both surfaces. Claude Desktop's B+ (75%) reading is the optimistic outlier. 2-of-3 evaluators agree: error recovery needs structured remediation.

H13 · CODEX DESKTOP Medium Customer Journey Friction at Decision Points

Two of the top-5 Codex Desktop findings relate to Customer Journey (H13, 71.54% B). Decision points in the journey lack clarity and recovery affordances. This compounds on mobile where the journey narrative breaks further (63.21% C+).

H02 · CODEX DESKTOP Evaluator Split — Investigate Vocabulary Scored A++ — Disagrees with Claude and Mobile

Codex reads product vocabulary (MCP, Plugin Zip) as appropriate for the developer-oriented target audience. This is the largest single evaluator split in the audit — 41.67 percentage points from Claude. Neither reading is wrong; they reflect different user mental models. A user research session with non-developer visitors would clarify which reading is accurate for the growth market.

Codex GPT-5.5 — Mobile Evaluation (C / 58.26%)

H11 · MOBILE · P0 Critical — Fix Immediately Accessibility F — Structural Mobile Failure

Touch targets below 44px WCAG minimum, contrast failures compounding under mobile ambient conditions, missing ARIA roles on tab navigation. The 59-point gap between Codex Desktop (78.06%) and Codex Mobile (18.81%) on the same heuristic confirms this is a responsive implementation failure.

H08 · MOBILE · P0 Critical — Fix Immediately Systemic Responsive Layout Collapse

Navigation tabs clip at viewport edge on screens under ~390px. Feature section content overflows horizontally. Primary CTAs partially obscured. Users must scroll horizontally to access core navigation — an anti-pattern that signals broken layout architecture. Desktop A (89.03%) proves the design system is sound; the responsive CSS layer is not.

H14 · MOBILE High Impact Label Truncation Destroys Information

Navigation and sidebar labels written for desktop (e.g., "Advanced Audit Report Generator") truncate to meaningless fragments on viewports under 420px. Mobile UX writing requires abbreviated variants or icon+label hybrid layouts at small breakpoints. Desktop B (70.79%) is acceptable; Mobile D- (35.11%) is not.

H05 · MOBILE High Impact Error Prevention Absent at Mobile Interaction Layer

Mobile-specific error prevention (oversized tap zones, autocorrect-resistant inputs, keyboard-aware layout shifts, touch-feedback states) is not implemented. D (48.4%) vs A- (84.15%) on desktop shows this is a mobile-only gap where the desktop patterns simply were not extended.

H10 · MOBILE Medium Help Documentation Not Accessible on Mobile

The same help content that scores A++ on desktop (Claude) scores B (72.45%) for Codex Desktop and B- (67.45%) for mobile. Multi-column doc layouts, wide reference tables, and collapsible sidebars do not adapt to mobile viewports. The content is there — the presentation fails on small screens.

Remediation Triage Matrix

Scores shown as: Claude Desktop / Codex Desktop / Codex Mobile
Heuristic Area C·D / X·D / X·M Owner Timeline Impact
🟠 H11 Accessibility 69 / 78 / 19 Mobile dev + a11y lead Sprint 1 Mobile F → touch targets, contrast, ARIA roles
🟠 H09 Error Recovery 75 / 50 / 50 Dev + content Sprint 1 2-of-3 evaluators flag this; mobile path broken
🟢 H08 Aesthetic Design 86 / 89 / 45 Mobile / responsive dev Sprint 1–2 Mobile D — layout clipping; desktop A- baseline
🟠 H14 UX Writing 71 / 71 / 35 Content + front-end Sprint 1–2 Mobile D- label truncation; all sources agree B range on desktop
🟡 H05 Error Prevention 75 / 84 / 48 Mobile dev Sprint 2 Mobile D — no mobile error states
🟡 H02 System ↔ Real World 58 / 100 / 58 Content / product Sprint 2 Evaluator split: glossary strategy recommended
🟡 H12 Empathetic Engagement 58 / 79 / 54 Design + content Sprint 2–3 Onboarding tone; progressive jargon disclosure
🟡 H13 Customer Journey 83 / 72 / 63 Design + product Sprint 3 Mobile C+ breaks desktop A- narrative arc
🟢 H10 Help & Documentation 100 / 72 / 67 Content Sprint 3 Desktop split (A++ vs B) — mobile-proof docs
🟢 H04 Consistency 82 / 93 / 75 Design system Sprint 3 Mobile slippage vs strong desktop; button audit
🔴 Critical (<47%)🟠 High (<60%)🟡 Medium (<73%)🟢 Monitor (≥73%)

What's Working — Confirmed Across Multiple Evaluations

H03 USER CONTROL · A / A++ / A-

All three evaluations score H03 in the A range. Undo paths, cancel options, and navigation escape hatches are reliably present across desktop and mobile.

H01 SYSTEM STATUS · B+ / A+ / B+

Status visibility scores well across all three evaluations — one of the few areas where mobile performance matches desktop. Loading states and feedback loops work.

H08 DESIGN · DESKTOP · A / A

Both desktop evaluators score H08 in the A range (85.94% / 89.03%). The design system is strong. The mobile failure (D / 45.31%) is a responsive CSS implementation issue — not a design language failure.

H04 CONSISTENCY · A- / A+ / B+

Desktop consistency is a genuine strength — especially for Codex Desktop (A+ / 92.86%). Even mobile scores B+ (75.01%), meaning consistency principles are more robustly applied than most heuristics.

H06 RECOGNITION · B / A / B+

Familiar UI patterns, predictable component placement, and icon conventions work well on both surfaces. The vocabulary debate (H02) is separate from recognition — patterns are legible even if labels are jargon-heavy.

H10 HELP DOCS · A++ / B / B-

Claude's A++ (100%) is a standout result. Even the lower Codex Desktop reading (B / 72.45%) and mobile reading (B- / 67.45%) represent acceptable baselines. The content foundation is strong; mobile delivery needs work.

Recommended Roadmap

SPRINT 1 — CRITICAL (Weeks 1–2)
  • H11 Mobile Accessibility: WCAG AA audit; 44px minimum touch targets; fix contrast failures; add ARIA roles to all interactive components including tab navigation
  • H08 Responsive Layout: Fix nav tab overflow at all breakpoints 320–768px; ensure CTAs are never clipped; test horizontal scroll elimination
  • H09 Error Recovery (Desktop + Mobile): Structured remediation — 2 of 3 evaluators flag this. Mobile error recovery path, desktop help anchor links
SPRINT 2 — HIGH PRIORITY (Weeks 3–5)
  • H14 Label Shortening: Abbreviated label variants for all nav items >20 chars; icon+label layout at ≤420px breakpoint
  • H05 Mobile Error Prevention: Enlarged tap zones; input validation before submission; keyboard-aware layout shifts
  • H02 Vocabulary Resolution: Commission user research session with non-developer target users to settle the evaluator split; implement glossary if research confirms Claude's reading
  • H12 Empathetic Onboarding: Reframe hero copy for non-developer curiosity; add outcome framing before feature listing
SPRINT 3 — MEDIUM PRIORITY (Weeks 6–10)
  • H10 Mobile Help: Responsive doc layouts; single-column mobile format; contextual help anchors on product pages
  • H13 Journey Continuity: Ensure mobile navigation preserves the desktop journey narrative; bottom-nav pattern for primary actions
  • H04 Design System Responsive Audit: Apply button color and spacing tokens consistently at all breakpoints
  • H08 CTA Hierarchy: One dominant primary CTA per section; secondary actions visually demoted
ONGOING — MONITOR
  • H03 User Control: All three evaluators rate this A range. Regression-test with each navigation change.
  • H01 System Status: Solid cross-surface. Extend mobile loading state patterns as responsive fixes land.
  • H04 Consistency: Strong on desktop. Audit each new component addition for responsive token application.

Methodology

Evaluation Sources

Claude Sonnet 4.6 — Desktop (viewport ≥1024px). 102 items, 14 heuristics. B+ / 78.01%.
Codex GPT-5.5 — Desktop (viewport ≥1024px). 102 items, 14 heuristics. A- / 81.17%.
Codex GPT-5.5 — Mobile (viewport ≤430px). 102 items, 14 heuristics. C / 58.26%.

Scoring Formula

Quality % = ((4 − avg_severity) / 4) × 100

Severity: 0 None, 1 Cosmetic, 2 Minor, 3 Major, 4 Catastrophic

Blended = (Claude Desktop + Codex Desktop + Codex Mobile) / 3

Overall = mean of all 14 blended heuristic scores

Grade Scale

A+ ≥95%
A ≥90%
A- ≥83%
B+ ≥77%
B ≥73%
B- ≥66%
C+ ≥60%
C ≥54%
C- ≥47%
D+ ≥40%
F ≥0%

Heuristic Framework

H01–H10: Nielsen Norman 10 Usability Heuristics.

H11: Accessibility and Inclusive Design (extended).

H12–H14: UX Heuristic Compass extended framework — Empathetic Engagement, Customer Journey Coherence, UX Writing Quality.

Next Steps

WEEK 1

H11 + H08 sprint kick-off. WCAG audit. Fix touch targets, contrast, nav tab overflow.

WEEK 2

H09 error recovery remediation on both surfaces. Schedule H02 vocabulary user research session.

WEEK 3–4

H14 label shortening. H05 mobile error prevention. H12 onboarding rewrite based on research.

WEEK 5+

Re-audit mobile surface after Sprint 1–2. Target: mobile above 70% (from 58.26%). Close the 21pp desktop/mobile gap by 50%.

Compass Suite UX Audit — 3-Source Blended Score 72.5% · B- · 2026-05-03
Claude Sonnet 4.6 (Desktop) · Codex GPT-5.5 (Desktop) · Codex GPT-5.5 (Mobile) · UX Heuristic Compass Framework