UX Heuristic Evaluation

Oppia.org — UX Audit Report

Desktop · 11 Core Heuristics (83 items) + H12–H14 Supplemental · Advanced Agent Mode + Human Evaluation Panel
AI-1: Claude (Sonnet 4.6 — Medium Reasoning) · AI-2: Codex (GPT-5.5 — Extra High Reasoning) · Human Panel: Human 1, Human 2, Human 3, Human 5, Human 4 · Synthesized 2026-05-03
B-  ·  71.1%
Claude AI-1: B- / 68.5% Codex AI-2: B / 72.0% Human panel avg: 71.6% Blended (3-source): B- / 71.1%

Mission Amplification

Oppia serves under-resourced learners worldwide — many accessing lessons on 2G/3G connections, shared devices, or during brief windows between other responsibilities. Every loading delay, every lost progress state, and every navigation dead end carries a higher cost here than on a mainstream consumer platform. This report weights findings accordingly.

Overall Grade
B-
71.1%
Multiple Issues — Addressable
Claude AI-1: 68.5%
Codex AI-2: 72.0%
Human panel: 71.6%
3-source avg · 11 core heuristics
Plain Language Read

Oppia's content model and design language are genuinely good — the story-based metaphor, clean typography, and no-login browsing all work well. The platform is held back by a small number of high-impact infrastructure problems: progress never saves for any user, the homepage leads with a fundraising overlay, and the lesson player offers no visible exit. All three evaluation sources agree on these priorities — a focused sprint could move this to a solid B.

Before You Use This Interface at Scale

Authenticated user progress is not persisting for any account — confirmed independently by two AI audits and three of five human evaluators. Scaling Oppia's learner base before this is resolved will amplify drop-off proportionally. Resolve the persistence pipeline before any growth or outreach campaign.

Critical Finding — Confirmed by All Three Sources

Progress does not save for any user — including authenticated accounts

The 0% progress ring never updates, for guests and logged-in users. This was independently confirmed by both AI audits (Claude H06: 68.75%; Codex H06: 50% — Catastrophic) and by 3 of 5 human evaluators — then escalated by the HIL auditor who personally verified it breaks for authenticated accounts too.

Claude AI-1: H06 severity high, H07 raised by HIL
Codex AI-2: H06 severity 4 — Catastrophic
Human panel: 3/5 evaluators flagged independently

Executive Summary

All three evaluation sources — Claude (Sonnet 4.6), Codex (GPT-5.5), and five subject-matter experts — reached the same core conclusion: Oppia's content and design language are genuinely good, but the platform's infrastructure undermines the learning experience it promises.

The story-based metaphor works. The no-login exploration model is smart. The visual design is clean. But a learner who completes a lesson finds their progress ring still at zero. A new learner landing on the homepage sees a donation modal before seeing a single sentence about what Oppia teaches. A learner who exits the lesson player by accident finds no obvious way back in.

The highest-impact fixes — progress persistence, the donation modal, the loading screen — are engineering and product-policy decisions, not design redesigns. They are fast to fix and carry outsized impact for Oppia's primary audience: learners in under-resourced communities where every second of load time and every moment of lost progress matters.

The 3-source blended evaluation produces a grade of B- / 71.1% — a high B-minus that reflects real strengths alongside high-impact, addressable gaps. Getting to a solid B+ requires fixing only the top four issues.

Heuristic Scorecard — 3-Source Blended (AI-1 + AI-2 + Human Panel)

Evaluators: Claude Sonnet 4.6 Codex GPT-5.5 Human 1 Human 2 Human 3 Human 5 Human 4 · 83 items · Severity 0–4 · Teal = Claude · Orange = Codex · Purple = Human
Quality % = ((4 − avg severity) / 4) × 100. Blended scores average all three sources equally.
H01
B-
70.5%
Visibility of System Status
Claude (AI-1)52.8%
Codex (AI-2)80.6%
Human panel78.1%
⚡ HIL escalated: loading delay especially harmful for low-bandwidth learners.
H02
B+
81.7%
Match Between System and the Real World
Claude (AI-1)83.3%
Codex (AI-2)83.3%
Human panel78.3%
H03
C+
61.8%
User Control and Freedom
Claude (AI-1)60.0%
Codex (AI-2)60.0%
Human panel65.5%
⚡ HIL accepted as scored.
H04
B
76.0%
Consistency and Standards
Claude (AI-1)70.2%
Codex (AI-2)77.4%
Human panel80.4%
H05
B+
80.3%
Error Prevention
Claude (AI-1)80.0%
Codex (AI-2)85.0%
Human panel76.0%
H06
C+
61.2%
Recognition Rather Than Recall
Claude (AI-1)68.8%
Codex (AI-2)50.0%
Human panel65.0%
⚡ HIL escalated (Critical): authenticated/logged-in users also lose progress. Both AI audits + human panel agree this is the #1 fix.
H07
B-
67.3%
Flexibility and Efficiency of Use
Claude (AI-1)66.7%
Codex (AI-2)63.9%
Human panel71.4%
⚡ HIL escalated: resume capability is table-stakes for learners with intermittent internet.
H08
B
74.3%
Aesthetic and Minimalist Design
Claude (AI-1)76.6%
Codex (AI-2)79.7%
Human panel66.6%
H09
B+
82.9%
Help Users Recognize and Recover from Errors
Claude (AI-1)75.0%
Codex (AI-2)100.0%
Human panel73.8%
H10
B-
71.3%
Help and Documentation
Claude (AI-1)70.0%
Codex (AI-2)80.0%
Human panel64.0%
H11
C
54.8%
Accessibility (POUR)
Claude (AI-1)50.0%
Codex (AI-2)50.0%
Human panel64.4%
⚡ HIL accepted as scored. Full axe-core test strongly recommended.
Score movement: 2-source (prev) → 3-source (current)
H01: 65% → 70.5% B-H02: 81% → 81.7% B+H03: 63% → 61.8% C+H04: 75% → 76.0% BH05: 78% → 80.3% B+H06: 67% → 61.2% C+H07: 69% → 67.3% B-H08: 72% → 74.3% BH09: 74% → 82.9% B+H10: 67% → 71.3% B-H11: 57% → 54.8% C

Prioritized Findings

1
CriticalH06 + H07 — Recognition Rather Than Recall · Flexibility & Efficiency

Progress does not save for any user — including authenticated accounts

Observed Issue
The circular progress indicator remains at 0% and never increments — even after completing a checkpoint and returning. HIL calibration confirmed authenticated (logged-in) accounts also lose progress. Both AI audits independently confirmed this: Claude scored H06 at 68.75% and H07 at 66.67%; Codex scored H06 at 50% (Catastrophic) and H07 at 63.89%. The cross-model agreement on this finding is the strongest signal in the entire report.
Recommendation
Audit and fix the authenticated user progress-persistence pipeline. Add a "Sign in to save progress" nudge at lesson start for guests. Display a last-visited indicator on the classroom page so returning learners know where they left off.
⚡ Mission impact: Oppia serves learners with intermittent internet access. A learner who drops off mid-lesson due to connectivity, then returns to find zero progress, may not return a third time.
👥 All three evaluation sources agree. Codex (Catastrophic/severity-4): "progress does not reliably save or resume even for authenticated users."
Evidence: Topic Page (authenticated + guest), Classroom Page; HIL auditor confirmation; 3/5 human evaluators flagged; both AI audits scored this as high-severity.
2
MajorH08 — Aesthetic and Minimalist Design

Donation modal overlays homepage value proposition on every visit

Observed Issue
A full-width donation prompt appears over the homepage hero on every page load, obscuring the core value proposition. Evaluator D (human panel) flagged this as creating competing messages at the most critical entry point.
Recommendation
Implement a frequency cap: show the modal at most once per 30-day session. Add a persistent low-profile donation link in the global nav. Ensure the homepage value proposition and primary "Learn" CTA are fully visible above the fold on first visit.
⚡ Mission impact: First impressions are highest stakes for under-resourced learners. An intrusive fundraising modal immediately signals "this site wants something from me."
👥 NEW finding from human evaluation panel. Not captured in either AI audit.
Evidence: Homepage — evaluator D (human panel), confirmed present on every visit.
3
MajorH01 — Visibility of System Status

Blank loading screen (3–5 s) with "Loading | Oppia" page title

Observed Issue
During Angular SPA bootstrap, users see a white screen for 3–5 seconds with no spinner, skeleton, or progress indicator. The browser tab reads "Loading | Oppia" — unhelpful to both sighted and screen-reader users. Page title never resets on SPA navigation events.
Recommendation
Add a lightweight server-side rendered skeleton or branded spinner in the HTML shell. Fix the Angular router to update the document title on every route change.
⚡ Mission impact: On a 3G or satellite connection, 3–5 seconds of silence reads as a broken page. Learners may reload or abandon before content loads.
Evidence: Homepage, Lesson Player (Playwright capture). HIL auditor confirmed. Codex: "loading/status feedback is not reliably perceivable for assistive tech."
4
MajorH03 — User Control and Freedom

No visible exit affordance inside the full-screen lesson player

Observed Issue
The lesson player fills the viewport with no visible Back, Close, or Exit button. Users can only exit via browser Back or a small breadcrumb. Both AI models independently confirmed this at the same severity (60%).
Recommendation
Add a clearly labeled "← Back to [Topic Name]" button in the lesson player header.
⚡ Mission impact: A learner who feels trapped in a lesson they are not ready for may close the browser entirely.
👥 Cross-model consensus: Claude, Codex, and 4/5 human evaluators all flagged independently.
Evidence: Lesson Player page. Confirmed by 4/5 human evaluators. Both AI audits at identical score.
5
MajorH11 — Accessibility (POUR)

POUR surface failures: multiple H1s, no ARIA live region, weak focus ring

Observed Issue
Classroom pages contain 2–3 competing H1 elements. No ARIA live region announces SPA navigation completions. Focus ring contrast below WCAG 3:1. Codex specifically called out keyboard-only users: "no visible Exit, Back to Topic, or X is available, and status changes are not clearly announced."
Recommendation
Enforce a single H1 per page. Add role="status" announcer for SPA navigation. Audit and increase focus ring contrast across the design system.
⚡ Mission impact: POUR failures directly block access for screen-reader-dependent users — Oppia's most under-served learning population.
👥 Both AI audits independently scored H11 at 50% (C-). Codex severity-3: "Keyboard-only and assistive-technology users face a major control issue."
Evidence: Accessibility snapshot (Playwright), Classroom Page. Both AI models scored H11 at 50% (C-). Human evaluators slightly more lenient at 64.4%.
6
MinorH04 — Consistency and Standards

Inconsistent vocabulary: "Classroom", "Topic", "Chapter", "Course" used interchangeably

Observed Issue
The same learning structure is called a "Classroom" in the nav, a "Course" in the URL, a "Topic" on the card, and a "Chapter" inside the lesson player.
Recommendation
Pick one vocabulary set (recommended: Topic/Lesson matching the lesson player) and apply consistently site-wide.
Evidence: Homepage, Math Classroom, Place Values topic, Lesson Player breadcrumbs.
7
MinorH07 — Flexibility and Efficiency of Use

No "Continue where you left off" affordance anywhere on the platform

Observed Issue
No resume widget exists anywhere. Returning users — especially those interrupted by poor connectivity — have no visual anchor to restart from. Codex confirmed: "returning-user efficiency fails at the core task."
Recommendation
Add a "Continue where you left off" card to the homepage (logged-in) and a "Last visited" indicator on topic cards.
⚡ Mission impact: Resume capability is table-stakes for serving learners with unreliable internet.
👥 Human evaluator consensus: all 5 flagged absence of resume/continue affordance.
Evidence: Homepage, Math Classroom, Topic page. Confirmed by 5/5 human evaluators and both AI audits.
8
MinorH08 — Aesthetic and Minimalist Design

Interactive buttons lack visible clicked/active state feedback

Observed Issue
Multiple CTAs and lesson-navigation buttons show no active/clicked visual change on press. Human evaluators rated this at consensus severity 2.0/4 — highest agreed-upon H08 score.
Recommendation
Add a clear :active CSS state (background shift or scale transform) to all buttons and interactive lesson elements.
👥 Human evaluator consensus (avg severity 2.0/4). Not captured in either AI audit.
Evidence: Human evaluation panel — 5/5 evaluators flagged; avg severity 2.0.
9
MinorH01 — Visibility of System Status

SPA page title does not update on navigation; no ARIA live announcement

Observed Issue
"Loading | Oppia" persists as the page title through SPA navigation, and no ARIA live region announces when content finishes loading.
Recommendation
Implement Angular RouterModule title strategy. Add an ARIA live region (role="status") that announces page title changes on route transitions.
⚡ Mission impact: Screen-reader users hear nothing when content loads — they cannot confirm if navigation succeeded.
Evidence: Lesson Player, Classroom Page (Playwright accessibility snapshot). Codex confirmed.
10
MinorH08 — Aesthetic and Minimalist Design

Cookie consent banner covers lower viewport on every page

Observed Issue
The cookie banner appears on every page and covers ~15% of the lower viewport. No decline option — only "OK".
Recommendation
Persist consent choice in localStorage on OK click. Add "Manage preferences" option for GDPR/CCPA compliance.
Evidence: Homepage, Lesson Player.

What Is Working Well

Prioritized Fix Roadmap

SprintAction
ImmediateAudit and fix authenticated user progress persistence pipeline (H06 — Critical, confirmed by all 3 sources)
ImmediateAdd "Sign in to save progress" prompt at lesson start for guest users (H07)
Sprint 1Implement donation-modal frequency cap (max once / 30 days); persistent nav link instead (H08 — NEW from human panel)
Sprint 1Add skeleton screen / branded spinner to lesson player loading state (H01)
Sprint 1Add visible "← Back to [Topic]" button in lesson player header (H03 — confirmed by all 3 sources)
Sprint 1Fix Angular router to set meaningful page title on every navigation event (H01)
Sprint 2Add ARIA live region for SPA navigation announcements (H01/H11 — Codex severity-3)
Sprint 2Add "Continue where you left off" widget on homepage and classrooms (H07)
Sprint 2Add visible :active CSS state to all buttons and interactive elements (H08 — human consensus)
Sprint 2Standardize vocabulary: one term set across nav, headings, breadcrumbs, URLs (H04)
Sprint 3Fix H1 heading hierarchy across all classroom page templates (H11)
Sprint 3Persist cookie consent in localStorage; add decline / manage-preferences option (H08)
Sprint 3Run full axe-core + manual screen-reader audit — surface audit covers visible issues only (H11)

Supplemental Heuristics — H12–H14

These three dimensions extend beyond the Nielsen Norman 10+H11 framework. H13 (Customer Journey) was assessed by both Codex and the human panel; H12 and H14 are human-panel only. These scores do not alter the core blended grade.

H12 · Human panel only
B-
68.8%
Empathetic Engagement & Inclusion
Human panel68.8%
H13 · Codex · Human panel only
C-
53.3%
Customer Journey & Satisfaction
Codex (AI-2)54.2%
Human panel52.5%
H14 · Human panel only
B-
67.5%
UX Writing — Content & Tone
Human panel67.5%
B-
68.8%
H12
Empathetic Engagement & Inclusion
Human panel68.8%
Measures whether the product acknowledges learner emotion, celebrates milestones, and creates belonging.
Findings: Story-based framing and audio narration add warmth. No milestone celebrations, no learner community features.
Recommendation: Add micro-celebration at lesson completion. Consider a first-lesson completion badge for new users.
C-
53.3%
H13
Customer Journey & Satisfaction
Codex (AI-2)54.2%
Human panel52.5%
Measures end-to-end journey coherence — first visit through repeat engagement.
Findings: Weakest supplemental at 53.3% blended. Codex (54.2%) and human panel (52.5%) independently arrived at near-identical scores. Absence of onboarding path, progress dashboard, and re-engagement mechanism all flagged.
🔨 Codex finding: Codex finding: "Personalized continuity is not dependable for the central learning task — progress does not reliably save or resume even for authenticated users." Scored severity 4 (Catastrophic).
Recommendation: Define a first-time learner onboarding sequence. Add a learner dashboard showing completed and in-progress topics.
B-
67.5%
H14
UX Writing — Content & Tone
Human panel67.5%
Measures microcopy, button labels, error messages, and instructional tone.
Findings: Generally friendly and accessible tone. Modal button copy ("OK") could be more action-specific.
Recommendation: Audit modal and button CTA copy for specificity. Replace "OK"/"Submit" with action-descriptive labels.

Methodology & Evidence Limits

AI-1 — Claude (Sonnet 4.6, Medium Reasoning): 83 checklist items rated via live Playwright browser session. Pages: Homepage, Math Classroom, Place Values Topic, Lesson Player. Agent modes: Generalist UX Researcher, Accessibility Specialist, Content Strategist. HIL calibration gates at mid-audit and final review.

AI-2 — Codex (GPT-5.5, Extra High Reasoning): Independent heuristic evaluation against the same 11 core heuristics + H13 Customer Journey. Report: oppia-uxhc-final-report.md (2026-05-02). 83 items, same 0–4 severity scale.

Human Evaluation Panel: 5 evaluators (Human 1, Human 2, Human 3, Human 5, Human 4) assessed the same interface using the same 0–4 severity scale. Source: HE_Unified_Scorecard_LOL_V2.xlsx, synthesized 2026-04-09. H12–H14 supplemental assessed by human panel only.

Blending method: Simple average of all available source scores per heuristic. H01–H11 use 3-way average (AI-1 + AI-2 + Human). H13 uses 2-way (AI-2 + Human). H12 and H14 are human-panel only. Supplemental heuristics not included in core blended grade.

H11 Accessibility: Surface audit only — not a full WCAG 2.1/2.2 compliance test. Both AI audits scored H11 at 50% (C-). A dedicated axe-core or manual screen-reader audit is strongly recommended.

H09 Error Recovery: Codex found no observable failures and scored 100%. Claude and human panel had limited evidence. Overall confidence on H09 is low — the high Codex score should not be taken as confirmation that error recovery is strong.

Out of scope: Authenticated post-completion flows, mobile views, error state messaging, post-login dashboard.

Recommended Next Validation Steps

  1. Verify with engineering: is authenticated progress loss a known bug or undocumented decision? This is the #1 priority — confirmed by all three evaluation sources.
  2. A/B test the donation modal with a frequency cap (once / 30 days) — measure impact on first-lesson conversion rate.
  3. Run axe-core automated scan across all 4 audited pages to establish a WCAG baseline.
  4. Test on a throttled 3G connection to measure actual lesson load time for the target audience.
  5. Conduct a 5-minute usability test with a first-time learner: observe homepage → first lesson navigation.
  6. Run a keyboard-only and screen-reader pass on the lesson player (Codex recommendation): loading announcements, page title changes, heading hierarchy, and exit controls.
  7. Check lesson player on Android Chrome (Oppia’s primary mobile platform) for exit affordance and loading behavior.