
In Aesop’s “The Dog and His Reflection,” a dog loses his bone because he is distracted by an illusion, ending with the moral, It is very foolish to be greedy. As a 6th-grade ELA teacher, I have wondered if educational technology is doing the same thing to our students. Are we dropping the “bone” of reading comprehension to chase the illusion of gamified engagement? There is a rather constant push and pull between traditional print and digital Edtech. We know that reading on a screen is fundamentally different from reading a physical book, but how does the medium actually impact a student’s cognitive load and metacognition?
To begin understanding this in my own classroom, I designed an A/B study to investigate the cognitive friction of digital reading versus physical paper. I measured how the medium itself impacts reading comprehension, extraneous cognitive load, and a student’s own perception of their learning. What I found challenged some of my assumptions about educational technology.
The Methodology: 1 Assignment, 2 Versions
To test this in a real-world environment, I built an A/B comparative study using Aesop’s Fables, adapted from my district curriculum. My goal was to isolate the medium as the sole variable.
Group A – Digital Escape Room (Genially): I designed a digital, interactive lesson in Genially. To prevent students from bypassing the reading, I used a “Microsite” navigation mode that required them to correctly answer comprehension questions to unlock the next room.
Group B – Physical Paper Packet: To mirror the Genially experience, I stripped the gamified escape room narrative but maintained the exact same texts, questions, and attention paid to the spatial layout. To mimic the digital stops, I implemented physical “teacher checkpoints” where students had to verify their answers with me before turning the page.
The Reality of the Classroom: Glitches and “The Grass is Greener”
Day 1 involved 89 students (47 Digital, 42 Paper). Immediately after the activity, students completed an adapted NASA Task Load Index survey via Google Forms to measure their subjective cognitive load (Mental Demand, Effort, Navigational Friction, and Frustration).
Observationally, both groups reported similar levels of frustration (averaging around 2.3 – 2.45 out of 5). A significant portion of this initial frustration stemmed from a lack of autonomy in what version they received. Sitting across from each other, two boys with different versions complained that they wanted the other version, neither of them hearing that their peer was equally annoyed—a very classic “grass is greener” phenomenon.
However, the digital group faced a unique issue: technical glitches. Genially experienced a bug where the screen would turn white, forcing students to reset the entire escape room. I could see the frustration on my students’ faces each time they ran into this issue. Additionally, an unclickable golden egg icon caused early confusion. “Was I supposed to click on that to read it?” was asked more than once per hour. These issues raise the question: Is it developmentally appropriate for 11- and 12-year-olds to navigate tech failures while performing a cognitively demanding reading task?
Data Analysis: Gamification vs. Deep Focus
On Day 2, I administered a comprehension check via Google Forms. Students read “The Dog and His Reflection” and answered questions to measure skill transfer.

After matching the 59 students who were present for both days (due to an early release schedule and poor attendance), I upskilled in Google Sheets to anonymize the data and utilized Google Gemini to run inferential statistical analyses.
Finding 1: The comprehension scores (out of 3 points) were incredibly similar: the digital group averaged 2.14, and the paper group averaged 2.17. They learned the material equally well. However, the way they arrived at those scores varied widely.
Finding 2: The paper group felt they had to exert more effort than the digital group. Recall that the paper group had to check in with me personally three times to verify their answers, whereas the digital group was completely self-paced. I watched students click through the digital Genially module and more or less declare, “I read 7 fables in 5 minutes,” with complete sincerity. They were skimming for keywords to solve a puzzle. The higher effort reported by the paper group indicated the focus required for genuine comprehension.

Finding 3: Perhaps the most crucial finding was the correlation between the user interface and learning. For the digital group, there was a statistically significant negative correlation between Navigational Demand and Comprehension Score (r = -0.41, p = 0.029). For every degree to which a student felt they had to “work” to navigate the Genially module, their actual learning decreased. The digital environment is fragile; bad UX directly impacts the mental bandwidth needed for reading.

The Confidence Check: How Paper Keeps Readers Honest
One of the most noticeable differences was in student confidence. The paper group reported feeling significantly more confident in their learning. Furthermore, statistical analysis showed a positive correlation between Confidence and Comprehension Score for the paper group—meaning their perceived confidence accurately predicted their actual mastery.
This aligns with current neuroscience and educational research regarding calibration accuracy—the degree to which a reader’s self-assessment matches their actual performance. As noted by Bruggink et al. (2022), readers often experience a gap in calibration accuracy when using screens, tending to think they understand the text better than they actually do. Furthermore, Dahan-Golan et al. (2018) highlighted this exact metacognitive gap, revealing that students are much more accurate in self-evaluating their understanding on physical paper.
Physical paper provides an embodied learning experience. When my paper group missed a question at my teacher checkpoint, they referenced the physical text to find the correct answer. The slower, more deliberate nature of paper reading fostered better learning outcomes and precise metacognitive calibration.
The Moral of the Study: Designing for Substance
This project is entrenched in the realities of instructional design. Learning how to manage, anonymize, and interpret raw data using AI makes me a vastly more effective learning experience designer. My students loved feeling like they were part of a real study, and in the future, I would love to build in an additional layer of student autonomy regarding their medium choice.
Most importantly, this data directly challenges much of my own work where I am building in the digital space. LXD exists beyond screens, though “digital” often feels like the default lens through which our field is viewed. Aesop’s moral for the dog was “Beware lest you lose the substance by grasping at the shadow.” As learning experience designers, we need to heed the same warning. Digital gamification is a beautiful shadow, but physical, distraction-free reading is often of greater substance to our learners. Moving forward, this A/B testing experience will impact how I evaluate educational technology before utilizing it. I will continue to challenge my pre-existing beliefs, push back against the systems that default to screens, and—above all else—collect the data. (My administrators will love to hear that news!)
Note on Data Analysis and AI Usage
To ensure rigorous standards for this A/B comparative study, the dataset was fully anonymized to protect student privacy prior to evaluation. Data cleaning, inferential statistical analysis (including Welch’s independent t-tests and Pearson correlation coefficients), and data visualization were conducted using Google Gemini’s advanced Python data analysis environment. All statistical outputs were reviewed and interpreted by the author to synthesize the findings presented in this blog post.
References
Bruggink, M., Swart, N., van der Lee, A., & Segers, E. (2022). Teaching reading comprehension in a digital world. Literacy, 56(2), 154-165.
Golan, D., Barzillai, M., & Katzir, T. (2018). The effect of presentation mode on children’s reading preferences, performance, and self-evaluations. Computers & Education, 126, 346-358.
