Forget To Remember

Roger Craig and the Sabermetrics of Jeopardy

You would think that “Jeopardy” devotees would be familiar with the nature of forgetting. When the TV game show first aired in 1964, Lyndon Johnson was president; today the typical “Jeopardy” viewer is retirement age, a blue-haired grandpa who has a hard time recalling his AOL password, never mind the answer to: Who wrote The Bridge on The River Kwai?

But it turns out the key to remembering “Jeopardy” trivia might actually be a kind of strategic forgetting. Enter Roger Craig, a mild-mannered data scientist, who managed to dominate his turn on the game show by taking advantage of the data-driven science of memory, racking up the largest, one-day total in the history of the game.

This statistical rags-to-riches narrative has become a common trope: A data guru mixes CSV files with a dash of Bayesian models with powerful results. Baseball has Billy Beane, the savvy, stats-hungry baseball manager made famous in MoneyBall. Politics has Nate Silver, the t-test guru behind FiveThirtyEight. Now there’s Roger Craig the guy who brought data analytics to “Jeopardy.”

Roger’s Sabermetrics-like approach relied on a key bit of memory science: researchers believe that every recollection comes with an expiration date. Unless a memory brims with emotion — like that awkward kiss in the back of a movie theater — the memory comes with a timer. When the timer goes off and we haven’t re-engaged the recollection, it’s gone.

In the argot of the learning sciences, the rate of memory failure is known as “the forgetting curve,” and it basically describes the rate at which the timer will go off. Generally speaking, it goes off pretty fast. Within a few moments, people typically forget around 10 percent of what they’ve learned. Within a few hours, it’s around 50 percent.

Roger Craig likes to talk (and even joke) about the forgetting curve, because there’s an easy way to improve it: If we revisit a fact or detail time and time again, we retain it for much longer. In other words, if we come across something again after a few days or even a few minutes, we’re much less likely to forget it.

Imagine you want to remember Elvis Presley’s middle name, which is Aaron. That’s the solid line in the chart below. If you remind yourself a few minutes later of the name, that’s the dotted line. It suggests that reminding yourself a few minutes later will help you retain the memory a little longer. But it doesn’t help by much: After a few weeks, you’ll still likely to have forgotten Elvis’s middle name.

But if you review the name again a few days after the initial learning — the name is Aaron, the name is Aaron — then the dotted line shifts a bit farther out, and now remembering looks more like this:

And if you revisit the name again some weeks later — his name is Aaron, his name is Aaron — forgetting now looks like more like this:

The key thing is the shift of the curve. That’s the sign of learning. It shows that spacing out learning over time can have a big effect. Because when there’s more time in between learning, it limits the eventual forgetting, and so you can answer Alex Trebek in the form of a question when he inquires about Elvis’s middle name, “What is Aaron?”

Craig came across the research on forgetting some two decades ago, and he realized almost immediately that he could use it to win at trivia games by learning details in a more spaced out manner. He didn’t care all that much about what he exactly he studied, at least when it came to “Jeopardy.” He wasn’t inherently interested in the answers to questions like, “Yukata is a traditional style of this robelike belted garment” (Kimono.) He just wanted to dominate the show, and he first tried out as a junior in high school. “I’m very competitive,” he said. “I like to win.”

In the late ’90s, Craig created his own computer tool to help him remember facts by spacing them out of time. “Think barebones flashcard-type software,” he said of the program, which had him revisiting facts over days or weeks to account for his inevitable misremembering. Later, Craig began using software known as Anki, which had fewer bugs than his home-grown version. More importantly, it had a sophisticated algorithm that more accurately predicted when he would forget things, thus having him revisit facts in better alignment with his rate of forgetting.

If Craig learned the capital of Mali (Bamako) using Anki, the software would test him about the fact again in a set number of hours when there would be a high percentage chance that he had forgotten the answer. If Craig got the answer right, the software would then revisit the fact in a few weeks when there was a good chance that he had again forgotten the name of the capitol of Mali (Bamako). Or as the developers declare on the site, “Only practice the material that you’re about to forget.”

Craig landed on “Jeopardy” for the first time in September 2010. (He also tried out in 2006 and 2008.) In a studio in Los Angeles, he stood across from Alex Trebek and set the record for the most amount of money won in a single game, blowing away the title that Ken Jennings had landed some years earlier.

When Craig got back to his hotel that night, he was worried that he had known too many of the answers, that the producers of the show thought that he had duped them in some way. He thought to himself, “Oh wow, maybe it’s worked too well,” and had a hard time sleeping. Would Jeopardy invite him back? Would Trebek think he’d cheated? But the show welcomed him back the following day. He continued his victory streak, too, winning a half-dozen more games. Later Craig also aced the Tournament of Champions, which pits the game’s all-stars against each other.

Can one tweak make a “Jeopardy” champion? Didn’t other people study and re-study the necessary “Jeopardy” trivia like the 1956 Oscar nomination for best sound mixing? (The King and I). Like all good answers, it’s complicated. Craig argues that most people who try out for the game don’t take a very rigorous approach to learning the facts. In particular, they don’t do enough to account for forgetting.

“Most people who develop their own systems for preparation tend to focus on the buzzer and non-study aspects of it,” Craig says. Craig’s approach to “Jeopardy” also had another important data-based twist: he had figured out a way to download all the Jeopardy questions ever asked from a Jeopardy fan website and then turned them into a single dataset. Since questions often repeat themselves on the show, the tool gave him an added edge.

What’s clear is that even the smallest attempts to space learning out over time help us remember a lot more. When there’s greater lengths of time between practice sessions, we gain a lot more. When Nate Kornell was doing his post-doc at UCLA some years ago, for instance, he found that some undergrads would sit in the library and quiz themselves with many small piles of flashcards. They flipped through the small piles and then tucked the cards away, feeling like the material about organic chemistry or Russian history was well learned.

But Kornell noticed that other students took a different approach, creating one large pile of flashcards, sometimes an inch or two thick, which forced them to learn the material in a more spaced-out way, with longer intervals in-between words or concepts. He reasoned that the size of the pile might time-shift what was being learned. Students who used one large pile would gain more because their learning would be more spaced out over time and thus do more to account for their forgetting.

So he drew up an experiment. In the lab, one group of subjects would practice learning vocabulary words using one large pile of flashcards. A second group, who learned the exact same material using four, smaller piles. All of the students would study GRE-level words like effulgent. (What means “brilliant”? )

For their part, the subjects in the study said that thought they thought that they would learn more if they studied four smaller piles. They were like people who cram all of their “Jeopardy” studying into a single day, and before the experiment began, most of the subjects indicated that they would learn more if they used four small piles of cards. They want to take the easier, visually less intimidating approach to studying the material.

But Kornell found the opposite, and the results of spacing the learning — of distributing the practice over a longer period of time — were dramatic. The subjects who practiced using one large stack of flashcards scored thirteen percentage points higher on average, even if the amount of time spent learning was the same. What’s more, some of the subjects who studied with one large pile of cards learned about a third more of the words.

Kornell is now a professor at Williams College, and he believes that spacing helps people know what they don’t know. When we revisit a fact or idea after a period of time, we’re less likely to be overconfident in our answers. In other words, when we do more to adjust our learning for forgetting, we’re less cocky about what we know.

Roger Craig still swears by the power of spacing out practice, and his approach has spread within the small field of dedicated “Jeopardy” contestants. When Craig was on the shuttle bus to Battle of the Decades, a 30th Anniversary Jeopardy Tournament held in 2014, some of the other contestants told Craig they had relied on his strategies to prep for the event. As of yet, Craig has not achieved what might be the highest honor of “Jeopardy,” which is to become an answer to a question. Or as Trebek would like the solution to be worded: Who used the science of memory to win at “Jeopardy”?

Ulrich Boser is a senior fellow at the Center for American Progress. This article was adapted from his new book, Learn Better.