Duolingo Research

Science powers our mission to make language education free and accessible to everyone.

About Us

Publications

Data & Tools

  • 2020 Notification Bandit Data

    Replication data for our KDD 2020 paper, "A Sleeping, Recovering Bandit Algorithm for Optimizing Recurring Notifications." Includes 200 million examples of Duolingo practice reminder push notifications sent to Duolingo users over a 35 day period, including which template was used, whether the user converted within 2 hours, and other metadata.

  • 2020 STAPLE Shared Task Data

    Data for the 2020 Shared Task on Simultaneous Translation And Paraphrase for Language Education (STAPLE). This corpus contains more than 3 million pairs of English sentences with multiple possible translations into Portuguese, Hungarian, Japanese, Korean, and Vietnamese.

  • 2018 SLAM Shared Task Data

    Data for the 2018 Shared Task on Second Language Acquisition Modeling (SLAM). This corpus contains 7 million words produced by learners of English, Spanish, and French. It includes user demographics, morph-syntactic metadata, response times, and longitudinal errors for 6k+ users over 30 days.

  • Spaced Repetition Data

    Data used to develop our half-life regression (HLR) spaced repetition algorithm. This is a collection of 13 million user-word pairs for learners of several languages with a variety of language backgrounds. It includes practice recall rates, lag times between practices, and other morpho-lexical metadata.

Our Team

  • Burr Settles AI + Machine Learning
  • André Horie AI + Machine Learning
  • Bożena Pająk Learning + Curriculum
  • Erin Gustafson Data Science + Analytics
  • Chris Brust AI + Machine Learning
  • Cindy Berger Learning + Curriculum
  • Angela DiCostanzo Learning + Curriculum
  • Cindy Blanco Learning + Curriculum
  • Lisa Bromberg Learning + Curriculum
  • Jenna Lake AI + Machine Learning
  • Bill McDowell AI + Machine Learning
  • Lowell Reade UX Research
  • Klinton Bicknell AI + Machine Learning
  • Will Monroe AI + Machine Learning
  • Geoff LaFlair Assessment + Psychometrics
  • Hope Wilson Learning + Curriculum
  • Kevin Yancey AI + Machine Learning
  • Xiangying Jiang Learning + Curriculum
  • Jessica Becker Learning + Curriculum
  • Graham Arthur Data Science + Analytics
  • Stephen Mayhew AI + Machine Learning
  • Meredith McDermott UX Research
  • Andrew Runge AI + Machine Learning
  • Connor Brem AI + Machine Learning
  • Anna Savage UX Research
  • Emily Moline Learning + Curriculum
  • Ben Collier Data Science + Analytics
  • Elizabeth Strong Learning + Curriculum
  • Cory Wheeler Learning + Curriculum
  • Lauren Bilsky AI + Machine Learning
  • Emma Gibson Learning + Curriculum
  • James Leow Learning + Curriculum
  • Danchen Yang Learning + Curriculum
  • Isabel Deibel Learning + Curriculum
  • Elizabeth Onstwedder Learning + Curriculum
  • Kevin Lenzo AI + Machine Learning
  • Mancy Liao Assessment + Psychometrics
  • Nora Gordon Learning + Curriculum
  • Sharon Wilkinson Learning + Curriculum
  • Naveen Shankar Data Science + Analytics
  • Antony Kunnan Assessment + Psychometrics
  • Jackie Bialostozky Learning + Curriculum
  • Lucy Portnoff Data Science + Analytics
  • Ramsey Cardwell Assessment + Psychometrics
  • Alina von Davier Assessment + Psychometrics
  • Hannah Pileggi UX Research
  • Yigal Attali Assessment + Psychometrics
  • Anita Bowles Learning + Curriculum
  • Audrey Kittredge Learning + Curriculum
  • Ben Reuveni Learning + Curriculum
  • J.R. Lockwood Assessment + Psychometrics
  • Rich Forest Learning + Curriculum
  • Mark Lock Data Science + Analytics
  • Lisa Frumkes Learning + Curriculum
  • Will Belzak Assessment + Psychometrics