Data and Technology for Fact-checking (2018-2019)


Today, our society is struggling with an unprecedented amount of falsehoods, hyperboles and half-truths that do harm to democracy, health, economy and national security. Fact-checking is a vital tool for defending against this onslaught.

Despite the rise of fact-checking efforts globally, fact-checkers find themselves increasingly overwhelmed and find it difficult to reach some segments of the public with their messages.

Project Description

This Bass Connections project seeks to leverage the power of data and computing to help make fact-checking and dissemination of fact-checks to the public more effective, scalable and sustainable. Building on the work of a Data+ team, the project team will build databases, systems and apps to achieve the following goals:

  • Make fact-checkers more effective. By monitoring media and data sources and aggregating public interest, the team aims to identify important, check-worthy claims automatically and in real-time. This feed will decrease fact-checkers’ response time and guard against any potential bias (or perception thereof) in selecting what to fact-check.
  • Help media consumers identify misinformation and disinformation faster, and make them feel like stakeholders in fact-checking. The team will make it easier for people to search for claims and get alerted automatically as soon as they are exposed to misinformation. Usage data and feedback will in turn help identify check-worthy claims and diversify the coverage of fact-checking.
  • Gain experience and learn lessons on building a sustainable, collaborative and inclusive ecosystem for fact-checking in the long run. Team members will design an open data and system infrastructure and smart algorithms, as well as best practices that will facilitate sharing and reuse of fact-checking efforts in the future.

Anticipated Outcomes

System that provides fact-checkers with live feeds of check-worthy claims automatically mined from various sources as well as public interest; apps and/or websites with features that help fact-checks reach customers; open, collaborative data and infrastructure for technologists and journalists to collaborate on fact-checking

Student Opportunities

Students will work closely with faculty and graduate student mentors, and interact with collaborators at other universities and Google. They will acquire relevant computing techniques and apply them to building databases and software in a highly collaborative setting.

The team will consist of 5-6 undergraduate researchers and two Ph.D. students (at least one from Computer Science; the other may be recruited from related disciplines such as Electrical & Computer Engineering, Statistics, Public Policy or Political Science). Both graduate students will serve as project managers and mentors; depending on expertise and commitment, they may also lead system development.

Optional related courses include Computer Science 216: Everything Data; Computer Science 316: Introduction to Databases; and relevant courses in Public Policy (e.g., 371: News as Moral Battleground)

The team will have weekly project meetings as well as monthly all-hands meetings. The team will be further broken into groups by tasks (with possible overlap in membership): data and infrastructure, which focuses on data wrangling, information extraction, text analytics, databases, system scalability issues; search, which focuses on the problem of search the database of fact-checks, drawing techniques from natural language processing, information retrieval, machine learning; and app, which focuses on tools, websites, or apps for fact-checkers as well as end-users.

Evaluation of individual tasks will be done in ways specific to the problems they address. For example, team leaders have curated a claim-matching benchmark for training and evaluating search algorithms. Feedbacks on first releases will also be used for evaluation.

Duke undergraduates and graduate students can apply for this project team beginning on January 24. The priority deadline is February 16 at 5:00 p.m.

Students may be interested in a related Data+ summer project, Data and Technology for Fact-checking (May 29 – August 3, 2018); that application is open until February 24, but Data+ applications are evaluated on a rolling basis.


Fall 2018 – Spring 2019  

  • Fall 2018: Build on development work of Data+ team; release live feeds of check-worthy claims, app/website with enhanced search quality and pop-up fact-checking
  • Spring 2019: Conduct re-evaluation and further enhancements; improve algorithms and systems as needed, develop additional features, such as aggregating user search requests to identify new check-worthy claims, and a subscription service to notify users as soon as a previously searched claim is checked; prepare for official release and final report


Independent study credit available for fall and spring semesters

Faculty/Staff Team Members

William Adair, Sanford School of Public Policy-DeWitt Wallace Center for Media and Democracy*
Pankaj Agarwal, Arts & Sciences-Computer Science*
Jun Yang, Arts & Sciences-Computer Science*

Community Team Members

James T. Hamilton, Stanford University
Chengkai Li, University of Texas, Arlington
Cong Yu, Google Research

* denotes team leader


