Creating Artificial Worlds with AI to Improve Energy Access Data (2021-2022)


Critical information for energy access decision-making and electricity system planning is not universally available, including information on village-level electricity access and reliability as well as the location and characteristics of power system infrastructure. Decision-makers require information to determine the optimal strategies for deploying energy resources to decide where to prioritize development and how electrification should be accomplished.

Previous work has shown that these information gaps can be filled by applying machine learning techniques to satellite and aerial imagery. However, there are major challenges that remain with scaling these techniques to apply across large data of varying geographies and many categories of objects. To be effective, more robust techniques need to be developed to assess infrastructure relevant for sustainable transitions.

Project Description

This project is a continuation of the 2020-2021 team. In past projects, team members have begun to develop deep learning models that can detect energy infrastructure in satellite imagery. However, these techniques struggle when they are applied in settings different from the training data, such as different geographies, sensor modalities and season of data collection.

Building on previous teams’ work, this year’s team will use 3D modeling to represent energy infrastructure in satellite imagery, expanding the methods for generating synthetic imagery to include generative models. Team members will investigate techniques for creating realistic training data with even less information by creating synthetically generated data. The team will create a labeled synthetic energy infrastructure remote sensing dataset generation tool and dataset.

The goals of this project are to: 

  1. Apply synthetic data generation techniques including generative models and style transfer for creating synthetic overhead imagery
  2. Compare these techniques with traditional, high-effort techniques for training data collection and recent 3D modeling approaches
  3. Share these tools and any generated data in an open-source repository to encourage use by other researchers and decision makers 

Learn more about this project team by viewing the team's video.

Anticipated Outputs

Dataset of geographically diverse synthetic overhead imagery; tool shared on GitHub for synthetic generation of overhead imagery; project website and final presentation; conference or journal paper 

Student Opportunities

Ideally, this project team will include 2 graduate students and 6 undergraduate students from disciplines such as engineering, computer science, economics, public policy, environmental science, mathematics and statistics. Students with some experience in quantitative methods are preferable. Experience with programming is required, and experience with Python programming is beneficial. 

Students will participate in a formalized research process including problem definition, literature review, research project design, analysis and interpretation. Team members will engage in team-based problem solving; learn how to convey technical and nontechnical information; describe the relationship between access to electricity, economic well-being, human health, land use and environmental impacts; and explain the fundamentals of machine learning and deep learning for computer vision. Students will also have the chance to apply advanced data analytics tools to energy data, including computational tools such as the Python programming language and deep learning algorithms. 

Graduate students will take on a leadership role as project managers. Ideally, these students will have excellent management skills and preferably strong technical skills or a willingness to get up-to-speed rapidly with respect to technical skills. The project managers will attend all team meetings and be responsible for the day-to-day work of the project. They will help guide and support team members’ needs, identify topics that the team could use help with and keep the team on track with deadlines. They will also help determining the project’s research direction.


Fall 2021 – Spring 2022

  • Fall 2021: Complete intensive short course on data analytics, energy systems, energy access, research techniques and project management; compete with other students in Kaggle machine learning competition; develop goals and roles
  • Spring 2022: Execute project; meet with guest speakers from industry and academia; refine research results; create presentations and posters


Academic credit available for fall and spring semesters; summer funding available

See earlier related team, Deep Learning for Rare Energy Infrastructure in Satellite Imagery (2020-2021) and Data+ summer project, Creating Artificial Worlds with AI to Improve Energy Access Data (2021).


Image: Wind farm, by pfly, licensed under CC BY 2.0

Wind farm.

Team Leaders

  • Kyle Bradbury, Pratt School of Engineering-Electrical & Computer Engineering|Energy Initiative
  • Jordan Malof, Pratt School of Engineering-Electrical & Computer Engineering

/graduate Team Members

  • Saksham Jain, Electrical/Computer Engg-MS
  • Boya Sun, Interdisciplinary Data Science - Masters
  • Katherine Wu, Master of Engineering Mgmt-MEG

/undergraduate Team Members

  • Madeleine Jones
  • Caleb Kornfein, Statistical Science (BS), Computer Science (BS2)
  • Aya Lahlou
  • Yuxi Long
  • Madeline Rubin
  • Caroline Tang, Mathematics (BS), Statistical Science (BS2)
  • Frank Willard
  • Yucheng Zhang

/yfaculty/staff Team Members

  • Leslie Collins, Pratt School of Engineering-Electrical & Computer Engineering
  • T. Robert Fetter, Nicholas Institute for Environmental Policy Solutions
  • Marc Jeuland, Sanford School of Public Policy
  • Luana Lima, Nicholas School of the Environment-Environmental Sciences and Policy
  • Robyn Meeks, Sanford School of Public Policy