Author:Published: 5/15/2023 WIN
Howard University and Morgan State University have partnered with Georgetown University’s Massive Data Institute for an initiative that aims to make environmental data accessible and usable for a wider range of researchers. After launching a year and a half ago, the Environmental Impact Data Collaborative created an interactive platform that hosts more than 145 datasets alongside digital tools that make it faster and easier to combine and analyze them.
“People can use our platform for data science tools, using programming languages to transform the data, merge the data and analyze and visualize the data,” said Michael Bailey, the collaborative’s director and a government professor at Georgetown. “But we really are focused on having impact—we want to avoid just having data for data’s sake.”
To that end, the collaborative supports more than 10 different data science projects examining environmental justice and climate change. Researchers at Howard lead five of those projects, which focus on environmental justice issues related to air pollution, health, transportation and homelessness.
Dr. Legand Burge, a Howard computer science professor, currently works on a project that collects and organizes community-level air quality data in D.C. and Baltimore. He also serves as the coordinator for Howard’s partnership within the Environmental Impact Data Collaborative.
“Georgetown is one of those sites where you can actually get access to census data and all these various kinds of data, and they’re responsible for managing and governing it, making it accessible to folks,” Burge said. “What [Howard] brings to the table I think is the fact that a lot of our projects are looking at marginalized communities dealing with underrepresented populations.”
Part of Burge’s project involves collecting air quality data from residential properties, and his team has been experimenting with new ways to give individuals more control and ownership of the data they share with researchers. Burge said that Howard’s and Morgan State’s partnerships within the collaborative can make it easier to reach vulnerable communities that may otherwise feel more reluctant to engage with academic institutions.
“If anyone wants to do research and they want to get real, real-time data, especially from vulnerable communities, there is a level of trust that needs to be established,” he said. “Working with churches or local organizations that are grassroots organizations already in the community is the best way to go.”
Tackling the problem with big datasets
Huge databases are often unwieldy and difficult to work with. Moreover, different sources organize information differently, and even within a single dataset, inconsistencies can significantly slow down the research process. Data analysts sometimes cite an “80/20 rule”—80% of their time is spent getting information cleaned up and prepared and 20% is spent on actual analysis.
Bailey said that the platform created by EIDC will speed things up and widen the range of people who can do meaningful analysis of environmental data beyond those with specialized data skills.
“For high-end users, we might save hours or a couple of days, but then, for middle or more novice users, we could save months,” he said. “A high-end user who’s a sophisticated data scientist with experience in the environmental field could do a lot of this without us. We’re trying to expand that set of folks.”
At Morgan State, the collaborative supports a small team of computer scientists developing machine learning processes that would assist with pulling data from the internet and making it usable for researchers. Dr. Paul Wang, the university’s chair of computer science, said that making more information both available and usable is a key part of solving climate and environmental justice issues.
“How are you going to meet a goal without knowing where the key areas to address are?” Wang said.
In addition to the three universities involved, the collaborative includes five other entities spanning the private, public and nonprofit sectors. Some partnered with university researchers and students to produce specific projects and others helped provide new data that hadn’t previously been made public, according to an annual report on the collaborative released by the Massive Data Institute earlier this year.
Most of the data housed within the Environmental Impact Data Collaborative’s platform is public information from sources such as the federal government’s Justice 40 initiative, EPA data on air, water and landfill toxics, and weather data. The collection includes local, state and national datasets, but currently does not have international data.
Funding for the initiative comes from the Bezos Earth Fund, which Amazon founder Jeff Bezos launched in 2020 to support projects addressing climate change and nature. The Massive Data Institute at Georgetown’s school of public policy received a $3.2 million grant from the fund in 2021 in order to launch the Collaborative.