Big data’s biggest challenge: climate change

June 23, 2014, 2:36 PM UTC
Climate Central's prediction for how lower Manhattan will look in the year 2100 due to rising sea levels.
Image courtesy Climate Central

Global sea levels are about eight inches higher today than they were in 1880, and they are expected to rise another two to seven feet during this century. At the same time, some 5 million people in the U.S. live in 2.6 million coastal homes situated less than 4 feet above high tide.

Do the math: Climate change is a problem, whatever its cause.

The problem? Actually making those complex calculations is an extremely challenging proposition. To understand the impact of climate change at the local level, you’ll need more than back-of-the-napkin mathematics.

You’ll need big data technology.

Surging Seas is an interactive map and tool developed by the nonprofit Climate Central that shows in graphic detail the threats from sea-level rise and storm surges to all of the 3,000-plus coastal towns, cities, counties and states in the continental United States. With detail down to neighborhood scale—search for a specific location or zoom down as necessary—the tool matches areas with flooding risk timelines and provides links to fact sheets, data downloads, action plans, embeddable widgets, and other items.

It’s the kind of number-crunching that was all but impossible only a few years ago.

‘Just as powerful, just as big’

“Our strategy is to tell people about their climate locally in ways they can understand, and the only way to do that is with big data analysis,” said Richard Wiles, vice president for strategic communications and director of research with Climate Central. “Big data allows you to say simple, clear things.”

There are actually two types of big data in use today to help understand and deal with climate change, Wiles said. The first is relatively recently collected data that is so voluminous and complex that it couldn’t be effectively manipulated before, such as NASA images of heat over cities, Wiles said. This kind of data “literally was too big to handle not that long ago,” he said, “but now you can handle it on a regular computer.”

The second type of big data is older datasets that may be less-than-reliable. This data “was always kind of there,” Wiles said, such as historic temperature trends in the United States. That kind of dataset is not overly complex, but it can be fraught with gaps and errors. “A guy in Oklahoma may have broken his thermometer back in 1936,” Wiles said, meaning that there could be no measurements at all for two months of that year.

Address those issues, and existing data can be “just as powerful, just as big,” Wiles said. “It makes it possible to make the story very local.”

Climate Central imports data from historical government records to produce highly localized graphics for about 150 local TV weather forecasters across the U.S., illustrating climate change in each station’s particular area. For example, “Junes in Toledo are getting hotter,” Wiles said. “We use these data all the time to try to localize the climate change story so people can understand it.”

‘One million hours of computation’

Though the Climate Central map is an effective tool for illustrating the problem of rising sea levels, big data technology is also helping researchers model, analyze, and predict the effects of climate change.

“Our goal is to turbo-charge the best science on massive data to create novel insights and drive action,” said Rebecca Moore, engineering manager for Google Earth Engine. Google Earth Engine aims to bring together the world’s satellite imagery—trillions of scientific measurements dating back almost 40 years—and make it available online along with tools for researchers.

Global deforestation, for example, “is a significant contributor to climate change, and until recently you could not find a detailed current map of the state of the world’s forests anywhere,” Moore said. That changed last November when Science magazine published the first high-resolution maps of global forest change from 2000 to 2012, powered by Google Earth Engine.

“We ran forest-mapping algorithms developed by Professor Matt Hansen of University of Maryland on almost 700,000 Landsat satellite images—a total of 20 trillion pixels,” she said. “It required more than one million hours of computation, but because we ran the analysis on 10,000 computers in parallel, Earth Engine was able to produce the results in a matter of days.”

On a single computer, that analysis would have taken more than 15 years. Anyone in the world can view the resulting interactive global map on a PC or mobile device.

‘We have sensors everywhere’

Rapidly propelling such developments, meanwhile, is the fact that data is being collected today on a larger scale than ever before.

“Big data in climate first means that we have sensors everywhere: in space, looking down via remote sensing satellites, and on the ground,” said Kirk Borne, a data scientist and professor at George Mason University. Those sensors are continually recording information about weather, land use, vegetation, oceans, ice cover, precipitation, drought, water quality, and many more variables, he said. They are also tracking correlations between datasets: biodiversity changes, invasive species, and at-risk species, for example.

Two large monitoring projects of this kind are NEON—the National Ecological Observatory Network—and OOI, the Ocean Observatories Initiative.

“All of these sensors also deliver a vast increase in the rate and the number of climate-related parameters that we are now measuring, monitoring, and tracking,” Borne said. “These data give us increasingly deeper and broader coverage of climate change, both temporally and geospatially.”

Climate change is one of the largest examples of scientific modeling and simulation, Borne said. Efforts are focused not on tomorrow’s weather but on decades and centuries into the future.

“Huge climate simulations are now run daily, if not more frequently,” he said. These simulations have increasingly higher horizontal spatial resolution—hundreds of kilometers, versus tens of kilometers in older simulations; higher vertical resolution, referring to the number of atmospheric layers that can be modeled; and higher temporal resolution—zeroing in on minutes or hours as opposed to days or weeks, he added.

The output of each daily simulation amounts to petabytes of data and requires an assortment of tools for storing, processing, analyzing, visualizing, and mining.

‘All models are wrong, but some are useful’

Interpreting climate change data may be the most challenging part.

“When working with big data, it is easy to create a model that explains the correlations that we discover in our data,” Borne said. “But we need to remember that correlation does not imply causation, and so we need to apply systematic scientific methodology.”

It’s also important to heed the maxim that “all models are wrong, but some are useful,” Borne said, quoting statistician George Box. “This is especially critical for numerical computer simulations, where there are so many assumptions and ‘parameterizations of our ignorance.’

“What fixes that problem—and also addresses Box’s warning—is data assimilation,” Borne said, referring to the process by which “we incorporate the latest and greatest observational data into the current model of a real system in order to correct, adjust, and validate. Big data play a vital and essential role in climate prediction science by providing corrective actions through ongoing data assimilation.”

‘We are in a data revolution’

Earlier this year, the Obama administration launched with more than 100 curated, high-quality data sets, Web services, and tools that can be used by anyone to help prepare for the effects of climate change. At the same time, NASA invited citizens to help find solutions to the coastal flooding challenge at an April mass-collaboration event.

More recently, UN Global Pulse launched a Big Data Climate Challenge to crowdsource projects that use big data to address the economic dimensions of climate change.

“We’ve already received submissions from 20 countries in energy, smart cities, forestry and agriculture,” said Miguel Luengo-Oroz, chief scientist for Global Pulse, which focuses on relief and development efforts around the world. “We also hope to see submissions from fields such as architecture, green data centers, risk management and material sciences.”

Big data can allow for more efficient responses to emerging crises, distributed access to knowledge, and greater understanding of the effects personal and policy decisions have on the planet’s climate, Luengo-Oroz added.

“But it’s not the data that will save us,” he said. “It’s the analysis and usage of the data that can help us make better decisions for climate action. Just like with climate change, it is no longer a question of, ‘is this happening?’ We are in a data revolution.”