“The tribe has grown,” Robert Kirkpatrick notes, looking out at the audience.
Robert’s the Director of UN Global Pulse. There’s been a lot of data in the crisis mapping community for some time, but it required pro-active collection. Now, we can observe in real time and on massive scale what people are already sharing with each other. And perhaps even more transformative is the “digital exhaust”, the data we generate just by using services around the world. We interact not just with each other, but with businesses and maps and search. The private sector has spent trillions of dollars building the cloud, and now we have human sensor networks that map immediately to human needs. We can passively observe collective human behavior in realtime.
This is not the world in which the United Nations was founded. Everything moves faster now. The pace of change has absolutely exploded, and I don’t think it’s ever going to slow down. Realtime isn’t just faster; it’s fundamentally different, and requires different rules.
2011 was a pretty significant year for data. More data was created last year than all of human existence before. Facebook hit a billion people last week. If social networks were countries, they’d be the top six populous countries in the world.
Mobiles, cloud, and social combine to produce ever larger amounts of data.
Jakarta is the tweeting-est city on earth. When we map the tweets, it’s clear that Indonesians spend a lot of time sitting in traffic. They take tons of photos, too.
The point of all this data?
1. Better early warning to detect trends, anomalies, and allow earlier response
2. Real-time awareness, with a better picture of needs supporting more effective planning and implementation
3. Real-time feedback to understand sooner where needs are changing, or not being met, to allow for more rapid iteration
The point of real-time data is that we get the information in time to do something with it. The feedback loop allows us to intervene and change the story before it’s over. Months-old nutrition data does little to help hungry people.
There are many research areas to explore, from preparedness to migration to climate change adaptation. Huge populations are moving to urban slums. Program Monitoring and Evaluation keeps coming up — can we get faster feedback on whether our programs are having the desired effect, rather than wait until the 3 year program cycle is over? And can we use some of these data techniques to approximate statistics? Mobile providers have modeled phone usage and can project, with 80% accuracy, a user’s gender and age.
As an example, let’s look at how we can use mobile networks as drought sensors in the Sahel.
Problems, we’ve got
Privacy, validation, access to data, and that most existential of questions: does it actually change anything? If the organization responsible doesn’t act, does it matter if we have all of this data and its insights?
Big data is a huge human rights issue, but it’s a source of tremendous risk, as a raw public good. Three draft guidelines:
- Never analyze personally-identifiable data
- Never analyze confidential data
- Never seek to re-identify individuals
Rob finds a space between Mark Zuckerberg and the nation of Germany in the spectrum of whether data privacy is pointless or sacrosanct. It’s a public good, but it comes with risks.
User-generated data poses additional challenges. People lie. Dark areas on a social network map don’t represent areas not worth tweeting about; our sensor network is unevenly distributed.
NLP isn’t great at sentiment, particularly sarcasm or irony. A tweet can contain gigabytes of context instantly apparent to a brain with hundreds of thousands of neurons, and nothing to a computer.
Behavioral data has selection bias from the start.
Media coverage itself drives behavior change, so it’s tricky to measure cause and effect.
Apophenia: We sometimes think we see trends where there are none. Correlation is not causality.
This crowd data is faster. It’s not a replacement for the hard evidence and existing methods. But its speed can change outcomes.
Telescopes and macroscopes each have their own issues. 96% of the universe is dark energy or dark matter, which doesn’t reflect light. Likewise, most of our data is behind corporate firewalls or is otherwise unshareable. So much of the information about people is not available to the people charged with representing them and ensuring they are healthy. Can we find a way to share data that doesn’t compromise business or privacy?
The future of the human race
This isn’t just about corporate social responsibility. If we could get the private sector to engage in sharing some of this data in a way that doesn’t produce business risk… Many business recognize that this volatile world is a terrible climate for business. We need a global real-time public/private data commons. Which models, tools, and technologies would allow a data commons to work for everyone?
“We think a data commons is the only way we are going to survive as a species for the next 100 years.”
Getting there will require safe space for experimentation. Many attempts will not work, but need to be tried.
Pulse Lab Jakarta is a shared facility with data scientists, research fellows, and private companies with tools and data to share. They’re actively recruiting partners to experiment in the most social-media-rich area of the world.
Additional labs will follow. Kampala, in east Africa, is less of a social media hub, but offers extraordinary opportunities for financial data. Makassar, in South Sulawesi, and Medan, in North Sumatera, are also regions of focus.
Issues of interest include food prices, urban poverty, and financial behavior.
The UN is a platform, which requires partners. They partner with governments to establish Pulse Labs, conduct research, and make it available to the world.
Their approach uses different data sources for various purposes. Twitter’s great for food prices, and useless for fears about job security. They correlate this information with official data sources, and visualize trends and map the results.
- ForSight gets a shoutout for monitoring conversations based on keywords. The global soybean shortage leads to a big spike in conversations about tofu and tempeh.
- Global Pulse also uses SAS’s Social Media Analytics and Text Miner tools
- They’re using linear regression analysis of social media conversations to determine if they can predict inflation of certain types of food. When prices go up, people tweet about that food. Except eggs. Indonesians tweet more about eggs as the price drops (“Mmm…omelet”).
Does this finding apply elsewhere? We’re going to need to map what these findings mean, and where they apply.
Our social media signals are getting stronger. People talk about basic needs using social media more than they used to. And the temporal correlation is growing stronger, too. Greater distribution of the human sensor network improves our results.
Our weather tools are pretty sophisticated because they pull from a large number of weather stations AND algorithmic detectors. Could we do the same for the monitoring of human networks?
This community is at an inflection point. The data’s everywhere, the tools are taking off, and the future’s exciting.