Why Use Private Data for Public Good

I wrote a piece for Harvard Business Review about data philanthropy, where private corporations donate or otherwise share valuable data with public partners like local government and non-profits. This piece introduces the idea, makes the business case, and begins to explore how an internal champion might go about executing such a project.

Fortunately, the post went live the very same day that John and I attended UN Global Pulse’s excellent Responsible Data Forum on Private Sector Data Sharing (organized with the Data & Society Research Institute and the Rockefeller Foundation). The attendees represented an incredible range and depth of experience in this nascent field. Together we began drafting additional resources, like a road map showing how to commit data philanthropy, and a starter kit. I’ll share these as soon as they’re ready (or sooner, if you’re interested in helping to shape them). Continue reading Why Use Private Data for Public Good

Participatory Aid Marketplace: Designing Online Channels for Digital Humanitarians

(a summary of my MIT Media Lab Master’s thesis)

Unlike my thesis readers, who may or may not have made it through all 244 pages, you get to experience the condensed version. The full PDF is here, if you’re into reading and citations.

Participatory Aid
People are using information and communication technologies (like the internet) to help each other in times of crisis (natural or man-made). This trend is the evolution of a concept known as “mutual aid”, introduced by Russian polymath Peter Kropotkin in 1902 in his argument that our natural sociable inclinations towards cooperation and mutual support are underserved by capitalism’s exclusive focus on the self-interested individual. My own reaction is to the bureaucracy’s underserving of informal and public-led solutions.

The practice of mutual aid has been greatly accelerated and extended by the internet’s global reach. I introduce the term “participatory aid” to describe the new reality where people all over the planet can participate in providing aid in various forms to their fellow humans. In many of these cases, that aid is mediated at least partially by technology, rather than exclusively by formal aid groups.

Formal aid groups like the UN and Red Cross are facing disintermediation not entirely unlike we’ve seen in the music, travel, and news industries. Members of the public are increasingly turning towards direct sources in crises rather than large, bureaucratic intermediaries. Information is increasingly likely to originate from people on the ground in those places rather than news companies, and there is a rich and growing number of ways to help, as well.

You are more than your bank account

Holmes Wilson, internet activism, and why we need you

(originally posted on Civic MIT)

Fight For the Future is known for its massive viral organizing campaigns that changed Internet history both nationally and globally. Faced with the passage of Stop Online Piracy Act/SOPA and the Protect-IP Act/PIPA — legislation that would have jeopardized the open Internet as we know it — Fight for the Future organized the largest and most visible online protest in history. Holmes Wilson has also co-founded Miro, OpenCongress, and Amara. He’s been at the forefront of a range of open internet and participatory culture projects and campaigns.

Holmes Wilson (foreground) and Dalek (background)

The internet delivers newfound powers of expression
The key thing about the internet that drives Holmes’s passion for it is that it gives us a new power, which ultimately translates to freedom of expression. But not in the conventional sense. The freedom of expression the internet enables isn’t just about speaking. It’s about making art, starting a business, overthrowing a government, building a new government, realizing dreams, and the ability to give your greatest gift to the world. When we think about expression this way, it’d be unthinkable to fail in preserving this medium. It would stifle human potential.

But that power is inherently fragile
The internet is fragile. The power that is being given to people is not necessarily stable and there are significant threats to it. The most present threat of the recent year is SOPA/PIPA. In some ways, these seemed like very small reasonable changes to the law. There’s a law that says sites aren’t responsible for content that users generate and this would make site owners responsible for users’ content. The consequence would have been that any copyright holder could have taken down any site where their content appeared and any site that is built with user generated content would have to aggressively police user behavior and contribution. Most of the harm would have been invisible. If SOPA was in effect when YouTube was first invented, we wouldn’t have YouTube.



Talking Fast II: More CrisisMapper Ignite Sessions

Luis Capelo (@luiscape) of Digital Humanitarian Network loves volunteers. DH exists to stimulate more interaction between humanitarian volunteers and large humanitarian institutions.

There’s information overload in humanitarian responses. How do we collect and make sense of all this information? Luis credits humanitarian orgs with doing the hard work of adapting, but it’s a rough sea to navigate. Volunteer & Technical Communities thrive in this environment. They’re nimble, lightweight, and advanced, technically. Luis thinks its time to stop questioning whether VT&Cs can help, and begin to dive into how these groups can collaborate.

DH aims to create a consortium of groups that faciliates between the two worlds, and reduces the cost of collaboration
They have a simplified activation process: activate volunteers, triage the volume, and forward them to VT&Cs. They’ve produced a guide to manage the activation of VT&Cs.

July and August of this year saw the first two activations. OCHA and ACAPS came to the DH network for help. OCHA wanted to build a pre-crisis profile of every country. ACAPS wanted to include VT&Cs in the formal assessment process.

Join at

Cat Graham (@Peaceful_intent) of Humanity Road works in multinational crisismapping. They specialize in the first hours of an event (12, 24, 48 hour windows). Self-directed work teams with training and a mission come online.

Their forthcoming QuickNets microtasking platform is open source, free, and will stay that way. Each row of data has an ‘anonymize’ button. Its been tested at RIMPAC and Pacific Endeavor exercises. 20 volunteers from 8 nations stepped up to model communications for the tabletop exercise.

Ka-Ping Yee (@zestyping) is an engineer at Google’s Crisis Response team. He uses Stratomap, the open-source tool behind Google Crisis Map. It’s on Google Code, and there’s a hosted version available. There are important datasets for crisis response, but they’re all hosted on different websites, so it’s difficult to get them in front of the right decisionmakers. Some of these maps have crappy UI, or don’t allow the databases to be combined. Data publishers have tools to publish, but map curators could offer even greater value if they were able to mashup maps and databases between various providers. We gain great insight when we can synthesize various pieces of information.

The Google Crisis Map provides a range of useful layers, from traffic to weather to user-submitted YouTube videos. Your map mashup can point to live feeds around the web, and it will be updated in realtime as those data sources are updated.

Users can share a customized view of the map, with layers

Brian Root (@brian_root) of Human Rights Watch shows us a US map depicting ICE’s deportation patterns. The group produced a report on the human rights implications of the US Immigration department’s detainment and deportation policies. They needed data to show ICE’s movements, but only ICE had the data they wanted. Through Freedom of Information requests, they were able to procure some data.

After cleaning the data, they were able to show the number of facilities involved, the facilities sending the most cases, and identify problem cases, where a detainee has been transferred numerous times across the country. They were able to visualize findings about the costs, human and financial, of transferring detainees.

But maps and data do not effective advocacy make. Drilling down to the state level was more useful with getting the attention of local media and local politicians. In January of this year, ICE issued a directive to limit the number of transfers, in large part due to HRW’s report.

Brian asks the audience to consider the human rights research that could be done with the mapping experience sitting in this room.

Clarence Wardell (@cwardell) led a research team at University of Arkansas following the Social Media and Emergency Management conference. One of the main concerns highlighted in their summary report was high-level resistance to use data because of verification. They took the strategy of conceding the “is it perfect?” argument up front, and instead arguing that the data was still nevertheless useful. There is room between horseshoes and hand grenades.

They mapped the verified and unverified data points together, creating the Traveling Salesman problem for disaster responders. Which relief tour is the optimal use of time and ground covered? They tested the multiple approaches.

Munish Puri (@RecordedFuture / site)time travels by looking at data over time. Hindsight + insight leads to foresight. Text has a predictive power when it is loaded with temporal references.

They looked at temper, time, and tone in the Georgian Conflict. Volume and velocity didn’t equal veracity. Licklider’s Intelligence Amplification.

Discovery consits of seeing what everybody has seen and thinking what nobody has thought. Albert Szent-Györgyi

Clionadh Raleigh (@acledinfo) is Director of Armed Conflict Location and Event Dataset (ACLED). They report on all violent events across Africa. Dates, locations, actor types, event types, and territory exchanges are collected to produce trend reports for others.

What’s happening in Africa?
We can use SpatialKey to detect and investigate patterns. The agents of violence have changed drastically in the last few years. Civil wars are declining, but political militias increasingly threaten civilians. We can follow the movements and attacks of groups like the LRA and Boko Haram. Boko Haram, for instance, never attacks troops.

We’re seeing more trans-national threats from Islamist groups. We see Ethiopia’s increasing violence over the last 15 years.

Analysis sees increased urbanization and other factors driving today’s violence. The data informs policy, academic, and public research. They offer special reports on topics like Islamist violence.

Steven Livingston (@ICTlivingston) introduces Mapping the Maps and Crowdglobe, an Internews platform to visualize geospatial data. Different map themes emerge in different regions. Western Europe sees crowdmaps used for entertainment and leisure and media reports.

They also surveyed crowdmappers, 80% of whom were men at an average age of 40 years. Many of the dead maps are simply a result of users experimenting with no intention of creating a map to begin with.

Only 6% of respondents promoted their maps using traditional media.

Jonne Catshoek is a conflict analyst working in the Republic of Georgia. In 2008, Russia and Georgia battled over South Ossetia. Conflict continues despite the international community’s involvement. Jonne blames security strategies that are not responsive enough to the needs of local communities. The elva platform (code here) they developed allows community representatives to SMS in community needs, where the information is then put online for the wide range of non-community actors.

Jonne estimates they spend 20% of their time developing software, and 80% of their time understanding local community need. This local trust allows better information sharing, and more reliable information.

They use SMS because smartphone penetration remains low. Trained monitors code a lot of information into a single SMS.

They’re expanding beyond violence into weather and agricultural information feeds for communities. The community is also heavily reporting security incidents, helping security providers respond appropriately.

Patrick Vinck (@developmentdata) is at the Harvard Humanitarian Initiative. They conduct surveys on countries’ peace and conflict, the way we have regular reports on health and other human factors. KoBo Toolbox is a data collection instrument designed to assist in this effort. It helps surveyors create their forms and export the questions. It even has recommendation engines, to suggest questions based on the topic being queried.

The same tool can collect voice, written text, images, and videos in one place. It works offline.

KoBo Map links to a spreadsheet (Google, CSV, etc.) and visualizes the data it contains. It’s lightweight for slow connections.

Taylor Owen (@taylor_owen) did a PhD thesis to map the historical US bombing of Cambodia. He quotes Nixon telling Kissinger to “crack the hell out of them” with an unlimited budget. He demands it. The result is over 200,000 sorties over an 8-year period. The 115,000 records detailing the planes, bombs, and tonnage remained secret until President Clinton opened the records to Vietnam for the purposes of de-mining the land.

Taylor used the data to produce timelines and fact-check the official timeline. The data can change our understanding of history. The bombings started sooner than we’ve said, lasted after the peace treaties, and hit civilian areas where Kissinger said we wouldn’t. We can see Watergate’s effect on bombs dropped in Cambodia.

We can see how the US bombing pushed the Khmer Rouge east, into the Vietcong territory, where they developed from agrarian socialist revolution to an anti-imperialist group.

It doesn’t take much to radicalize a population, Taylor says. In one instance, a single bomb in a village drove 70 radicalized recruits.

Henry Kissinger’s record of claims on this issue, including Kissinger’s Second Rule of Engagement (We won’t bomb within a mile of a village) turns out to be wildly incorrect once we map the bombing patterns. The US bombed heavily populated areas near Phnom Penh, for example.

Government data is often deleted, or at least classified for long periods of time. We need to work with the data when it does get released, so we can understand its historical implications.

Patrick Florance is in town from Tufts University to talk about the Open Geoportal (OGP). It’s an open source project at Tufts to rapidly discover, preview geospatial data. It’s a collaborative effort to take on big geospatial datasets.

You can shop around for datasets and drop them into your virtual shopping cart. You can incorporate third party web services and share the data all over the web. It works with common web mapping tools and will have over 20,000 data layers available by December.

Josh Campbell (disruptivegeo) is a geographer at the Department of State. He’s working to link together the US government’s purchasing power of commercially available satellite imagery, and the need of VTCs for this imagery.

In a few weeks, Haiti went from being barely mapped at all, to being mapped in such great detail so as to support on-the-ground action. State attributes this incredible development to the existing OSM community, empowered by web-service satellite imagery.

The US government buys a LOT of satellite imagery, and is contractually required to share it. In the Horn of Africa crisis, they experimented with letting volunteer mappers map refugee camps. 29 volunteers produced 50,000 nodes of data on a previously blank refugee camp map. The volunteers provided not just roads and streets, but also footways, paths, hydrologic features, and other rich data.

The project showed that people would map where there is imagery. But they’re interested in mapping the human elements, as well. The Red Cross hosted a mapping party, and went to Uganda to train locals to use OSM and fire responders. Locals annotated data with places names and restaurants. The local community received a map of the density of grass huts (a fire hazard).

John Crowley (@jcrowley) works at Camp Roberts to connect the top-down and bottom-up aid groups.

1. In law, agencies are having trouble navigating the policies that govern their use of crowd data
2. Trust in data, and trust in processes of VTCs
3. Security – the Arab Spring and Anonymous have shown that we can’t secure all the voices in a system
4. Voice – We could have a bigger collective capacity than we’ve ever had in human history. But what happens when that moves faster than our governments?

Agencies ask, how can we control this? Bad news is, you can’t. Good news is, we can begin to see how we can coordinate in this space. But we need to step outside of bureaucracies that only allow information to flow down.

We need space to fail, where it won’t disrupt actual operations. That is the purpose of Camp Roberts. Crashing is allowed. They bring together the players in the space to bridge capability gaps.

How do we repeat Haiti?
A range of actors not traditionally in the same space were brought together. An 18-month exercise brought them up against many legal and policy walls, but they were able to show the process worked.

FEMA came and asked about doing the same with tornadoes. Civil Air Patrol and many VTCs came together and designed a new workflow, which was used in Hurricane Isaac two weeks after it was created.

Can we scale this innovation process to all the other agencies around the world? It’s an Open Humanitarian Initiative.

This process of bringing people together into safe spaces requires combining the wisdom of the old with the innovation of the new. How can we bring together the human race to learn to heal itself?

Talking fast at CrisisMappers: the Ignite Talks


Dr. Jen Ziemke (co-founder of CrisisMappers) welcomes a room packed with a wide variety of professionals and volunteers. CrisisMappers started in 2009 as a network, designed to stay in touch
The group has grown to 5,000 members, organized on a Google Group and Ning network.

For the newbies in the room, what is CrisisMapping?
Jen breaks it down into the data coming in, the visualization of the data, and the response: how does it affect decisions on the ground?

Changing technology is clearly a primary driver of crisis mapping. Mobile technology and the ability to crowdsource shared experiences and visualize it on a map, or elsewhere, has enabled crisis mapping. Beyond mapping, this community quickly becomes a broader group of digital humanitarians, using technology to help communities affected by crisis.

Patrick Meier (the other co-founder of CrisisMappers) hops on stage. He’s currently at the Qatar Foundation Computing Research Institute, where they’re researching how to gather information from social media
In 2009, the field of crisismapping was just beginning to take shape, and it was easy to know every project and get to know all of the people behind the projects, often over drinks. The field has exploded, and matured, and the problems the field faces have grown more difficult.

The community met last year in Geneva to begin tackling some of the challenges. The computer security behind emerging humanitarian technologies was an obvious area of concern. John Crowley spearheaded this effort at the Camp Roberts event. Phoebe Winpope* has taken on issues of data privacy and security, working with professionals in that space.

Wendy Harman and her team at the American Red Cross have launched the Digital Operations Center, driving home the point that social media for disaster response is here to stay.

Andrej Verity has launched the Digital Humanitarians Network to facilitate the space between volunteer technologists and large humanitarian organizations.

Geeks WIthout Bounds and Digital Hacks of Kindness are also driving forward innovation in this space.

Disaster-affected communities are increasingly the source of big data. When Japan was struck by earthquake, tsunami, and nuclear disaster, a TON of data was shared online. Patrick argues that we need hybrid methodologies that combine crowdsourcing and the speed and scalability of advanced machine learning algorithms. We need to use multiple channels to listen to communities.

Verification remains an important challenge for aid organizations looking to make use of social media. Media organizations are actually leading the way in this department. The BBC has had a User Generated Content hub in London since 2007.

Monitoring and Evaluation are also important. Our perception of digital humanitarian technologies is high, but the real evidence supporting them is pretty thin. We need strong, independent evaluations of these technologies’ impact.

CrisisMappers 2013 will be held in Nairobi, Kenya (applause). This will be the fifth annual conference, and Jen and Patrick have decided to step down as organizers, and adopt OpenStreetMap’s model, where members of the network pitch to host and organize the annual conference.

Patrick is also inspired by TEDx, and hopes to see the CrisisMappers brand, logo, and website repurposed to support far more local events in this space.

Ignite talks!

Lin Wells (@STAR_TIDES) gives an overview of the STAR-TIDES network and its 1500 nodes. It is public-private and trans-national. They work post-disaster, post-war, impoverished, and short term and long term (disaster vs. refugees). They work in domestic or foreign situations, whether military is involved, or not. Shelter, water, power, cooking, lighting, sanitation, and ICT technologies.

Lin says technology alone is never enough. Building social networks and developing trust, as well as understanding how policy is adapted in the field, all matter.

TIDES hosts annual technology demos at Fort McMayer. They’re also present at Camp Roberts. The real world events they’ve supported include floods, wildfires, election monitoring, and more.

Jakob Rogstadius (@JakobRogstadius) introduces Crisis Tracker to crowdsource the curation of information like tweets during a crisis. Twitter poses a challenge, in that 140 characters is too short for computers to understand, but the rate of incoming tweets is too voluminous for humans to cope with.

The Crisis Tracker platform helps people sift through the noise and identify the novel information pieces in the stream, reduce the inflow rate, and enable volunteers to act on the actionable content. Similar messages are clustered, junk is filtered out. 30,000 tweets become 2,000 stories, and then 7 unique pieces of information per hour. The system looks at metadata like the timestamp, and then the crowd annotates the information with stories.

Why human involvement?
Humans can process text and images in ways computers cannot. Humans are also adaptable to rapid changes.

The system is up and running for the Syrian civil war. You can drill down into individual stories and see who shared it, links to multimedia content, and similar stories.

Volunteers have appreciated the platform’s ability to aggregate and filter. The system picks up stories and ranks and auto-sorts and filters the top stories. A 6-8 volunteer team can pull out the top items. With 40-60 volunteers, you can have a detailed log of the event at a very local level.

The project is free and open source.

Mona Chalabi (@MonaChalabi) brings us back to the challenges brought up by both slow and sudden-onset challenges. Crisis maps were unable to stop delays in aid distribution in Haiti. Have we reached a plateau in the use of GPS? FedEx uses RFID to track packages. It’s a barcode on steroids. A chip stores coded data, an antenna transfers the signal, and a computer accepts it for processing. Many subway cards use this technology.

In Haiti, hundreds of containers arrived each day, and those that were recorded were tracked using manual processes like Excel spreadsheets, leaving plenty of room for error. RFID would be much more efficient at managing the supply chain. The chips can be re-used.

The effective distribution of resources in a crisis can be the difference between people being fed and people being tear-gassed by UN troops afraid of a large crowd. RFID also combats information asymmetry in the space because the information is automated rather than input by one group and verified by another.
Simple supply chain logistics remain a major challenge for aid agencies and governments in crisis.
Simple supply chain logistics remain a major challenge for aid agencies and governments in crisis.

The effective distribution of resources in a crisis can be the difference between people being fed and people being tear-gassed by UN troops afraid of a large crowd.

RFID also combats information asymmetry in the space because anyone can use the information.

Simple supply chain logistics remain a major challenge for governments in crisis, and make it more difficult for corrupt officials to interfere with the chain of supplies.

Nate Smith (@nas_smith) is with Development Seed and Mapbox. There are many factors in communicating the many complicated factors that go into a crisis situation.

Context is important, like the conditions in the Horn of Africa prior to the droughts.

The Sahel Food Crisis project was much more than mapping, Nate says. Getting access to the raw data and communicating it in an appropriate way was part of the challenge.

Workflow is a problem. How we access data, and do things with it, can slow things down. Fews Net maintains their data in PDFs, which slows down developers looking to do anything with it.

What are the right colors and pieces of information in a map to express information?

We must design for shareability. Tools must be equipped for people to use them. If you’re publishing a lot of maps, your website should have the map endpoint on your site, so others can take and use it.

MapBox is also looking at building large data browsers that will support applications. The Sahel Food Crisis site is open for collaboration on Github.

Richard Stronkman (@rstronkman) is founder of Twitcident, another service to listen to the voice of the community during incidents. So much information is produced, with 400 million tweets per day. It’s a lot of noise, but when there’s a crisis, the information rates go up.

Early warning to identify increased risks and potential incidents.
And when incidents occur, they do crisis management.

In the Netherlands, they’ve worked with police forces, event security company, and the Dutch railway operator. On Queen’s Day, the Utrecht police force asked them to produce a map of incidents in a real-time dashboard. The team was able to identify threats towards the Royal family, leading to police visits.

At summer carnaval, the team worked to intervene against false rumor propagation. They found rumors at an early stage, and helped the police publicly disprove rumors early in the process before the rumors took off.

The team was able to identify a lack of drinking water at a large scale water fight early in the event, and help organizers react.

The group publishes their findings as academic research and in the technical press.

Shadrock Roberts (@Shadrocker) works at USAID’s Geo Center. The USAID Development Credit Authority uses credit guarantees to encourage local banks to lend to underserved communities. They wanted to map their work to see how they could do it better. They had a beautiful dataset of 100,000 records over the program’s 12-year history, but the location information was all over the place.

The Geo Center team cleaned up the dataset with a hybrid human-computer method. An automatic process parsed the easy fixes, but they raised a lot of eyebrows when they suggested crowdsourcing the remainder. They built an application on The technical infrastructure was a hurdle, but so were the legal policies restraining how the agency could use the data. They cozied up to the lawyers and received help from VTCs to clarify what volunteers would do with the data.

Involving volunteers went beyond crowdsourced tasks. Volunteers began a much broader discussion about development and the Development Credit Authority. Social media mentions of the organization went up significantly. 300 volunteers took on 10,000 records in 16 hours with 85% accuracy.

The goal wasn’t just a map, but to make the data open for others. You can find it by Googling “USAID Crowdsourcing Transparency” or tweeting at @USAID_credit.

Jerri Husch is here to talk Action Intelligence. How do we make sense of the massive amounts of data. Like electrical outlets, we have competing standards and cultural influences in different places. Jerri argues that we need an adaptable standard. If we look at societies from afar, we might make the mistake of assuming everything’s objective. But the world doesn’t work that way. A cricket is a pest in one place, and a delicacy in another.

Action Intelligence allows us to manage and analyze multiple dimensions and link data. We need to know who’s doing what, where, when. We need the data immediately in a crisis, and we need it around for the long-term as legacy data.

They can link actors to one another, or link actors to actions.

The process goes like this:
1. Collect data, often with university teams, in standardized ways
2. Classify and code that data, by place and time, at micro or macro levels
3. Visualize the data. They use free, open source tools, which allows for an adaptable standard that works anywhere in the world.

Andrew Turner (@ajturner), formerly of GeoIQ, has joined ESRI as CTO of their Research & Development Center.

“Big Data” means different things to different people, but we know it’s huge. In a crisis, with very short, life-threatening situations, data can help. In Haiti, crowdsourced information was really good, but still needed someone on the ground to act on the information.

GIS analysis lets us count and look at geospatial analysis of information. We need to evolve and build learning algorithms, because our current techniques are pretty easily fooled. @Racerboy8 provided useful data from his house during the Colorado wildfires, but his house wasn’t actually in danger. He was just being helpful.

In the NYC Marathon last year, FEMA looked at sensors throughout the crowd to visualize the crowd’s movement over time and space.

We can model situations before they occur, and detect communities in advance of a crisis.

Anahi Ayala (@anahi_ayala) works for Internews, which supports local media across the globe to empower communities. Anahi’s at the Center for Innovation and Learning, looking at how to incorporate new technologies.

Merging crowdsourced data with official information has proven difficult because of the difficult of verification. In the Ukrainian elections, they’re collecting information not only from social media, but also trained electoral monitors and journalists. Users can see verified reports vs. untrusted sources on a map.

The team dissects all of the information that comes in in an attempt to verify. They look at the context, the content, and the source of the information. There are digital traces everywhere online. Who are you already friends with, followed by, directly engaging with?

The content itself can be verified. We can crowdsource, triangulate, follow up with the source, and look at the weather in the video you submitted.

Every event occurs in a context in a country. Reports can be verified based on knowledge of the existing situation in a place and time.

Everyone’s adopting their own verification methods. Yes, falsification of information is always possible. But so is verification. Machine learning makes verification faster and cheaper for organizations. The question today is not whether or not you can verify information, but how to make it of high enough quality and timely enough to be acted upon.

Kuo-Yu Slayer Chuang (@darkensiva) goes by Slayer. He brings us back to the Titanic, which sank in a time when SOS technology wasn’t standardized. Open GeoSMS is a standard that combines SMS and location. Smartphones can embed all sorts of geo information in messages. They are designing a user-centric application to make collaboration easier.

Lars Peter Nissen (@ACAPSproject)
How do we make sure the data we collect actually becomes useful for decisionmakers?
Mistakes happen in large-scale, multi-agency responses. Potential impact is stymied.
In Haiti, we only used 1/3 of the data gathered. That’s a waste of the effort exerted collecting that data.

Disasters are never what we expect. Decisions are made when they have to be made, not at some ideal point in time. And we’ll always have massive information gaps, with plenty of known unknowns and unknown unknowns.

Three principles:
1. Know what you need to know. It sounds obvious, but do you?
2. Make sense, not data. Don’t collect data if you don’t know what you’ll use it for.
3. Don’t be precisely right, be approximately right.

With Internews, they designed the GEO, Global Emergency Overview. A snapshot gives you a quick overview of what’s happening, globally. Short summaries provide a basic understanding of specific crises. Then, you can drill down into 20-page analyses that help you discriminate between different types of needs in the fields.

Sara Farmer (@bodaceacat) is a core team member at Standby Task Force. It’s 2 years old with 1,000 volunteers. They’re generally known for turning social media feeds into maps. But they’re not just about information; they’re also about knowledge and analysis. They support HXL and other standards.

Their Disaster Needs Analysis (DNA) reports provide information about locales before disasters occur. Teams investigated and mapped available data for countries. They created baseline indicator sets, and set up a workflow to collect, store, and distribute the data. A fleet of scrapers converted online data into machine-readable tables. The data was cleaned into standard formats for country names, dates, and geo references. Gaps are filled in with estimates and proxies. Expert (Hunchworks) also help fill in the data.

Phil Harris (@geofeedia) sees every demographic using social media more and more. It’s not just Facebook and Twitter – there are image-rich services that didn’t exist two years ago. Smartphones drive more user generated content.

They set geofences on London and monitored the Olympics, aggregating 175,000 posts from YouTube, Twitter, Flickr, Picasa, and Instagram. Instagram’s a surprisingly large source of posts (36%). Only 31% of the posts contain keywords like London or Olympics. The majority of these pots wouldn’t have been tracked by traditional keyword search methods. Geofeedia sells a service to monitor social media.

Colleen McCue (@geoeye) works in geospatial predictive analytics. Again, we can learn a lot from private sector marketers. Product positioning in the supermarket is critical. Location also matters when you’re talking about bad actors in the humanitarian world. The Lord’s Resistance Army has struck over the borders of several African nations. We can target them based on past behavior. And we can segment the population by crime types, like marketers. Looting, abduction, incidental homicides, and murders produce different geospatial patterns. Individual factors can influence violent crimes. Roads and porters are critical to a successful abduction. IDP camps and other population clusters are attractive to the LRA, just as banks are attractive to thieves.

“Distance from murders” turns out to be a major factor in instances of isolated murders, suggesting a key behavioral difference from incidental homicides that occur over the course of another crime. The group is compiling signature profiles of different behaviors, and using advanced analytics to produce actionable recommendations prior to events occurring.

Kalev Leetaru brings us back to 38,000 years ago, where we find the first written records. 550 years ago, we get the printing press. Today, we’re producing incredible amounts of information and written records as a species.

The history of conflict can teach us about the present. We can visualize NGO reports and the global media tone towards a nation like Egypt or a leader like Mubarak to see when they’ve lost global credibility. We can track geographic affinity for a leader like Osama bin Laden. We can map conflicts within a nation by various factors.

Dave Warner is “dangerously over-educated and works in a memo-free environment.” He wants to make smart people smarter and explode the dots on the map into more complicated pins that contain significantly more information. His maps look much more like something out of a strategy video game than GIS software.

Dave mapped an audience by their WIkipedia entries, and academics by geography across the country.

Social media as a giant human sensor network critical to the survival of the human race

“The tribe has grown,” Robert Kirkpatrick notes, looking out at the audience.

Robert’s the Director of UN Global Pulse. There’s been a lot of data in the crisis mapping community for some time, but it required pro-active collection. Now, we can observe in real time and on massive scale what people are already sharing with each other. And perhaps even more transformative is the “digital exhaust”, the data we generate just by using services around the world. We interact not just with each other, but with businesses and maps and search. The private sector has spent trillions of dollars building the cloud, and now we have human sensor networks that map immediately to human needs. We can passively observe collective human behavior in realtime.

This is not the world in which the United Nations was founded. Everything moves faster now. The pace of change has absolutely exploded, and I don’t think it’s ever going to slow down. Realtime isn’t just faster; it’s fundamentally different, and requires different rules.

2011 was a pretty significant year for data. More data was created last year than all of human existence before. Facebook hit a billion people last week. If social networks were countries, they’d be the top six populous countries in the world.

Mobiles, cloud, and social combine to produce ever larger amounts of data.

Jakarta is the tweeting-est city on earth. When we map the tweets, it’s clear that Indonesians spend a lot of time sitting in traffic. They take tons of photos, too.

The point of all this data?

1. Better early warning to detect trends, anomalies, and allow earlier response
2. Real-time awareness, with a better picture of needs supporting more effective planning and implementation
3. Real-time feedback to understand sooner where needs are changing, or not being met, to allow for more rapid iteration

The point of real-time data is that we get the information in time to do something with it. The feedback loop allows us to intervene and change the story before it’s over. Months-old nutrition data does little to help hungry people.

There are many research areas to explore, from preparedness to migration to climate change adaptation. Huge populations are moving to urban slums. Program Monitoring and Evaluation keeps coming up — can we get faster feedback on whether our programs are having the desired effect, rather than wait until the 3 year program cycle is over? And can we use some of these data techniques to approximate statistics? Mobile providers have modeled phone usage and can project, with 80% accuracy, a user’s gender and age.

As an example, let’s look at how we can use mobile networks as drought sensors in the Sahel.

Problems, we’ve got
Privacy, validation, access to data, and that most existential of questions: does it actually change anything? If the organization responsible doesn’t act, does it matter if we have all of this data and its insights?

Big data is a huge human rights issue, but it’s a source of tremendous risk, as a raw public good. Three draft guidelines:

  1. Never analyze personally-identifiable data
  2. Never analyze confidential data
  3. Never seek to re-identify individuals

Rob finds a space between Mark Zuckerberg and the nation of Germany in the spectrum of whether data privacy is pointless or sacrosanct. It’s a public good, but it comes with risks.

User-generated data poses additional challenges. People lie. Dark areas on a social network map don’t represent areas not worth tweeting about; our sensor network is unevenly distributed.

NLP isn’t great at sentiment, particularly sarcasm or irony. A tweet can contain gigabytes of context instantly apparent to a brain with hundreds of thousands of neurons, and nothing to a computer.

Behavioral data has selection bias from the start.

Media coverage itself drives behavior change, so it’s tricky to measure cause and effect.

Apophenia: We sometimes think we see trends where there are none. Correlation is not causality.

This crowd data is faster. It’s not a replacement for the hard evidence and existing methods. But its speed can change outcomes.

Telescopes and macroscopes each have their own issues. 96% of the universe is dark energy or dark matter, which doesn’t reflect light. Likewise, most of our data is behind corporate firewalls or is otherwise unshareable. So much of the information about people is not available to the people charged with representing them and ensuring they are healthy. Can we find a way to share data that doesn’t compromise business or privacy?

The future of the human race

This isn’t just about corporate social responsibility. If we could get the private sector to engage in sharing some of this data in a way that doesn’t produce business risk… Many business recognize that this volatile world is a terrible climate for business. We need a global real-time public/private data commons. Which models, tools, and technologies would allow a data commons to work for everyone?

“We think a data commons is the only way we are going to survive as a species for the next 100 years.”

Getting there will require safe space for experimentation. Many attempts will not work, but need to be tried.

Pulse Lab Jakarta is a shared facility with data scientists, research fellows, and private companies with tools and data to share. They’re actively recruiting partners to experiment in the most social-media-rich area of the world.

Additional labs will follow. Kampala, in east Africa, is less of a social media hub, but offers extraordinary opportunities for financial data. Makassar, in South Sulawesi, and Medan, in North Sumatera, are also regions of focus.

Issues of interest include food prices, urban poverty, and financial behavior.

The UN is a platform, which requires partners. They partner with governments to establish Pulse Labs, conduct research, and make it available to the world.

Their approach uses different data sources for various purposes. Twitter’s great for food prices, and useless for fears about job security. They correlate this information with official data sources, and visualize trends and map the results.

Some tools:

  1. ForSight gets a shoutout for monitoring conversations based on keywords. The global soybean shortage leads to a big spike in conversations about tofu and tempeh.
  2. Global Pulse also uses SAS’s Social Media Analytics and Text Miner tools
  3. They’re using linear regression analysis of social media conversations to determine if they can predict inflation of certain types of food. When prices go up, people tweet about that food. Except eggs. Indonesians tweet more about eggs as the price drops (“Mmm…omelet”).

Does this finding apply elsewhere? We’re going to need to map what these findings mean, and where they apply.

Our social media signals are getting stronger. People talk about basic needs using social media more than they used to. And the temporal correlation is growing stronger, too. Greater distribution of the human sensor network improves our results.

Our weather tools are pretty sophisticated because they pull from a large number of weather stations AND algorithmic detectors. Could we do the same for the monitoring of human networks?

This community is at an inflection point. The data’s everywhere, the tools are taking off, and the future’s exciting.

A word from the #ICCM Sponsors

  • Neils Holms-Nielsen, of the World Bank and the Global Facility for Disaster Recovery and Reduction, says this event is quite important for them. The cost of disasters annually has grown for the past several decades, much faster than the global economy (not that the global economy is growing too quickly, these days). This is a problem from a development point of view. For about half of the world’s nations, disasters pose a significant hurdle to development. Neils is heartened by the people, ideas, and technologies represented in the room. The World Bank cannot address this problem on its own, and looks to build stronger partnerships and relations with the emerging field of volunteer technology communities.
  • Salim Saway works on ESRI‘s Global Affairs team. They’ve attended all four CrisisMappers conferences, and sponsored the last three. They support the disaster relief community with data, tools, and licenses.
  • Christiaan Adams is here representing Google’s Crisis Response team. They’re excited by the data, collection of imagery, open tools, standards for sharing, and other developments. He says that crises force us to think faster and more creatively than usual, and encourage an environment of community and collaboration.
  • Tara Cordyack of GeoEye points out the critical need for commercially-available satellite imagery in a crisis. Time and time again, having this imagery leads to lives saved, money saved, and infrastructure saved. They’re happy to support this community with imagery and analytical services.
  • Dan Palmer, professor of Computer Science at John Carroll University. They hosted the first CrisisMappers conference. They’re a Jesuit university focused on helping others, social justice, and engaging with the world. On campus, they have a Center for Crisis Mapping, working in crowdsourcing and spatial analysis. It’s not a single academic discipline or department. Sociologists, computer scientists, and many more fields are brought into the mix. They are planning an expedition to Uganda to map resources.
  • Camille Cassidy of Digital Globe also provides earth imagery, with three satellites providing imagery for a range of uses.