Unpacking open data: power, politics and the influence of infrastructures

Liveblog of a #Berkman lunch written with Erhardt Graeff.

Tim Davies (@timdavies) is a social researcher with interests in civic participation and civic technologies. He has spent the last five years focussing on the development of the open government data landscape around the world, from his MSc work at the Oxford Internet Institute on Data and Democracy, the first major study of data.gov.uk, through to leading a 12-country study on the Emerging Impacts of Open Data in Developing Countries for the World Wide Web Foundation.

A broad coalition of companies, governments, and other entities have come together to open data. This work is based on the belief that opening data creates myriad benefits to society, for transparency, for economic value, and other benefits.

Does open data reconfigure power relationships in the political space? The past, promise, and reality of open data reminds wide.


For the last 5 years, Tim’s been following the spread of open data policy and practice, initially in the UK and then increasingly around the world. He’s starting from the desire that those people affected by decisions have a role in shaping those decisions, and asks whether there are certain open data structures and practices that achieve this normative goal.

Today, the impact of open data is often anecdotal. Tim seeks not only to prove whether open data is impactful, but also what conditions improve the chances that it is.

He puts forward three null hypotheses:
H1: Open data is not delivering widespread inclusive civic engagement;
H2: Open data is not delivering scalable innovation;
H3: Open data is not substantially shifting the balance of power between citizen and state.

The Open Data Barometer

A comparison of international open data policies

Open Data Research following the usage of open data in 12 developing countries and developing standards: http://standard.open-contracting.org/.


Open data’s roots can be found in the civic technology movement. Tim harkens back to Sebastopol, 2007, when Carl Malamud and others set forth principles for open government data. They had built civic tools, but were often frustrated by a lack of access to the data needed to power the tools. They were interested in very specific datasets, but this didn’t make for a very compelling movement, so a broader movement emerged.

This open government data strand joined another thread, particularly in Europe, of the public sector information industry. In Europe, state data was often open, with downstream value chains making use of the information. (is this what Tim said?)

Civic technologists sought free access to data, whereas the public sector information industry sought to capture value in exchange for providing government data. In the UK and US, governments dealt with political crises to stem democratic disaffection. Open data portals emerged as a potential solution.

The various factions converged on what Tim calls the standard model of open data. The big tent of open data was defined around:

  1. Pro-actively published
  2. Machine readable
  3. Legally re-usable

Machine readable cares about the format the data is in, but doesn’t look at how the data is structured internally, i.e. we prefer CSVs or XML files. Legal re-usability comes from the explicit application of an open license.

This creates a fairly binary definition of what is and is not open data. Any personally identify data, or data derived from such data, cannot be included in such systems. We’ve restricted the open data space with this dichotomy.

The theory of change is you-centric. Data comes out of government, private groups use it in apps or other intermediary ways, and you get impact. It’s a domino model. Deeper questions emerge. What kind of change are we actually creating?


Is the standard model being applied around the world?

The power to choose what is shared remains with governments. Tim shows a graph of datasets that have been opened across 77 countries. Most of the data that have been opened like census and trade data are the least sought after by open data requesters, where as many of the most demanded data like maps and transport data are less opened. And the data with the greatest potential to hold government accountable around company and land issues, are the least opened.

Very few of the open datasets Tim analyzed actually meet the three central goals of pro-actively publication, machine readability, and legal re-use.

Brazil’s open data

The unequal application of the standard model was made evident as Tim looks at the policies in each country he has studied. The initial launch of open data policies in the US, UK, and India are framed in democratic terms, whereas later open data policy rhetoric has been around economic potential.

The portal is the common denominator across each country. But other terms, like making data freely available, or specifying its license, are less standard.

Currently, countries are focusing a lot on “high-value” datasets now that might have the greatest impact. Tim points out the UK’s National Information Infrastructure and Denmark’s Good Basic Data for Everyone.

Infrastructures are generally invisible in our daily lives, only becoming visible when they break down. They set the frameworks within actions take place. And their malleability is limited because of knock on effects to changing them. But every so often there are opportunities to make major change.

Standards tend to proliferate. But in open government, the governments themselves are helping set standards. The Open Government Partnership is encouraging common adoption of standards, the G8 open charter nudges governments towards interoperability.

Open data standards are often treated as purely technical issues, but Tim cites Interop in arguing that our decisions around what to make interoperable, or not, go much deeper than technology.

Tim has been fascinated by the standardization of the contracting process and how that shapes the outcomes for open data standards. Standards don’t just shape the data the government publishes — they also shape how government works with data internally. The Open Contracting Data Standard model encourages governments to release data on an ongoing, iterative basis, rather than dump large numbers of datasets online.

There is an opportunity to see the opening of data as part of a broader process that rethinks government infrastructures towards openness. But there are many threats to this civic potential. The shift from open data as a civic virtue towards economic utility means that government partners are more likely to be economic actors. We could end up locked into infrastructures that don’t serve the public good.

Recommendations:

  1. We need to situate data in context. It shouldn’t be dumped into abstracted, decontextualized data portals.
  2. We should move from epiphenomenal data to active data. Moving away from releasing coincidentally associated data to data that is connected to specific activity in the same way public registers traditionally documented the work of government.
  3. We should shift from “raw data now” to an inclusive information infrastructure where we consider who’s involved in making these decisions

Question & Answer

Q: [About who is at the table on the standardization process and why.]

A: Governments will identify a data sharing need or open data movement will generate pressure for some set of data. This triggers a standardization process. There does not necessarily mean that corporate interests enter here and shape that standard. De facto standardization is the norm, whoever is the first mover in a space by a group that commits to using the data in some public way like Google Maps data.

Q: I’m interested in how interested cities are in the open data movement. The public’s running with it and asking questions of cities that cities aren’t capable of responding to.

A: A number of case studies have looked at cities and what we’re probably seeing is that those individuals inside government who are directly involved in open data policy are creating space for collaboration between parties, but this is usually an accidental artifact of open data’s culture and positive relationships.

Q: The power of the permalink in open data—openness as a stream rather than a drop. In NY, massage and physical therapists mobilized against regulations that would affect their business after they were able to get a permalink to the legislation. But there is an effort against permalinking to legislative data to protect business interests in selling legal data. What are your views on permalinks and the stream versus the drop?

A: That’s an absolutely key point on the way the technical and web infrastructures play out. A permanent URL to online documents is a good baseline to start with, because that makes the document a public object we can discuss.

Q: Do you have a definition of civic hacking and what role should it play in open data?

A: Jolly good question. I’m not sure I have one. If I were to create a definition of civic hacking, it would be broad enough to include not just code, but also information about local communities and rethinking how those communities reimagine how those spaces should work.

Q: Interested in the shift from the rhetoric of democracy to economy. I’m wondering if one of the dangers is like when free becomes a business model, does open become a business model that governments can use—and how does that neo-liberal turn affect the democratic qualities of government?

A: The key work here is that of Joanne Bates, who wrote a great PhD thesis on the UK’s open data policy with a critique to its neoliberal term, where data is provided in places of service,s or where government data becomes a subsidy to all sorts of private industries. These are important questions: How to prioritize what data gets released? How is the decision shaped and who is involved? There is an “open data user group” in the UK, that is largely commercial interests who invites civil society orgs to participate.

Q: How do you have a nuanced discussion around inclusive information infrastructures if the mantra is so uniformly open = good?

A: One way to deal with that is to create broader language that incorporates other modes of sharing. There is a need for much better language in this space. Open language has been appropriated to talk about private data sharing and other things which clouds the issue.

Q: Locally, there is a company called Bridj that looks at how public transportation is utilized in the city. They want to create an opportunity for small buses for hire on-demand. This could create heavy congestion at bus stops in the city of Cambridge. They are cherry picking from other public transportation companies. Bridj are even offering data to the city to give back (feeling responsible to do so) what they find. But the city is typically a data producer not a data ingester. Have you seen other examples of that, such as where the city is in the position to ingest data?

A: This is a great example, and the key idea here is that governments need to think about two-way flows of data, and they’re not often set up to do this. In the UK, OpenStreetMappers used open data to improve the government’s inaccurate stop locations, but there’s been little evidence that the government has successfully re-integrated the volunteer corrections.

Q: Civil society institutional investors hold lots of money and power. That information isn’t machine readable, but it does exist in IRS 990 forms. Civic education can map, manage, and analyze these pooled assets and make these resources part of the public conversation. In the corporate world right now, there’s a huge amount of effort going into standardizing corporate disclosure and reporting across economic, environmental, and other data. As the platforms get created, my focus is on the emerging educational infrastructure that will be needed for those intermediaries working with citizens to re-discover the voice and power they already have. How we engage and empower “plain people” to understand the data?

A: That is key to look at those intermediary abilities. It’s not just about who those individuals are but how they fit into a wider ecosystem. Any attempt to take data and mediate it, changes it. What kinds of organizations can engaged at the grassroots? One of the findings in the developing countries research was a need to do capacity building in local communities not just with governments and recognized that its a long, multi-year process. We don’t have good methods for creating the relevant communities of practitioners and there is a need to bridge cultures in order to do that work.

Q: Where is the talk of standardization in the health world, internationally or locally, or is there none?

A: That’s not something I know a lot about. I know there is an issue around what is public data and what is patient data and how you maintain privacy.

Q: Did your research notice a difference in ‘open standard’ adequacy between national and municipal datasets?

A: The cities case studies from the ODD project would help here — I haven’t looked at that.