Data Collaboratives: Matching Demand with Supply of (Corporate) Data to solve Public Problems


Blog by Stefaan G. Verhulst, IrynaSusha and Alexander Kostura: “Data Collaboratives refer to a new form of collaboration, beyond the public-private partnership model, in which participants from different sectors (private companies, research institutions, and government agencies) share data to help solve public problems. Several of society’s greatest challenges — from climate change to poverty — require greater access to big (but not always open) data sets, more cross-sector collaboration, and increased capacity for data analysis. Participants at the workshop and breakout session explored the various ways in which data collaborative can help meet these needs.

Matching supply and demand of data emerged as one of the most important and overarching issues facing the big and open data communities. Participants agreed that more experimentation is needed so that new, innovative and more successful models of data sharing can be identified.

How to discover and enable such models? When asked how the international community might foster greater experimentation, participants indicated the need to develop the following:

· A responsible data framework that serves to build trust in sharing data would be based upon existing frameworks but also accommodates emerging technologies and practices. It would also need to be sensitive to public opinion and perception.

· Increased insight into different business models that may facilitate the sharing of data. As experimentation continues, the data community should map emerging practices and models of sharing so that successful cases can be replicated.

· Capacity to tap into the potential value of data. On the demand side,capacity refers to the ability to pose good questions, understand current data limitations, and seek new data sets responsibly. On the supply side, this means seeking shared value in collaboration, thinking creatively about public use of private data, and establishing norms of responsibility around security, privacy, and anonymity.

· Transparent stock of available data supply, including an inventory of what corporate data exist that can match multiple demands and that is shared through established networks and new collaborative institutional structures.

· Mapping emerging practices and models of sharing. Corporate data offers value not only for humanitarian action (which was a particular focus at the conference) but also for a variety of other domains, including science,agriculture, health care, urban development, environment, media and arts,and others. Gaining insight in the practices that emerge across sectors could broaden the spectrum of what is feasible and how.

In general, it was felt that understanding the business models underlying data collaboratives is of utmost importance in order to achieve win-win outcomes for both private and public sector players. Moreover, issues of public perception and trust were raised as important concerns of government organizations participating in data collaboratives….(More)”

6 lessons from sharing humanitarian data


Francis Irving at LLRX: “The Humanitarian Data Exchange (HDX) is an unusual data hub. It’s made by the UN, and is successfully used by agencies, NGOs, companies, Governments and academics to share data.

They’re doing this during crises such as the Ebola epidemic and the Nepal earthquakes, and every day to build up information in between crises.

There are lots of data hubs which are used by one organisation to publish data, far fewer which are used by lots of organisations to share data. The HDX project did a bunch of things right. What were they?

Here are six lessons…

1) Do good design

HDX started with user needs research. This was expensive, and was immediately worth it because it stopped a large part of the project which wasn’t needed.

The user needs led to design work which has made the website seem simple and beautiful – particularly unusual for something from a large bureaucracy like the UN.

HDX front page

2) Build on existing software

When making a hub for sharing data, there’s no need to make something from scratch. Open Knowledge’s CKANsoftware is open source, this stuff is a commodity. HDX has developers who modify and improve it for the specific needs of humanitarian data.

ckan

3) Use experts

HDX is a great international team – the leader is in New York, most of the developers are in Romania, there’s a data lab in Nairobi. Crucially, they bring in specific outside expertise: frog design do the user research and design work;ScraperWiki, experts in data collaboration, provide operational management.

ScraperWiki logo

4) Measure the right things

HDX’s metrics are about both sides of its two sided network. Are users who visit the site actually finding and downloading data they want? Are new organisations joining to share data? They’re avoiding “vanity metrics”, taking inspiration from tech startup concepts like “pirate metrics“.

HDX metrics

5) Add features specific to your community

There are endless features you can add to data hubs – most add no value, and end up a cost to maintain. HDX add specific things valuable to its community.

For example, much humanitarian data is in “shape files”, a standard for geographical information. HDX automatically renders a beautiful map of these – essential for users who don’t have ArcGIS, and a good check for those that do.

Syrian border crossing

6) Trust in the data

The early user research showed that trust in the data was vital. For this reason, anyone can’t just come along and add data to it. New organisations have to apply – proving either that they’re known in humanitarian circles, or have quality data to share. Applications are checked by hand. It’s important to get this kind of balance right – being too ideologically open or closed doesn’t work.

Apply HDX

Conclusion

The detail of how a data sharing project is run really matters….(More)”

Citizen-Generated Data and Governments: Towards a Collaborative Model


Civicus: “…we’re very happy today to launch “Citizen-Generated Data and Governments: Towards a Collaborative Model”.

This piece explores the idea that governments could host and publish citizen-generated data (CGD) themselves, and whether this could mean that data is applied more widely and in a more sustainable way. It was inspired by a recent meeting in Buenos Aires with Argentine civil society organizations and government representatives, hosted by the City of Buenos Aires Innovation and Open Government Lab (Laboratorio de innovación y Gobierno Abierto de la Ciudad de Buenos Aires).

Screen Shot 2015-10-26 at 20.58.06

The meeting was organized to explore how people within government think about citizen-generated data, and discuss what would be needed for them to consider it as a valid method of data generation. One of the most novel and exciting ideas that surfaced was the potential for government open data portals, such as that managed by the Buenos Aires Innovation Lab, to host and publish CGD.

We wrote this report to explore this issue further, looking at existing models of data collaboration and outlining our first thoughts on the benefits and obstacles this kind of model might face. We welcome feedback from those with deeper expertise into different aspects of citizen-generated data, and look forward to refining these thoughts in the future together with the broader community…(More)”

Harnessing the Data Revolution for Sustainable Development


US State Department Fact Sheet on “U.S. Government Commitments and Collaboration with the Global Partnership for Sustainable Development Data”: “On September 27, 2015, the member states of the United Nations agreed to a set of Sustainable Development Goals (Global Goals) that define a common agenda to achieve inclusive growth, end poverty, and protect the environment by 2030. The Global Goals build on tremendous development gains made over the past decade, particularly in low- and middle-income countries, and set actionable steps with measureable indicators to drive progress. The availability and use of high quality data is essential to measuring and achieving the Global Goals. By harnessing the power of technology, mobilizing new and open data sources, and partnering across sectors, we will achieve these goals faster and make their progress more transparent.

Harnessing the data revolution is a critical enabler of the global goals—not only to monitor progress, but also to inclusively engage stakeholders at all levels – local, regional, national, global—to advance evidence-based policies and programs to reach those who need it most. Data can show us where girls are at greatest risk of violence so we can better prevent it; where forests are being destroyed in real-time so we can protect them; and where HIV/AIDS is enduring so we can focus our efforts and finish the fight. Data can catalyze private investment; build modern and inclusive economies; and support transparent and effective investment of resources for social good…..

The Global Partnership for Sustainable Development Data (Global Data Partnership), launched on the sidelines of the 70th United Nations General Assembly, is mobilizing a range of data producers and users—including governments, companies, civil society, data scientists, and international organizations—to harness the data revolution to achieve and measure the Global Goals. Working together, signatories to the Global Data Partnership will address the barriers to accessing and using development data, delivering outcomes that no single stakeholder can achieve working alone….The United States, through the U.S. President’s Emergency Plan for AIDS Relief (PEPFAR), is joining a consortium of funders to seed this initiative. The U.S. Government has many initiatives that are harnessing the data revolution for impact domestically and internationally. Highlights of our international efforts are found below:

Health and Gender

Country Data Collaboratives for Local Impact – PEPFAR and the Millennium Challenge Corporation(MCC) are partnering to invest $21.8 million in Country Data Collaboratives for Local Impact in sub-Saharan Africa that will use data on HIV/AIDS, global health, gender equality, and economic growth to improve programs and policies. Initially, the Country Data Collaboratives will align with and support the objectives of DREAMS, a PEPFAR, Bill & Melinda Gates Foundation, and Girl Effect partnership to reduce new HIV infections among adolescent girls and young women in high-burden areas.

Measurement and Accountability for Results in Health (MA4Health) Collaborative – USAID is partnering with the World Health Organization, the World Bank, and over 20 other agencies, countries, and civil society organizations to establish the MA4Health Collaborative, a multi-stakeholder partnership focused on reducing fragmentation and better aligning support to country health-system performance and accountability. The Collaborative will provide a vehicle to strengthen country-led health information platforms and accountability systems by improving data and increasing capacity for better decision-making; facilitating greater technical collaboration and joint investments; and developing international standards and tools for better information and accountability. In September 2015, partners agreed to a set of common strategic and operational principles, including a strong focus on 3–4 pathfinder countries where all partners will initially come together to support country-led monitoring and accountability platforms. Global actions will focus on promoting open data, establishing common norms and standards, and monitoring progress on data and accountability for the Global Goals. A more detailed operational plan will be developed through the end of the year, and implementation will start on January 1, 2016.

Data2X: Closing the Gender GapData2X is a platform for partners to work together to identify innovative sources of data, including “big data,” that can provide an evidence base to guide development policy and investment on gender data. As part of its commitment to Data2X—an initiative of the United Nations Foundation, Hewlett Foundation, Clinton Foundation, and Bill & Melinda Gates Foundation—PEPFAR and the Millennium Challenge Corporation (MCC) are working with partners to sponsor an open data challenge to incentivize the use of gender data to improve gender policy and practice….(More)”

See also: Data matters: the Global Partnership for Sustainable Development Data. Speech by UK International Development Secretary Justine Greening at the launch of the Global Partnership for Sustainable Development Data.

Data Collaboratives: Sharing Public Data in Private Hands for Social Good


Beth Simone Noveck (The GovLab) in Forbes: “Sensor-rich consumer electronics such as mobile phones, wearable devices, commercial cameras and even cars are collecting zettabytes of data about the environment and about us. According to one McKinsey study, the volume of data is growing at fifty percent a year. No one needs convincing that these private storehouses of information represent a goldmine for business, but these data can do double duty as rich social assets—if they are shared wisely.

Think about a couple of recent examples: Sharing data held by businesses and corporations (i.e. public data in private hands) can help to improve policy interventions. California planners make water allocation decisions based upon expertise, data and analytical tools from public and private sources, including Intel, the Earth Research Institute at the University of California at Santa Barbara, and the World Food Center at the University of California at Davis.

In Europe, several phone companies have made anonymized datasets available, making it possible for researchers to track calling and commuting patterns and gain better insight into social problems from unemployment to mental health. In the United States, LinkedIn is providing free data about demand for IT jobs in different markets which, when combined with open data from the Department of Labor, helps communities target efforts around training….

Despite the promise of data sharing, these kind of data collaboratives remain relatively new. There is a need toaccelerate their use by giving companies strong tax incentives for sharing data for public good. There’s a need for more study to identify models for data sharing in ways that respect personal privacy and security and enable companies to do well by doing good. My colleagues at The GovLab together with UN Global Pulse and the University of Leiden, for example, published this initial analysis of terms and conditions used when exchanging data as part of a prize-backed challenge. We also need philanthropy to start putting money into “meta research;” it’s not going to be enough to just open up databases: we need to know if the data is good.

After years of growing disenchantment with closed-door institutions, the push for greater use of data in governing can be seen as both a response and as a mirror to the Big Data revolution in business. Although more than 1,000,000 government datasets about everything from air quality to farmers markets are openly available online in downloadable formats, much of the data about environmental, biometric, epidemiological, and physical conditions rest in private hands. Governing better requires a new empiricism for developing solutions together. That will depend on access to these private, not just public data….(More)”

Interactive app lets constituents help balance their city’s budget


Springwise: “In this era of information, political spending and municipal budgets are still often shrouded in confusion and mystery. But a new web app called Balancing Act hopes to change that, by enabling US citizens to see the breakdown of their city’s budget via adjustable, comprehensive pie charts.

Created by Colorado-based consultants Engaged Public, Balancing Act not only shows citizens the current budget breakdown, it also enables them to experiment with hypothetical future budgets, adjusting spending and taxes to suit their own priorities. The project aims to engage and inform citizens about the money that their mayors and governments assign on their behalf and allow them to have more of a say in the future of their city. The resource has already been utilized by Pedro Segarra, Mayor of Hartford, Connecticut, who asked his citizens for their input on how best to balance the USD 49 million.

The system can be used to help governments understand the wants and needs of their constituents, as well as enable citizens to see the bigger picture when it comes to tough or unappealing policies. Eventually it can even be used to create the world’s first crowdsourced budget, giving the public the power to make their preferences heard in a clear, comprehensible way…(More)”

Selected Readings on Data Governance


Jos Berens (Centre for Innovation, Leiden University) and Stefaan G. Verhulst (GovLab)

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data governance was originally published in 2015.

Context
The field of Data Collaboratives is premised on the idea that sharing and opening-up private sector datasets has great – and yet untapped – potential for promoting social good. At the same time, the potential of data collaboratives depends on the level of societal trust in the exchange, analysis and use of the data exchanged. Strong data governance frameworks are essential to ensure responsible data use. Without such governance regimes, the emergent data ecosystem will be hampered and the (perceived) risks will dominate the (perceived) benefits. Further, without adopting a human-centered approach to the design of data governance frameworks, including iterative prototyping and careful consideration of the experience, the responses may fail to be flexible and targeted to real needs.

Selected Readings List (in alphabetical order)

Annotated Selected Readings List (in alphabetical order)

Better Place Lab, “Privacy, Transparency and Trust.” Mozilla, 2015. Available from: http://www.betterplace-lab.org/privacy-report.

  • This report looks specifically at the risks involved in the social sector having access to datasets, and the main risks development organizations should focus on to develop a responsible data use practice.
  • Focusing on five specific countries (Brazil, China, Germany, India and Indonesia), the report displays specific country profiles, followed by a comparative analysis centering around the topics of privacy, transparency, online behavior and trust.
  • Some of the key findings mentioned are:
    • A general concern on the importance of privacy, with cultural differences influencing conception of what privacy is.
    • Cultural differences determining how transparency is perceived, and how much value is attached to achieving it.
    • To build trust, individuals need to feel a personal connection or get a personal recommendation – it is hard to build trust regarding automated processes.

Montjoye, Yves Alexandre de; Kendall, Jake and; Kerry, Cameron F. “Enabling Humanitarian Use of Mobile Phone Data.” The Brookings Institution, 2015. Available from: http://www.brookings.edu/research/papers/2014/11/12-enabling-humanitarian-use-mobile-phone-data.

  • Focussing in particular on mobile phone data, this paper explores ways of mitigating privacy harms involved in using call detail records for social good.
  • Key takeaways are the following recommendations for using data for social good:
    • Engaging companies, NGOs, researchers, privacy experts, and governments to agree on a set of best practices for new privacy-conscientious metadata sharing models.
    • Accepting that no framework for maximizing data for the public good will offer perfect protection for privacy, but there must be a balanced application of privacy concerns against the potential for social good.
    • Establishing systems and processes for recognizing trusted third-parties and systems to manage datasets, enable detailed audits, and control the use of data so as to combat the potential for data abuse and re-identification of anonymous data.
    • Simplifying the process among developing governments in regards to the collection and use of mobile phone metadata data for research and public good purposes.

Centre for Democracy and Technology, “Health Big Data in the Commercial Context.” Centre for Democracy and Technology, 2015. Available from: https://cdt.org/insight/health-big-data-in-the-commercial-context/.

  • Focusing particularly on the privacy issues related to using data generated by individuals, this paper explores the overlap in privacy questions this field has with other data uses.
  • The authors note that although the Health Insurance Portability and Accountability Act (HIPAA) has proven a successful approach in ensuring accountability for health data, most of these standards do not apply to developers of the new technologies used to collect these new data sets.
  • For non-HIPAA covered, customer facing technologies, the paper bases an alternative framework for consideration of privacy issues. The framework is based on the Fair Information Practice Principles, and three rounds of stakeholder consultations.

Center for Information Policy Leadership, “A Risk-based Approach to Privacy: Improving Effectiveness in Practice.” Centre for Information Policy Leadership, Hunton & Williams LLP, 2015. Available from: https://www.informationpolicycentre.com/uploads/5/7/1/0/57104281/white_paper_1-a_risk_based_approach_to_privacy_improving_effectiveness_in_practice.pdf.

  • This white paper is part of a project aiming to explain what is often referred to as a new, risk-based approach to privacy, and the development of a privacy risk framework and methodology.
  • With the pace of technological progress often outstripping the capabilities of privacy officers to keep up, this method aims to offer the ability to approach privacy matters in a structured way, assessing privacy implications from the perspective of possible negative impact on individuals.
  • With the intended outcomes of the project being “materials to help policy-makers and legislators to identify desired outcomes and shape rules for the future which are more effective and less burdensome”, insights from this paper might also feed into the development of innovative governance mechanisms aimed specifically at preventing individual harm.

Centre for Information Policy Leadership, “Data Governance for the Evolving Digital Market Place”, Centre for Information Policy Leadership, Hunton & Williams LLP, 2011. Available from: http://www.huntonfiles.com/files/webupload/CIPL_Centre_Accountability_Data_Governance_Paper_2011.pdf.

  • This paper argues that as a result of the proliferation of large scale data analytics, new models governing data inferred from society will shift responsibility to the side of organizations deriving and creating value from that data.
  • It is noted that, with the reality of the challenge corporations face of enabling agile and innovative data use “In exchange for increased corporate responsibility, accountability [and the governance models it mandates, ed.] allows for more flexible use of data.”
  • Proposed as a means to shift responsibility to the side of data-users, the accountability principle has been researched by a worldwide group of policymakers. Tailing the history of the accountability principle, the paper argues that it “(…) requires that companies implement programs that foster compliance with data protection principles, and be able to describe how those programs provide the required protections for individuals.”
  • The following essential elements of accountability are listed:
    • Organisation commitment to accountability and adoption of internal policies consistent with external criteria
    • Mechanisms to put privacy policies into effect, including tools, training and education
    • Systems for internal, ongoing oversight and assurance reviews and external verification
    • Transparency and mechanisms for individual participation
    • Means of remediation and external enforcement

Crawford, Kate; Schulz, Jason. “Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harm.” NYU School of Law, 2014. Available from: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2325784&download=yes.

  • Considering the privacy implications of large-scale analysis of numerous data sources, this paper proposes the implementation of a ‘procedural data due process’ mechanism to arm data subjects against potential privacy intrusions.
  • The authors acknowledge that some privacy protection structures already know similar mechanisms. However, due to the “inherent analytical assumptions and methodological biases” of big data systems, the authors argue for a more rigorous framework.

Letouze, Emmanuel, and; Vinck, Patrick. “The Ethics and Politics of Call Data Analytics”, DataPop Alliance, 2015. Available from: http://static1.squarespace.com/static/531a2b4be4b009ca7e474c05/t/54b97f82e4b0ff9569874fe9/1421442946517/WhitePaperCDRsEthicFrameworkDec10-2014Draft-2.pdf.

  • Focusing on the use of Call Detail Records (CDRs) for social good in development contexts, this whitepaper explores both the potential of these datasets – in part by detailing recent successful efforts in the space – and political and ethical constraints to their use.
  • Drawing from the Menlo Report Ethical Principles Guiding ICT Research, the paper explores how these principles might be unpacked to inform an ethics framework for the analysis of CDRs.

Data for Development External Ethics Panel, “Report of the External Ethics Review Panel.” Orange, 2015. Available from: http://www.d4d.orange.com/fr/content/download/43823/426571/version/2/file/D4D_Challenge_DEEP_Report_IBE.pdf.

  • This report presents the findings of the external expert panel overseeing the Orange Data for Development Challenge.
  • Several types of issues faced by the panel are described, along with the various ways in which the panel dealt with those issues.

Federal Trade Commission Staff Report, “Mobile Privacy Disclosures: Building Trust Through Transparency.” Federal Trade Commission, 2013. Available from: www.ftc.gov/os/2013/02/130201mobileprivacyreport.pdf.

  • This report looks at ways to address privacy concerns regarding mobile phone data use. Specific advise is provided for the following actors:
    • Platforms, or operating systems providers
    • App developers
    • Advertising networks and other third parties
    • App developer trade associations, along with academics, usability experts and privacy researchers

Mirani, Leo. “How to use mobile phone data for good without invading anyone’s privacy.” Quartz, 2015. Available from: http://qz.com/398257/how-to-use-mobile-phone-data-for-good-without-invading-anyones-privacy/.

  • This paper considers the privacy implications of using call detail records for social good, and ways to mitigate risks of privacy intrusion.
  • Taking example of the Orange D4D challenge and the anonymization strategy that was employed there, the paper describes how classic ‘anonymization’ is often not enough. The paper then lists further measures that can be taken to ensure adequate privacy protection.

Bernholz, Lucy. “Several Examples of Digital Ethics and Proposed Practices” Stanford Ethics of Data conference, 2014, Available from: http://www.scribd.com/doc/237527226/Several-Examples-of-Digital-Ethics-and-Proposed-Practices.

  • This list of readings prepared for Stanford’s Ethics of Data conference lists some of the leading available literature regarding ethical data use.

Abrams, Martin. “A Unified Ethical Frame for Big Data Analysis.” The Information Accountability Foundation, 2014. Available from: http://www.privacyconference2014.org/media/17388/Plenary5-Martin-Abrams-Ethics-Fundamental-Rights-and-BigData.pdf.

  • Going beyond privacy, this paper discusses the following elements as central to developing a broad framework for data analysis:
    • Beneficial
    • Progressive
    • Sustainable
    • Respectful
    • Fair

Lane, Julia; Stodden, Victoria; Bender, Stefan, and; Nissenbaum, Helen, “Privacy, Big Data and the Public Good”, Cambridge University Press, 2014. Available from: http://www.dataprivacybook.org.

  • This book treats the privacy issues surrounding the use of big data for promoting the public good.
  • The questions being asked include the following:
    • What are the ethical and legal requirements for scientists and government officials seeking to serve the public good without harming individual citizens?
    • What are the rules of engagement?
    • What are the best ways to provide access while protecting confidentiality?
    • Are there reasonable mechanisms to compensate citizens for privacy loss?

Richards, Neil M, and; King, Jonathan H. “Big Data Ethics”. Wake Forest Law Review, 2014. Available from: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2384174.

  • This paper describes the growing impact of big data analytics on society, and argues that because of this impact, a set of ethical principles to guide data use is called for.
  • The four proposed themes are: privacy, confidentiality, transparency and identity.
  • Finally, the paper discusses how big data can be integrated into society, going into multiple facets of this integration, including the law, roles of institutions and ethical principles.

OECD, “OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data”. Available from: http://www.oecd.org/sti/ieconomy/oecdguidelinesontheprotectionofprivacyandtransborderflowsofpersonaldata.htm.

  • A globally used set of principles to inform thought about handling personal data, the OECD privacy guidelines serve as one the leading standards for informing privacy policies and data governance structures.
  • The basic principles of national application are the following:
    • Collection Limitation Principle
    • Data Quality Principle
    • Purpose Specification Principle
    • Use Limitation Principle
    • Security Safeguards Principle
    • Openness Principle
    • Individual Participation Principle
    • Accountability Principle

The White House Big Data and Privacy Working Group, “Big Data: Seizing Opportunities, Preserving Values”, White House, 2015. Available from: https://www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_5.1.14_final_print.pdf.

  • Documenting the findings of the White House big data and privacy working group, this report lists i.a. the following key recommendations regarding data governance:
    • Bringing greater transparency to the data services industry
    • Stimulating international conversation on big data, with multiple stakeholders
    • With regard to educational data: ensuring data is used for the purpose it is collected for
    • Paying attention to the potential for big data to facilitate discrimination, and expanding technical understanding to stop discrimination

William Hoffman, “Pathways for Progress” World Economic Forum, 2015. Available from: http://www3.weforum.org/docs/WEFUSA_DataDrivenDevelopment_Report2015.pdf.

  • This paper treats i.a. the lack of well-defined and balanced governance mechanisms as one of the key obstacles preventing particularly corporate sector data from being shared in a controlled space.
  • An approach that balances the benefits against the risks of large scale data usage in a development context, building trust among all stake holders in the data ecosystem, is viewed as key.
  • Furthermore, this whitepaper notes that new governance models are required not just by the growing amount of data and analytical capacity, and more refined methods for analysis. The current “super-structure” of information flows between institutions is also seen as one of the key reasons to develop alternatives to the current – outdated – approaches to data governance.

Mapping the Next Frontier of Open Data: Corporate Data Sharing


Stefaan Verhulst at the GovLab (cross-posted at the UN Global Pulse Blog): “When it comes to data, we are living in the Cambrian Age. About ninety percent of the data that exists today has been generated within the last two years. We create 2.5 quintillion bytes of data on a daily basis—equivalent to a “new Google every four days.”
All of this means that we are certain to witness a rapid intensification in the process of “datafication”– already well underway. Use of data will grow increasingly critical. Data will confer strategic advantages; it will become essential to addressing many of our most important social, economic and political challenges.
This explains–at least in large part–why the Open Data movement has grown so rapidly in recent years. More and more, it has become evident that questions surrounding data access and use are emerging as one of the transformational opportunities of our time.
Today, it is estimated that over one million datasets have been made open or public. The vast majority of this open data is government data—information collected by agencies and departments in countries as varied as India, Uganda and the United States. But what of the terabyte after terabyte of data that is collected and stored by corporations? This data is also quite valuable, but it has been harder to access.
The topic of private sector data sharing was the focus of a recent conference organized by the Responsible Data Forum, Data and Society Research Institute and Global Pulse (see event summary). Participants at the conference, which was hosted by The Rockefeller Foundation in New York City, included representatives from a variety of sectors who converged to discuss ways to improve access to private data; the data held by private entities and corporations. The purpose for that access was rooted in a broad recognition that private data has the potential to foster much public good. At the same time, a variety of constraints—notably privacy and security, but also proprietary interests and data protectionism on the part of some companies—hold back this potential.
The framing for issues surrounding sharing private data has been broadly referred to under the rubric of “corporate data philanthropy.” The term refers to an emerging trend whereby companies have started sharing anonymized and aggregated data with third-party users who can then look for patterns or otherwise analyze the data in ways that lead to policy insights and other public good. The term was coined at the World Economic Forum meeting in Davos, in 2011, and has gained wider currency through Global Pulse, a United Nations data project that has popularized the notion of a global “data commons.”
Although still far from prevalent, some examples of corporate data sharing exist….

Help us map the field

A more comprehensive mapping of the field of corporate data sharing would draw on a wide range of case studies and examples to identify opportunities and gaps, and to inspire more corporations to allow access to their data (consider, for instance, the GovLab Open Data 500 mapping for open government data) . From a research point of view, the following questions would be important to ask:

  • What types of data sharing have proven most successful, and which ones least?
  • Who are the users of corporate shared data, and for what purposes?
  • What conditions encourage companies to share, and what are the concerns that prevent sharing?
  • What incentives can be created (economic, regulatory, etc.) to encourage corporate data philanthropy?
  • What differences (if any) exist between shared government data and shared private sector data?
  • What steps need to be taken to minimize potential harms (e.g., to privacy and security) when sharing data?
  • What’s the value created from using shared private data?

We (the GovLab; Global Pulse; and Data & Society) welcome your input to add to this list of questions, or to help us answer them by providing case studies and examples of corporate data philanthropy. Please add your examples below, use our Google Form or email them to us at corporatedata@thegovlab.org”

Journey tracking app will use cyclist data to make cities safer for bikes


Springwise: “Most cities were never designed to cater for the huge numbers of bikes seen on their roads every day, and as the number of cyclists grows, so do the fatality statistics thanks to limited investment in safe cycle paths. While Berlin already crowdsources bikers’ favorite cycle routes and maps them through the Dynamic Connections platform, a new app called WeCycle lets cyclists track their journeys, pooling their data to create heat maps for city planners.
Created by the UK’s TravelAI transport startup, WeCycle taps into the current consumer trend for quantifying every aspect of life, including journey times. By downloading the free iOS app, London cyclists can seamlessly create stats each time they get on their bike. They app runs in the background and uses the device’s accelerometer to smartly distinguish walking or running from cycling. They can then see how far they’ve traveled, how fast they cycle and every route they’ve taken. Additionally, the app also tracks bus and car travel.
Anyone that downloads the app agrees that their data can be anonymously sent to TravelAI, creating an accurate and real-time information resource. It aims to create tools such as heat maps and behavior monitoring for cities and local authorities to learn more about how citizens are using roads to better inform their transport policies.
WeCycle follows in the footsteps of similar apps such as Germany’s Radwende and the Toronto Cycling App — both released this year — in taking a popular trend and turning into data that could help make cities a safer place to cycle….Website: www.travelai.info

Sharing Data Is a Form of Corporate Philanthropy


Matt Stempeck in HBR Blog:  “Ever since the International Charter on Space and Major Disasters was signed in 1999, satellite companies like DMC International Imaging have had a clear protocol with which to provide valuable imagery to public actors in times of crisis. In a single week this February, DMCii tasked its fleet of satellites on flooding in the United Kingdom, fires in India, floods in Zimbabwe, and snow in South Korea. Official crisis response departments and relevant UN departments can request on-demand access to the visuals captured by these “eyes in the sky” to better assess damage and coordinate relief efforts.

DMCii is a private company, yet it provides enormous value to the public and social sectors simply by periodically sharing its data.
Back on Earth, companies create, collect, and mine data in their day-to-day business. This data has quickly emerged as one of this century’s most vital assets. Public sector and social good organizations may not have access to the same amount, quality, or frequency of data. This imbalance has inspired a new category of corporate giving foreshadowed by the 1999 Space Charter: data philanthropy.
The satellite imagery example is an area of obvious societal value, but data philanthropy holds even stronger potential closer to home, where a wide range of private companies could give back in meaningful ways by contributing data to public actors. Consider two promising contexts for data philanthropy: responsive cities and academic research.
The centralized institutions of the 20th century allowed for the most sophisticated economic and urban planning to date. But in recent decades, the information revolution has helped the private sector speed ahead in data aggregation, analysis, and applications. It’s well known that there’s enormous value in real-time usage of data in the private sector, but there are similarly huge gains to be won in the application of real-time data to mitigate common challenges.
What if sharing economy companies shared their real-time housing, transit, and economic data with city governments or public interest groups? For example, Uber maintains a “God’s Eye view” of every driver on the road in a city:
stempeck2
Imagine combining this single data feed with an entire portfolio of real-time information. An early leader in this space is the City of Chicago’s urban data dashboard, WindyGrid. The dashboard aggregates an ever-growing variety of public datasets to allow for more intelligent urban management.
stempeck3
Over time, we could design responsive cities that react to this data. A responsive city is one where services, infrastructure, and even policies can flexibly respond to the rhythms of its denizens in real-time. Private sector data contributions could greatly accelerate these nascent efforts.
Data philanthropy could similarly benefit academia. Access to data remains an unfortunate barrier to entry for many researchers. The result is that only researchers with access to certain data, such as full-volume social media streams, can analyze and produce knowledge from this compelling information. Twitter, for example, sells access to a range of real-time APIs to marketing platforms, but the price point often exceeds researchers’ budgets. To accelerate the pursuit of knowledge, Twitter has piloted a program called Data Grants offering access to segments of their real-time global trove to select groups of researchers. With this program, academics and other researchers can apply to receive access to relevant bulk data downloads, such as an period of time before and after an election, or a certain geographic area.
Humanitarian response, urban planning, and academia are just three sectors within which private data can be donated to improve the public condition. There are many more possible applications possible, but few examples to date. For companies looking to expand their corporate social responsibility initiatives, sharing data should be part of the conversation…
Companies considering data philanthropy can take the following steps:

  • Inventory the information your company produces, collects, and analyzes. Consider which data would be easy to share and which data will require long-term effort.
  • Think who could benefit from this information. Who in your community doesn’t have access to this information?
  • Who could be harmed by the release of this data? If the datasets are about people, have they consented to its release? (i.e. don’t pull a Facebook emotional manipulation experiment).
  • Begin conversations with relevant public agencies and nonprofit partners to get a sense of the sort of information they might find valuable and their capacity to work with the formats you might eventually make available.
  • If you expect an onslaught of interest, an application process can help qualify partnership opportunities to maximize positive impact relative to time invested in the program.
  • Consider how you’ll handle distribution of the data to partners. Even if you don’t have the resources to set up an API, regular releases of bulk data could still provide enormous value to organizations used to relying on less-frequently updated government indices.
  • Consider your needs regarding privacy and anonymization. Strip the data of anything remotely resembling personally identifiable information (here are some guidelines).
  • If you’re making data available to researchers, plan to allow researchers to publish their results without obstruction. You might also require them to share the findings with the world under Open Access terms….”