Paper by Jennifer Larson et al for Political Networks Workshops & Conference 2016: “Pinning down the role of social ties in the decision to protest has been notoriously elusive, largely due to data limitations. The era of social media and its global use by protesters offers an unprecedented opportunity to observe real-time social ties and online behavior, though often without an attendant measure of real-world behavior. We collect data on Twitter activity during the 2015 Charlie Hebdo protests in Paris which, unusually, record both real-world protest attendance and high-resolution network structure. We specify a theory of participation in which an individual’s decision depends on her exposure to others’ intentions, and network position determines exposure. Our findings are strong and consistent with this theory, showing that, relative to comparable Twitter users, protesters are significantly more connected to one another via direct, indirect, triadic, and reciprocated ties. These results offer the first large-scale empirical support for the claim that social network structure influences protest participation….(More)’
Transforming governance: how can technology help reshape democracy?
Research Briefing by Matt Leighninger: “Around the world, people are asking how we can make democracy work in new and better ways. We are frustrated by political systems in which voting is the only legitimate political act, concerned that many republics don’t have the strength or appeal to withstand authoritarian figures, and disillusioned by the inability of many countries to address the fundamental challenges of health, education and economic development.
We can no longer assume that the countries of the global North have ‘advanced’ democracies, and that the nations of the global South simply need to catch up. Citizens of these older democracies have increasingly lost faith in their political institutions; Northerners cherish their human rights and free elections, but are clearly looking for something more. Meanwhile, in the global South, new regimes based on a similar formula of rights and elections have proven fragile and difficult to sustain. And in Brazil, India and other Southern countries, participatory budgeting and other valuable democratic innovations have emerged. The stage is set for a more equitable, global conversation about what we mean by democracy.
How can we adjust our democratic formulas so that they are more sustainable, powerful, fulfilling – and, well, democratic? Some of the parts of this equation may come from the development of online tools and platforms that help people to engage with their governments, with organisations and institutions, and with each other. Often referred to collectively as ‘civic technology’ or ‘civic tech’, these tools can help us map public problems, help citizens generate solutions, gather input for government, coordinate volunteer efforts, and help neighbours remain connected. If we want to create democracies in which citizens have meaningful roles in shaping public decisions and solving public problems, we should be asking a number of questions about civic tech, including:
- How can online tools best support new forms of democracy?
- What are the examples of how this has happened?
- What are some variables to consider in comparing these examples?
- How can we learn from each other as we move forward?
This background note has been developed to help democratic innovators explore these questions and examine how their work can provide answers….(More)”
The Seductions of Quantification: Measuring Human Rights, Gender Violence, and Sex Trafficking
Book by Sally Engle Merry: “We live in a world where seemingly everything can be measured. We rely on indicators to translate social phenomena into simple, quantified terms, which in turn can be used to guide individuals, organizations, and governments in establishing policy. Yet counting things requires finding a way to make them comparable. And in the process of translating the confusion of social life into neat categories, we inevitably strip it of context and meaning—and risk hiding or distorting as much as we reveal.
With The Seductions of Quantification, leading legal anthropologist Sally Engle Merry investigates the techniques by which information is gathered and analyzed in the production of global indicators on human rights, gender violence, and sex trafficking. Although such numbers convey an aura of objective truth and scientific validity, Merry argues persuasively that measurement systems constitute a form of power by incorporating theories about social change in their design but rarely explicitly acknowledging them. For instance, the US State Department’s Trafficking in Persons Report, which ranks countries in terms of their compliance with antitrafficking activities, assumes that prosecuting traffickers as criminals is an effective corrective strategy—overlooking cultures where women and children are frequently sold by their own families. As Merry shows, indicators are indeed seductive in their promise of providing concrete knowledge about how the world works, but they are implemented most successfully when paired with context-rich qualitative accounts grounded in local knowledge….(More)”.
Estonia Is Demonstrating How Government Should Work in a Digital World
Motherboard: “In May, Manu Sporny became the 10,000th “e-Resident” of Estonia. Sporny, the founder and CEO of a digital payments and identity company located in the United States, has never set foot in Estonia. However, he heard about the country’s e-Residency program and decided it would be an obvious choice for his company’s European headquarters.
People like Sporny are why Estonia launched a digital residency program in December 2014. The program allows anyone in the world to apply for a digital identity, which will let them: establish and run a location independent business online, get easier access to EU markets, open a bank account and conduct e-banking, use international payment service providers, declare taxes, and sign all relevant documents and contracts remotely…..
One of the most essential components of a functioning digital society is a secure digital identity. The state and the private sector need to know who is accessing these online services. Likewise, users need to feel secure that their identity is protected.
Estonia found the solution to this problem. In 2002, we started issuing residents a mandatory ID-card with a chip that empowers them to categorically identify themselves and verify legal transactions and documents through a digital signature. A digital signature has been legally equivalent to a handwritten one throughout the European Union—not just in Estonia—since 1999.
With this new digital identity system, the state could serve not only areas with a low population, but also the entire Estonian diaspora. Estonians anywhere in the world could maintain a connection to their homeland via e-services, contribute to the legislative process, and even participate in elections. Once the government realized that it could scale this service worldwide, it seemed logical to offer its e-services to those without physical residency in Estonia. This meant the Estonian country suddenly had value as a service in addition to a place to live.
What does “Country as a Service” mean?
With the rise of a global internet, we’ve seen more skilled workers and businesspeople offering their services across nations, regardless of their physical location. A survey by Intuit estimates that this number will reach 40 percent in the US alone by 2020.
These entrepreneurs and skilled artisans are ultimately looking for the simplest way to create and maintain a legal, global identity as an outlet for their global offerings.
They look to other countries, not because they are looking for a tax haven, but because they have been prevented from incorporating and maintaining a business, due to barriers from their own government.
The most important thing for these entrepreneurs is that the creation and upkeep of the company is easy and hassle-free. It is also important that, despite being incorporated in a different nation, they remain honest taxpayers within their country of physical residence.
This is exactly what Estonia offers—a location-independent, hassle-free and fully-digital economic and financial environment where entrepreneurs can run their own company globally….
When an e-Resident establishes a company, it means that the company will likely start using the services offered by other Estonian companies (like creating a bank account, partnering with a payment service provider, seeking assistance from accountants, auditors and lawyers). As more clients are created for Estonian companies, their growth potential increases, along with the growth potential of the Estonian economy.
Eventually, there will be more residents outside borders than inside them
If states fail to redesign and simplify the machinery of bureaucracy and make it location-independent, there will be an opportunity for countries that can offer such services across borders.
Estonia has learned that it’s incredibly important in a small state to serve primarily small and micro businesses. In order to sustain a nation on this, we must automate and digitize processes to scale. Estonia’s model, for instance, is location-independent, making it simple to scale successfully. We hope to acquire at least 10 million digital residents (e-Residents) in a way that is mutually beneficial by the nation-states where these people are tax residents….(More)”
Open access: All human knowledge is there—so why can’t everybody access it?
Glyn Moody at ArsTechnica: “In 1836, Anthony Panizzi, who later became principal librarian of the British Museum, gave evidence before a parliamentary select committee. At that time, he was only first assistant librarian, but even then he had an ambitious vision for what would one day became the British Library. He told the committee:
I want a poor student to have the same means of indulging his learned curiosity, of following his rational pursuits, of consulting the same authorities, of fathoming the most intricate inquiry as the richest man in the kingdom, as far as books go, and I contend that the government is bound to give him the most liberal and unlimited assistance in this respect.
He went some way to achieving that goal of providing general access to human knowledge. In 1856, after 20 years of labour as Keeper of Printed Books, he had helped boost the British Museum’s collection to over half a million books, making it the largest library in the world at the time. But there was a serious problem: to enjoy the benefits of those volumes, visitors needed to go to the British Museum in London.
Imagine, for a moment, if it were possible to provide access not just to those books, but to all knowledge for everyone, everywhere—the ultimate realisation of Panizzi’s dream. In fact, we don’t have to imagine: it is possible today, thanks to the combined technologies of digital texts and the Internet. The former means that we can make as many copies of a work as we want, for vanishingly small cost; the latter provides a way to provide those copies to anyone with an Internet connection. The global rise of low-cost smartphones means that group will soon include even the poorest members of society in every country.
That is to say, we have the technical means to share all knowledge, and yet we are nowhere near providing everyone with the ability to indulge their learned curiosity as Panizzi wanted it.
What’s stopping us? That’s the central question that the “open access” movement has been asking, and trying to answer, for the last two decades. Although tremendous progress has been made, with more knowledge freely available now than ever before, there are signs that open access is at a critical point in its development, which could determine whether it will ever succeed in realising Panizzi’s plan.
Table of Contents
- The arcana of academic publishing
- What about us?
- In the beginning was arXiv
- Scholarly skywriting
- Opening up the Americas
- Public Library of Science
- Open access is born
- CERN’s SCOAP
- PLoS ONE
- Gold open access
- Hybrid problems
- Green open access
- The empire strikes back
- Diamond open access
- From Aaron Swartz…
- …to Sci-Hub“
Connect the corporate dots to see true transparency
Gillian Tett at the Financial Times: “…In all this, a crucial point is often forgotten: simply amassing data will not solve the problem of transparency. What is also needed is a way for analysts to track the connections that exist between companies scattered across different national jurisdictions.
There are more than 45,000 companies listed on global stock exchanges and, according to Chris Taggart of OpenCorporates, an independent data company, there are between 250m and 400m unlisted groups. Many of these are listed on national registries but, since registries are extremely fragmented, it is very difficult for shareholders or regulators to form a complete picture of company activity.
It also creates financial stability risks. One reason why it is currently hard to track the scale of Chinese corporate debt, say, is that it is being issued by an opaque web of legal entities. Similarly, regulators struggled to cope with the fallout from the Lehman Brothers collapse in 2008 because the bank was operating almost 3,000 different legal entities around the world.
Is there a solution to this? A good place to start would be for governments to put their corporate registries online. Another crucial step would be for governments and companies to agree on a common standard for labelling legal entities, so that these can be tracked across borders.
Happily, work on that has begun: in 2014, the Global Legal Entity Identifier Foundation was created. It supports the implementation and use of “legal entity identifiers”, a data standard that identifies participants in financial transactions. Groups such as the Data Coalition in Washington DC are lobbying for laws that would force companies to use LEIs….However, this inter-governmental project is moving so slowly that the private sector may be a better bet. In recent years, companies such as Dun & Bradstreet have begun to amass proprietary information about complex corporate webs, and computer nerds are also starting to use the power of big data to join up the corporate dots in a public format.
OpenCorporates is a good example. Over the past five years, a dozen staff there have been painstakingly scraping national corporate registries to create a database designed to show how companies are connected around the world. This is far from complete but data from 100m entities have already been logged. And in the wake of the Panama Papers, more governments are coming on board — data from the Cayman Islands are currently being added and France is likely to collaborate soon.
Sadly, these moves will not deliver real transparency straight away. If you type “MIO” into the search box on the OpenCorporates website, you will not see a map of all of McKinsey’s activities — at least not yet.
The good news, however, is that with every data scrape, or use of an LEI, the picture of global corporate activity is becoming slightly less opaque thanks to the work of a hidden army of geeks. They deserve acclaim and support — even (or especially) from management consultants….(More)”
Selected Readings on Data Collaboratives
By Neil Britto, David Sangokoya, Iryna Susha, Stefaan Verhulst and Andrew Young
The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data collaboratives was originally published in 2017.
The term data collaborative refers to a new form of collaboration, beyond the public-private partnership model, in which participants from different sectors (including private companies, research institutions, and government agencies ) can exchange data to help solve public problems. Several of society’s greatest challenges — from addressing climate change to public health to job creation to improving the lives of children — require greater access to data, more collaboration between public – and private-sector entities, and an increased ability to analyze datasets. In the coming months and years, data collaboratives will be essential vehicles for harnessing the vast stores of privately held data toward the public good.
Selected Reading List (in alphabetical order)
- G. Agaba, et al – Big data and Positive Social Change in the Developing World: A White Paper for Practitioners and Researchers – a white paper describing the potential of big data, and corporate data in particular, to positively benefit development efforts.
- C. Ansell and A. Gash – Collaborative Governance in Theory and Practice – a journal article describing the emerging practice of public-private partnerships, particularly those built around data sharing.
- Amparo Ballivian and Bill Hoffman – Public-Private Partnerships for Data: Issues Paper for Data Revolution Consultation – an issues paper prepared by the World Bank on financing and sustaining the post-2015 “data revolution” movement through data public-private partnerships.
- Matthew Brack and Tito Castillo – Data Sharing for Public Health: Key Lessons from Other Sectors – a Chatham House report describing the need for data sharing and collaboration for global public health emergencies and potential lessons learned from the commercial sector.
- Yves-Alexandre de Montjoye, Jake Kendall, and Cameron F. Kerry – Enabling Humanitarian Use of Mobile Phone Data – an issues paper from the Brookings Institution on leveraging the benefits of mobile phone data for humanitarian use while minimizing risks to privacy.
- Silja M. Eckartz, Wout J. Hofman, Anne Fleur Van Veenstra – A Decision Model for Data Sharing – a paper proposing a decision model for data sharing arrangements aimed at addressing identified risks and challenges.
- Harlan M. Krumholz et al. – Sea Change in Open Science and Data Sharing Leadership by Industry – a review of industry-led efforts and cross-sector collaborations to share data from clinical trials to inform clinical practice.
- Institute of Medicine (IOM) – Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk – a consensus, peer-revieed IOM report recommending how to promote responsible clinical trial data sharing and minimize risks and challenges of sharing.
- Gideon Mann – Private Data and the Public Good – the transcript of a keynote talk on the potential of leveraging corporate data to help solve public problems.
- D. Pastor Escuredo, Morales-Guzmán, A. et al – Flooding through the Lens of Mobile Phone Activity – an analysis of aggregated and anonymized call details records (CDR) conducted in collaboration with the UN, Government of Mexico, academia and Telefonica suggests high potential in using shared telecom data to improve early warning and emergency management mechanisms.
- M. Perkmann and H. Schildt – Open Data Partnerships Between Firms and Universities: The Role of Boundary Organizations – a paper highlighting the advantages of third-party organizations enabling data sharing between industry and academia to uncover new insights to benefit the public good.
- Matt Stempeck – Sharing Data Is A Form Of Corporate Philanthropy’ – a Harvard Business Review article on data philanthropy, the practice of companies donating data for public good, and its benefits and challenges.
- N. Robin, T. Klein, J. Jütting – Public-Private Partnerships for Statistics: Lessons Learned, Future Steps – a working paper describing how privately held data sources could fill current gaps in the efforts of National Statistics Offices.
- Elizabeth Stuart, Emma Samman, William Avis, and Tom Berliner –The data revolution: finding the missing millions – the Overseas Development Institute’s annual report focused on solutions toward a sustainable data revolution.
- L. Taylor and R. Schroeder – Is Bigger Better? The Emergence of Big Data as a Tool for International Development Policy – a paper describing how data, such as privately held mobile phone data – could improve development policy.
- Willem G. van Panhuis, Proma Paul, Claudia Emerson, John Grefenstette, Richard Wilder, Abraham J. Herbst, David Heymann, and Donald S. Burke – A systematic review of barriers to data sharing in public health – a literature review of potential barriers to public health data sharing.
- Stefaan Verhulst and David Sangokoya – Mapping the Next Frontier of Open Data: Corporate Data Sharing – this essay describes an emerging taxonomy of activities involving corporate data sharing for public good, an emerging trend in which companies share anonymized and aggregated data with third-party users towards data-driven policymaking and greater public good.
- Stefaan Verhulst and David Sangokoya – Data Collaboratives: Exchanging Data to Improve People’s Lives – an essay on leveraging the potential of data to solve complex public problems through data collaboratives and four critical accelerators towards responsible data sharing and collaboration.
- Stefaan Verhulst, Iryna Susha, Alexander Kostura – Data Collaboratives: matching Supply of (Corporate) Data to Solve Public Problems – a report describing emerging practice, opportunities and challenges in data collaboratives as identified at the International Data Responsibility Conference.
- F, Welle Donker, B. van Loenen, A. K. Bregt – Open Data and Beyond – a case study examining the opening of private data by Dutch energy network administrator Liander.
- World Economic Forum – Data-driven development: pathways for progress – an overview report from the World Economic Forum on the existing data deficit and the value and impact of big data for sustainable development
Annotated Selected Readings List (in alphabetical order)
Agaba, G., Akindès, F., Bengtsson, L., Cowls, J., Ganesh, M., Hoffman, N., . . . Meissner, F. “Big Data and Positive Social Change in the Developing World: A White Paper for Practitioners and Researchers.” 2014. http://bit.ly/25RRC6N.
- This white paper, produced by “a group of activists, researchers and data experts” explores the potential of big data to improve development outcomes and spur positive social change in low- and middle-income countries. Using examples, the authors discuss four areas in which the use of big data can impact development efforts:
- Advocating and facilitating by “opening[ing] up new public spaces for discussion and awareness building;
- Describing and predicting through the detection of “new correlations and the surfac[ing] of new questions;
- Facilitating information exchange through “multiple feedback loops which feed into both research and action,” and
- Promoting accountability and transparency, especially as a byproduct of crowdsourcing efforts aimed at “aggregat[ing] and analyz[ing] information in real time.
- The authors argue that in order to maximize the potential of big data’s use in development, “there is a case to be made for building a data commons for private/public data, and for setting up new and more appropriate ethical guidelines.”
- They also identify a number of challenges, especially when leveraging data made accessible from a number of sources, including private sector entities, such as:
- Lack of general data literacy;
- Lack of open learning environments and repositories;
- Lack of resources, capacity and access;
- Challenges of sensitivity and risk perception with regard to using data;
- Storage and computing capacity; and
- Externally validating data sources for comparison and verification.
Ansell, C. and Gash, A. “Collaborative Governance in Theory and Practice.” Journal of Public Administration Research and Theory 18 (4), 2008. http://bit.ly/1RZgsI5.
- This article describes collaborative arrangements that include public and private organizations working together and proposes a model for understanding an emergent form of public-private interaction informed by 137 diverse cases of collaborative governance.
- The article suggests factors significant to successful partnering processes and outcomes include:
- Shared understanding of challenges,
- Trust building processes,
- The importance of recognizing seemingly modest progress, and
- Strong indicators of commitment to the partnership’s aspirations and process.
- The authors provide a ‘’contingency theory model’’ that specifies relationships between different variables that influence outcomes of collaborative governance initiatives. Three “core contingencies’’ for successful collaborative governance initiatives identified by the authors are:
- Time (e.g., decision making time afforded to the collaboration);
- Interdependence (e.g., a high degree of interdependence can mitigate negative effects of low trust); and
- Trust (e.g. a higher level of trust indicates a higher probability of success).
Ballivian A, Hoffman W. “Public-Private Partnerships for Data: Issues Paper for Data Revolution Consultation.” World Bank, 2015. Available from: http://bit.ly/1ENvmRJ
- This World Bank report provides a background document on forming public-prviate partnerships for data with the private sector in order to inform the UN’s Independent Expert Advisory Group (IEAG) on sustaining a “data revolution” in sustainable development.
- The report highlights the critical position of private companies within the data value chain and reflects on key elements of a sustainable data PPP: “common objectives across all impacted stakeholders, alignment of incentives, and sharing of risks.” In addition, the report describes the risks and incentives of public and private actors, and the principles needed to “build[ing] the legal, cultural, technological and economic infrastructures to enable the balancing of competing interests.” These principles include understanding; experimentation; adaptability; balance; persuasion and compulsion; risk management; and governance.
- Examples of data collaboratives cited in the report include HP Earth Insights, Orange Data for Development Challenges, Amazon Web Services, IBM Smart Cities Initiative, and the Governance Lab’s Open Data 500.
Brack, Matthew, and Tito Castillo. “Data Sharing for Public Health: Key Lessons from Other Sectors.” Chatham House, Centre on Global Health Security. April 2015. Available from: http://bit.ly/1DHFGVl
- The Chatham House report provides an overview on public health surveillance data sharing, highlighting the benefits and challenges of shared health data and the complexity in adapting technical solutions from other sectors for public health.
- The report describes data sharing processes from several perspectives, including in-depth case studies of actual data sharing in practice at the individual, organizational and sector levels. Among the key lessons for public health data sharing, the report strongly highlights the need to harness momentum for action and maintain collaborative engagement: “Successful data sharing communities are highly collaborative. Collaboration holds the key to producing and abiding by community standards, and building and maintaining productive networks, and is by definition the essence of data sharing itself. Time should be invested in establishing and sustaining collaboration with all stakeholders concerned with public health surveillance data sharing.”
- Examples of data collaboratives include H3Africa (a collaboration between NIH and Wellcome Trust) and NHS England’s care.data programme.
de Montjoye, Yves-Alexandre, Jake Kendall, and Cameron F. Kerry. “Enabling Humanitarian Use of Mobile Phone Data.” The Brookings Institution, Issues in Technology Innovation. November 2014. Available from: http://brook.gs/1JxVpxp
- Using Ebola as a case study, the authors describe the value of using private telecom data for uncovering “valuable insights into understanding the spread of infectious diseases as well as strategies into micro-target outreach and driving update of health-seeking behavior.”
- The authors highlight the absence of a common legal and standards framework for “sharing mobile phone data in privacy-conscientious ways” and recommend “engaging companies, NGOs, researchers, privacy experts, and governments to agree on a set of best practices for new privacy-conscientious metadata sharing models.”
Eckartz, Silja M., Hofman, Wout J., Van Veenstra, Anne Fleur. “A decision model for data sharing.” Vol. 8653 LNCS. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2014. http://bit.ly/21cGWfw.
- This paper proposes a decision model for data sharing of public and private data based on literature review and three case studies in the logistics sector.
- The authors identify five categories of the barriers to data sharing and offer a decision model for identifying potential interventions to overcome each barrier:
- Ownership. Possible interventions likely require improving trust among those who own the data through, for example, involvement and support from higher management
- Privacy. Interventions include “anonymization by filtering of sensitive information and aggregation of data,” and access control mechanisms built around identity management and regulated access.
- Economic. Interventions include a model where data is shared only with a few trusted organizations, and yield management mechanisms to ensure negative financial consequences are avoided.
- Data quality. Interventions include identifying additional data sources that could improve the completeness of datasets, and efforts to improve metadata.
- Technical. Interventions include making data available in structured formats and publishing data according to widely agreed upon data standards.
Hoffman, Sharona and Podgurski, Andy. “The Use and Misuse of Biomedical Data: Is Bigger Really Better?” American Journal of Law & Medicine 497, 2013. http://bit.ly/1syMS7J.
- This journal articles explores the benefits and, in particular, the risks related to large-scale biomedical databases bringing together health information from a diversity of sources across sectors. Some data collaboratives examined in the piece include:
- MedMining – a company that extracts EHR data, de-identifies it, and offers it to researchers. The data sets that MedMining delivers to its customers include ‘lab results, vital signs, medications, procedures, diagnoses, lifestyle data, and detailed costs’ from inpatient and outpatient facilities.
- Explorys has formed a large healthcare database derived from financial, administrative, and medical records. It has partnered with major healthcare organizations such as the Cleveland Clinic Foundation and Summa Health System to aggregate and standardize health information from ten million patients and over thirty billion clinical events.
- Hoffman and Podgurski note that biomedical databases populated have many potential uses, with those likely to benefit including: “researchers, regulators, public health officials, commercial entities, lawyers,” as well as “healthcare providers who conduct quality assessment and improvement activities,” regulatory monitoring entities like the FDA, and “litigants in tort cases to develop evidence concerning causation and harm.”
- They argue, however, that risks arise based on:
- The data contained in biomedical databases is surprisingly likely to be incorrect or incomplete;
- Systemic biases, arising from both the nature of the data and the preconceptions of investigators are serious threats the validity of research results, especially in answering causal questions;
- Data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policymakers.
Krumholz, Harlan M., et al. “Sea Change in Open Science and Data Sharing Leadership by Industry.” Circulation: Cardiovascular Quality and Outcomes 7.4. 2014. 499-504. http://1.usa.gov/1J6q7KJ
- This article provides a comprehensive overview of industry-led efforts and cross-sector collaborations in data sharing by pharmaceutical companies to inform clinical practice.
- The article details the types of data being shared and the early activities of GlaxoSmithKline (“in coordination with other companies such as Roche and ViiV”); Medtronic and the Yale University Open Data Access Project; and Janssen Pharmaceuticals (Johnson & Johnson). The article also describes the range of involvement in data sharing among pharmaceutical companies including Pfizer, Novartis, Bayer, AbbVie, Eli Llly, AstraZeneca, and Bristol-Myers Squibb.
Mann, Gideon. “Private Data and the Public Good.” Medium. May 17, 2016. http://bit.ly/1OgOY68.
-
- This Medium post from Gideon Mann, the Head of Data Science at Bloomberg, shares his prepared remarks given at a lecture at the City College of New York. Mann argues for the potential benefits of increasing access to private sector data, both to improve research and academic inquiry and also to help solve practical, real-world problems. He also describes a number of initiatives underway at Bloomberg along these lines.
- Mann argues that data generated at private companies “could enable amazing discoveries and research,” but is often inaccessible to those who could put it to those uses. Beyond research, he notes that corporate data could, for instance, benefit:
-
- Public health – including suicide prevention, addiction counseling and mental health monitoring.
- Legal and ethical questions – especially as they relate to “the role algorithms have in decisions about our lives,” such as credit checks and resume screening.
-
- Mann recognizes the privacy challenges inherent in private sector data sharing, but argues that it is a common misconception that the only two choices are “complete privacy or complete disclosure.” He believes that flexible frameworks for differential privacy could open up new opportunities for responsibly leveraging data collaboratives.
Pastor Escuredo, D., Morales-Guzmán, A. et al, “Flooding through the Lens of Mobile Phone Activity.” IEEE Global Humanitarian Technology Conference, GHTC 2014. Available from: http://bit.ly/1OzK2bK
- This report describes the impact of using mobile data in order to understand the impact of disasters and improve disaster management. The report was conducted in the Mexican state of Tabasco in 2009 as a multidisciplinary, multi-stakeholder consortium involving the UN World Food Programme (WFP), Telefonica Research, Technical University of Madrid (UPM), Digital Strategy Coordination Office of the President of Mexico, and UN Global Pulse.
- Telefonica Research, a division of the major Latin American telecommunications company, provided call detail records covering flood-affected areas for nine months. This data was combined with “remote sensing data (satellite images), rainfall data, census and civil protection data.” The results of the data demonstrated that “analysing mobile activity during floods could be used to potentially locate damaged areas, efficiently assess needs and allocate resources (for example, sending supplies to affected areas).”
- In addition to the results, the study highlighted “the value of a public-private partnership on using mobile data to accurately indicate flooding impacts in Tabasco, thus improving early warning and crisis management.”
* Perkmann, M. and Schildt, H. “Open data partnerships between firms and universities: The role of boundary organizations.” Research Policy, 44(5), 2015. http://bit.ly/25RRJ2c.
- This paper discusses the concept of a “boundary organization” in relation to industry-academic partnerships driven by data. Boundary organizations perform mediated revealing, allowing firms to disclose their research problems to a broad audience of innovators and simultaneously minimize the risk that this information would be adversely used by competitors.
- The authors identify two especially important challenges for private firms to enter open data or participate in data collaboratives with the academic research community that could be addressed through more involvement from boundary organizations:
- First is a challenge of maintaining competitive advantage. The authors note that, “the more a firm attempts to align the efforts in an open data research programme with its R&D priorities, the more it will have to reveal about the problems it is addressing within its proprietary R&D.”
- Second, involves the misalignment of incentives between the private and academic field. Perkmann and Schildt argue that, a firm seeking to build collaborations around its opened data “will have to provide suitable incentives that are aligned with academic scientists’ desire to be rewarded for their work within their respective communities.”
Robin, N., Klein, T., & Jütting, J. “Public-Private Partnerships for Statistics: Lessons Learned, Future Steps.” OECD. 2016. http://bit.ly/24FLYlD.
- This working paper acknowledges the growing body of work on how different types of data (e.g, telecom data, social media, sensors and geospatial data, etc.) can address data gaps relevant to National Statistical Offices (NSOs).
- Four models of public-private interaction for statistics are describe: in-house production of statistics by a data-provider for a national statistics office (NSO), transfer of data-sets to NSOs from private entities, transfer of data to a third party provider to manage the NSO and private entity data, and the outsourcing of NSO functions.
- The paper highlights challenges to public-private partnerships involving data (e.g., technical challenges, data confidentiality, risks, limited incentives for participation), suggests deliberate and highly structured approaches to public-private partnerships involving data require enforceable contracts, emphasizes the trade-off between data specificity and accessibility of such data, and the importance of pricing mechanisms that reflect the capacity and capability of national statistic offices.
- Case studies referenced in the paper include:
- A mobile network operator’s (MNO Telefonica) in house analysis of call detail records;
- A third-party data provider and steward of travel statistics (Positium);
- The Data for Development (D4D) challenge organized by MNO Orange; and
- Statistics Netherlands use of social media to predict consumer confidence.
Stuart, Elizabeth, Samman, Emma, Avis, William, Berliner, Tom. “The data revolution: finding the missing millions.” Overseas Development Institute, 2015. Available from: http://bit.ly/1bPKOjw
- The authors of this report highlight the need for good quality, relevant, accessible and timely data for governments to extend services into underrepresented communities and implement policies towards a sustainable “data revolution.”
- The solutions focused on this recent report from the Overseas Development Institute focus on capacity-building activities of national statistical offices (NSOs), alternative sources of data (including shared corporate data) to address gaps, and building strong data management systems.
Taylor, L., & Schroeder, R. “Is bigger better? The emergence of big data as a tool for international development policy.” GeoJournal, 80(4). 2015. 503-518. http://bit.ly/1RZgSy4.
- This journal article describes how privately held data – namely “digital traces” of consumer activity – “are becoming seen by policymakers and researchers as a potential solution to the lack of reliable statistical data on lower-income countries.
- They focus especially on three categories of data collaborative use cases:
- Mobile data as a predictive tool for issues such as human mobility and economic activity;
- Use of mobile data to inform humanitarian response to crises; and
- Use of born-digital web data as a tool for predicting economic trends, and the implications these have for LMICs.
- They note, however, that a number of challenges and drawbacks exist for these types of use cases, including:
- Access to private data sources often must be negotiated or bought, “which potentially means substituting negotiations with corporations for those with national statistical offices;”
- The meaning of such data is not always simple or stable, and local knowledge is needed to understand how people are using the technologies in question
- Bias in proprietary data can be hard to understand and quantify;
- Lack of privacy frameworks; and
- Power asymmetries, wherein “LMIC citizens are unwittingly placed in a panopticon staffed by international researchers, with no way out and no legal recourse.”
van Panhuis, Willem G., Proma Paul, Claudia Emerson, John Grefenstette, Richard Wilder, Abraham J. Herbst, David Heymann, and Donald S. Burke. “A systematic review of barriers to data sharing in public health.” BMC public health 14, no. 1 (2014): 1144. Available from: http://bit.ly/1JOBruO
- The authors of this report provide a “systematic literature of potential barriers to public health data sharing.” These twenty potential barriers are classified in six categories: “technical, motivational, economic, political, legal and ethical.” In this taxonomy, “the first three categories are deeply rooted in well-known challenges of health information systems for which structural solutions have yet to be found; the last three have solutions that lie in an international dialogue aimed at generating consensus on policies and instruments for data sharing.”
- The authors suggest the need for a “systematic framework of barriers to data sharing in public health” in order to accelerate access and use of data for public good.
Verhulst, Stefaan and Sangokoya, David. “Mapping the Next Frontier of Open Data: Corporate Data Sharing.” In: Gasser, Urs and Zittrain, Jonathan and Faris, Robert and Heacock Jones, Rebekah, “Internet Monitor 2014: Reflections on the Digital World: Platforms, Policy, Privacy, and Public Discourse (December 15, 2014).” Berkman Center Research Publication No. 2014-17. http://bit.ly/1GC12a2
- This essay describe a taxonomy of current corporate data sharing practices for public good: research partnerships; prizes and challenges; trusted intermediaries; application programming interfaces (APIs); intelligence products; and corporate data cooperatives or pooling.
- Examples of data collaboratives include: Yelp Dataset Challenge, the Digital Ecologies Research Partnerhsip, BBVA Innova Challenge, Telecom Italia’s Big Data Challenge, NIH’s Accelerating Medicines Partnership and the White House’s Climate Data Partnerships.
- The authors highlight important questions to consider towards a more comprehensive mapping of these activities.
Verhulst, Stefaan and Sangokoya, David, 2015. “Data Collaboratives: Exchanging Data to Improve People’s Lives.” Medium. Available from: http://bit.ly/1JOBDdy
- The essay refers to data collaboratives as a new form of collaboration involving participants from different sectors exchanging data to help solve public problems. These forms of collaborations can improve people’s lives through data-driven decision-making; information exchange and coordination; and shared standards and frameworks for multi-actor, multi-sector participation.
- The essay cites four activities that are critical to accelerating data collaboratives: documenting value and measuring impact; matching public demand and corporate supply of data in a trusted way; training and convening data providers and users; experimenting and scaling existing initiatives.
- Examples of data collaboratives include NIH’s Precision Medicine Initiative; the Mobile Data, Environmental Extremes and Population (MDEEP) Project; and Twitter-MIT’s Laboratory for Social Machines.
Verhulst, Stefaan, Susha, Iryna, Kostura, Alexander. “Data Collaboratives: matching Supply of (Corporate) Data to Solve Public Problems.” Medium. February 24, 2016. http://bit.ly/1ZEp2Sr.
- This piece articulates a set of key lessons learned during a session at the International Data Responsibility Conference focused on identifying emerging practices, opportunities and challenges confronting data collaboratives.
- The authors list a number of privately held data sources that could create positive public impacts if made more accessible in a collaborative manner, including:
- Data for early warning systems to help mitigate the effects of natural disasters;
- Data to help understand human behavior as it relates to nutrition and livelihoods in developing countries;
- Data to monitor compliance with weapons treaties;
- Data to more accurately measure progress related to the UN Sustainable Development Goals.
- To the end of identifying and expanding on emerging practice in the space, the authors describe a number of current data collaborative experiments, including:
- Trusted Intermediaries: Statistics Netherlands partnered with Vodafone to analyze mobile call data records in order to better understand mobility patterns and inform urban planning.
- Prizes and Challenges: Orange Telecom, which has been a leader in this type of Data Collaboration, provided several examples of the company’s initiatives, such as the use of call data records to track the spread of malaria as well as their experience with Challenge 4 Development.
- Research partnerships: The Data for Climate Action project is an ongoing large-scale initiative incentivizing companies to share their data to help researchers answer particular scientific questions related to climate change and adaptation.
- Sharing intelligence products: JPMorgan Chase shares macro economic insights they gained leveraging their data through the newly established JPMorgan Chase Institute.
- In order to capitalize on the opportunities provided by data collaboratives, a number of needs were identified:
- A responsible data framework;
- Increased insight into different business models that may facilitate the sharing of data;
- Capacity to tap into the potential value of data;
- Transparent stock of available data supply; and
- Mapping emerging practices and models of sharing.
Vogel, N., Theisen, C., Leidig, J. P., Scripps, J., Graham, D. H., & Wolffe, G. “Mining mobile datasets to enable the fine-grained stochastic simulation of Ebola diffusion.” Paper presented at the Procedia Computer Science. 2015. http://bit.ly/1TZDroF.
- The paper presents a research study conducted on the basis of the mobile calls records shared with researchers in the framework of the Data for Development Challenge by the mobile operator Orange.
- The study discusses the data analysis approach in relation to developing a situation of Ebola diffusion built around “the interactions of multi-scale models, including viral loads (at the cellular level), disease progression (at the individual person level), disease propagation (at the workplace and family level), societal changes in migration and travel movements (at the population level), and mitigating interventions (at the abstract government policy level).”
- The authors argue that the use of their population, mobility, and simulation models provide more accurate simulation details in comparison to high-level analytical predictions and that the D4D mobile datasets provide high-resolution information useful for modeling developing regions and hard to reach locations.
Welle Donker, F., van Loenen, B., & Bregt, A. K. “Open Data and Beyond.” ISPRS International Journal of Geo-Information, 5(4). 2016. http://bit.ly/22YtugY.
- This research has developed a monitoring framework to assess the effects of open (private) data using a case study of a Dutch energy network administrator Liander.
- Focusing on the potential impacts of open private energy data – beyond ‘smart disclosure’ where citizens are given information only about their own energy usage – the authors identify three attainable strategic goals:
- Continuously optimize performance on services, security of supply, and costs;
- Improve management of energy flows and insight into energy consumption;
- Help customers save energy and switch over to renewable energy sources.
- The authors propose a seven-step framework for assessing the impacts of Liander data, in particular, and open private data more generally:
- Develop a performance framework to describe what the program is about, description of the organization’s mission and strategic goals;
- Identify the most important elements, or key performance areas which are most critical to understanding and assessing your program’s success;
- Select the most appropriate performance measures;
- Determine the gaps between what information you need and what is available;
- Develop and implement a measurement strategy to address the gaps;
- Develop a performance report which highlights what you have accomplished and what you have learned;
- Learn from your experiences and refine your approach as required.
- While the authors note that the true impacts of this open private data will likely not come into view in the short term, they argue that, “Liander has successfully demonstrated that private energy companies can release open data, and has successfully championed the other Dutch network administrators to follow suit.”
World Economic Forum, 2015. “Data-driven development: pathways for progress.” Geneva: World Economic Forum. http://bit.ly/1JOBS8u
- This report captures an overview of the existing data deficit and the value and impact of big data for sustainable development.
- The authors of the report focus on four main priorities towards a sustainable data revolution: commercial incentives and trusted agreements with public- and private-sector actors; the development of shared policy frameworks, legal protections and impact assessments; capacity building activities at the institutional, community, local and individual level; and lastly, recognizing individuals as both produces and consumers of data.
The trouble with Big Data? It is called the “recency bias”.
One of the problems with such a rate of information increase is that the present moment will always loom far larger than even the recent past. Imagine looking back over a photo album representing the first 18 years of your life, from birth to adulthood. Let’s say that you have two photos for your first two years. Assuming a rate of information increase matching that of the world’s data, you will have an impressive 2,000 photos representing the years six to eight; 200,000 for the years 10 to 12; and a staggering 200,000,000 for the years 16 to 18. That’s more than three photographs for every single second of those final two years.
The moment you start looking backwards to seek the longer view, you have far too much of the recent stuff and far too little of the old
This isn’t a perfect analogy with global data, of course. For a start, much of the world’s data increase is due to more sources of information being created by more people, along with far larger and more detailed formats. But the point about proportionality stands. If you were to look back over a record like the one above, or try to analyse it, the more distant past would shrivel into meaningless insignificance. How could it not, with so many times less information available?
Here’s the problem with much of the big data currently being gathered and analysed. The moment you start looking backwards to seek the longer view, you have far too much of the recent stuff and far too little of the old. Short-sightedness is built into the structure, in the form of an overwhelming tendency to over-estimate short-term trends at the expense of history.
To understand why this matters, consider the findings from social science about ‘recency bias’, which describes the tendency to assume that future events will closely resemble recent experience. It’s a version of what is also known as the availability heuristic: the tendency to base your thinking disproportionately on whatever comes most easily to mind. It’s also a universal psychological attribute. If the last few years have seen exceptionally cold summers where you live, for example, you might be tempted to state that summers are getting colder – or that your local climate may be cooling. In fact, you shouldn’t read anything whatsoever into the data. You would need to take a far, far longer view to learn anything meaningful about climate trends. In the short term, you’d be best not speculating at all – but who among us can manage that?
Short-term analyses aren’t only invalid – they’re actively unhelpful and misleading
The same tends to be true of most complex phenomena in real life: stock markets, economies, the success or failure of companies, war and peace, relationships, the rise and fall of empires. Short-term analyses aren’t only invalid – they’re actively unhelpful and misleading. Just look at the legions of economists who lined up to pronounce events like the 2009 financial crisis unthinkable right until it happened. The very notion that valid predictions could be made on that kind of scale was itself part of the problem.
It’s also worth remembering that novelty tends to be a dominant consideration when deciding what data to keep or delete. Out with the old and in with the new: that’s the digital trend in a world where search algorithms are intrinsically biased towards freshness, and where so-called link rot infests everything from Supreme Court decisions to entire social media services. A bias towards the present is structurally engrained in almost all the technology surrounding us, not least thanks to our habit of ditching most of our once-shiny machines after about five years.
What to do? This isn’t just a question of being better at preserving old data – although this wouldn’t be a bad idea, given just how little is currently able to last decades rather than years. More importantly, it’s about determining what is worth preserving in the first place – and what it means meaningfully to cull information in the name of knowledge.
What’s needed is something that I like to think of as “intelligent forgetting”: teaching our tools to become better at letting go of the immediate past in order to keep its larger continuities in view. It’s an act of curation akin to organising a photograph album – albeit with more maths….(More)”
Searching for Someone: From the “Small World Experiment” to the “Red Balloon Challenge,” and beyond
Essay by Manuel Cebrian, Iyad Rahwan, Victoriano Izquierdo, Alex Rutherford, Esteban Moro and Alex (Sandy) Pentland: “Our ability to search social networks for people and information is fundamental to our success. We use our personal connections to look for new job opportunities, to seek advice about what products to buy, to match with romantic partners, to find a good physician, to identify business partners, and so on.
Despite living in a world populated by seven billion people, we are able to navigate our contacts efficiently, only needing a handful of personal introductions before finding the answer to our question, or the person we are seeking. How does this come to be? In folk culture, the answer to this question is that we live in a “small world.” The catch-phrase was coined in 1929 by the visionary author Frigyes Karinthy in his Chain-Links essay, where these ideas are put forward for the first time.
Let me put it this way: Planet Earth has never been as tiny as it is now. It shrunk — relatively speaking of course — due to the quickening pulse of both physical and verbal communication. We never talked about the fact that anyone on Earth, at my or anyone’s will, can now learn in just a few minutes what I think or do, and what I want or what I would like to do. Now we live in fairyland. The only slightly disappointing thing about this land is that it is smaller than the real world has ever been. — Frigyes Karinthy, Chain-Links, 1929
Then, it was just a dystopian idea reflecting the anxiety of living in an increasingly more connected world. But there was no empirical evidence that this was actually the case, and it took almost 30 years to find any.
Six Degrees of Separation
In 1967, legendary psychologist Stanley Milgram conducted a ground-breaking experiment to test this “small world” hypothesis. He started with random individuals in the U.S. midwest, and asked them to send packages to people in Boston, Massachusetts, whose address was not given. They must contribute to this “search” only by sending the package to individuals known on a first-name basis. Milgram expected that successful searches (if any!) would require hundreds of individuals along the chain from the initial sender to the final recipient.
Surprisingly, however, Milgram found that the average path length was somewhere between five point five and six individuals, which made social search look astonishingly efficient. Although the experiment raised some methodological criticisms, its findings were profound. However, what it did not answer is why social networks have such short paths in the first place. The answer was not obvious. In fact, there were reasons to suspect that short paths were just a myth: social networks are very cliquish. Your friends’ friends are likely to also be your friends, and thus most social paths are short and circular. This “cliquishness” suggests that our search through the social network can easily get “trapped” within our close social community, making social search highly inefficient.
Architectures for Social Search
Again, it took a long time — more than 40 years — before this riddle was solved. In a 1998 seminal paper in Nature, Duncan Watts & Steven Strogatzcame up with an elegant mathematical model to explain the existence of these short paths. They started from a social network that is very cliquish, i.e., most of your friends are also friends of one another. In this model, the world is “large” since the social distance among individuals is very long. However, if we take only a tiny fraction of these connections (say one out of every hundred links), and rewire them to random individuals in the network, that same world suddenly becomes “small.” These random connections allow individuals to jump to faraway communities very quickly — using them as social network highways — thus reducing average path length in a dramatic fashion.
While this theoretical insight suggests that social networks are searchable due to the existence of short paths, it does not yet say much about the “procedure” that people use to find these paths. There is no reason, a priori, that we should know how to find these short chains, especially since there are many chains, and no individuals have knowledge of the network structure beyond their immediate communities. People do not know how the friends of their friends are connected among themselves, and therefore it is not obvious that they would have a good way of navigating their social network while searching.
Soon after Watts and Strogatz came up with this model at Cornell University, a computer scientist across campus, Jon Kleinberg, set out to investigate whether such “small world” networks are searchable. In a landmark Nature article, “Navigation in a Small World,” published in 200o, he showed that social search is easy without global knowledge of the network, but only for a very specific value of the probability of long-range connectivity (i.e., the probability that we know somebody far removed from us, socially, in the social network). With the advent of a publicly available social media dataset such as LiveJournal, David Liben-Nowell and colleagues showed that real-world social networks do indeed have these particular long-range ties. It appears the social architecture of the world we inhabit is remarkably fine-tuned for searchability….
The Tragedy of the Crowdsourcers
Some recent efforts have been made to try and disincentivize sabotage. If verification is also rewarded along the recruitment tree, then the individuals who recruited the saboteurs would have a clear incentive to verify, halt, and punish the saboteurs. This theoretical solution is yet to be tested in practice, and it is conjectured that a coalition of saboteurs, where saboteurs recruit other saboteurs pretending to “vet” them, would make recursive verification futile.
If we are to believe in theory, theory does not shed a promising light on reducing sabotage in social search. We recently proposed the “Crowdsourcing Dilemma.” In it, we perform a game-theoretic analysis of the fundamental tradeoff between the potential for increased productivity of social search and the possibility of being set back by malicious behavior, including misinformation. Our results show that, in competitive scenarios, such as those with multiple social searches competing for the same information, malicious behavior is the norm, not an anomaly — a result contrary to conventional wisdom. Even worse: counterintuitively, making sabotage more costly does not deter saboteurs, but leads all the competing teams to a less desirable outcome, with more aggression, and less efficient collective search for talent.
These empirical and theoretical findings have cautionary implications for the future of social search, and crowdsourcing in general. Social search is surprisingly efficient, cheap, easy to implement, and functional across multiple applications. But there are also surprises in the amount of evildoing that the social searchers will stumble upon while recruiting. As we get deeper and deeper into the recruitment tree, we stumble upon that evil force lurking in the dark side of the network.
Evil mutates and regenerates in the crowd in new forms impossible to anticipate by the designers or participants themselves. Crowdsourcing and its enemies will always be engaged in an co-evolutionary arms race.
Talent is there to be searched and recruited. But so are evil and malice. Ultimately, crowdsourcing experts need to figure out how to recruit more of the former, while deterring more of the later. We might be living on a small world, but the cost and fragility of navigating it could harm any potential strategy to leverage the power of social networks….
Being searchable is a way of being closely connected to everyone else, which is conducive to contagion, group-think, and, most crucially, makes it hard for individuals to differentiate from each other. Evolutionarily, for better or worse, our brain makes us mimic others, and whether this copying of others ends up being part of the Wisdom of the Crowds, or the “stupidity of many,” it is highly sensitive to the scenario at hand.
Katabasis, or the myth of the hero that descends to the underworld and comes back stronger, is as old as time and pervasive across ancient cultures. Creative people seem to need to “get lost.” Grigori Perelman, Shinichi Mochizuki, and Bob Dylan all disappeared for a few years to reemerge later as more creative versions of themselves. Others like J. D. Salinger and Bobby Fisher also vanished, and never came back to the public sphere. If others cannot search and find us, we gain some slack, some room to escape from what we are known for by others. Searching for our true creative selves may rest on the difficulty of others finding us….(More)”
Private Data and the Public Good
Gideon Mann‘s remarks on the occasion of the Robert Khan distinguished lecture at The City College of New York on 5/22/16: and opportunities about a specific aspect of this relationship, the broader need for computer science to engage with the real world. Right now, a key aspect of this relationship is being built around the risks and opportunities of the emerging role of data.
Ultimately, I believe that these relationships, between computer science andthe real world, between data science and real problems, hold the promise tovastly increase our public welfare. And today, we, the people in this room,have a unique opportunity to debate and define a more moral dataeconomy….
The hybrid research model proposes something different. The hybrid research model, embeds, as it were, researchers as practitioners.The thought was always that you would be going about your regular run of business,would face a need to innovate to solve a crucial problem, and would do something novel. At that point, you might choose to work some extra time and publish a paper explaining your innovation. In practice, this model rarely works as expected. Tight deadlines mean the innovation that people do in their normal progress of business is incremental..
This model separated research from scientific publication, and shortens thetime-window of research, to what can be realized in a few year time zone.For me, this always felt like a tremendous loss, with respect to the older so-called “ivory tower” research model. It didn’t seem at all clear how this kindof model would produce the sea change of thought engendered byShannon’s work, nor did it seem that Claude Shannon would ever want towork there. This kind of environment would never support the freestanding wonder, like the robot mouse that Shannon worked on. Moreover, I always believed that crucial to research is publication and participation in the scientific community. Without this engagement, it feels like something different — innovation perhaps.
It is clear that the monopolistic environment that enabled AT&T to support this ivory tower research doesn’t exist anymore. .
Now, the hybrid research model was one model of research at Google, butthere is another model as well, the moonshot model as exemplified byGoogle X. Google X brought together focused research teams to driveresearch and development around a particular project — Google Glass and the Self-driving car being two notable examples. Here the focus isn’t research, but building a new product, with research as potentially a crucial blocking issue. Since the goal of Google X is directly to develop a new product, by definition they don’t publish papers along the way, but they’re not as tied to short-term deliverables as the rest of Google is. However, they are again decidedly un-Bell-Labs like — a secretive, tightly focused, non-publishing group. DeepMind is a similarly constituted initiative — working, for example, on a best-in-the-world Go playing algorithm, with publications happening sparingly.
Unfortunately, both of these approaches, the hybrid research model and the moonshot model stack the deck towards a particular kind of research — research that leads to relatively short term products that generate corporate revenue. While this kind of research is good for society, it isn’t the only kind of research that we need. We urgently need research that is longterm, and that is undergone even without a clear financial local impact. Insome sense this is a “tragedy of the commons”, where a shared public good (the commons) is not supported because everyone can benefit from itwithout giving back. Academic research is thus a non-rival, non-excludible good, and thus reasonably will be underfunded. In certain cases, this takes on an ethical dimension — particularly in health care, where the choice ofwhat diseases to study and address has a tremendous potential to affect human life. Should we research heart disease or malaria? This decisionmakes a huge impact on global human health, but is vastly informed by the potential profit from each of these various medicines….
Private Data means research is out of reach
The larger point that I want to make, is that in the absence of places where long-term research can be done in industry, academia has a tremendous potential opportunity. Unfortunately, it is actually quite difficult to do the work that needs to be done in academia, since many of the resources needed to push the state of the art are only found in industry: in particular data.
Of course, academia also lacks machine resources, but this is a simpler problem to fix — it’s a matter of money, resources form the government could go to enabling research groups building their own data centers or acquiring the computational resources from the market, e.g. Amazon. This is aided by the compute philanthropy that Google and Microsoft practice that grant compute cycles to academic organizations.
But the data problem is much harder to address. The data being collected and generated at private companies could enable amazing discoveries and research, but is impossible for academics to access. The lack of access to private data from companies actually is much more significant effects than inhibiting research. In particular, the consumer level data, collected by social networks and internet companies could do much more than ad targeting.
Just for public health — suicide prevention, addiction counseling, mental health monitoring — there is enormous potential in the use of our online behavior to aid the most needy, and academia and non-profits are set-up to enable this work, while companies are not.
To give a one examples, anorexia and eating disorders are vicious killers. 20 million women and 10 million men suffer from a clinically significant eating disorder at some time in their life, and sufferers of eating disorders have the highest mortality rate of any other mental health disorder — with a jaw-dropping estimated mortality rate of 10%, both directly from injuries sustained by the disorder and by suicide resulting from the disorder.
Eating disorders are particular in that sufferers often seek out confirmatory information, blogs, images and pictures that glorify and validate what sufferers see as “lifestyle” choices. Browsing behavior that seeks out images and guidance on how to starve yourself is a key indicator that someone is suffering. Tumblr, pinterest, instagram are places that people host and seek out this information. Tumblr has tried to help address this severe mental health issue by banning blogs that advocate for self-harm and by adding PSA announcements to query term searches for queries for or related to anorexia. But clearly — this is not the be all and end all of work that could be done to detect and assist people at risk of dying from eating disorders. Moreover, this data could also help understand the nature of those disorders themselves…..
There is probably a role for a data ombudsman within private organizations — someone to protect the interests of the public’s data inside of an organization. Like a ‘public editor’ in a newspaper according to how you’ve set it up. There to protect and articulate the interests of the public, which means probably both sides — making sure a company’s data is used for public good where appropriate, and making sure the ‘right’ to privacy of the public is appropriately safeguarded (and probably making sure the public is informed when their data is compromised).
Next, we need a platform to make collaboration around social good between companies and between companies and academics. This platform would enable trusted users to have access to a wide variety of data, and speed process of research.
Finally, I wonder if there is a way that government could support research sabbaticals inside of companies. Clearly, the opportunities for this research far outstrip what is currently being done…(more)”