New book by David McCandless: “In this mind-blowing follow-up to the bestselling Information is Beautiful, the undisputed king of infographics David McCandless uses stunning and unique visuals to reveal unexpected insights into how the world really works. Every minute of every hour of every day we are bombarded with information – be it on television, in print or online. How can we relate to this mind-numbing overload? Enter David McCandless and his amazing infographics: simple, elegant ways to understand information too complex or abstract to grasp any way but visually. McCandless creates dazzling displays that blend the facts with their connections, contexts and relationships, making information meaningful, entertaining – and beautiful. Knowledge is Beautiful is an endlessly fascinating spin through the world of visualized data, all of it bearing the hallmark of David McCandless’s ground-breaking signature style. Taking infographics to the next level, Knowledge is Beautiful offers a deeper, more wide-ranging look at the world and its history. Covering everything from dog breeds and movie plots to the most commonly used passwords and crazy global warming solutions, Knowledge is Beautiful is guaranteed to enrich your understanding of the world.”
What Cars Did for Today’s World, Data May Do for Tomorrow’s
Quentin Hardy in the New York Times: “New technology products head at us constantly. There’s the latest smartphone, the shiny new app, the hot social network, even the smarter thermostat.
As great (or not) as all these may be, each thing is a small part of a much bigger process that’s rarely admired. They all belong inside a world-changing ecosystem of digital hardware and software, spreading into every area of our lives.
Thinking about what is going on behind the scenes is easier if we consider the automobile, also known as “the machine that changed the world.” Cars succeeded through the widespread construction of highways and gas stations. Those things created a global supply chain of steel plants and refineries. Seemingly unrelated things, including suburbs, fast food and drive-time talk radio, arose in the success.
Today’s dominant industrial ecosystem is relentlessly acquiring and processing digital information. It demands newer and better ways of collecting, shipping, and processing data, much the way cars needed better road building. And it’s spinning out its own unseen businesses.
A few recent developments illustrate the new ecosystem. General Electric plans to announce Monday that it has created a “data lake” method of analyzing sensor information from industrial machinery in places like railroads, airlines, hospitals and utilities. G.E. has been putting sensors on everything it can for a couple of years, and now it is out to read all that information quickly.
The company, working with an outfit called Pivotal, said that in the last three months it has looked at information from 3.4 million miles of flights by 24 airlines using G.E. jet engines. G.E. said it figured out things like possible defects 2,000 times as fast as it could before.
The company has to, since it’s getting so much more data. “In 10 years, 17 billion pieces of equipment will have sensors,” said William Ruh, vice president of G.E. software. “We’re only one-tenth of the way there.”
It hardly matters if Mr. Ruh is off by five billion or so. Billions of humans are already augmenting that number with their own packages of sensors, called smartphones, fitness bands and wearable computers. Almost all of that will get uploaded someplace too.
Shipping that data creates challenges. In June, researchers at the University of California, San Diego announced a method of engineering fiber optic cable that could make digital networks run 10 times faster. The idea is to get more parts of the system working closer to the speed of light, without involving the “slow” processing of electronic semiconductors.
“We’re going from millions of personal computers and billions of smartphones to tens of billions of devices, with and without people, and that is the early phase of all this,” said Larry Smarr, drector of the California Institute for Telecommunications and Information Technology, located inside U.C.S.D. “A gigabit a second was fast in commercial networks, now we’re at 100 gigabits a second. A terabit a second will come and go. A petabit a second will come and go.”
In other words, Mr. Smarr thinks commercial networks will eventually be 10,000 times as fast as today’s best systems. “It will have to grow, if we’re going to continue what has become our primary basis of wealth creation,” he said.
Add computation to collection and transport. Last month, U.C. Berkeley’s AMP Lab, created two years ago for research into new kinds of large-scale computing, spun out a company called Databricks, that uses new kinds of software for fast data analysis on a rental basis. Databricks plugs into the one million-plus computer servers inside the global system of Amazon Web Services, and will soon work inside similar-size megacomputing systems from Google and Microsoft.
It was the second company out of the AMP Lab this year. The first, called Mesosphere, enables a kind of pooling of computing services, building the efficiency of even million-computer systems….”
The infrastructure Africa really needs is better data reporting
Quartz: “This week African leaders met with officials in Washington and agreed to billions of dollars of US investments and infrastructure deals. at But the terrible state of statistical reporting in most of Africa means that it will be nearly impossible to gauge how effective these deals are at making Africans, or the American investors, better off.
Data reporting on the continent is sketchy. Just look at the recent GDP revisions of large countries. How is it that Nigeria’s April GDP recalculation catapulted it ahead of South Africa, making it the largest economy in Africa overnight? Or that Kenya’s economy is actually 20% larger (paywall) than previously thought?
Indeed, countries in Africa get noticeably bad scores on the World Bank’s Bulletin Board on Statistical Capacity, an index of data reporting integrity.
A recent working paper from the Center for Global Development (CGD) shows how politics influence the statistics released by many African countries…
But in the long run, dodgy statistics aren’t good for anyone. They “distort the way we understand the opportunities that are available,” says Amanda Glassman, one of the CGD report’s authors. US firms have pledged $14 billion in trade deals at the summit in Washington. No doubt they would like to know whether high school enrollment promises to create a more educated workforce in a given country, or whether its people have been immunized for viruses.
Overly optimistic indicators also distort how a government decides where to focus its efforts. If school enrollment appears to be high, why implement programs intended to increase it?
The CGD report suggests increased funding to national statistical agencies, and making sure that they are wholly independent from their governments. President Obama is talking up $7 billion into African agriculture. But unless cash and attention are given to improving statistical integrity, he may never know whether that investment has borne fruit”
The Emergence of Government Innovation Teams
Hollie Russon Gilman at TechTank: “A new global currency is emerging. Governments understand that people at home and abroad evaluate them based on how they use technology and innovative approaches in their service delivery and citizen engagement. This raises opportunities, and critical questions about the role of innovation in 21st century governance.
Bloomberg Philanthropies and Nesta, the UK’s Innovation foundation, recently released a global report highlighting 20 government innovation teams. Importantly, the study included teams that were established and funded by all levels of government (city, regional and national), and aims to find creative solutions to seemingly intractable solutions. This report features 20 teams across six continents and features some basic principles and commonalities that are instructive for all types of innovators, inside and outside, of government.
Using Government to Locally Engage
One of the challenges of representational democracy is that elected officials and government officials spend time in bureaucracies isolated from the very people they aim to serve. Perhaps there can be different models. For example, Seoul’s Innovation Bureau is engaging citizens to re-design and re-imagine public services. Seoul is dedicated to becoming a Sharing City; including Tool Kit Centers where citizens can borrow machinery they would rarely use that would also benefit the whole community. This approach puts citizens at the center of their communities and leverages government to work for the people…
As I’ve outlined in a earlier TechTank post, there are institutional constraints for governments to try the unknown. There are potential electoral costs, greater disillusionment, and gaps in vital service delivery. Yet, despite all of these barriers there are a variety of promising tools. For example, Finland has Sitra, an Innovation fund, whose mission is to foster experimentation to transform a diverse set of policy issues including sustainable energy and healthcare. Sitra invests in both the practical research and experiments to further public sector issues as well as invest in early stage companies.
We need a deeper understanding of the opportunities, and challenges, of innovation in government. Luckily there are many researchers, think-tanks, and organizations beginning analysis. For example, Professor and Associate Dean Anita McGahan, of the Rotman School of Management at the University of Toronto, calls for a more strategic approach toward understanding the use of innovation, including big data, in the public sector…”
Digital Footprints: Opportunities and Challenges for Online Social Research
Paper by Golder, Scott A. and Macy, Michael for the Annual Review of Sociology: “Online interaction is now a regular part of daily life for a demographically diverse population of hundreds of millions of people worldwide. These interactions generate fine-grained time-stamped records of human behavior and social interaction at the level of individual events, yet are global in scale, allowing researchers to address fundamental questions about social identity, status, conflict, cooperation, collective action, and diffusion, both by using observational data and by conducting in vivo field experiments. This unprecedented opportunity comes with a number of methodological challenges, including generalizing observations to the offline world, protecting individual privacy, and solving the logistical challenges posed by “big data” and web-based experiments. We review current advances in online social research and critically assess the theoretical and methodological opportunities and limitations. [J]ust as the invention of the telescope revolutionized the study of the heavens, so too by rendering the unmeasurable measurable, the technological revolution in mobile, Web, and Internet communications has the potential to revolutionize our understanding of ourselves and how we interact…. [T]hree hundred years after Alexander Pope argued that the proper study of mankind should lie not in the heavens but in ourselves, we have finally found our telescope. Let the revolution begin. —Duncan Watts”
Fifteen open data insights
Tim Davies from ODRN: “…below are the 15 points from the three-page briefing version, and you can find a full write-up of these points for download. You can also find reports from all the individual project partners, including a collection of quick-read research posters over on the Open Data Research Network website.
15 insights into open data supply, use and impacts
(1) There are many gaps to overcome before open data availability, can lead to widespread effective use and impact. Open data can lead to change through a ‘domino effect’, or by creating ripples of change that gradually spread out. However, often many of the key ‘domino pieces’ are missing, and local political contexts limit the reach of ripples. Poor data quality, low connectivity, scarce technical skills, weak legal frameworks and political barriers may all prevent open data triggering sustainable change. Attentiveness to all the components of open data impact is needed when designing interventions.
(2) There is a frequent mismatch between open data supply and demand in developing countries. Counting datasets is a poor way of assessing the quality of an open data initiative. The datasets published on portals are often the datasets that are easiest to publish, not the datasets most in demand. Politically sensitive datasets are particularly unlikely to be published without civil society pressure. Sometimes the gap is on the demand side – as potential open data users often do not articulate demands for key datasets.
(3) Open data initiatives can create new spaces for civil society to pursue government accountability and effectiveness. The conversation around transparency and accountability that ideas of open data can support is as important as the datasets in some developing countries.
(4) Working on open data projects can change how government creates, prepares and uses its own data. The motivations behind an open data initiative shape how government uses the data itself. Civil society and entrepreneurs interacting with government through open data projects can help shape government data practices. This makes it important to consider which intermediaries gain insider roles shaping data supply.
(5) Intermediaries are vital to both the supply and the use of open data. Not all data needed for governance in developing countries comes from government. Intermediaries can create data, articulate demands for data, and help translate open data visions from political leaders into effective implementations. Traditional local intermediaries are an important source of information, in particular because they are trusted parties.
(6) Digital divides create data divides in both the supply and use of data. In some developing countries key data is not digitised, or a lack of technical staff has left data management patchy and inconsistent. Where Internet access is scarce, few citizens can have direct access to data or services built with it. Full access is needed for full empowerment, but offline intermediaries, including journalists and community radio stations, also play a vital role in bridging the gaps between data and citizens.
(7) Where information is already available and used, the shift to open data involves data evolution rather than data revolution. Many NGOs and intermediaries already access the information which is now becoming available as data. Capacity building should start from existing information and data practices in organisations, and should look for the step-by-step gains to be made from a data-driven approach.
(8) Officials’ fears about the integrity of data are a barrier to more machine-readable data being made available. The publication of data as PDF or in scanned copies is often down to a misunderstanding of how open data works. Only copies can be changed, and originals can be kept authoritative. Helping officials understand this may help increase the supply of data.
(9) Very few datasets are clearly openly licensed, and there is low understanding of what open licenses entail. There are mixed opinions on the importance of a focus on licensing in different contexts. Clear licenses are important to building a global commons of interoperable data, but may be less relevant to particular uses of data on the ground. In many countries wider conversation about licensing are yet to take place.
(10) Privacy issues are not on the radar of most developing country open data projects, although commercial confidentiality does arise as a reason preventing greater data transparency. Much state held data is collected either from citizens or from companies. Few countries in the ODDC study have weak or absent privacy laws and frameworks, yet participants in the studies raised few personal privacy considerations. By contrast, a lack of clarity, and officials’ concerns, about potential breaches of commercial confidentiality when sharing data gathered from firms was a barrier to opening data.
(11) There is more to open data than policies and portals. Whilst central open data portals act as a visible symbol of open data initiatives, a focus on portal building can distract attention from wider reforms. Open data elements can also be built on existing data sharing practices, and data made available through the locations where citizens, NGOs are businesses already go to access information.
(12) Open data advocacy should be aware of, and build upon, existing policy foundations in specific countries and sectors. Sectoral transparency policies for local government, budget and energy industry regulation, amongst others, could all have open data requirements and standards attached, drawing on existing mechanisms to secure sustainable supplies of relevant open data in developing countries. In addition, open data conversations could help make existing data collection and disclosure requirements fit better with the information and data demands of citizens.
(13) Open data is not just a central government issue: local government data, city data, and data from the judicial and legislative branches are all important. Many open data projects focus on the national level, and only on the executive branch. However, local government is closer to citizens, urban areas bring together many of the key ingredients for successful open data initiatives, and transparency in other branches of government is important to secure citizens democratic rights.
(14) Flexibility is needed in the application of definitions of open data to allow locally relevant and effective open data debates and advocacy to emerge. Open data is made up of various elements, including proactive publication, machine-readability and permissions to re-use. Countries at different stages of open data development may choose to focus on one or more of these, but recognising that adopting all elements at once could hinder progress. It is important to find ways to both define open data clearly, and to avoid a reductive debate that does not recognise progressive steps towards greater openness.
(15) There are many different models for an open data initiative: including top-down, bottom-up and sector-specific. Initiatives may also be state-led, civil society-led and entrepreneur-led in their goals and how they are implemented – with consequences for the resources and models required to make them sustainable. There is no one-size-fits-all approach to open data. More experimentation, evaluation and shared learning on the components, partners and processes for putting open data ideas into practice must be a priority for all who want to see a world where open-by-default data drives real social, political and economic change.
You can read more about each of these points in the full report.”
Quantifying the Interoperability of Open Government Datasets
Paper by Pieter Colpaert, Mathias Van Compernolle, Laurens De Vocht, Anastasia Dimou, Miel Vander Sande, Peter Mechant, Ruben Verborgh, and Erik Mannens, to be published in Computer: “Open Governments use the Web as a global dataspace for datasets. It is in the interest of these governments to be interoperable with other governments worldwide, yet there is currently no way to identify relevant datasets to be interoperable with and there is no way to measure the interoperability itself. In this article we discuss the possibility of comparing identifiers used within various datasets as a way to measure semantic interoperability. We introduce three metrics to express the interoperability between two datasets: the identifier interoperability, the relevance and the number of conflicts. The metrics are calculated from a list of statements which indicate for each pair of identifiers in the system whether they identify the same concept or not. While a lot of effort is needed to collect these statements, the return is high: not only relevant datasets are identified, also machine-readable feedback is provided to the data maintainer.”
Selected Readings on Economic Impact of Open Data
The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of open data was originally published in 2014.
Open data is publicly available data – often released by governments, scientists, and occasionally private companies – that is made available for anyone to use, in a machine-readable format, free of charge. Considerable attention has been devoted to the economic potential of open data for businesses and other organizations, and it is now widely accepted that open data plays an important role in spurring innovation, growth, and job creation. From new business models to innovation in local governance, open data is being quickly adopted as a valuable resource at many levels.
Measuring and analyzing the economic impact of open data in a systematic way is challenging, and governments as well as other providers of open data seek to provide access to the data in a standardized way. As governmental transparency increases and open data changes business models and activities in many economic sectors, it is important to understand best practices for releasing and using non-proprietary, public information. Costs, social challenges, and technical barriers also influence the economic impact of open data.
These selected readings are intended as a first step in the direction of answering the question of if we can and how we consider if opening data spurs economic impact.
Selected Reading List (in alphabetical order)
- Carla Bonina — New Business Models and the Values of Open Data: Definitions, Challenges, and Opportunities. – Paper provides an introduction to open data and open data business models, evaluating their potential economic value and identifying future challenges for the effectiveness of open data
- John Carpenter and Phil Watts — Assessing the Value of OS OpenData™ to the Economy of Great Britain – Synopsis – A study examining the economic impact of the OS OpenData initiative to the economy of Great Britain.
- Capgemini Consulting. — The Open Data Economy: Unlocking Economic Value by Opening Government and Public Data. Capgemini Consulting – Paper analyzes trends in open government data interventions among different countries with goal of identifying best practices for stimulating economic impact and creating economic value.
- Deloitte — Open Growth: Stimulating Demand for Open Data in the UK. – Explores emerging data-driven business models and its potential to stimulate demand for open data in the UK economy.
- Nicholas Gruen, John Houghton and Richard Tooth — Open for Business: How Open Data Can Help Achieve the G20 Growth Target — Assesses exiting literature, in-depth case studies, and proposes key strategies for institutions to open data to spur economic development and growth.
- Felipe I Heusser — Understanding Open Government Data and Addressing Its Impact (draft version) – Early research on open data initiatives and its economic impact in developing countries.
- Alex Howard — San Francisco Looks to Tap into the Open Data Economy – This article examines San Francisco’s use of open data in municipal governance.
- Noor Huijboom and Tijs Van den Broek — Open Data: An International Comparison of Strategies — This paper examines five countries and their open data strategies, identifying key features, main barriers, and drivers of progress for of open data programs.
- James Manyika, Michael Chui, Diana Farrell, Steve Van Kuiken, Peter Groves, and Elizabeth Almasi Doshi —Open Data: Unlocking Innovation and Performance with Liquid Innovation — Focuses on quantifying the potential value of open data in critical domains of the global economy.
- Alida Moore — Congressional Transparency Caucus: How Open Data Creates Jobs — Summary of the March 24th briefing of the Congressional Transparency Caucus on the need to increase government transparency through adopting open data initiatives for job creation.
- Andrew Stott —Open Data for Economic Growth— Examines five archetypes of businesses using open data, and provides recommendations for governments trying to maximize economic growth from open data.
Annotated Selected Reading List (in alphabetical order)
Bonina, Carla. New Business Models and the Values of Open Data: Definitions, Challenges, and Opportunities. NEMODE 3K – Small Grants Call 2013. http://bit.ly/1xGf9oe
- In this paper, Dr. Carla Bonina provides an introduction to open data and open data business models, evaluating their potential economic value and identifying future challenges for the effectiveness of open data, such as personal data and privacy, the emerging data divide, and the costs of collecting, producing and releasing open (government) data.
Carpenter, John and Phil Watts. Assessing the Value of OS OpenData™ to the Economy of Great Britain – Synopsis. June 2013. Accessed July 25, 2014. http://bit.ly/1rTLVUE
- John Carpenter and Phil Watts of Ordnance Survey undertook a study to examine the economic impact of open data to the economy of Great Britain. Using a variety of methods such as case studies, interviews, downlad analysis, adoption rates, impact calculation, and CGE modeling, the authors estimates that the OS OpenData initiative will deliver a net of increase in GDP of £13 – 28.5 million for Great Britain in 2013.
Capgemini Consulting. The Open Data Economy: Unlocking Economic Value by Opening Government and Public Data. Capgemini Consulting. Accessed July 24, 2014. http://bit.ly/1n7MR02
- This report explores how governments are leveraging open data for economic benefits. Through using a compariative approach, the authors study important open data from organizational, technological, social and political perspectives. The study highlights the potential of open data to drive profit through increasing the effectiveness of benchmarking and other data-driven business strategies.
Deloitte. Open Growth: Stimulating Demand for Open Data in the UK. Deloitte Analytics. December 2012. Accessed July 24, 2014. http://bit.ly/1oeFhks
- This early paper on open data by Deloitte uses case studies and statistical analysis on open government data to create models of businesses using open data. They also review the market supply and demand of open government data in emerging sectors of the economy.
Gruen, Nicholas, John Houghton and Richard Tooth. Open for Business: How Open Data Can Help Achieve the G20 Growth Target. Accessed July 24, 2014, http://bit.ly/UOmBRe
- This report highlights the potential economic value of the open data agenda in Australia and the G20. The report provides an initial literature review on the economic value of open data, as well as a asset of case studies on the economic value of open data, and a set of recommendations for how open data can help the G20 and Australia achieve target objectives in the areas of trade, finance, fiscal and monetary policy, anti-corruption, employment, energy, and infrastructure.
Heusser, Felipe I. Understanding Open Government Data and Addressing Its Impact (draft version). World Wide Web Foundation. http://bit.ly/1o9Egym
- The World Wide Web Foundation, in collaboration with IDRC has begun a research network to explore the impacts of open data in developing countries. In addition to the Web Foundation and IDRC, the network includes the Berkman Center for Internet and Society at Harvard, the Open Development Technology Alliance and Practical Participation.
Howard, Alex. San Francisco Looks to Tap Into the Open Data Economy. O’Reilly Radar: Insight, Analysis, and Reach about Emerging Technologies. October 19, 2012. Accessed July 24, 2014. http://oreil.ly/1qNRt3h
- Alex Howard points to San Francisco as one of the first municipalities in the United States to embrace an open data platform. He outlines how open data has driven innovation in local governance. Moreover, he discusses the potential impact of open data on job creation and government technology infrastructure in the City and County of San Francisco.
Huijboom, Noor and Tijs Van den Broek. Open Data: An International Comparison of Strategies. European Journal of ePractice. March 2011. Accessed July 24, 2014. http://bit.ly/1AE24jq
- This article examines five countries and their open data strategies, identifying key features, main barriers, and drivers of progress for of open data programs. The authors outline the key challenges facing European, and other national open data policies, highlighting the emerging role open data initiatives are playing in political and administrative agendas around the world.
Manyika, J., Michael Chui, Diana Farrell, Steve Van Kuiken, Peter Groves, and Elizabeth Almasi Doshi. Open Data: Unlocking Innovation and Performance with Liquid Innovation. McKinsey Global Institute. October 2013. Accessed July 24, 2014. http://bit.ly/1lgDX0v
- This research focuses on quantifying the potential value of open data in seven “domains” in the global economy: education, transportation, consumer products, electricity, oil and gas, health care, and consumer finance.
Moore, Alida. Congressional Transparency Caucus: How Open Data Creates Jobs. April 2, 2014. Accessed July 30, 2014. Socrata. http://bit.ly/1n7OJpp
- Socrata provides a summary of the March 24th briefing of the Congressional Transparency Caucus on the need to increase government transparency through adopting open data initiatives. They include key takeaways from the panel discussion, as well as their role in making open data available for businesses.
Stott, Andrew. Open Data for Economic Growth. The World Bank. June 25, 2014. Accessed July 24, 2014. http://bit.ly/1n7PRJF
- In this report, The World Bank examines the evidence for the economic potential of open data, holding that the economic potential is quite large, despite a variation in the published estimates, and difficulties assessing its potential methodologically. They provide five archetypes of businesses using open data, and provides recommendations for governments trying to maximize economic growth from open data.
Sharing Data Is a Form of Corporate Philanthropy
Matt Stempeck in HBR Blog: “Ever since the International Charter on Space and Major Disasters was signed in 1999, satellite companies like DMC International Imaging have had a clear protocol with which to provide valuable imagery to public actors in times of crisis. In a single week this February, DMCii tasked its fleet of satellites on flooding in the United Kingdom, fires in India, floods in Zimbabwe, and snow in South Korea. Official crisis response departments and relevant UN departments can request on-demand access to the visuals captured by these “eyes in the sky” to better assess damage and coordinate relief efforts.
Back on Earth, companies create, collect, and mine data in their day-to-day business. This data has quickly emerged as one of this century’s most vital assets. Public sector and social good organizations may not have access to the same amount, quality, or frequency of data. This imbalance has inspired a new category of corporate giving foreshadowed by the 1999 Space Charter: data philanthropy.
The satellite imagery example is an area of obvious societal value, but data philanthropy holds even stronger potential closer to home, where a wide range of private companies could give back in meaningful ways by contributing data to public actors. Consider two promising contexts for data philanthropy: responsive cities and academic research.
The centralized institutions of the 20th century allowed for the most sophisticated economic and urban planning to date. But in recent decades, the information revolution has helped the private sector speed ahead in data aggregation, analysis, and applications. It’s well known that there’s enormous value in real-time usage of data in the private sector, but there are similarly huge gains to be won in the application of real-time data to mitigate common challenges.
What if sharing economy companies shared their real-time housing, transit, and economic data with city governments or public interest groups? For example, Uber maintains a “God’s Eye view” of every driver on the road in a city:
Imagine combining this single data feed with an entire portfolio of real-time information. An early leader in this space is the City of Chicago’s urban data dashboard, WindyGrid. The dashboard aggregates an ever-growing variety of public datasets to allow for more intelligent urban management.
Over time, we could design responsive cities that react to this data. A responsive city is one where services, infrastructure, and even policies can flexibly respond to the rhythms of its denizens in real-time. Private sector data contributions could greatly accelerate these nascent efforts.
Data philanthropy could similarly benefit academia. Access to data remains an unfortunate barrier to entry for many researchers. The result is that only researchers with access to certain data, such as full-volume social media streams, can analyze and produce knowledge from this compelling information. Twitter, for example, sells access to a range of real-time APIs to marketing platforms, but the price point often exceeds researchers’ budgets. To accelerate the pursuit of knowledge, Twitter has piloted a program called Data Grants offering access to segments of their real-time global trove to select groups of researchers. With this program, academics and other researchers can apply to receive access to relevant bulk data downloads, such as an period of time before and after an election, or a certain geographic area.
Humanitarian response, urban planning, and academia are just three sectors within which private data can be donated to improve the public condition. There are many more possible applications possible, but few examples to date. For companies looking to expand their corporate social responsibility initiatives, sharing data should be part of the conversation…
Companies considering data philanthropy can take the following steps:
- Inventory the information your company produces, collects, and analyzes. Consider which data would be easy to share and which data will require long-term effort.
- Think who could benefit from this information. Who in your community doesn’t have access to this information?
- Who could be harmed by the release of this data? If the datasets are about people, have they consented to its release? (i.e. don’t pull a Facebook emotional manipulation experiment).
- Begin conversations with relevant public agencies and nonprofit partners to get a sense of the sort of information they might find valuable and their capacity to work with the formats you might eventually make available.
- If you expect an onslaught of interest, an application process can help qualify partnership opportunities to maximize positive impact relative to time invested in the program.
- Consider how you’ll handle distribution of the data to partners. Even if you don’t have the resources to set up an API, regular releases of bulk data could still provide enormous value to organizations used to relying on less-frequently updated government indices.
- Consider your needs regarding privacy and anonymization. Strip the data of anything remotely resembling personally identifiable information (here are some guidelines).
- If you’re making data available to researchers, plan to allow researchers to publish their results without obstruction. You might also require them to share the findings with the world under Open Access terms….”
'Big Data' Will Change How You Play, See the Doctor, Even Eat
We’re entering an age of personal big data, and its impact on our lives will surpass that of the Internet. Data will answer questions we could never before answer with certainty—everyday questions like whether that dress actually makes you look fat, or profound questions about precisely how long you will live.
Every 20 years or so, a powerful technology moves from the realm of backroom expertise and into the hands of the masses. In the late-1970s, computing made that transition—from mainframes in glass-enclosed rooms to personal computers on desks. In the late 1990s, the first web browsers made networks, which had been for science labs and the military, accessible to any of us, giving birth to the modern Internet.
Each transition touched off an explosion of innovation and reshaped work and leisure. In 1975, 50,000 PCs were in use worldwide. Twenty years later: 225 million. The number of Internet users in 1995 hit 16 million. Today it’s more than 3 billion. In much of the world, it’s hard to imagine life without constant access to both computing and networks.
The 2010s will be the coming-out party for data. Gathering, accessing and gleaning insights from vast and deep data has been a capability locked inside enterprises long enough. Cloud computing and mobile devices now make it possible to stand in a bathroom line at a baseball game while tapping into massive computing power and databases. On the other end, connected devices such as the Nest thermostat or Fitbit health monitor and apps on smartphones increasingly collect new kinds of information about everyday personal actions and habits, turning it into data about ourselves.
More than 80 percent of data today is unstructured: tangles of YouTube videos, news stories, academic papers, social network comments. Unstructured data has been almost impossible to search for, analyze and mix with other data. A new generation of computers—cognitive computing systems that learn from data—will read tweets or e-books or watch video, and comprehend its content. Somewhat like brains, these systems can link diverse bits of data to come up with real answers, not just search results.
Such systems can work in natural language. The progenitor is the IBM Watson computer that won on Jeopardy in 2011. Next-generation Watsons will work like a super-powered Google. (Google today is a data-searching wimp compared with what’s coming.)
Sports offers a glimpse into the data age. Last season the NBA installed in every arena technology that can “watch” a game and record, in 48 minutes of action, more than 4 million data points about every movement and shot. That alone could yield new insights for NBA coaches, such as which group of five players most efficiently passes the ball around….
Think again about life before personal computing and the Internet. Even if someone told you that you’d eventually carry a computer in your pocket that was always connected to global networks, you would’ve had a hard time imagining what that meant—imagining WhatsApp, Siri, Pandora, Uber, Evernote, Tinder.
As data about everything becomes ubiquitous and democratized, layered on top of computing and networks, it will touch off the most spectacular technology explosion yet. We can see the early stages now. “Big data” doesn’t even begin to describe the enormity of what’s coming next.”