The Brave New World of Good


Brad Smith: “Welcome to the Brave New World of Good. Once almost the exclusive province of nonprofit organizations and the philanthropic foundations that fund them, today the terrain of good is disputed by social entrepreneurs, social enterprises, impact investors, big business, governments, and geeks. Their tools of choice are markets, open data, innovation, hackathons, and disruption. They cross borders, social classes, and paradigms with the swipe of a touch screen. We seemed poised to unleash a whole new era of social and environmental progress, accompanied by unimagined economic prosperity.
As a brand, good is unassailably brilliant. Who could be against it? It is virtually impossible to write an even mildly skeptical blog post about good without sounding well, bad — or at least a bit old-fashioned. For the record, I firmly believe there is much in the brave new world of good that is helping us find our way out of the tired and often failed models of progress and change on which we have for too long relied. Still, there are assumptions worth questioning and questions worth answering to ensure that the good we seek is the good that can be achieved.

Open Data
Second only to “good” in terms of marketing genius is the concept of “open data.” An offspring of previous movements such as “open source,” “open content,” and “open access,” open data in the Internet age has come to mean data that is machine-readable, free to access, and free to use, re-use, and re-distribute, subject to attribution. Fully open data goes way beyond posting your .pdf document on a Web site (as neatly explained by Tim Berners Lee’s five-star framework).
When it comes to government, there is a rapidly accelerating movement around the world that is furthering transparency by making vast stores of data open. Ditto on the data of international aid funders like the United States Agency for International Development, the World Bank, and the Organisation for Economic Co-operation and Development. The push has now expanded to the tax return data of nonprofits and foundations (IRS Forms 990). Collection of data by government has a business model; it’s called tax dollars. However, open data is not born pure. Cleaning that data, making it searchable, and building and maintaining reliable user interfaces is complex, time-consuming, and often expensive. That requires a consistent stream of income of the kind that can only come from fees, subscriptions, or, increasingly less so, government.
Foundation grants are great for short-term investment, experimentation, or building an app or two, but they are no substitute for a scalable business model. Structured, longitudinal data are vital to social, environmental, and economic progress. In a global economy where government is retreating from the funding of public goods, figuring how to pay for the cost of that data is one of our greatest challenges.”

Towards an effective framework for building smart cities: Lessons from Seoul and San Francisco


New paper by JH Lee, MG Hancock, MC Hu in Technological Forecasting and Social Change: “This study aims to shed light on the process of building an effective smart city by integrating various practical perspectives with a consideration of smart city characteristics taken from the literature. We developed a framework for conducting case studies examining how smart cities were being implemented in San Francisco and Seoul Metropolitan City. The study’s empirical results suggest that effective, sustainable smart cities emerge as a result of dynamic processes in which public and private sector actors coordinate their activities and resources on an open innovation platform. The different yet complementary linkages formed by these actors must further be aligned with respect to their developmental stage and embedded cultural and social capabilities. Our findings point to eight ‘stylized facts’, based on both quantitative and qualitative empirical results that underlie the facilitation of an effective smart city. In elaborating these facts, the paper offers useful insights to managers seeking to improve the delivery of smart city developmental projects.”
 

Global Open Data Initiative moving forward


“The Global Open Data Initiative will serve as a guiding voice internationally on open data issues. Civil society groups who focus on open data have often been isolated to single national contexts, despite the similar challenges and opportunities repeating themselves in countries across the globe. The Global Open Data Initiative aims to help share valuable resources, guidance and judgment, and to clarify the potential for government open data across the world.
Provide a leading vision for how governments approach open data. Open data commitments are among the most popular commitments for countries participating in the Open Government Partnership. The Global Open Data Initiative recommendations and resources will help guide open data initiatives and others as they seek to design and implement strong, effective open data initiatives and policies. Global Open Data Initiative resources will also help civil society actors who will be evaluating government initiatives.
Increase awareness of open data. Global Open Data Initiative will work to advance the understanding of open data issues, challenges, and resources by promoting best practices, engaging in online and offline dialogue, and supporting networking between organizations both new and familiar to the open data arena.
Support the development of the global open data community especially in civil society. Civil society organizations (CSOs) have a key role to play as suppliers, intermediaries, and users of open data, though at present, relatively few organizations are engaging with open data and the opportunities it presents. Most CSOs lack the awareness, skills and support needed to be active users and providers of open data in ways that can help them meet their goals. The Global Open Data Initiative aims to help CSOs, to engage with and use open data whether whatever area they work on – be it climate change, democratic rights, land governance or financial reform.
Our immediate focus is on two activities:

  1. To consult with members of the CSO community around the world about what they think is important in this area
  2. Develop a set of principles in collaboration with the CSO community to guide open government data policies and approaches and to help initiate, strengthen and further elevate conversations between governments and civil society.”

From Collective Intelligence to Collective Intelligence Systems


New Paper by A. Kornrumpf and U. Baumol in  the International Journal of Cooperative Information Systems: “Collective intelligence (CI) has become a popular research topic over the past few years. However, the CI debate suffers from several problems such as that there is no unanimously agreed-upon definition of CI that clearly differentiates between CI and related terms such as swarm intelligence (SI) and collective intelligence systems (CIS). Furthermore, a model of such CIS is lacking for purposes of research and the design of new CIS. This paper aims at untangling the definitions of CI and other related terms, especially CIS, and at providing a semi-structured model of CIS as a first step towards more structured research. The authors of this paper argue that CI can be defined as the ability of sufficiently large groups of individuals to create an emergent solution for a specific class of problems or tasks. The authors show that other alleged properties of CI which are not covered by this definition, are, in fact, properties of CIS and can be understood by regarding CIS as complex socio-technical systems (STS) that enable the realization of CI. The model defined in this article serves as a means to structure open questions in CIS research and helps to understand which research methodology is adequate for different aspects of CIS.”

Towards an information systems perspective and research agenda on crowdsourcing for innovation


New paper by A Majchrzak and A Malhotra in The Journal of Strategic Information Systems: “Recent years have seen an increasing emphasis on open innovation by firms to keep pace with the growing intricacy of products and services and the ever changing needs of the markets. Much has been written about open innovation and its manifestation in the form of crowdsourcing. Unfortunately, most management research has taken the information system (IS) as a given. In this essay we contend that IS is not just an enabler but rather can be a shaper that optimizes open innovation in general and crowdsourcing in particular. This essay is intended to frame crowdsourcing for innovation in a manner that makes more apparent the issues that require research from an IS perspective. In doing so, we delineate the contributions that the IS field can make to the field of crowdsourcing.

  • Reviews participation architectures supporting current crowdsourcing, finding them inadequate for innovation development by the crowd.

  • Identifies 3 tensions for explaining why a participation architecture for crowdsourced innovation is difficult.

  • Identifies affordances for the participation architectures that may help to manage the tension.

  • Uses the tensions and possible affordances to identify research questions for IS scholars.”

The Value of Personal Data


The Digital Enlightenment Yearbook 2013 is dedicated this year to Personal Data:  “The value of personal data has traditionally been understood in ethical terms as a safeguard for personality rights such as human dignity and privacy. However, we have entered an era where personal data are mined, traded and monetized in the process of creating added value – often in terms of free services including efficient search, support for social networking and personalized communications. This volume investigates whether the economic value of personal data can be realized without compromising privacy, fairness and contextual integrity. It brings scholars and scientists from the disciplines of computer science, law and social science together with policymakers, engineers and entrepreneurs with practical experience of implementing personal data management.
The resulting collection will be of interest to anyone concerned about privacy in our digital age, especially those working in the field of personal information management, whether academics, policymakers, or those working in the private sector.”

A Global Online Network Lets Health Professionals Share Expertise


Rebecca Weintraub, Aaron C. Beals, Sophie G. Beauvais, Marie Connelly, Julie Rosenberg Talbot, Aaron VanDerlip, and Keri Wachter in HBR Blog Network : “In response, our team at the Global Health Delivery Project at Harvard launched an online platform to generate and disseminate knowledge in health care delivery. With guidance from Paul English, chief technology officer of Kayak, we borrowed a common tool from business — professional virtual communities (PVCs) — and adapted it to leverage the wisdom of the crowds.  In business, PVCs are used for knowledge management and exchange across multiple organizations, industries, and geographies. In health care, we thought, they could be a rapid, practical means for diverse professionals to share insights and tactics. As GHDonline’s rapid growth and success have demonstrated, they can indeed be a valuable tool for improving the efficiency, quality, and the ultimate value of health care delivery….
Creating a professional virtual network that would be high quality, participatory, and trusted required some trial and error both in terms of the content and technology. What features would make the site inviting, accessible, and useful? How could members establish trust? What would it take to involve professionals from differing time zones in different languages?
The team launched GHDonline in June 2008 with public communities in tuberculosis-infection control, drug-resistant tuberculosis, adherence and retention, and health information technology. Bowing to the reality of the sporadic electricity service and limited internet bandwidth available in many countries, we built a lightweight platform, meaning that the site minimized the use of images and only had features deemed essential….
Even with early successes in terms of membership growth and daily postings to communities, user feedback and analytics directed the team to simplify the user navigation and experience. Longer, more nuanced, in-depth conversations in the communities were turned into “discussion briefs” — two-page, moderator-reviewed summaries of the conversations. The GHDonline team integrated Google Translate to accommodate the growing number of non-native English speakers. New public communities were launched for nursing, surgery, and HIV and malaria treatment and prevention. You can view all of the features of GHDOnline here (PDF).”

Using Big Data to Ask Big Questions


Chase Davis in the SOURCE: “First, let’s dispense with the buzzwords. Big Data isn’t what you think it is: Every federal campaign contribution over the last 30-plus years amounts to several tens of millions of records. That’s not Big. Neither is a dataset of 50 million Medicare records. Or even 260 gigabytes of files related to offshore tax havens—at least not when Google counts its data in exabytes. No, the stuff we analyze in pursuit of journalism and app-building is downright tiny by comparison.
But you know what? That’s ok. Because while super-smart Silicon Valley PhDs are busy helping Facebook crunch through petabytes of user data, they’re also throwing off intellectual exhaust that we can benefit from in the journalism and civic data communities. Most notably: the ability to ask Big Questions.
Most of us who analyze public data for fun and profit are familiar with small questions. They’re focused, incisive, and often have the kind of black-and-white, definitive answers that end up in news stories: How much money did Barack Obama raise in 2012? Is the murder rate in my town going up or down?
Big Questions, on the other hand, are speculative, exploratory, and systemic. As the name implies, they are also answered at scale: Rather than distilling a small slice of a dataset into a concrete answer, Big Questions look at entire datasets and reveal small questions you wouldn’t have thought to ask.
Can we track individual campaign donor behavior over decades, and what does that tell us about their influence in politics? Which neighborhoods in my city are experiencing spikes in crime this week, and are police changing patrols accordingly?
Or, by way of example, how often do interest groups propose cookie-cutter bills in state legislatures?

Looking at Legislation

Even if you don’t follow politics, you probably won’t be shocked to learn that lawmakers don’t always write their own bills. In fact, interest groups sometimes write them word-for-word.
Sometimes those groups even try to push their bills in multiple states. The conservative American Legislative Exchange Council has gotten some press, but liberal groups, social and business interests, and even sororities and fraternities have done it too.
On its face, something about elected officials signing their names to cookie-cutter bills runs head-first against people’s ideal of deliberative Democracy—hence, it tends to make news. Those can be great stories, but they’re often limited in scope to a particular bill, politician, or interest group. They’re based on small questions.
Data science lets us expand our scope. Rather than focusing on one bill, or one interest group, or one state, why not ask: How many model bills were introduced in all 50 states, period, by anyone, during the last legislative session? No matter what they’re about. No matter who introduced them. No matter where they were introduced.
Now that’s a Big Question. And with some basic data science, it’s not particularly hard to answer—at least at a superficial level.

Analyze All the Things!

Just for kicks, I tried building a system to answer this question earlier this year. It was intended as an example, so I tried to choose methods that would make intuitive sense. But it also makes liberal use of techniques applied often to Big Data analysis: k-means clustering, matrices, graphs, and the like.
If you want to follow along, the code is here….
To make exploration a little easier, my code represents similar bills in graph space, shown at the top of this article. Each dot (known as a node) represents a bill. And a line connecting two bills (known as an edge) means they were sufficiently similar, according to my criteria (a cosine similarity of 0.75 or above). Thrown into a visualization software like Gephi, it’s easy to click around the clusters and see what pops out. So what do we find?
There are 375 clusters in total. Because of the limitations of our data, many of them represent vague, subject-specific bills that just happen to have similar titles even though the legislation itself is probably very different (think things like “Budget Bill” and “Campaign Finance Reform”). This is where having full bill text would come handy.
But mixed in with those bills are a handful of interesting nuggets. Several bills that appear to be modeled after legislation by the National Conference of Insurance Legislators appear in multiple states, among them: a bill related to limited lines travel insurance; another related to unclaimed insurance benefits; and one related to certificates of insurance.”

The Shutdown’s Data Blackout


Opinion piece by Katherine G. Abraham and John Haltiwanger in The New York Times: “Today, for the first time since 1996 and only the second time in modern memory, the Bureau of Labor Statistics will not issue its monthly jobs report, as a result of the shutdown of nonessential government services. This raises an important question: Are the B.L.S. report and other economic data that the government provides “nonessential”?

If we’re trying to understand how much damage the shutdown or sequestration cuts are doing to jobs or the fragile economic recovery, they are definitely essential. Without robust economic data from the federal government, we can speculate, but we won’t really know.

In the last two shutdowns, in 1995 and 1996, the Congressional Budget Office estimated the economic damage at around 0.5 percent of the gross domestic product. This time, Moody’s estimates that a three-to-four-week shutdown might subtract 1.4 percent (annualized) from gross domestic product growth this quarter and take $55 billion out of the economy. Democrats tend to play up such projections; Republicans tend to play them down. If the shutdown continues, though, we’ll all be less able to tell what impact it is having, because more reports like the B.L.S. jobs report will be delayed, while others may never be issued.

In fact, sequestration cuts that affected 2013 budgets are already leading federal statistics agencies to defer or discontinue dozens of reports on everything from income to overseas labor costs. The economic data these agencies produce are key to tracking G.D.P., earnings and jobs, and to informing the Federal Reserve, the executive branch and Congress on the state of the economy and the impact of economic policies. The data are also critical for decisions made by state and local policy makers, businesses and households.

The combined budget for all the federal statistics agencies totals less than 0.1 percent of the federal budget. Yet the same across-the-board-cut mentality that led to sequester and shutdown has shortsightedly cut statistics agencies, too, as if there were something “nonessential” about spending money on accurately assessing the economic effects of government actions and inactions. As a result, as we move through the shutdown, the debt-ceiling fight and beyond, reliable, essential data on the impact of policy decisions will be harder to come by.

Unless the sequester cuts are reversed, funding for economic data will shrink further in 2014, on top of a string of lean budget years. More data reports will be eliminated at the B.L.S., the Census Bureau, the Bureau of Economic Analysis and other agencies. Even more insidious damage will come from compromising the methods for producing the reports that still are paid for and from failing to prepare for the future.

To save money, survey sample sizes will be cut, reducing the reliability of national data and undermining local statistics. Fewer resources will be devoted to maintaining the listings used to draw business survey samples, running the risk that surveys based on those listings won’t do as good a job of capturing actual economic conditions. Hiring and training will be curtailed. Over time, the availability and quality of economic indicators will diminish.

That would be especially paradoxical and backward at a time when economic statistics can and should be advancing through technological innovation instead of marched backward by politics. Integrating survey data, administrative data and commercial data collected with scanners and other digital technologies could produce richer, more useful information with less of a burden on businesses and households.

Now more than ever, framing sound economic policy depends on timely and accurate information about the economy. Bad or ill-targeted data can lead to bad or ill-targeted decisions about taxes and spending. The tighter the budget and the more contentious the political debate around it, the more compelling the argument for investing in federal data that accurately show how government policies are affecting the economy, so we can target the most effective cuts or spending or other policies, and make ourselves accountable for their results. That’s why Congress should restore funding to the federal statistical agencies at a level that allows them to carry out their critical work.”

Commons at the Intersection of Peer Production, Citizen Science, and Big Data: Galaxy Zoo


New paper by Michael J. Madison: “The knowledge commons research framework is applied to a case of commons governance grounded in research in modern astronomy. The case, Galaxy Zoo, is a leading example of at least three different contemporary phenomena. In the first place Galaxy Zoo is a global citizen science project, in which volunteer non-scientists have been recruited to participate in large-scale data analysis via the Internet. In the second place Galaxy Zoo is a highly successful example of peer production, some times known colloquially as crowdsourcing, by which data are gathered, supplied, and/or analyzed by very large numbers of anonymous and pseudonymous contributors to an enterprise that is centrally coordinated or managed. In the third place Galaxy Zoo is a highly visible example of data-intensive science, sometimes referred to as e-science or Big Data science, by which scientific researchers develop methods to grapple with the massive volumes of digital data now available to them via modern sensing and imaging technologies. This chapter synthesizes these three perspectives on Galaxy Zoo via the knowledge commons framework.”