Discovering the Language of Data: Personal Pattern Languages and the Social Construction of Meaning from Big Data


Paper by ; ; in Interdisciplinary Science Reviews: “This paper attempts to address two issues relevant to the sense-making of Big Data. First, it presents a case study for how a large dataset can be transformed into both a visual language and, in effect, a ‘text’ that can be read and interpreted by human beings. The case study comes from direct observation of graduate students at the IIT Institute of Design who investigated task-switching behaviours, as documented by productivity software on a single user’s laptop and a smart phone. Through a series of experiments with the resulting dataset, the team effects a transformation of that data into a catalogue of visual primitives — a kind of iconic alphabet — that allow others to ‘read’ the data as a corpus and, more provocatively, suggest the formation of a personal pattern language. Second, this paper offers a model for human-technical collaboration in the sense-making of data, as demonstrated by this and other teams in the class. Current sense-making models tend to be data- and technology-centric, and increasingly presume data visualization as a primary point of entry of humans into Big Data systems. This alternative model proposes that meaningful interpretation of data emerges from a more elaborate interplay between algorithms, data and human beings….(More)”

 

Big Data for Social Good


Introduction to a Special Issue of the Journal “Big Data” by Catlett Charlie and Ghani Rayid: “…organizations focused on social good are realizing the potential as well but face several challenges as they seek to become more data-driven. The biggest challenge they face is a paucity of examples and case studies on how data can be used for social good. This special issue of Big Data is targeted at tackling that challenge and focuses on highlighting some exciting and impactful examples of work that uses data for social good. The special issue is just one example of the recent surge in such efforts by the data science community. …

This special issue solicited case studies and problem statements that would either highlight (1) the use of data to solve a social problem or (2) social challenges that need data-driven solutions. From roughly 20 submissions, we selected 5 articles that exemplify this type of work. These cover five broad application areas: international development, healthcare, democracy and government, human rights, and crime prevention.

“Understanding Democracy and Development Traps Using a Data-Driven Approach” (Ranganathan et al.) details a data-driven model between democracy, cultural values, and socioeconomic indicators to identify a model of two types of “traps” that hinder the development of democracy. They use historical data to detect causal factors and make predictions about the time expected for a given country to overcome these traps.

“Targeting Villages for Rural Development Using Satellite Image Analysis” (Varshney et al.) discusses two case studies that use data and machine learning techniques for international economic development—solar-powered microgrids in rural India and targeting financial aid to villages in sub-Saharan Africa. In the process, the authors stress the importance of understanding the characteristics and provenance of the data and the criticality of incorporating local “on the ground” expertise.

In “Human Rights Event Detection from Heterogeneous Social Media Graphs,” Chen and Neil describe efficient and scalable techniques to use social media in order to detect emerging patterns in human rights events. They test their approach on recent events in Mexico and show that they can accurately detect relevant human rights–related tweets prior to international news sources, and in some cases, prior to local news reports, which could potentially lead to more timely, targeted, and effective advocacy by relevant human rights groups.

“Finding Patterns with a Rotten Core: Data Mining for Crime Series with Core Sets” (Wang et al.) describes a case study with the Cambridge Police Department, using a subspace clustering method to analyze the department’s full housebreak database, which contains detailed information from thousands of crimes from over a decade. They find that the method allows human crime analysts to handle vast amounts of data and provides new insights into true patterns of crime committed in Cambridge…..(More)

The Data Disclosure Decision


“The CIO Council Innovation Committee has released its first Open Data case study, The Data Disclosure Decision, showcasing the Department of Education (Education) Disclosure Review Board.
The Department of Education is a national warehouse for open data across a decentralized educational system, managing and exchanging education related data from across the country. Education collects large amounts of aggregate data at the state, district, and school level, disaggregated by a number of demographic variables. A majority of the data Education collects is considered personally identifiable information (PII), making data disclosure avoidance plans a mandatory component of Education’s data releases. With their expansive data sets and a need to protect sensitive information, Education quickly realized the need to organize and standardize their data disclosure protocol.
Education formally established the Data Disclosure Board with Secretary of Education Arne Duncan signing their Charter in August 2013. Since its inception, the Disclosure Review Board has recognized substantial successes and has greatly increased the volume and quality of data being released. Education’s Disclosure Review Board is continually learning through its open data journey and improving their approach through cultural change and leadership buy-in.
Learn more about Education’s Data Review Board’s story by reading The Data Disclosure Decision where you will find the full account of their experience and what they learned along the way. Read The Data Disclosure Decision

Citizens Connect


Harvard Business School Case Study by Mitchell Weiss: “Funding to scale Citizens Connect, Boston’s 311 app, is both a blessing and a burden and tests two public entrepreneurs. In 2012, the Commonwealth of Massachusetts provides Boston’s Mayor’s Office of New Urban Mechanics with a grant to scale Citizens Connect across the state. The money gives two co-creators of Citizens Connect, Chris Osgood and Nigel Jacob, a chance to grow their vision for citizen-engaged governance and civic innovation, but it also requires that the two City of Boston leaders sit on a formal selection committee that pits their original partner, Connected Bits, against another player that might meet the specific requirements for delivering a statewide version. The selection and scaling process raise questions beyond just which partner to choose. What would happen to the Citizens Connect brand as Osgood and Jacob’s product spreads across the state? Who could help scale their work best then nationally? Which business models were best positioned to drive that growth? What intellectual property arrangements would best enable it? And what role should the two city employees have, anyway, in scaling Citizens Connect outside of Boston in the first place? These questions hung in the air as they pondered the one big one about passing over Connected Bits for another partner: should they?…(More)”

Scenario Planning Case Studies Using Open Government Data


New Paper by Robert Power, Bella Robinson, Lachlan Rudd, and Andrew Reeson: “The opportunity for improved decision making has been enhanced in recent years through the public availability of a wide variety of information. In Australia, government data is routinely made available and maintained in the http://data.gov.au repository. This is a single point of reference for data that can be reused for purposes beyond that originally considered by the data custodians. Similarly a wealth of citizen information is available from the Australian Bureau of Statistics. Combining this data allows informed decisions to be made through planning scenarios.”

We present two case studies that demonstrate the utility of data integration and web mapping. As a simple proof of concept the user can explore different scenarios in each case study by indicating the relative weightings to be used for the decision making process. Both case studies are demonstrated as a publicly available interactive map-based website….(More)”

The story of the sixth myth of open data and open government


Paper by Ann-Sofie Hellberg and Karin Hedström: “The aim of this paper is to describe a local government effort to realise an open government agenda. This is done using a storytelling approach….The empirical data is based on a case study. We participated in, as well as followed, the process of realising an open government agenda on a local level, where citizens were invited to use open public data as the basis for developing apps and external web solutions. Based on an interpretative tradition, we chose storytelling as a way to scrutinize the competition process. In this paper, we present a story about the competition process using the story elements put forward by Kendall and Kendall (2012).

….Our research builds on existing research by proposing the myth that the “public” wants to make use of open data. We provide empirical insights into the challenge of gaining benefits from open public data. In particular, we illustrate the difficulties in getting citizens interested in using open public data. Our case shows that people seem to like the idea of open public data, but do not necessarily participate actively in the data re-use process…..This study illustrates the difficulties of promoting the re-use of open public data. Public organisations that want to pursue an open government agenda can use our findings as empirical insights… (More)”

 

Motivations for sustained participation in crowdsourcing: The role of talk in a citizen science case study


Paper by CB. Jackson, C. Østerlund, G. Mugar, KDV. Hassman for the Proceedings of the Forty-eighth Hawai’i International Conference on System Science (HICSS-48): “The paper explores the motivations of volunteers in a large crowdsourcing project and contributes to our understanding of the motivational factors that lead to deeper engagement beyond initial participation. Drawing on the theory of legitimate peripheral participation (LPP) and the literature on motivation in crowdsourcing, we analyze interview and trace data from a large citizen science project. The analyses identify ways in which the technical features of the projects may serve as motivational factors leading participants towards sustained participation. The results suggest volunteers first engage in activities to support knowledge acquisition and later share knowledge with other volunteers and finally increase participation in Talk through a punctuated process of role discovery…(More)”

.

Ebola’s Information Paradox


 Steven Johnson at The New York Times:” …The story of the Broad Street outbreak is perhaps the most famous case study in public health and epidemiology, in large part because it led to the revolutionary insight that cholera was a waterborne disease, not airborne as most believed at the time. But there is another element of the Broad Street outbreak that warrants attention today, as popular anxiety about Ebola surges across the airwaves and subways and living rooms of the United States: not the spread of the disease itself, but the spread of information about the disease.

It was a full seven days after Baby Lewis became ill, and four days after the Soho residents began dying in mass numbers, before the outbreak warranted the slightest mention in the London papers, a few short lines indicating that seven people had died in the neighborhood. (The report understated the growing death toll by an order of magnitude.) It took two entire weeks before the press began treating the outbreak as a major news event for the city.

Within Soho, the information channels were equally unreliable. Rumors spread throughout the neighborhood that the entire city had succumbed at the same casualty rate, and that London was facing a catastrophe on the scale of the Great Fire of 1666. But this proved to be nothing more than rumor. Because the Soho crisis had originated with a single-point source — the poisoned well — its range was limited compared with its intensity. If you lived near the Broad Street well, you were in grave danger. If you didn’t, you were likely to be unaffected.

Compare this pattern of information flow to the way news spreads now. On Thursday, Craig Spencer, a New York doctor, was given a diagnosis of Ebola after presenting a high fever, and the entire world learned of the test result within hours of the patient himself learning it. News spread with similar velocity several weeks ago with the Dallas Ebola victim, Thomas Duncan. In a sense, it took news of the cholera outbreak a week to travel the 20 blocks from Soho to Fleet Street in 1854; today, the news travels at nearly the speed of light, as data traverses fiber-optic cables. Thanks to that technology, the news channels have been on permanent Ebola watch for weeks now, despite the fact that, as the joke went on Twitter, more Americans have been married to Kim Kardashian than have died in the United States from Ebola.

As societies and technologies evolve, the velocities vary with which disease and information can spread. The tremendous population density of London in the 19th century enabled the cholera bacterium to spread through a neighborhood with terrifying speed, while the information about that terror moved more slowly. This was good news for the mental well-being of England’s wider population, which was spared the anxiety of following the death count as if it were a stock ticker. But it was terrible from a public health standpoint; the epidemic had largely faded before the official institutions of public health even realized the magnitude of the outbreak….

Information travels faster than viruses do now. This is why we are afraid. But this is also why we are safe.”

The Power of Data Analytics to Transform Government


Hugo Moreno at Forbes: It’s mind boggling to consider the amount of information governments collect on their citizens. We often just expect them to manage and analyze it in a way that will benefit the general public and facilitate government transparency. However, it can be difficult to organize, manage and extract insights from these large, diverse data sets. According to “Analytics Paves the Way for Better Government,” a Forbes Insights case study sponsored by SAP, government leaders have called for investment in Big Data analytics capabilities to modernize government services and aid their economies. State and federal governments have begun to recognize the benefits of applying analytics. In fact, McKinsey & Co. estimates that by digitizing information, disseminating public data sets and applying analytics to improve decision making, governments around the world can act as catalysts for more than $3 trillion in economic value.
Governor Mike Pence of Indiana understands the importance of data and is keeping it at the center of his long-term vision for improving the management and effectiveness of government programs and making Indiana a leader in data-driven decision making. A year after taking office, he ordered state agencies to collaborate and share data to improve services. Data sharing is not a common practice in states, but the governor recognized that sharing data will lead to a successful enterprise.
Insights from analytics will help Indiana pursue six public policy goals: Increase private sector employment; attract new investment to the state; improve the quality of the state’s workforce; improve the health, safety and well-being of families; increase high school graduation rates; and improve the math and reading skills of elementary students….”

Proof: How Crowdsourced Election Monitoring Makes a Difference


Patrick Meier at iRevolution: “My colleagues Catie Bailard & Steven Livingston have just published the results of their empirical study on the impact of citizen-based crowdsourced election monitoring. Readers of iRevolution may recall that my doctoral dissertation analyzed the use of crowdsourcing in repressive environments and specifically during contested elections. This explains my keen interest in the results of my colleagues’ news data-driven study, which suggests that crowdsourcing does have a measurable and positive impact on voter turnout.

Reclaim Naija

Catie and Steven are “interested in digitally enabled collective action initiatives” spearheaded by “nonstate actors, especially in places where the state is incapable of meeting the expectations of democratic governance.” They are particularly interested in measuring the impact of said initiatives. “By leveraging the efficiencies found in small, incremental, digitally enabled contributions (an SMS text, phone call, email or tweet) to a public good (a more transparent election process), crowdsourced elections monitoring constitutes [an] important example of digitally-enabled collective action.” To be sure, “the successful deployment of a crowdsourced elections monitoring initiative can generate information about a specific political process—information that would otherwise be impossible to generate in nations and geographic spaces with limited organizational and administrative capacity.”

To this end, their new study tests for the effects of citizen-based crowdsourced election monitoring efforts on the 2011 Nigerian presidential elections. More specifically, they analyzed close to 30,000 citizen-generated reports of failures, abuses and successes which were publicly crowdsourced and mapped as part of the Reclaim Naija project. Controlling for a number of factors, Catie and Steven find that the number and nature of crowdsourced reports is “significantly correlated with increased voter turnout.”

In conclusion, the authors argue that “digital technologies fundamentally change information environments and, by doing so, alter the opportunities and constraints that the political actors face.” This new study is an important contribution to the literature and should be required reading for anyone interested in digitally-enabled, crowdsourced collective action. Of course, the analysis focuses on “just” one case study, which means that the effects identified in Nigeria may not occur in other crowdsourced, election monitoring efforts. But that’s another reason why this study is important—it will no doubt catalyze future research to determine just how generalizable these initial findings are.”