Big Data for Social Good


Introduction to a Special Issue of the Journal “Big Data” by Catlett Charlie and Ghani Rayid: “…organizations focused on social good are realizing the potential as well but face several challenges as they seek to become more data-driven. The biggest challenge they face is a paucity of examples and case studies on how data can be used for social good. This special issue of Big Data is targeted at tackling that challenge and focuses on highlighting some exciting and impactful examples of work that uses data for social good. The special issue is just one example of the recent surge in such efforts by the data science community. …

This special issue solicited case studies and problem statements that would either highlight (1) the use of data to solve a social problem or (2) social challenges that need data-driven solutions. From roughly 20 submissions, we selected 5 articles that exemplify this type of work. These cover five broad application areas: international development, healthcare, democracy and government, human rights, and crime prevention.

“Understanding Democracy and Development Traps Using a Data-Driven Approach” (Ranganathan et al.) details a data-driven model between democracy, cultural values, and socioeconomic indicators to identify a model of two types of “traps” that hinder the development of democracy. They use historical data to detect causal factors and make predictions about the time expected for a given country to overcome these traps.

“Targeting Villages for Rural Development Using Satellite Image Analysis” (Varshney et al.) discusses two case studies that use data and machine learning techniques for international economic development—solar-powered microgrids in rural India and targeting financial aid to villages in sub-Saharan Africa. In the process, the authors stress the importance of understanding the characteristics and provenance of the data and the criticality of incorporating local “on the ground” expertise.

In “Human Rights Event Detection from Heterogeneous Social Media Graphs,” Chen and Neil describe efficient and scalable techniques to use social media in order to detect emerging patterns in human rights events. They test their approach on recent events in Mexico and show that they can accurately detect relevant human rights–related tweets prior to international news sources, and in some cases, prior to local news reports, which could potentially lead to more timely, targeted, and effective advocacy by relevant human rights groups.

“Finding Patterns with a Rotten Core: Data Mining for Crime Series with Core Sets” (Wang et al.) describes a case study with the Cambridge Police Department, using a subspace clustering method to analyze the department’s full housebreak database, which contains detailed information from thousands of crimes from over a decade. They find that the method allows human crime analysts to handle vast amounts of data and provides new insights into true patterns of crime committed in Cambridge…..(More)

An In-Depth Analysis of Open Data Portals as an Emerging Public E-Service


Paper by Martin Lnenicka: “Governments collect and produce large amounts of data. Increasingly, governments worldwide have started to implement open data initiatives and also launch open data portals to enable the release of these data in open and reusable formats. Therefore, a large number of open data repositories, catalogues and portals have been emerging in the world. The greater availability of interoperable and linkable open government data catalyzes secondary use of such data, so they can be used for building useful applications which leverage their value, allow insight, provide access to government services, and support transparency. The efficient development of successful open data portals makes it necessary to evaluate them systematic, in order to understand them better and assess the various types of value they generate, and identify the required improvements for increasing this value. Thus, the attention of this paper is directed particularly to the field of open data portals. The main aim of this paper is to compare the selected open data portals on the national level using content analysis and propose a new evaluation framework, which further improves the quality of these portals. It also establishes a set of considerations for involving businesses and citizens to create eservices and applications that leverage on the datasets available from these portals….(More)”

Information transparency of public administrations. The right of the people to know and the duty to disseminate public information actively


New book by Miguel Angel Blanes Climent:”El presente trabajo de investigación analiza la situación legal y judicial existente en las principales democracias del mundo y en el ámbito de Naciones Unidas, Consejo de Europa y Unión Europea. Se trata, por tanto, de una poderosa herramienta para saber quién, cómo, cuándo, dónde y a qué tipo de información financiada con fondos públicos se puede acceder por parte de los ciudadanos. Y lo que es más importante: qué recursos administrativos y judiciales se pueden presentar cuando la información no es facilitada y cuáles son sus consecuencias disciplinarias, patrimoniales y penales. El trabajo examina con detalle la nueva Ley 19/2013, de 9 de diciembre, de transparencia, acceso a la información pública y buen gobierno, así como la normativa autonómica existente en la materia.
Es objeto de especial estudio el acceso a la información sensible: adjudicatarios y coste final de los contratos públicos; datos urbanísticos y medioambientales; presupuesto y cuentas públicas; sueldos, dietas y viajes de los cargos electos y funcionarios; financiación de partidos políticos, sindicatos y organizaciones empresariales; listas de espera sanitarias y de vivienda; beneficiarios de subvenciones; publicidad institucional; los servicios públicos de interés general prestados por entidades privadas -telecomunicaciones, electricidad, gas, servicios postales- y los concesionarios de servicios públicos -agua, residuos, transporte, sanidad- etc.
La información que se resiste a ser publicada es toda aquella que permite a los ciudadanos controlar la gestión de los asuntos públicos, exigir la rendición de cuentas y denunciar casos de despilfarro o corrupción. El autor acuña el lema: «la transparencia es como la sinceridad: se exige la ajena y se limita la propia».” …(More)

Turning Government Data into Better Public Service


OMB Blog: “Every day, millions of people use their laptops, phones, and tablets to check the status of their tax refund, get the latest forecast from the National Weather Service, book a campsite at one of our national parks, and much more. There were more than 1.3 billion visits to websites across the Federal Government in just the past 90 days.

Today, during Sunshine Week when we celebrate openness and transparency in government, we are pleased to release the Digital Analytics Dashboard, a new window into the way people access the government online. For the first time, you can see how many people are using a Federal Government website, which pages are most popular, and which devices, browsers, and operating systems people are using. We’ll use the data from the Digital Analytics Program to focus our digital service teams on the services that matter most to the American people, and analyze how much progress we are making. The Dashboard will help government agencies understand how people find, access, and use government services online to better serve the public – all while protecting privacy.  The program does not track individuals. It anonymizes the IP addresses of all visitors and then uses the resulting information in the aggregate….(More)

 

The Missing Information That Municipal-Bond Investors Need


Marc Joffe at Governing: “…There are many reasons why the municipal market lacks sophistication in this area, but a big part of the problem has been a lack of free (or even low-cost) financial-statement data. In this regard, some strides are being made. First, the 2009 launch by the Municipal Securities Rulemaking Board (MSRB) of its Electronic Municipal Market Access (EMMA) system gave investors a one-stop shop for municipal financial disclosure. But as the Securities and Exchange Commission (SEC) observed recently, a large number of municipal-bond issuers have been posting their statements late or not at all. The commission’s Municipal Continuing Disclosure Cooperation Initiative has greatly increased the number of statements on EMMA. Finally, late this year the Census Bureau is expected to begin posting federal single-audit submissions online. These packages include the same basic financial statements typically found in municipal market disclosure.

But the simple publication of thousands of voluminous PDFs does not provide the degree of transparency needed to raise the level of municipal-bond-market financial literacy. The vast majority of investors and analysts lack the patience and/or technical skills needed to extract the valuable needles of insight from this haystack of disclosure.

Investors in corporate securities do not face these difficulties. For the last 20 years, company financial reports have been available in textual form on the SEC’s Electronic Data Gathering, Analysis and Retrieval system. As a result, corporate financial-statement data is freely available in convenient forms around the Internet: Yahoo Finance, MarketWatch, Morningstar and your broker’s website are just a few of the places you can find this data.

So while corporate investors can readily compare the financial statistics of a safe company like Apple to an insolvent one like Radio Shack, municipal investors cannot easily perform the same exercise for Dallas and Detroit.

It wasn’t always this way. Between 1909 and 1931, the Census Bureau published an annual volume entitled “Financial Statistics of Cities Having a Population of Over 30,000.” The final edition — available at the St. Louis Federal Reserve’s website — covered 311 American cities and included hundreds of revenue, expenditure, asset and liability data points for each municipality. Unfortunately, ever since 1931, Census financial data on local governments has become less comprehensive, less timely and less comprehensible to the lay user.

In the years after 1931, we lost the understanding that comparative local-government financial statistics were a public good. While we might look to the federal government to once again offer this this information in today’s era of heightened need, it may be challenged to take on this role in an era of sequesters.

But while we may need the private sector to provide this public good, the federal government can greatly reduce the cost of compiling a local-government financial-statement database. The SEC has required companies to file financial statements in text form — rather than via PDF — since the mid-1990s. In 2008, the SEC further standardized company financial reporting by requiring firms to file their statements in the form of eXtensible Business Reporting Language (XBRL), which imposes a consistent format on all filings. To date, neither the SEC nor the MSRB has pursued a similar course with respect to municipal financial disclosure.

Next week, the Data Transparency Coalition, a group that advocates for the use of XBRL, will hold a Financial Regulation Summit featuring numerous congressional representatives and regulators. Perhaps the extension of XBRL to the municipal-bond market can find its way onto the agenda….(More)

Gamification harnesses the power of games to motivate


Kevin Werbach at the Conversation: “Walk through any public area and you’ll see people glued to their phones, playing mobile games like Game of War and Candy Crush Saga. They aren’t alone. 59% of Americans play video games, and contrary to stereotypes, 48% of gamers are women. The US$100 billion video game industry is among the least-appreciated business phenomena in the world today.

But this isn’t an article about video games. It’s about where innovative organizations are applying the techniques that make those games so powerfully engaging: everywhere else.

Gamification is the perhaps-unfortunate name for the growing practice of applying structural elements, design patterns, and psychological insights from game design to business, education, health, marketing, crowdsourcing and other fields. Over the past four years, gamification has gone through a cycle of (over-)hype and (overblown) disappointment common for technological trends. Yet if you look carefully, you’ll see it everywhere.

Tapping into pieces of games

Gamification involves two primary mechanisms. The first is to take design structures from games, such as levels, achievements, points, and leaderboards — in my book, For the Win, my co-author and I label them “game elements” — and incorporate them into activities. The second, more subtle but ultimately more effective, is to mine the rich vein of design techniques that game designers have developed over many years. Good games pull you in and carry you through a journey that remains engaging, using an evolving balance of challenges and a stream of well crafted, actionable feedback.

Many enterprises now use tools built on top of Salesforce.com’s customer relationship management platform to motivate employees through competitions, points and leaderboards. Online learning platforms such as Khan Academy commonly challenge students to “level up” by sprinkling game elements throughout the process. Even games are now gamified: Microsoft’s Xbox One and Sony’s PS4 consoles offer a meta-layer of achievements and trophies to promote greater game-play.

The differences between a gamified system that incorporates good design principles and one that doesn’t aren’t always obvious on the surface. They show up in the results.

Duolingo is an online language-learning app. It’s pervasively and thoughtfully gamified: points, levels, achievements, bonuses for “streaks,” visual progression indicators, even a virtual currency with various ways to spend it. The well integrated gamification is a major differentiator for Duolingo, which happens to be the most successful tool of its kind. With over 60 million registered users, it teaches languages to more people than the entire US public school system.

Most of the initial high-profile cases of gamification were for marketing: for example, USA Network ramped up its engagement numbers with web-based gamified challenges for fans of its shows, and Samsung gave points and badges for learning about its products.

Soon it became clear that other applications were equally promising. Today, organizations are using gamification to enhance employee performance, promote health and wellness activities, improve retention in online learning, help kids with cancer endure their treatment regimen, and teach people how to code, to name just a few examples. Gamification has potential anywhere that motivation is an important element of success.

Gamification works because our responses to games are deeply hard-wired into our psychology. Game design techniques can activate our innate desires to recognize patterns, solve puzzles, master challenges, collaborate with others, and be in the drivers’ seat when experiencing the world around us. They can also create a safe space for experimentation and learning. After all, why not try something new when you know that even if you fail, you’ll get another life?…(More)

What Your Tweets Say About You


at the New Yorker: “How much can your tweets reveal about you? Judging by the last nine hundred and seventy-two words that I used on Twitter, I’m about average when it comes to feeling upbeat and being personable, and I’m less likely than most people to be depressed or angry. That, at least, is the snapshot provided by AnalyzeWords, one of the latest creations from James Pennebaker, a psychologist at the University of Texas who studies how language relates to well-being and personality. One of Pennebaker’s most famous projects is a computer program called Linguistic Inquiry and Word Count (L.I.W.C.), which looks at the words we use, and in what frequency and context, and uses this information to gauge our psychological states and various aspects of our personality….

Take a study, out last month, from a group of researchers based at the University of Pennsylvania. The psychologist Johannes Eichstaedt and his colleagues analyzed eight hundred and twenty-six million tweets across fourteen hundred American counties. (The counties contained close to ninety per cent of the U.S. population.) Then, using lists of words—some developed by Pennebaker, others by Eichstaedt’s team—that can be reliably associated with anger, anxiety, social engagement, and positive and negative emotions, they gave each county an emotional profile. Finally, they asked a simple question: Could those profiles help determine which counties were likely to have more deaths from heart disease?

The answer, it turned out, was yes….

The researchers have a theory: they suggest that “the language of Twitter may be a window into the aggregated and powerful effects of the community context.” They point to other epidemiological studies which have shown that general facts about a community, such as its “social cohesion and social capital,” have consequences for the health of individuals. Broadly speaking, people who live in poorer, more fragmented communities are less healthy than people living in richer, integrated ones.“When we do a sub-analysis, we find that the power that Twitter has is in large part accounted for by community and socioeconomic variables,” Eichstaedt told me when we spoke over Skype. In short, a young person’s negative, angry, and stressed-out tweets might reflect his or her stress-inducing environment—and that same environment may have negative health repercussions for other, older members of the same community….(More)”

Secrecy versus openness: Internet security and the limits of open source and peer production


Dissertation by Andreas Schmidt:” Open source and peer production have been praised as organisational models that could change the world for the better. It is commonly asserted that almost any societal activity could benefit from distributed, bottom-up collaboration — by making societal interaction more open, more social, and more democratic. However, we also need to be mindful of the limits of these models. How could they function in environments hostile to openness? Security is a societal domain more prone to secrecy than any other, except perhaps for romantic love. In light of the destructive capacity of contemporary cyber attacks, how has the Internet survived without a comprehensive security infrastructure? Secrecy vs. openness describes the realities of Internet security production through the lenses of open source and peer production theories. The study offers a glimpse into the fascinating communities of technical experts, who played a pivotal role when the chips were down for the Internet after large-scale attacks. After an initial flirtation with openness in the early years, operational Internet security communities have put in place institutional mechanisms that have resulted in less open forms of social production…(More)”

Using open legislative data to map bill co-sponsorship networks in 15 countries


François Briatte at OpeningParliament.org: “A few years back, Kamil Gregor published a post under the title “Visualizing politics: Network analysis of bill sponsors”. His post, which focused on the lower chamber of the Czech Parliament, showed how basic social network analysis can support the exploration of parliamentary work, by revealing the ties that members of parliament create between each other through the co-sponsorship of private bills….In what follows, I would like to quickly report on a small research project that I have developed over the years, under the name “parlnet”.

Legislative data on bill co-sponsorship

This project looks at bill co-sponsorship networks in European countries. Many parliaments allow their members to co-sponsor each other’s private bills, which makes it possible to represent these parliaments as collaborative networks, where a tie exists between two MPs if they have co-sponsored legislation together.

This idea is not new: it was pioneered by James Fowler in the United States, and has been the subject of extensive research in American politics, both on the U.S. Congress and on state legislatures. Similar research also exists on the bill co-sponsorship networks of parliaments in Argentina, Chile andRomania.

Inspired by this research and by Baptiste Coulmont’s visualisation of the French lower chamber, I surveyed the parliamentary websites of the following countries:

  • all 28 current members of the European Union ;
  • 4 members of the EFTA: Iceland, Liechtenstein, Norway, and Switzerland

This search returned 19 parliamentary chambers from 15 countries for which it was (relatively) easy to extract legislative data, either through open data portals like data.riksdagen.se in Sweden ordata.stortinget.no in Norway, or from official parliamentary websites directly….After splitting the data into legislative periods separated by nationwide elections, I was able to draw a large collection of networks showing bill co-sponsorship in these 19 chambers….In this graph, each point (or node) is a Belgian MP, and each tie between two MPs indicates that they have co-sponsored at least one bill together. The colors and abbreviations used in the graph are party-related codes, which combine information on the parliamentary group and linguistic community of each MP.Because this kind of graph can be interesting to explore in more detail, I have also built interactive visualizations out of them, in order to show more detailed information on the MPs who participate in bill cosposorship…

The parlnet project was coded in R, and its code is public so that it might benefit from external contributions. The list of countries and chambers that it covers is not exhaustive: in some cases like Portugal, I simply failed to retrieve the data. More talented coders might therefore be able to add to the current database.

Bill cosponsorship networks illustrate how open legislative data provided by parliaments can be turned into interactive tools that easily convey some information about parliamentary work, including, but not limited to:

  • the role of parliamentary party leaders in managing the legislation produced by their groups
  • the impact of partisan discipline and ideology on legislative collaboration between MPs
  • the extent of cross-party cooperation in various parliamentary environments and chambers… (More)

UNESCO demonstrates global impact through new transparency portal


“Opendata.UNESCO.org  is intended to present comprehensive, quality and timely information about UNESCO’s projects, enabling users to find information by country/region, funding source, and sector and providing comprehensive project data, including budget, expenditure, completion status, implementing organization, project documents, and more. It publishes program and financial information that are in line with UN system-experience of the IATI (International Aid Transparency Initiative) standards and other relevant transparency initiatives. UNESCO is now part of more than 230 organizations that have published to the IATI Registry, which brings together donor and developing countries, civil society organizations and other experts in aid information who are committed to working together to increase the transparency of aid.

Since its creation 70 years ago, UNESCO has tirelessly championed the causes of education, culture, natural sciences, social and human sciences, communication and information, globally. For instance – started in March 2010, the program for the Enhancement of Literacy in Afghanistan (ELA) benefited from a $19.5 million contribution by Japan. It aimed to improve the level of literacy, numeracy and vocational skills of the adult population in 70 districts of 15 provinces of Afghanistan. Over the next three years, until April 2013, the ELA programme helped some 360,000 adult learners in General Literacy compotency. An interactive map allows for an easy identification of UNESCO’s high-impact programs, and up-to-date information of current and future aid allocations within and across countries.

Public participation and interactivity are key to the success of any open data project. http://Opendata.UNESCO.org will evolve as Member States and partners will get involved, by displaying data on their own websites and sharing data among different networks, building and sharing applications, providing feedback, comments, and recommendations. …(More)”