Thinking Ahead – Essays on Big Data, Digital Revolution, and Participatory Market Society


New book by Dirk Helbing: “The rapidly progressing digital revolution is now touching the foundations of the governance of societal structures. Humans are on the verge of evolving from consumers to prosumers, and old, entrenched theories – in particular sociological and economic ones – are falling prey to these rapid developments. The original assumptions on which they are based are being questioned. Each year we produce as much data as in the entire human history – can we possibly create a global crystal ball to predict our future and to optimally govern our world? Do we need wide-scale surveillance to understand and manage the increasingly complex systems we are constructing, or would bottom-up approaches such as self-regulating systems be a better solution to creating a more innovative, more successful, more resilient, and ultimately happier society? Working at the interface of complexity theory, quantitative sociology and Big Data-driven risk and knowledge management, the author advocates the establishment of new participatory systems in our digital society to enhance coordination, reduce conflict and, above all, reduce the “tragedies of the commons,” resulting from the methods now used in political, economic and management decision-making….(More)”

New surveys reveal dynamism, challenges of open data-driven businesses in developing countries


Alla Morrison at World Bank Open Data blog: “Was there a class of entrepreneurs emerging to take advantage of the economic possibilities offered by open data, were investors keen to back such companies, were governments tuned to and responsive to the demands of such companies, and what were some of the key financing challenges and opportunities in emerging markets? As we began our work on the concept of an Open Fund, we partnered with Ennovent (India), MDIF (East Asia and Latin America) and Digital Data Divide (Africa) to conduct short market surveys to answer these questions, with a focus on trying to understand whether a financing gap truly existed in these markets. The studies were fairly quick (4-6 weeks) and reached only a small number of companies (193 in India, 70 in Latin America, 63 in South East Asia, and 41 in Africa – and not everybody responded) but the findings were fairly consistent.

  • Open data is still a very nascent concept in emerging markets. and there’s only a small class of entrepreneurs/investors that is aware of the economic possibilities; there’s a lot of work to do in the ‘enabling environment’
    • In many regions the distinction between open data, big data, and private sector generated/scraped/collected data was blurry at best among entrepreneurs and investors (some of our findings consequently are better indicators of  data-driven rather than open data-driven businesses)
  • There’s a small but growing number of open data-driven companies in all the markets we surveyed and these companies target a wide range of consumers/users and are active in multiple sectors
    • A large percentage of identified companies operate in sectors with high social impact – health and wellness, environment, agriculture, transport. For instance, in India, after excluding business analytics companies, a third of data companies seeking financing are in healthcare and a fifth in food and agriculture, and some of them have the low-income population or the rural segment of India as an intended beneficiary segment. In Latin America, the number of companies in business services, research and analytics was closely followed by health, environment and agriculture. In Southeast Asia, business, consumer services, and transport came out in the lead.
    • We found the highest number of companies in Latin America and Asia with the following countries leading the way – Mexico, Chile, and Brazil, with Colombia and Argentina closely behind in Latin America; and India, Indonesia, Philippines, and Malaysia in Asia
  • An actionable pipeline of data-driven companies exists in Latin America and in Asia
    • We heard demand for different kinds of financing (equity, debt, working capital) but the majority of the need was for equity and quasi-equity in amounts ranging from $100,000 to $5 million USD, with averages of between $2 and $3 million USD depending on the region.
  • There’s a significant financing gap in all the markets
    • The investment sizes required, while they range up to several million dollars, are generally small. Analysis of more than 300 data companies in Latin America and Asia indicates a total estimated need for financing of more than $400 million
  • Venture capitals generally don’t recognize data as a separate sector and club data-driven companies with their standard information communication technology (ICT) investments
    • Interviews with founders suggest that moving beyond seed stage is particularly difficult for data-driven startups. While many companies are able to cobble together an initial seed round augmented by bootstrapping to get their idea off the ground, they face a great deal of difficulty when trying to raise a second, larger seed round or Series A investment.
    • From the perspective of startups, investors favor banal e-commerce (e.g., according toTech in Asia, out of the $645 million in technology investments made public across the region in 2013, 92% were related to fashion and online retail) or consumer service startups and ignore open data-focused startups even if they have a strong business model and solid key performance indicators. The space is ripe for a long-term investor with a generous risk appetite and multiple bottom line goals.
  • Poor data quality was the number one issue these companies reported.
    • Companies reported significant waste and inefficiency in accessing/scraping/cleaning data.

The analysis below borrows heavily from the work done by the partners. We should of course mention that the findings are provisional and should not be considered authoritative (please see the section on methodology for more details)….(More).”

Big Data, Little Data, No Data


New book by Christine L. Borgman: “Big Data” is on the covers of Science, Nature, the Economist, and Wired magazines, on the front pages of the Wall Street Journal and the New York Times. But despite the media hyperbole, as Christine Borgman points out in this examination of data and scholarly research, having the right data is usually better than having more data; little data can be just as valuable as big data. In many cases, there are no data—because relevant data don’t exist, cannot be found, or are not available. Moreover, data sharing is difficult, incentives to do so are minimal, and data practices vary widely across disciplines.

Borgman, an often-cited authority on scholarly communication, argues that data have no value or meaning in isolation; they exist within a knowledge infrastructure—an ecology of people, practices, technologies, institutions, material objects, and relationships. After laying out the premises of her investigation—six “provocations” meant to inspire discussion about the uses of data in scholarship—Borgman offers case studies of data practices in the sciences, the social sciences, and the humanities, and then considers the implications of her findings for scholarly practice and research policy. To manage and exploit data over the long term, Borgman argues, requires massive investment in knowledge infrastructures; at stake is the future of scholarship….(More)”

Discovering the Language of Data: Personal Pattern Languages and the Social Construction of Meaning from Big Data


Paper by ; ; in Interdisciplinary Science Reviews: “This paper attempts to address two issues relevant to the sense-making of Big Data. First, it presents a case study for how a large dataset can be transformed into both a visual language and, in effect, a ‘text’ that can be read and interpreted by human beings. The case study comes from direct observation of graduate students at the IIT Institute of Design who investigated task-switching behaviours, as documented by productivity software on a single user’s laptop and a smart phone. Through a series of experiments with the resulting dataset, the team effects a transformation of that data into a catalogue of visual primitives — a kind of iconic alphabet — that allow others to ‘read’ the data as a corpus and, more provocatively, suggest the formation of a personal pattern language. Second, this paper offers a model for human-technical collaboration in the sense-making of data, as demonstrated by this and other teams in the class. Current sense-making models tend to be data- and technology-centric, and increasingly presume data visualization as a primary point of entry of humans into Big Data systems. This alternative model proposes that meaningful interpretation of data emerges from a more elaborate interplay between algorithms, data and human beings….(More)”

 

Learning to See Data


Benedict Carey in the New York Times: “FOR the past year or so genetic scientists at the Albert Einstein College of Medicine in New York have been collaborating with a specialist from another universe: Daniel Kohn, a Brooklyn-based painter and conceptual artist.

Mr. Kohn has no training in computers or genetics, and he’s not there to conduct art therapy classes. His role is to help the scientists with a signature 21st-century problem: Big Data overload.

Advanced computing produces waves of abstract digital data that in many cases defy interpretation; there’s no way to discern a meaningful pattern in any intuitive way. To extract some order from this chaos, analysts need to continually reimagine the ways in which they represent their data — which is where Mr. Kohn comes in. He spent 10 years working with scientists and knows how to pose useful questions. He might ask, for instance, What if the data were turned sideways? Or upside down? Or what if you could click on a point on the plotted data and see another dimension?….

And so it is in many fields, whether predicting climate, flagging potential terrorists or making economic forecasts. The information is all there, great expanding mountain ranges of it. What’s lacking is the tracker’s instinct for picking up a trail, the human gut feeling for where to start looking to find patterns and meaning. But can such creative instincts really be trained systematically? And even if they could, wouldn’t it take years to do so?

The answers are yes and no, at least when it comes to some advanced skills. And that should give analysts drowning in data some cause for optimism.

Scientists working in a little-known branch of psychology called perceptual learning have shown that it is possible to fast-forward a person’s gut instincts both in physical fields, like flying an airplane, and more academic ones, like deciphering advanced chemical notation. The idea is to train specific visual skills, usually with computer-game-like modules that require split-second decisions. Over time, a person develops a “good eye” for the material, and with it an ability to extract meaningful patterns instantaneously.

Perceptual learning is such an elementary skill that people forget they have it. It’s what we use as children to make distinctions between similar-looking letters, like U and V, long before we can read. It’s the skill needed to distinguish an A sharp from a B flat (both the notation and the note), or between friendly insurgents and hostiles in a fast-paced video game. By the time we move on to sentences and melodies and more cerebral gaming — “chunking” the information into larger blocks — we’ve forgotten how hard it was to learn all those subtle distinctions in the first place….(More)

Can Big Data Measure Livability in Cities?


PlaceILive: “Big data helps us measure and predict consumer behavior, hurricanes and even pregnancies. It has revolutionized the way we access and use information. That being said, so far big data has not been able to tackle bigger issues like urbanization or improve the livability of cities.

A new startup, www.placeilive.com thinks big data should and can be used to measure livability. They aggregated open data from government institutions and social media to create a tool that can calculate just that. ….PlaceILive wants to help people and governments better understand their cities, so that they can make smarter decisions. Cities can be more sustainable, while its users save money and time when they are choosing a new home.

Not everyone is eager to read long lists of raw data. Therefore they created appealing user-friendly maps that visualize the statistics. Offering the user fast and accessible information on the neighborhoods that matter to them.

Another cornerstone of PlaceILive is their Life Quality Index: an algorithm that takes aspects like transportation, safety, and affordability into account. Making it possible for people to easily compare the livability of different houses. You can read more on the methodology and sources here.

life quality index press release

In its beta form, the site features five cities—New York City, Chicago, San Francisco, London and Berlin. When you click on the New York portal, for instance, you can search for the place you want to know more about by borough, zip code, or address. Using New York as an example, it looks like this….(More)

Big Data for Social Good


Introduction to a Special Issue of the Journal “Big Data” by Catlett Charlie and Ghani Rayid: “…organizations focused on social good are realizing the potential as well but face several challenges as they seek to become more data-driven. The biggest challenge they face is a paucity of examples and case studies on how data can be used for social good. This special issue of Big Data is targeted at tackling that challenge and focuses on highlighting some exciting and impactful examples of work that uses data for social good. The special issue is just one example of the recent surge in such efforts by the data science community. …

This special issue solicited case studies and problem statements that would either highlight (1) the use of data to solve a social problem or (2) social challenges that need data-driven solutions. From roughly 20 submissions, we selected 5 articles that exemplify this type of work. These cover five broad application areas: international development, healthcare, democracy and government, human rights, and crime prevention.

“Understanding Democracy and Development Traps Using a Data-Driven Approach” (Ranganathan et al.) details a data-driven model between democracy, cultural values, and socioeconomic indicators to identify a model of two types of “traps” that hinder the development of democracy. They use historical data to detect causal factors and make predictions about the time expected for a given country to overcome these traps.

“Targeting Villages for Rural Development Using Satellite Image Analysis” (Varshney et al.) discusses two case studies that use data and machine learning techniques for international economic development—solar-powered microgrids in rural India and targeting financial aid to villages in sub-Saharan Africa. In the process, the authors stress the importance of understanding the characteristics and provenance of the data and the criticality of incorporating local “on the ground” expertise.

In “Human Rights Event Detection from Heterogeneous Social Media Graphs,” Chen and Neil describe efficient and scalable techniques to use social media in order to detect emerging patterns in human rights events. They test their approach on recent events in Mexico and show that they can accurately detect relevant human rights–related tweets prior to international news sources, and in some cases, prior to local news reports, which could potentially lead to more timely, targeted, and effective advocacy by relevant human rights groups.

“Finding Patterns with a Rotten Core: Data Mining for Crime Series with Core Sets” (Wang et al.) describes a case study with the Cambridge Police Department, using a subspace clustering method to analyze the department’s full housebreak database, which contains detailed information from thousands of crimes from over a decade. They find that the method allows human crime analysts to handle vast amounts of data and provides new insights into true patterns of crime committed in Cambridge…..(More)

Data for policy: when the haystack is made of needles. A call for contributions


Diana Vlad-Câlcic at the European Commission: “If policy-making is ‘whatever government chooses to do or not to do’ (Th. Dye), then how do governments actually decide? Evidence-based policy-making is not a new answer to this question, but it is constantly challenging both policy-makers and scientists to sharpen their thinking, their tools and their responsiveness.  The European Commission has recognised this and has embedded in its processes, namely through Impact Assessment, policy monitoring and evaluation, an evidence-informed decision-making approach.

With four parameters I can fit an elephant, and with five I can make him wiggle his trunk. (John von Neumann)

New data technologies raise the bar high for advanced modelling, dynamic visualisation, real-time data flows and a variety of data sources, from sensors, to cell phones or the Internet as such. An abundance of (big) data, a haystack made of needles, but do public administrations have the right tools and skills to exploit it? How much of it adds real value to established statistics and to scientific evidence? Are the high hopes and the high expectations partly just hype? And what lessons can we learn from experience?

To explore these questions, the European Commission is launching a study with the Oxford Internet Institute, Technopolis and CEPS  on ‘Data for policy: big data and other innovative data-driven approaches for evidence-informed policymaking’. As a first step, the study will collect examples of initiatives in public institutions at national and international level, where innovative data technologies contribute to the policy process. It will eventually develop case-studies for EU policies.

Contribute to the collective reflection by sharing with us good practices and examples you have from other public administrations. Follow the developments of the study also on Twitter @data4policyEU

Big Data Is an Economic Justice Issue, Not Just a Privacy Problem


in the Huffington Post: “The control of personal data by “big data” companies is not just an issue of privacy but is becoming a critical issue of economic justice, argues a new report issued by the organization Data Justice>, which itself is being publicly launched in conjunction with the report. ..

At the same time, big data is fueling economic concentration across our economy. As a handful of data platforms generate massive amounts of user data, the barriers to entry rise, since potential competitors have little data themselves to entice advertisers compared with the incumbents, who have both the concentrated processing power and the supply of user data to dominate particular sectors. With little competition, companies end up with little incentive to either protect user privacy or share the economic value of that user data with the consumers generating those profits.

The report argues for a threefold approach to making big data work for everyone in the economy, not just for the big data platforms’ shareholders:

  • First, regulators need to strengthen user control of their own data by both requiring explicit consent for all uses of the data and better informing users of how it’s being used and how companies profit from that data.
  • Second, regulators need to factor control of data into merger review, and to initiate antitrust actions against companies like Google where monopoly control of a sector like search advertising has been established.
  • Third, policymakers should restrict practices that harm consumers, including banning price discrimination where consumers are not informed of all discount options available and bringing the participation of big data platforms in marketing financial services under the regulation of the Consumer Financial Protection Bureau.

Data Justice itself has been founded as an organization “to promote public education and new alliances to challenge the danger of big data to workers, consumers and the public.” It will work to educate the public, policymakers and organizational allies on how big data is contributing to economic inequality in the economy. Its new website at datajustice.org is intended to bring together a wide range of resources highlighting the economic justice aspects of big data.”

States Use Big Data to Nab Tax Fraudsters


at Governing: “It’s tax season again. For most of us, that means undergoing the laborious and thankless task of assembling financial records and calculating taxes for state and federal returns. But for a small group of us, tax season is profit season. It’s the time of year when fraudsters busy themselves with stealing identities and electronically submitting fraudulent tax returns for refunds.
Nobody knows for sure just how much tax return fraud is committed, but the amount is rising fast. According to the U.S. Treasury, the number of identified fraudulent federal returns has increased by 40 percent from 2011 to 2012, an increase of more than $4 billion. Ten years ago, New York state stopped refunds on 50,000 fraudulently filed tax returns. Last year, the number of stopped refunds was 250,000, according to Nonie Manion, executive deputy commissioner for the state’s Department of Taxation and Finance….
To combat the problem, state revenue and tax agencies are using software programs to sift through mounds of data and detect patterns that would indicate when a return is not valid. Just about every state with a tax fraud detection program already compares tax return data with information from other state agencies and private firms to spot incorrect mailing addresses and stolen identities. Because so many returns are filed electronically, fraud spotting systems look for suspicious Internet protocol (IP) addresses. For example, tax auditors in New York noticed that similar IP addresses in Fort Lauderdale, Fla., were submitting a series of returns for refunds. When the state couldn’t match the returns with any employer data, they were flagged for further scrutiny and  ultimately found to be fraudulent.
High-tech analytics is one way states keep up with the war on fraud. The other is accurate data. The third component is well trained staff. But it takes time and money to put together the technology and the expertise to combat the growing sophistication of fraudsters….(More)”