Yochai Benkler at MIT Technology Review on Distributed Innovation and Creativity, Peer Production, and Commons in a Networked Economy: “A decade ago, Wikipedia and open-source software were treated as mere curiosities in business circles. Today, these innovations represent a core challenge to how we have thought about property and contract, organization theory and management, over the past 150 years.
For the first time since before the Industrial Revolution, the most important inputs into some of the most important economic sectors are radically distributed in the population, and the core capital resources necessary for these economic activities have become widely available in wealthy countries and among the wealthier populations of emerging economies. This technological feasibility of social production generally, and peer production — the kind of network collaboration of which Wikipedia is the most prominent example — more specifically, is interacting with the high rate of change and the escalating complexity of global innovation and production systems.
Increasingly, in the business literature and practice, we see a shift toward a range of open innovation and models that allow more fluid flows of information, talent, and projects across organizations.
Peer production, the most significant organizational innovation that has emerged from Internet-mediated social practice, is large-scale collaborative engagement by groups of individuals who come together to produce products more complex than they could have produced on their own. Organizationally, it combines three core characteristics: decentralization of conception and execution of problems and solutions; harnessing of diverse motivations; and separation of governance and management from property and contract.
These characteristics make peer production highly adept at experimentation, innovation, and adaptation in changing and complex environments. If the Web was innovation on a commons-based model — allocating access and use rights in resources without giving anyone exclusive rights to exclude anyone else — Wikipedia’s organizational innovation is in problem-solving.
Wikipedia’s user-generated content model incorporates knowledge that simply cannot be managed well, either because it is tacit knowledge (possessed by individuals but difficult to communicate to others) or because it is spread among too many people to contract for. The user-generated content model also permits organizations to explore a space of highly diverse interests and tastes that was too costly for traditional organizations to explore.
Peer production allows a diverse range of people, regardless of affiliation, to dynamically assess and reassess available resources, projects, and potential collaborators and to self-assign to projects and collaborations. By leaving these elements to self-organization dynamics, peer production overcomes the lossiness of markets and bureaucracies, and its benefits are sufficient that the practice has been widely adopted by firms and even governments.
In a networked information economy, commons-based practices and open innovation provide an evolutionary model typified by repeated experimentation and adoption of successful adaptation rather than the more traditional, engineering-style approaches to building optimized systems.
Commons-based production and peer production are edge cases of a broader range of openness strategies that trade off the freedom of these two approaches and the manageability and appropriability that many more-traditional organizations seek to preserve. Some firms are using competitions and prizes to diversify the range of people who work on their problems, without ceding contractual control over the project. Many corporations are participating in networks of firms engaging in a range of open collaborative innovation practices with a more manageable set of people, resources, and projects to work with than a fully open-to-the-world project. And the innovation clusters anchored around universities represent an entrepreneurial model at the edge of academia and business, in which academia allows for investment in highly uncertain innovation, and the firms allow for high-risk, high-reward investment models.
Continued Progress and Plans for Open Government Data
Steve VanRoekel, and Todd Park at the White House: “One year ago today, President Obama signed an executive order that made open and machine-readable data the new default for government information. This historic step is helping to make government-held data more accessible to the public and to entrepreneurs while appropriately safeguarding sensitive information and rigorously protecting privacy.
Freely available data from the U.S. government is an important national resource, serving as fuel for entrepreneurship, innovation, scientific discovery, and economic growth. Making information about government operations more readily available and useful is also core to the promise of a more efficient and transparent government. This initiative is a key component of the President’s Management Agenda and our efforts to ensure the government is acting as an engine to expand economic growth and opportunity for all Americans. The Administration is committed to driving further progress in this area, including by designating Open Data as one of our key Cross-Agency Priority Goals.
Over the past few years, the Administration has launched a number of Open Data Initiatives aimed at scaling up open data efforts across the Health, Energy, Climate, Education, Finance, Public Safety, and Global Development sectors. The White House has also launched Project Open Data, designed to share best practices, examples, and software code to assist federal agencies with opening data. These efforts have helped unlock troves of valuable data—that taxpayers have already paid for—and are making these resources more open and accessible to innovators and the public.
Other countries are also opening up their data. In June 2013, President Obama and other G7 leaders endorsed the Open Data Charter, in which the United States committed to publish a roadmap for our nation’s approach to releasing and improving government data for the public.
Building upon the Administration’s Open Data progress, and in fulfillment of the Open Data Charter, today we are excited to release the U.S. Open Data Action Plan. The plan includes a number of exciting enhancements and new data releases planned in 2014 and 2015, including:
- Small Business Data: The Small Business Administration’s (SBA) database of small business suppliers will be enhanced so that software developers can create tools to help manufacturers more easily find qualified U.S. suppliers, ultimately reducing the transaction costs to source products and manufacture domestically.
- Smithsonian American Art Museum Collection: The Smithsonian American Art Museum’s entire digitized collection will be opened to software developers to make educational apps and tools. Today, even museum curators do not have easily accessible information about their art collections. This information will soon be available to everyone.
- FDA Adverse Drug Event Data: Each year, healthcare professionals and consumers submit millions of individual reports on drug safety to the Food and Drug Administration (FDA). These anonymous reports are a critical tool to support drug safety surveillance. Today, this data is only available through limited quarterly reports. But the Administration will soon be making these reports available in their entirety so that software developers can build tools to help pull potentially dangerous drugs off shelves faster than ever before.
We look forward to implementing the U.S. Open Data Action Plan, and to continuing to work with our partner countries in the G7 to take the open data movement global”.
Can Big Data Stop Wars Before They Happen?
Foreign Policy: “It has been almost two decades exactly since conflict prevention shot to the top of the peace-building agenda, as large-scale killings shifted from interstate wars to intrastate and intergroup conflicts. What could we have done to anticipate and prevent the 100 days of genocidal killing in Rwanda that began in April 1994 or the massacre of thousands of Bosnian Muslims at Srebrenica just over a year later? The international community recognized that conflict prevention could no longer be limited to diplomatic and military initiatives, but that it also requires earlier intervention to address the causes of violence between nonstate actors, including tribal, religious, economic, and resource-based tensions.
For years, even as it was pursued as doggedly as personnel and funding allowed, early intervention remained elusive, a kind of Holy Grail for peace-builders. This might finally be changing. The rise of data on social dynamics and what people think and feel — obtained through social media, SMS questionnaires, increasingly comprehensive satellite information, news-scraping apps, and more — has given the peace-building field hope of harnessing a new vision of the world. But to cash in on that hope, we first need to figure out how to understand all the numbers and charts and figures now available to us. Only then can we expect to predict and prevent events like the recent massacres in South Sudan or the ongoing violence in the Central African Republic.
A growing number of initiatives have tried to make it across the bridge between data and understanding. They’ve ranged from small nonprofit shops of a few people to massive government-funded institutions, and they’ve been moving forward in fits and starts. Few of these initiatives have been successful in documenting incidents of violence actually averted or stopped. Sometimes that’s simply because violence or absence of it isn’t verifiable. The growing literature on big data and conflict prevention today is replete with caveats about “overpromising and underdelivering” and the persistent gap between early warning and early action. In the case of the Conflict Early Warning and Response Mechanism (CEWARN) system in central Africa — one of the earlier and most prominent attempts at early intervention — it is widely accepted that the project largely failed to use the data it retrieved for effective conflict management. It relied heavily on technology to produce large databases, while lacking the personnel to effectively analyze them or take meaningful early action.
To be sure, disappointments are to be expected when breaking new ground. But they don’t have to continue forever. This pioneering work demands not just data and technology expertise. Also critical is cross-discipline collaboration between the data experts and the conflict experts, who know intimately the social, political, and geographic terrain of different locations. What was once a clash of cultures over the value and meaning of metrics when it comes to complex human dynamics needs to morph into collaboration. This is still pretty rare, but if the past decade’s innovations are any prologue, we are hopefully headed in the right direction.
* * *
Over the last three years, the U.S. Defense Department, the United Nations, and the CIA have all launched programs to parse the masses of public data now available, scraping and analyzing details from social media, blogs, market data, and myriad other sources to achieve variations of the same goal: anticipating when and where conflict might arise. The Defense Department’s Information Volume and Velocity program is designed to use “pattern recognition to detect trends in a sea of unstructured data” that would point to growing instability. The U.N.’s Global Pulse initiative’s stated goal is to track “human well-being and emerging vulnerabilities in real-time, in order to better protect populations from shocks.” The Open Source Indicators program at the CIA’s Intelligence Advanced Research Projects Activity aims to anticipate “political crises, disease outbreaks, economic instability, resource shortages, and natural disasters.” Each looks to the growing stream of public data to detect significant population-level changes.
Large institutions with deep pockets have always been at the forefront of efforts in the international security field to design systems for improving data-driven decision-making. They’ve followed the lead of large private-sector organizations where data and analytics rose to the top of the corporate agenda. (In that sector, the data revolution is promising “to transform the way many companies do business, delivering performance improvements not seen since the redesign of core processes in the 1990s,” as David Court, a director at consulting firm McKinsey, has put it.)
What really defines the recent data revolution in peace-building, however, is that it is transcending size and resource limitations. It is finding its way to small organizations operating at local levels and using knowledge and subject experts to parse information from the ground. It is transforming the way peace-builders do business, delivering data-led programs and evidence-based decision-making not seen since the field’s inception in the latter half of the 20th century.
One of the most famous recent examples is the 2013 Kenyan presidential election.
In March 2013, the world was watching and waiting to see whether the vote would produce more of the violence that had left at least 1,300 people dead and 600,000 homeless during and after 2010 elections. In the intervening years, a web of NGOs worked to set up early-warning and early-response mechanisms to defuse tribal rivalries, party passions, and rumor-mongering. Many of the projects were technology-based initiatives trying to leverage data sources in new ways — including a collaborative effort spearheaded and facilitated by a Kenyan nonprofit called Ushahidi (“witness” in Swahili) that designs open-source data collection and mapping software. The Umati (meaning “crowd”) project used an Ushahidi program to monitor media reports, tweets, and blog posts to detect rising tensions, frustration, calls to violence, and hate speech — and then sorted and categorized it all on one central platform. The information fed into election-monitoring maps built by the Ushahidi team, while mobile-phone provider Safaricom donated 50 million text messages to a local peace-building organization, Sisi ni Amani (“We are Peace”), so that it could act on the information by sending texts — which had been used to incite and fuel violence during the 2007 elections — aimed at preventing violence and quelling rumors.
The first challenges came around 10 a.m. on the opening day of voting. “Rowdy youth overpowered police at a polling station in Dandora Phase 4,” one of the informal settlements in Nairobi that had been a site of violence in 2007, wrote Neelam Verjee, programs manager at Sisi ni Amani. The young men were blocking others from voting, and “the situation was tense.”
Sisi ni Amani sent a text blast to its subscribers: “When we maintain peace, we will have joy & be happy to spend time with friends & family but violence spoils all these good things. Tudumishe amani [“Maintain the peace”] Phase 4.” Meanwhile, security officers, who had been called separately, arrived at the scene and took control of the polling station. Voting resumed with little violence. According to interviews collected by Sisi ni Amani after the vote, the message “was sent at the right time” and “helped to calm down the situation.”
In many ways, Kenya’s experience is the story of peace-building today: Data is changing the way professionals in the field think about anticipating events, planning interventions, and assessing what worked and what didn’t. But it also underscores the possibility that we might be edging closer to a time when peace-builders at every level and in all sectors — international, state, and local, governmental and not — will have mechanisms both to know about brewing violence and to save lives by acting on that knowledge.
Three important trends underlie the optimism. The first is the sheer amount of data that we’re generating. In 2012, humans plugged into digital devices managed to generate more data in a single year than over the course of world history — and that rate more than doubles every year. As of 2012, 2.4 billion people — 34 percent of the world’s population — had a direct Internet connection. The growth is most stunning in regions like the Middle East and Africa where conflict abounds; access has grown 2,634 percent and 3,607 percent, respectively, in the last decade.
The growth of mobile-phone subscriptions, which allow their owners to be part of new data sources without a direct Internet connection, is also staggering. In 2013, there were almost as many cell-phone subscriptions in the world as there were people. In Africa, there were 63 subscriptions per 100 people, and there were 105 per 100 people in the Arab states.
The second trend has to do with our expanded capacity to collect and crunch data. Not only do we have more computing power enabling us to produce enormous new data sets — such as the Global Database of Events, Language, and Tone (GDELT) project, which tracks almost 300 million conflict-relevant events reported in the media between 1979 and today — but we are also developing more-sophisticated methodological approaches to using these data as raw material for conflict prediction. New machine-learning methodologies, which use algorithms to make predictions (like a spam filter, but much, much more advanced), can provide “substantial improvements in accuracy and performance” in anticipating violent outbreaks, according to Chris Perry, a data scientist at the International Peace Institute.
This brings us to the third trend: the nature of the data itself. When it comes to conflict prevention and peace-building, progress is not simply a question of “more” data, but also different data. For the first time, digital media — user-generated content and online social networks in particular — tell us not just what is going on, but also what people think about the things that are going on. Excitement in the peace-building field centers on the possibility that we can tap into data sets to understand, and preempt, the human sentiment that underlies violent conflict.
Realizing the full potential of these three trends means figuring out how to distinguish between the information, which abounds, and the insights, which are actionable. It is a distinction that is especially hard to make because it requires cross-discipline expertise that combines the wherewithal of data scientists with that of social scientists and the knowledge of technologists with the insights of conflict experts.
How Helsinki Became the Most Successful Open-Data City in the World
Olli Sulopuisto in Atlantic Cities: “If there’s something you’d like to know about Helsinki, someone in the city administration most likely has the answer. For more than a century, this city has funded its own statistics bureaus to keep data on the population, businesses, building permits, and most other things you can think of. Today, that information is stored and freely available on the internet by an appropriately named agency, City of Helsinki Urban Facts.
There’s a potential problem, though. Helsinki may be Finland’s capital and largest city, with 620,000 people. But it’s only one of more than a dozen municipalities in a metropolitan area of almost 1.5 million. So in terms of urban data, if you’re only looking at Helsinki, you’re missing out on more than half of the picture.
Helsinki and three of its neighboring cities are now banding together to solve that problem. Through an entity called Helsinki Region Infoshare, they are bringing together their data so that a fuller picture of the metro area can come into view.
That’s not all. At the same time these datasets are going regional, they’re also going “open.” Helsinki Region Infoshare publishes all of its data in formats that make it easy for software developers, researchers, journalists and others to analyze, combine or turn into web-based or mobile applications that citizens may find useful. In four years of operation, the project has produced more than 1,000 “machine-readable” data sources such as a map of traffic noise levels, real-time locations of snow plows, and a database of corporate taxes.
A global leader
All of this has put the Helsinki region at the forefront of the open-data movement that is sweeping cities across much of the world. The concept is that all kinds of good things can come from assembling city data, standardizing it and publishing it for free. Last month, Helsinki Region Infoshare was presented with the European Commission’s prize for innovation in public administration.
The project is creating transparency in government and a new digital commons. It’s also fueling a small industry of third-party application developers who take all this data and turn it into consumer products.
For example, Helsinki’s city council has a paperless system called Ahjo for handling its agenda items, minutes and exhibits that accompany council debates. Recently, the datasets underlying Ahjo were opened up. The city built a web-based interface for browsing the documents, but a software developer who doesn’t even live in Helsinki created a smartphone app for it. Now anyone who wants to keep up with just about any decision Helsinki’s leaders have before them can do so easily.
Another example is a product called BlindSquare, a smartphone app that helps blind people navigate the city. An app developer took the Helsinki region’s data on public transport and services, and mashed it up with location data from the social networking app Foursquare as well as mapping tools and the GPS and artificial voice capabilities of new smartphones. The product now works in dozens of countries and languages and sells for about €17 ($24 U.S.)
…
Helsinki also runs competitions for developers who create apps with public-sector data. That’s nothing new — BlindSquare won the Apps4Finland and European OpenCities app challenges in 2012. But this year, they’re trying a new approach to the app challenge concept, funded by the European Commission’s prize money and Sitra.
It’s called Datademo. Instead of looking for polished but perhaps random apps to heap fame and prize money on, Datademo is trying to get developers to aim their creative energies toward general goals city leaders think are important. The current competition specifies that apps have to use open data from the Helsinki region or from Finland to make it easier for citizens to find information and participate in democracy. The competition also gives developers seed funding upfront.
Datademo received more than 40 applications in its first round. Of those, the eight best suggestions were given three months and €2,000 ($2,770 U.S) to implement their ideas. The same process will be repeated two times, resulting in dozens of new app ideas that will get a total of €48,000 ($66,000 U.S.) in development subsidies. Keeping with the spirit of transparency, the voting and judging process is open to all who submit an idea for each round….”
The advent of crowdfunding innovations for development
SciDevNet: “FundaGeek, TechMoola and RocketHub have more in common than just their curious names. These are all the monikers of crowdsourcing websites that are dedicated to raising money for science and technology projects. As the coffers that were traditionally used to fund research and development have been squeezed in recent years, several such sites have sprouted up.
In 2013, general crowdsourcing site Kickstarter saw a total of US$480 million pledged to its projects by three million backers. That’s up from US$320 million in 2012, US$99 million in 2011 and just US$28million in 2010. Kickstarter expects the figures to climb further this year, and not just for popular projects such as films and books.
Science and technology projects — particularly those involving simple designs — are starting to make waves on these sites. And new sites, such as those bizarrely named ones, are now catering specifically for scientific projects, widening the choice of platforms on offer and raising crowdsourcing’s profile among the global scientific community online.
All this means that crowdsourcing is fast becoming one of the most significant innovations in funding the development of technology that can aid poor communities….
A good example of how crowdsourcing can help the developing world is the GravityLight, a product launched on Indiegogo over a year ago that uses gravity to create light. Not only did UK design company Therefore massively exceed its initial funding target — ultimately raising $US400,000 instead of a planned US$55,000 — it amassed a global network of investors and distributors that has allowed the light to be trialled in 26 countries as of last December.
The light was developed in-house after Therefore was given a brief to produce a cheap solar-powered lamp by private clients. Although this project faltered, the team independently set out to produce a lamp to replace the ubiquitous and dangerous kerosene lamps widely used in remote areas in Africa. After several months of development, Therefore had designed a product that is powered by a rope with a heavy weight on its end being slowly drawn through the light’s gears (see video)…
Crowdfunding is not always related to a specific product. Earlier this year, Indiegogo hosted a project hoping to build a clean energy store in a Ugandan village. The idea is to create an ongoing supply chain for technologies such as cleaner-burning stoves, water filters and solar lights that will improve or save lives, according to ENVenture, the project’s creators. [1] The US$2,000 target was comfortably exceeded…”
#Bring back our girls
After Nigerian protestors marched on parliament in the capital Abuja calling for action on April 30, people in cities around the world have followed suit and organised their own marches.
A social media campaign under the hashtag #Bringbackourgirls started trending in Nigeria two weeks ago and has now been tweeted more than one million times. It was first used on April 23 at the opening ceremony for a UNESCO event honouring the Nigerian city of Port Harcourt as the 2014 World Book Capital City. A Nigerian lawyer in Abuja, Ibrahim M. Abdullahi, tweeted the call in a speech by Dr. Oby Ezekwesili, Vice President of the World Bank for Africa to “Bring Back the Girls!”
Another mass demonstration took place outside the Nigerian Defence Headquarters in Abuja on May 6 and many other protests have been organised in response to a social media campaign asking for people around the world to march and wear red in solidarity. People came out in protest at the Nigerian embassy in London, in Los Angeles and New York.
A global “social media march” has also been organised asking supporters to use their networks to promote the campaign for 200 minutes on May 8.
A petition started on Change.org by a Nigerian woman in solidarity with the schoolgirls has now been signed by more than 300,000 supporters.
Amnesty International and UNICEF have backed the campaign, as well as world leaders and celebrities, including Hilary Clinton, Malala Yousafzai and rappers Wyclef Jean and Chris Brown, whose mention of the campaign was retweeted more than 10,000 times.
After three weeks of silence the Nigerian President Goodluck Jonathan vowed to find the schoolgirls on April 3, stating: “wherever these girls are, we’ll get them out”. On the same day, John Kerry pledged assistance from the US.”
EU: Have your say on Future and Emerging Technologies!
European Commission: “Do you have a great idea for a new technology that is not possible yet? Do you think it can become realistic by putting Europe’s best minds on the task? Share your view and the European Commission – via the Future and Emerging Technologies (FET) programme@fet_eu#FET_eu– can make it happen. The consultation is open till 15 June 2014.
The aim of the public consultation launched today is to identify promising and potentially game-changing directions for future research in any technological domain.
Vice-President of the European Commission @NeelieKroesEU, responsible for the Digital Agenda, said: “From protecting the environment to curing disease – the choices and investments we make today will make a difference to the jobs and lives we enjoy tomorrow. Researchers and entrepreneurs, innovators, creators or interested bystanders – whoever you are, I hope you will take this opportunity to take part in determining Europe’s future“.
The consultation is organised as a series of discussions, in which contributors can suggest ideas for a new FET Proactive initiative or discuss the 9 research topics identified in the previous consultation to determine whether they are still relevant today.
The ideas collected via the public consultation will contribute to future FET work programmes, notably the next one (2016-17). This participative process has already been used to draft the current work programme (2014-15).
Background
€2,7 billion will be invested in Future and Emerging Technologies (FET) under the new research programme Horizon 2020#H2020 (2014-2020). This represents a nearly threefold increase in budget compared to the previous research programme, FP7. FET actions are part of the Excellent science pillar of Horizon 2020.
The objective of FET is to foster radical new technologies by exploring novel and high-risk ideas building on scientific foundations. By providing flexible support to goal-oriented and interdisciplinary collaborative research, and by adopting innovative research practices, FET research seizes the opportunities that will deliver long-term benefit for our society and economy.
FET Proactive initiatives aim to mobilise interdisciplinary communities around promising long-term technological visions. They build up the necessary base of knowledge and know-how for kick-starting a future technology line that will benefit Europe’s future industries and citizens in the decades to come. FET Proactive initiatives complement FET Open scheme, which funds small-scale projects on future technology, and FET Flagships, which are large-scale initiatives to tackle ambitious interdisciplinary science and technology goals.
FET previously launched an online consultation (2012-13) to identify research topics for the current work programme. Around 160 ideas were submitted. The European Commission did an exhaustive analysis and produced an informal clustering of these ideas into broad topics. 9 topics were identified as candidates for a FET Proactive initiative. Three are included in the current programme, namely Global Systems Science; Knowing, Doing, Being; and Quantum Simulation.”
Open Government Data Gains Global Momentum
Wyatt Kash in Information Week: “Governments across the globe are deepening their strategic commitments and working more closely to make government data openly available for public use, according to public and private sector leaders who met this week at the inaugural Open Government Data Forum in Abu Dhabi, hosted by the United Nations and the United Arab Emirates, April 28-29.
Data experts from Europe, the Middle East, the US, Canada, Korea, and the World Bank highlighted how one country after another has set into motion initiatives to expand the release of government data and broaden its use. Those efforts are gaining traction due to multinational organizations, such as the Open Government Partnership, the Open Data Institute, The World Bank, and the UN’s e-government division, that are trying to share practices and standardize open data tools.
In the latest example, the French government announced April 24 that it is joining the Open Government Partnership, a group of 64 countries working jointly to make their governments more open, accountable, and responsive to citizens. The announcement caps a string of policy shifts, which began with the formal release of France’s Open Data Strategy in May 2011 and which parallel similar moves by the US.
The strategy committed France to providing “free access and reuse of public data… using machine-readable formats and open standards,” said Romain Lacombe, head of innovation for the French prime minister’s open government task force, Etalab. The French government is taking steps to end the practice of selling datasets, such as civil and case-law data, and is making them freely reusable. France launched a public data portal, Data.gouv.fr, in December 2011 and joined a G8 initiative to engage with open data innovators worldwide.
For South Korea, open data is not just about achieving greater transparency and efficiency, but is seen as digital fuel for a nation that by 2020 expects to achieve “ambient intelligence… when all humans and things are connected together,” said Dr. YoungSun Lee, who heads South Korea’s National Information Society Agency.
He foresees open data leading to a shift in the ways government will function: from an era of e-government, where information is delivered to citizens, to one where predictive analysis will foster a “creative government,” in which “government provides customized services for each individual.”
The open data movement is also propelling innovative programs in the United Arab Emirates. “The role of open data in directing economic and social decisions pertaining to investments… is of paramount importance” to the UAE, said Dr. Ali M. Al Khouri, director general of the Emirates Identity Authority. It also plays a key role in building public trust and fighting corruption, he said….”
Saving Big Data from Big Mouths
Cesar A. Hidalgo in Scientific American: “It has become fashionable to bad-mouth big data. In recent weeks the New York Times, Financial Times, Wired and other outlets have all run pieces bashing this new technological movement. To be fair, many of the critiques have a point: There has been a lot of hype about big data and it is important not to inflate our expectations about what it can do.
But little of this hype has come from the actual people working with large data sets. Instead, it has come from people who see “big data” as a buzzword and a marketing opportunity—consultants, event organizers and opportunistic academics looking for their 15 minutes of fame.
Most of the recent criticism, however, has been weak and misguided. Naysayers have been attacking straw men, focusing on worst practices, post hoc failures and secondary sources. The common theme has been to a great extent obvious: “Correlation does not imply causation,” and “data has biases.”
Critics of big data have been making three important mistakes:
First, they have misunderstood big data, framing it narrowly as a failed revolution in social science hypothesis testing. In doing so they ignore areas where big data has made substantial progress, such as data-rich Web sites, information visualization and machine learning. If there is one group of big-data practitioners that the critics should worship, they are the big-data engineers building the social media sites where their platitudes spread. Engineering a site rich in data, like Facebook, YouTube, Vimeo or Twitter, is extremely challenging. These sites are possible because of advances made quietly over the past five years, including improvements in database technologies and Web development frameworks.
Big data has also contributed to machine learning and computer vision. Thanks to big data, Facebook algorithms can now match faces almost as accurately as humans do.
And detractors have overlooked big data’s role in the proliferation of computational design, data journalism and new forms of artistic expression. Computational artists, journalists and designers—the kinds of people who congregate at meetings like Eyeo—are using huge sets of data to give us online experiences that are unlike anything we experienced in paper. If we step away from hypothesis testing, we find that big data has made big contributions.
The second mistake critics often make is to confuse the limitations of prototypes with fatal flaws. This is something I have experienced often. For example, in Place Pulse—a project I created with my team the M.I.T. Media Lab—we used Google Street View images and crowdsourced visual surveys to map people’s perception of a city’s safety and wealth. The original method was rife with limitations that we dutifully acknowledged in our paper. Google Street View images are taken at arbitrary times of the day and showed cities from the perspective of a car. City boundaries were also arbitrary. To overcome these limitations, however, we needed a first data set. Producing that first limited version of Place Pulse was a necessary part of the process of making a working prototype.
A year has passed since we published Place Pulse’s first data set. Now, thanks to our focus on “making,” we have computer vision and machine-learning algorithms that we can use to correct for some of these easy-to-spot distortions. Making is allowing us to correct for time of the day and dynamically define urban boundaries. Also, we are collecting new data to extend the method to new geographical boundaries.
Those who fail to understand that the process of making is iterative are in danger of being too quick to condemn promising technologies. In 1920 the New York Times published a prediction that a rocket would never be able to leave atmosphere. Similarly erroneous predictions were made about the car or, more recently, about iPhone’s market share. In 1969 the Times had to publish a retraction of their 1920 claim. What similar retractions will need to be published in the year 2069?
Finally, the doubters have relied too heavily on secondary sources. For instance, they made a piñata out of the 2008 Wired piece by Chris Anderson framing big data as “the end of theory.” Others have criticized projects for claims that their creators never made. A couple of weeks ago, for example, Gary Marcus and Ernest Davis published a piece on big data in the Times. There they wrote about another of one of my group’s projects, Pantheon, which is an effort to collect, visualize and analyze data on historical cultural production. Marcus and Davis wrote that Pantheon “suggests a misleading degree of scientific precision.” As an author of the project, I have been unable to find where I made such a claim. Pantheon’s method section clearly states that: “Pantheon will always be—by construction—an incomplete resource.” That same section contains a long list of limitations and caveats as well as the statement that “we interpret this data set narrowly, as the view of global cultural production that emerges from the multilingual expression of historical figures in Wikipedia as of May 2013.”
Bickering is easy, but it is not of much help. So I invite the critics of big data to lead by example. Stop writing op–eds and start developing tools that improve on the state of the art. They are much appreciated. What we need are projects that are worth imitating and that we can build on, not obvious advice such as “correlation does not imply causation.” After all, true progress is not something that is written, but made.”
Looking for the Needle in a Stack of Needles: Tracking Shadow Economic Activities in the Age of Big Data
Manju Bansal in MIT Technology Review: “The undocumented guys hanging out in the home-improvement-store parking lot looking for day labor, the neighborhood kids running a lemonade stand, and Al Qaeda terrorists plotting to do harm all have one thing in common: They operate in the underground economy, a shadowy zone where businesses, both legitimate and less so, transact in the currency of opportunity, away from traditional institutions and their watchful eyes.
One might think that this alternative economy is limited to markets that are low on the Transparency International rankings (such as sub-Saharan Africa and South Asia, for instance). However, a recent University of Wisconsin report estimates the value of the underground economy in the United States at about $2 trillion, about 15% of the total U.S. GDP. And a 2013 study coauthored by Friedrich Schneider, a noted authority on global shadow economies, estimated the European Union’s underground economy at more than 18% of GDP, or a whopping 2.1 trillion euros. More than two-thirds of the underground activity came from the most developed countries, including Germany, France, Italy, Spain, and the United Kingdom.
Underground economic activity is a multifaceted phenomenon, with implications across the board for national security, tax collections, public-sector services, and more. It includes the activity of any business that relies primarily on old-fashioned cash for most transactions — ranging from legitimate businesses (including lemonade stands) to drug cartels and organized crime.
Though it’s often soiled, heavy to lug around, and easy to lose to theft, cash is still king simply because it is so easy to hide from the authorities. With the help of the right bank or financial institution, “dirty” money can easily be laundered and come out looking fresh and clean, or at least legitimate. Case in point is the global bank HSBC, which agreed to pay U.S. regulators $1.9 billion in fines to settle charges of money laundering on behalf of Mexican drug cartels. According to a U.S. Senate subcommittee report, that process involved transferring $7 billion in cash from the bank’s branches in Mexico to those in the United States. Just for reference, each $100 bill weighs one gram, so to transfer $7 billion, HSBC had to physically transport 70 metric tons of cash across the U.S.-Mexican border.
The Financial Action Task Force, an intergovernmental body established in 1989, has estimated the total amount of money laundered worldwide to be around 2% to 5% of global GDP. Many of these transactions seem, at first glance, to be perfectly legitimate. Therein lies the conundrum for a banker or a government official: How do you identify, track, control, and, one hopes, prosecute money launderers, when they are hiding in plain sight and their business is couched in networked layers of perfectly defensible legitimacy?
Enter big-data tools, such as those provided by SynerScope, a Holland-based startup that is a member of the SAP Startup Focus program. This company’s solutions help unravel the complex networks hidden behind the layers of transactions and interactions.
Networks, good or bad, are near omnipresent in almost any form of organized human activity and particularly in banking and insurance. SynerScope takes data from both structured and unstructured data fields and transforms these into interactive computer visuals that display graphic patterns that humans can use to quickly make sense of information. Spotting of deviations in complex networked processes can easily be put to use in fraud detection for insurance, banking, e-commerce, and forensic accounting.
SynerScope’s approach to big-data business intelligence is centered on data-intense compute and visualization that extend the human “sense-making” capacity in much the same way that a telescope or microscope extends human vision.
To understand how SynerScope helps authorities track and halt money laundering, it’s important to understand how the networked laundering process works. It typically involves three stages.
1. In the initial, or placement, stage, launderers introduce their illegal profits into the financial system. This might be done by breaking up large amounts of cash into less-conspicuous smaller sums that are then deposited directly into a bank account, or by purchasing a series of monetary instruments (checks, money orders) that are then collected and deposited into accounts at other locations.
2. After the funds have entered the financial system, the launderer commences the second stage, called layering, which uses a series of conversions or transfers to distance the funds from their sources. The funds might be channeled through the purchase and sales of investment instruments, or the launderer might simply wire the funds through a series of accounts at various banks worldwide.
Such use of widely scattered accounts for laundering is especially prevalent in those jurisdictions that do not cooperate in anti-money-laundering investigations. Sometimes the launderer disguises the transfers as payments for goods or services.
3. Having successfully processed the criminal profits through the first two phases, the launderer then proceeds to the third stage, integration, in which the funds re-enter the legitimate economy. The launderer might invest the funds in real estate, luxury assets, or business ventures.
Current detection tools compare individual transactions against preset profiles and rules. Sophisticated criminals quickly learn how to make their illicit transactions look normal for such systems. As a result, rules and profiles need constant and costly updating.
But SynerScope’s flexible visual analysis uses a network angle to detect money laundering. It shows the structure of the entire network with data coming in from millions of transactions, a structure that launderers cannot control. With just a few mouse clicks, SynerScope’s relation and sequence views reveal structural interrelationships and interdependencies. When those patterns are mapped on a time scale, it becomes virtually impossible to hide abnormal flows.
