What Is Big Data?


datascience@berkeley Blog: ““Big Data.” It seems like the phrase is everywhere. The term was added to the Oxford English Dictionary in 2013 External link, appeared in Merriam-Webster’s Collegiate Dictionary by 2014 External link, and Gartner’s just-released 2014 Hype Cycle External link shows “Big Data” passing the “Peak of Inflated Expectations” and on its way down into the “Trough of Disillusionment.” Big Data is all the rage. But what does it actually mean?
A commonly repeated definition External link cites the three Vs: volume, velocity, and variety. But others argue that it’s not the size of data that counts, but the tools being used, or the insights that can be drawn from a dataset.
To settle the question once and for all, we asked 40+ thought leaders in publishing, fashion, food, automobiles, medicine, marketing and every industry in between how exactly they would define the phrase “Big Data.” Their answers might surprise you! Take a look below to find out what big data is:

  1. John Akred, Founder and CTO, Silicon Valley Data Science
  2. Philip Ashlock, Chief Architect of Data.gov
  3. Jon Bruner, Editor-at-Large, O’Reilly Media
  4. Reid Bryant, Data Scientist, Brooks Bell
  5. Mike Cavaretta, Data Scientist and Manager, Ford Motor Company
  6. Drew Conway, Head of Data, Project Florida
  7. Rohan Deuskar, CEO and Co-Founder, Stylitics
  8. Amy Escobar, Data Scientist, 2U
  9. Josh Ferguson, Chief Technology Officer, Mode Analytics
  10. John Foreman, Chief Data Scientist, MailChimp

FULL LIST at datascience@berkeley Blog”

From “Bitcoin to Burning Man and Beyond”


IDCubed: “From Bitcoin to Burning Man and Beyond: The Quest for Autonomy and Identity in a Digital Society explores a new generation of digital technologies that are re-imagining the very foundations of identity, governance, trust and social organization.
The fifteen essays of this book stake out the foundations of a new future – a future of open Web standards and data commons, a society of decentralized autonomous organizations, a world of trustworthy digital currencies and self-organized and expressive communities like Burning Man.
Among the contributors are Alex “Sandy” Pentland of the M.I.T. Human Dynamics Laboratory, former FCC Chairman Reed E. Hundt, long-time IBM strategist Irving Wladawksy-Berger, monetary system expert Bernard Lietaer, Silicon Valley entrepreneur Peter Hirshberg, journalist Jonathan Ledgard and H-Farm cofounder Maurizio Rossi.
From Bitcoin to Burning Man and Beyond was edited by Dr. John H. Clippinger, cofounder and executive director of ID3, and David Bollier, an Editor at ID3 who is also an author, blogger and scholar who studies the commons. The book, published by ID3 in association with Off the Common Books, reflects ID3’s vision of the huge, untapped potential for self-organized, distributed governance on open platforms.
The book is available in print and ebook formats (Kindle and epub) from Amazon.com and Off the Common Books. The book, licensed under a Creative Commons Attribution-NonCommercial-ShareAlike license (BY-NC-SA), may also be downloaded for free as a pdf file from ID3.
One chapter that inspires the book’s title traces the 28-year history of Burning Man, the week-long encampment in the Nevada desert that have hosted remarkable experimentation in new forms of self-governance by large communities. Other chapters explore such cutting-edge concepts as

  • evolvable digital contracts that could supplant conventional legal agreements;
  • smartphone currencies that could help Africans meet their economic needs more effective;
  • the growth of the commodity-backed Ven currency; and
  • new types of “solar currencies” that borrow techniques from Bitcoin to enable more efficient, cost-effective solar generation and sharing by homeowners.

From Bitcoin to Burning Man and Beyond also introduces the path-breaking software platform that ID3 has developed called “Open Mustard Seed,” or OMS. The just-released open source program enables the rise of new types of trusted, self-healing digital institutions on open networks, which in turn will make possible new sorts of privacy-friendly social ecosystems.
“OMS is an integrated, open source package of programs that lets people collect and share personal information in secure, and transparent and accountable ways, enabling authentic, trusted social and economic relationships to flourish,” said Dr. John H. Clippinger, executive director of ID3, an acronym for the Institute for Institutional Innovation and Data-Driven Design.
“The software builds individual privacy, security and trusted exchange into the very design of the system. In effect, OMS represents a new authentication, privacy and sharing layer for the Internet,” said Clippinger “– a new way to share personal information selectively and securely, without access by unauthorized third parties.”
A two-minute video introducing the capabilities of OMS can be viewed here.”

Big Data and Chicago's Traffic-cam Scandal


Holman Jenkins in the Wall Street Journal: “The danger is microscopic regulation that we invite via the democratic process.
Big data techniques are new in the world. It will take time to know how to feel about them and whether and how they should be legally corralled. For sheer inanity, though, there’s no beating a recent White House report quivering about the alleged menace of “digital redlining,” or the use of big-data marketing tactics in ways that supposedly disadvantage minority groups.
This alarm rests on an extravagant misunderstanding. Redlining was a crude method banks used to avoid losses in bad neighborhoods even at the cost of missing some profitable transactions—exactly the inefficiency big data is meant to improve upon. Failing to lure an eligible customer into a sale, after all, is hardly the goal of any business.
The real danger of the new technologies lies elsewhere, which the White House slightly touches upon in some of its fretting about police surveillance. The danger is microscopic regulation of our daily activities that we will invite on ourselves through the democratic process.
Soon it may be impossible to leave our homes without our movements being tracked by traffic and security cameras able to read license plates, identify faces and pull up data about any individual, from social media postings to credit reports.
Private businesses are just starting to use these techniques to monitor shoppers in front of shelves of goodies. Towns and cities have already embraced such techniques as revenue grabs, encouraged by private contractors peddling automated traffic cameras.
Witness a festering Chicago scandal. This month came federal indictments of a former city bureaucrat, an outside consultant, and the former CEO of Redflex Traffic Systems, the company that operated the city’s traffic cameras until last year….”
 

In democracy and disaster, emerging world embraces 'open data'


Jeremy Wagstaff’ at Reuters: “Open data’ – the trove of data-sets made publicly available by governments, organizations and businesses – isn’t normally linked to high-wire politics, but just may have saved last month’s Indonesian presidential elections from chaos.
Data is considered open when it’s released for anyone to use and in a format that’s easy for computers to read. The uses are largely commercial, such as the GPS data from U.S.-owned satellites, but data can range from budget numbers and climate and health statistics to bus and rail timetables.
It’s a revolution that’s swept the developed world in recent years as governments and agencies like the World Bank have freed up hundreds of thousands of data-sets for use by anyone who sees a use for them. Data.gov, a U.S. site, lists more than 100,000 data-sets, from food calories to magnetic fields in space.
Consultants McKinsey reckon open data could add up to $3 trillion worth of economic activity a year – from performance ratings that help parents find the best schools to governments saving money by releasing budget data and asking citizens to come up with cost-cutting ideas. All the apps, services and equipment that tap the GPS satellites, for example, generate $96 billion of economic activity each year in the United States alone, according to a 2011 study.
But so far open data has had a limited impact in the developing world, where officials are wary of giving away too much information, and where there’s the issue of just how useful it might be: for most people in emerging countries, property prices and bus schedules aren’t top priorities.
But last month’s election in Indonesia – a contentious face-off between a disgraced general and a furniture-exporter turned reformist – highlighted how powerful open data can be in tandem with a handful of tech-smart programmers, social media savvy and crowdsourcing.
“Open data may well have saved this election,” said Paul Rowland, a Jakarta-based consultant on democracy and governance…”
 

Riding the Second Wave of Civic Innovation


Jeremy Goldberg at Governing: “Innovation and entrepreneurship in local government increasingly require mobilizing talent from many sectors and skill sets. Fortunately, the opportunities for nurturing cross-pollination between the public and private sectors have never been greater, thanks in large part to the growing role of organizations such as Bayes Impact, Code for America, Data Science for Social Good and Fuse Corps.
Indeed, there’s reason to believe that we might be entering an even more exciting period of public-private collaboration. As one local-government leader recently put it to me when talking about the critical mass of pro-bono civic-innovation efforts taking place across the San Francisco Bay area, “We’re now riding the second wave of civic pro-bono and civic innovation.”
As an alumni of Fuse Corps’ executive fellows program, I’m convinced that the opportunities initiated by it and similar organizations are integral to civic innovation. Fuse Corps brings civic entrepreneurs with experience across the public, private and nonprofit sectors to work closely with government employees to help them negotiate project design, facilitation and management hurdles. The organization’s leadership training emphasizes “smallifying” — building innovation capacity by breaking big challenges down into smaller tasks in a shorter timeframe — and making “little bets” — low-risk actions aimed at developing and testing an idea.
Since 2012, I have managed programs and cross-sector networks for the Silicon Valley Talent Partnership. I’ve witnessed a groundswell of civic entrepreneurs from across the region stepping up to participate in discussions and launch rapid-prototyping labs focused on civic innovation.
Cities across the nation are creating new roles and programs to engage these civic start-ups. They’re learning that what makes these projects, and specifically civic pro-bono programs, work best is a process of designing, building, operationalizing and bringing them to scale. If you’re setting out to create such a program, here’s a short list of best practices:
Assets: Explore existing internal resources and knowledge to understand the history, departmental relationships and overall functions of the relevant agencies or departments. Develop a compendium of current service/volunteer programs.
City policies/legal framework: Determine what the city charter, city attorney’s office or employee-relations rules and policies say about procurement, collective bargaining and public-private partnerships.
Leadership: The support of the city’s top leadership is especially important during the formative stages of a civic-innovation program, so it’s important to understand how the city’s form of government will impact the program. For example, in a “strong mayor” government the ability to make definitive decisions on a public-private collaboration may be unlikely to face the same scrutiny as it might under a “council/mayor” government.
Cross-departmental collaboration: This is essential. Without the support of city staff across departments, innovation projects are unlikely to take off. Convening a “tiger team” of individuals who are early adopters of such initiatives is important step. Ultimately, city staffers best understand the needs and demands of their departments or agencies.
Partners from corporations and philanthropy: Leveraging existing partnerships will help to bring together an advisory group of cross-sector leaders and executives to participate in the early stages of program development.
Business and member associations: For the Silicon Valley Talent Partnership, the Silicon Valley Leadership Group has been instrumental in advocating for pro-bono volunteerism with the cities of Fremont, San Jose and Santa Clara….”

Detroit and Big Data Take on Blight


Susan Crawford in Bloomberg View: “The urban blight that has been plaguing Detroit was, until very recently, made worse by a dearth of information about the problem. No one could tell how many buildings needed fixing or demolition, or how effectively city services were being delivered to them (or not). Today, thanks to the combined efforts of a scrappy small business, tech-savvy city leadership and substantial philanthropic support, the extent of the problem is clear.
The question now is whether Detroit has the heart to use the information to make hard choices about its future.
In the past, when the city foreclosed on properties for failure to pay back taxes, it had no sense of where those properties were clustered. The city would auction off the houses for the bargain-basement price of $500 each, but the auction was entirely undocumented, so neighbors were unaware of investment opportunities, big buyers were gaming the system, and, as often as not, arsonists would then burn the properties down. The result of this blind spot was lost population, lost revenue and even more blight.
Then along came Jerry Paffendorf, a San Francisco transplant, who saw what was needed. His company, Loveland Technologies, started mapping all the tax-foreclosed and auctioned properties. Impressed with Paffendorf’s zeal, the city’s Blight Task Force, established by President Barack Obama and funded by foundations and the state Housing Development Authority, hired his team to visit every property in the city. That led to MotorCityMapping.org, the first user-friendly collection of information about all the attributes of every property in Detroit — including photographs.
Paffendorf calls this map a “scan of the genome of the city.” It shows more than 84,000 blighted structures and vacant lots; in eight neighborhoods, crime, fires and other torments have led to the abandonment of more than a third of houses and businesses. To demolish all those houses, as recommended by the Blight Task Force, will cost almost $2 billion. Still more money will then be needed to repurpose the sites….”

Open Intellectual Property Casebook


New book by James Boyle & Jennifer Jenkins: “..This book, the first in a series of Duke Open Coursebooks, is available for free download under a Creative Commons license. It can also be purchased in a glossy paperback print edition for $29.99, $130 cheaper than other intellectual property casebooks.
This book is an introduction to intellectual property law, the set of private legal rights that allows individuals and corporations to control intangible creations and marks—from logos to novels to drug formulae—and the exceptions and limitations that define those rights. It focuses on the three main forms of US federal intellectual property—trademark, copyright and patent—but many of the ideas discussed here apply far beyond those legal areas and far beyond the law of the United States.
The book is intended to be a textbook for the basic Intellectual Property class, but because it is an open coursebook, which can be freely edited and customized, it is also suitable for an undergraduate class, or for a business, library studies, communications or other graduate school class. Each chapter contains cases and secondary readings and a set of problems or role-playing exercises involving the material. The problems range from a video of the Napster oral argument to counseling clients about search engines and trademarks, applying the First Amendment to digital rights management and copyright or commenting on the Supreme Court’s new rulings on gene patents.
Intellectual Property: Law & the Information Society is current as of August 2014. It includes discussions of such issues as the Redskins trademark cancelations, the Google Books case and the America Invents Act. Its illustrations range from graphs showing the growth in patent litigation to comic book images about copyright. The best way to get some sense of its coverage is to download it. In coming weeks, we will provide a separate fuller webpage with a table of contents and individual downloadable chapters.
The Center has also published an accompanying supplement of statutory and treaty materials that is available for free download and low cost print purchase.”

Google's fact-checking bots build vast knowledge bank


Hal Hodson in the New Scientist: “The search giant is automatically building Knowledge Vault, a massive database that could give us unprecedented access to the world’s facts

GOOGLE is building the largest store of knowledge in human history – and it’s doing so without any human help. Instead, Knowledge Vault autonomously gathers and merges information from across the web into a single base of facts about the world, and the people and objects in it.

The breadth and accuracy of this gathered knowledge is already becoming the foundation of systems that allow robots and smartphones to understand what people ask them. It promises to let Google answer questions like an oracle rather than a search engine, and even to turn a new lens on human history.

Knowledge Vault is a type of “knowledge base” – a system that stores information so that machines as well as people can read it. Where a database deals with numbers, a knowledge base deals with facts. When you type “Where was Madonna born” into Google, for example, the place given is pulled from Google’s existing knowledge base.

This existing base, called Knowledge Graph, relies on crowdsourcing to expand its information. But the firm noticed that growth was stalling; humans could only take it so far. So Google decided it needed to automate the process. It started building the Vault by using an algorithm to automatically pull in information from all over the web, using machine learning to turn the raw data into usable pieces of knowledge.

Knowledge Vault has pulled in 1.6 billion facts to date. Of these, 271 million are rated as “confident facts”, to which Google’s model ascribes a more than 90 per cent chance of being true. It does this by cross-referencing new facts with what it already knows.

“It’s a hugely impressive thing that they are pulling off,” says Fabian Suchanek, a data scientist at Télécom ParisTech in France.

Google’s Knowledge Graph is currently bigger than the Knowledge Vault, but it only includes manually integrated sources such as the CIA Factbook.

Knowledge Vault offers Google fast, automatic expansion of its knowledge – and it’s only going to get bigger. As well as the ability to analyse text on a webpage for facts to feed its knowledge base, Google can also peer under the surface of the web, hunting for hidden sources of data such as the figures that feed Amazon product pages, for example.

Tom Austin, a technology analyst at Gartner in Boston, says that the world’s biggest technology companies are racing to build similar vaults. “Google, Microsoft, Facebook, Amazon and IBM are all building them, and they’re tackling these enormous problems that we would never even have thought of trying 10 years ago,” he says.

The potential of a machine system that has the whole of human knowledge at its fingertips is huge. One of the first applications will be virtual personal assistants that go way beyond what Siri and Google Now are capable of, says Austin…”

Our future government will work more like Amazon


Michael Case in The Verge: “There is a lot of government in the United States. Several hundred federal agencies, 535 voting members in two houses of Congress, more than 90,000 state and local governments, and over 20 million Americans involved in public service.

We say we have a government for and by the people. But the way American government conducts its day-to-day business does not feel like anything we, the people weaned on the internet, would design in 2014. Most interactions with the US government don’t resemble anything else we’re used to in our daily lives….

But if the government is ever going to completely retool itself to provide sensible services to a growing, aging, diversifying American population, it will have to do more than bring in a couple innovators and throw data at the public. At the federal level, these kinds of adjustments will require new laws to change the way money is allocated to executive branch agencies so they can coordinate the purchase and development of a standard set of tools. State and local governments will have to agree on standard tools and data formats as well so that the mayor of Anchorage can collaborate with the governor of Delaware.

Technology is the answer to a lot of American government’s current operational shortcomings. Not only are the tools and systems most public servants use outdated and suboptimal, but the organizations and processes themselves have also calcified around similarly out-of-date thinking. So the real challenge won’t be designing cutting edge software or high tech government facilities — it’s going to be conjuring the will to overcome decades of old thinking. It’s going to be convincing over 90,000 employees to learn new skills, coaxing a bitterly divided Congress to collaborate on something scary, and finding a way to convince a timid and distracted White House to put its name on risky investments that won’t show benefits for many years.

But! If we can figure out a way for governments across the country to perform their basic functions and provide often life-saving services, maybe we can move on to chase even more elusive government tech unicorns. Imagine voting from your smartphone, having your taxes calculated and filed automatically with a few online confirmations, or filing for your retirement at a friendly tablet kiosk at your local government outpost. Government could — feasibly — be not only more effective, but also a pleasure to interact with someday. Someday.”

America in Decay


Francis Fukuyama in Foreign Affairs:”… Institutions are “stable, valued, recurring patterns of behaviour”, as Huntington put it, the most important function of which is to facilitate collective action. Without some set of clear and relatively stable rules, human beings would have to renegotiate their interactions at every turn. Such rules are often culturally determined and vary across different societies and eras, but the capacity to create and adhere to them is genetically hard-wired into the human brain. A natural tendency to conformism helps give institutions inertia and is what has allowed human societies to achieve levels of social cooperation unmatched by any other animal species.
The very stability of institutions, however, is also the source of political decay. Institutions are created to meet the demands of specific circumstances, but then circumstances change and institutions fail to adapt. One reason is cognitive: people develop mental models of how the world works and tend to stick to them, even in the face of contradictory evidence. Another reason is group interest: institutions create favored classes of insiders who develop a stake in the status quo and resist pressures to reform.
In theory, democracy, and particularly the Madisonian version of democracy that was enshrined in the US Constitution, should mitigate the problem of such insider capture by preventing the emergence of a dominant faction or elite that can use its political power to tyrannize over the country. It does so by spreading power among a series of competing branches of government and allowing for competition among different interests across a large and diverse country.
But Madisonian democracy frequently fails to perform as advertised. Elite insiders typically have superior access to power and information, which they use to protect their interests. Ordinary voters will not get angry at a corrupt politician if they don’t know that money is being stolen in the first place. Cognitive rigidities or beliefs may also prevent social groups from mobilizing in their own interests. For example, in the United States, many working-class voters support candidates promising to lower taxes on the wealthy, despite the fact that such tax cuts will arguably deprive them of important government services.
Furthermore, different groups have different abilities to organize to defend their interests. Sugar producers and corn growers are geographically concentrated and focused on the prices of their products, unlike ordinary consumers or taxpayers, who are dispersed and for whom the prices of these commodities are only a small part of their budgets. Given institutional rules that often favor special interests (such as the fact that Florida and Iowa, where sugar and corn are grown, are electoral swing states), those groups develop an outsized influence over agricultural and trade policy. Similarly, middle-class groups are usually much more willing and able to defend their interests, such as the preservation of the home mortgage tax deduction, than are the poor. This makes such universal entitlements as Social Security or health insurance much easier to defend politically than programs targeting the poor only.
Finally, liberal democracy is almost universally associated with market economies, which tend to produce winners and losers and amplify what James Madison termed the “different and unequal faculties of acquiring property.” This type of economic inequality is not in itself a bad thing, insofar as it stimulates innovation and growth and occurs under conditions of equal access to the economic system. It becomes highly problematic, however, when the economic winners seek to convert their wealth into unequal political influence. They can do so by bribing a legislator or a bureaucrat, that is, on a transactional basis, or, what is more damaging, by changing the institutional rules to favor themselves — for example, by closing off competition in markets they already dominate, tilting the playing field ever more steeply in their favor.
Political decay thus occurs when institutions fail to adapt to changing external circumstances, either out of intellectual rigidities or because of the power of incumbent elites to protect their positions and block change. Decay can afflict any type of political system, authoritarian or democratic. And while democratic political systems theoretically have self-correcting mechanisms that allow them to reform, they also open themselves up to decay by legitimating the activities of powerful interest groups that can block needed change.
This is precisely what has been happening in the United States in recent decades, as many of its political institutions have become increasingly dysfunctional. A combination of intellectual rigidity and the power of entrenched political actors is preventing those institutions from being reformed. And there is no guarantee that the situation will change much without a major shock to the political order….”