Data Discrimination Means the Poor May Experience a Different Internet


MIT Technology Review: “Data analytics are being used to implement a subtle form of discrimination, while anonymous data sets can be mined to reveal health data and other private information, a Microsoft researcher warned this morning at MIT Technology Review’s EmTech conference.
Kate Crawford, principal researcher at Microsoft Research, argued that these problems could be addressed with new legal approaches to the use of personal data.
In a new paper, she and a colleague propose a system of “due process” that would give people more legal rights to understand how data analytics are used in determinations made against them, such as denial of health insurance or a job. “It’s the very start of a conversation about how to do this better,” Crawford, who is also a visiting professor at the MIT Center for Civic Media, said in an interview before the event. “People think ‘big data’ avoids the problem of discrimination, because you are dealing with big data sets, but in fact big data is being used for more and more precise forms of discrimination—a form of data redlining.”
During her talk this morning, Crawford added that with big data, “you will never know what those discriminations are, and I think that’s where the concern begins.”

The Best American Infographics 2013


41DKY50w7vL._SX258_BO1,204,203,200_ New book by Gareth Cook:  “The rise of infographics across virtually all print and electronic media—from a striking breakdown of classic cocktails to a graphic tracking 200 influential moments that changed the world to visually arresting depictions of Twitter traffic—reveals patterns in our lives and our world in fresh and surprising ways. In the era of big data, where information moves faster than ever, infographics provide us with quick, often influential bursts of art and knowledge—on the environment, politics, social issues, health, sports, arts and culture, and more—to digest, to tweet, to share, to go viral.
The Best American Infographics captures the finest examples from the past year, including the ten best interactive infographics, of this mesmerizing new way of seeing and understanding our world.”
See also selection of some in Wired.
 

If big data is an atomic bomb, disarmament begins in Silicon Valley


at GigaOM: “Big data is like atomic energy, according to scientist Albert-László Barabási in a Monday column on Politico. It’s very beneficial when used ethically, and downright destructive when turned into a weapon. He argues scientists can help resolve the damage done by government spying by embracing the principles of nuclear nonproliferation that helped bring an end to Cold War fears and distrust.
Barabási’s analogy is rather poetic:

“Powered by the right type of Big Data, data mining is a weapon. It can be just as harmful, with long-term toxicity, as an atomic bomb. It poisons trust, straining everything from human relations to political alliances and free trade. It may target combatants, but it cannot succeed without sifting through billions of data points scraped from innocent civilians. And when it is a weapon, it should be treated like a weapon.”

I think he’s right, but I think the fight to disarm the big data bomb begins in places like Silicon Valley and Madison Avenue. And it’s not just scientists; all citizens should have a role…
I write about big data and data mining for a living, and I think the underlying technologies and techniques are incredibly valuable, even if the applications aren’t always ideal. On the one hand, advances in machine learning from companies such as Google and Microsoft are fantastic. On the other hand, Facebook’s newly expanded Graph Search makes Europe’s proposed right-to-be-forgotten laws seem a lot more sensible.
But it’s all within the bounds of our user agreements and beauty is in the eye of the beholder.
Perhaps the reason we don’t vote with our feet by moving to web platforms that embrace privacy, even though we suspect it’s being violated, is that we really don’t know what privacy means. Instead of regulating what companies can and can’t do, perhaps lawmakers can mandate a degree of transparency that actually lets users understand how data is being used, not just what data is being collected. Great, some company knows my age, race, ZIP code and web history: What I really need to know is how it’s using that information to target, discriminate against or otherwise serve me.
An intelligent national discussion about the role of the NSA is probably in order. For all anyone knows,  it could even turn out we’re willing to put up with more snooping than the goverment might expect. But until we get a handle on privacy from the companies we choose to do business with, I don’t think most Americans have the stomach for such a difficult fight.”

Smart Cities Turn Big Data Into Insight [Infographic]


Mark van Rijmenam in SmartDataCollective: “Cities around the globe are confronted with growing populations, aging infrastructure, reduced budgets, and the challenge of doing more with less. Applying big data technologies within cities can provide valuable insights that can keep a city habitable. The City of Songdo is a great example of a connected city, where all connected devices create a smart city that is optimized for the every-changing conditions in that same city. IBM recently released an infographic showing the vast opportunities of smart cities and the possible effects on the economy.”
Infographic Smarter Cities. Turning Big Data into Insight

AskThem


AskThem is a project of the Participatory Politics Foundation, a 501(c)3 non-profit organization with a mission to increase civic engagement. AskThem is supported by a charitable grant from the Knight Foundation’s Tech For Engagement initiative.
AskThem is a free & open-source website for questions-and-answers with public figures. It’s a not-for-profit tool for a stronger democracy, with open data for informed and engaged communities.
AskThem allows you to:

  • Find and ask questions to over 142,000 elected officials nationwide: federal, state and city levels of government.
  • Get signatures for your question or petition, have it delivered over email or Twitter, and push for a public response.
  • See questions from people near you, sign-on to questions you care about, and review answers from public figures.

It’s like a version of “We The People” for every elected official, from local city council members all the way up to U.S. senators. Enter your email above to be the first to ask a question when we launch and see previews of the site this Fall.
Elected officials: enter your email above and we’ll send you more information about signing up to answer questions on AskThem. It’s a free and non-partisan service to respond to your constituents in an open public forum and update them over email about your work. Or, be a leader in open-government and sign up now.
Issue-based organizations and media: sign up to help promote questions to government from people in your area. We’re working to launch with partnerships that build greater public accountability.
Previously known as the OpenGovernment.org project, AskThem is open-source and uses open government data – our code is available on GitHub – contributions welcome. For more development updates & discussion, join our low-traffic Google Group.
We’re a small non-profit organization actively seeking charitable funding support – help us launch this powerful new tool for public dialogue! Email us for a copy of our non-profit funding prospectus. If you can make a tax-exempt gift to support our work, please donate to PPF via OpenCongress. More background on the project is available on our Knight NewsChallenge proposal from March 2013.
Questions, feedback, ideas? Email David Moore, Executive Director of PPF – david at ppolitics.org, Twitter: @ppolitics; like our page on Facebook & follow @AskThemPPF on Twitter. Stay tuned!”

The Brave New World of Good


Brad Smith: “Welcome to the Brave New World of Good. Once almost the exclusive province of nonprofit organizations and the philanthropic foundations that fund them, today the terrain of good is disputed by social entrepreneurs, social enterprises, impact investors, big business, governments, and geeks. Their tools of choice are markets, open data, innovation, hackathons, and disruption. They cross borders, social classes, and paradigms with the swipe of a touch screen. We seemed poised to unleash a whole new era of social and environmental progress, accompanied by unimagined economic prosperity.
As a brand, good is unassailably brilliant. Who could be against it? It is virtually impossible to write an even mildly skeptical blog post about good without sounding well, bad — or at least a bit old-fashioned. For the record, I firmly believe there is much in the brave new world of good that is helping us find our way out of the tired and often failed models of progress and change on which we have for too long relied. Still, there are assumptions worth questioning and questions worth answering to ensure that the good we seek is the good that can be achieved.

Open Data
Second only to “good” in terms of marketing genius is the concept of “open data.” An offspring of previous movements such as “open source,” “open content,” and “open access,” open data in the Internet age has come to mean data that is machine-readable, free to access, and free to use, re-use, and re-distribute, subject to attribution. Fully open data goes way beyond posting your .pdf document on a Web site (as neatly explained by Tim Berners Lee’s five-star framework).
When it comes to government, there is a rapidly accelerating movement around the world that is furthering transparency by making vast stores of data open. Ditto on the data of international aid funders like the United States Agency for International Development, the World Bank, and the Organisation for Economic Co-operation and Development. The push has now expanded to the tax return data of nonprofits and foundations (IRS Forms 990). Collection of data by government has a business model; it’s called tax dollars. However, open data is not born pure. Cleaning that data, making it searchable, and building and maintaining reliable user interfaces is complex, time-consuming, and often expensive. That requires a consistent stream of income of the kind that can only come from fees, subscriptions, or, increasingly less so, government.
Foundation grants are great for short-term investment, experimentation, or building an app or two, but they are no substitute for a scalable business model. Structured, longitudinal data are vital to social, environmental, and economic progress. In a global economy where government is retreating from the funding of public goods, figuring how to pay for the cost of that data is one of our greatest challenges.”

Towards an effective framework for building smart cities: Lessons from Seoul and San Francisco


New paper by JH Lee, MG Hancock, MC Hu in Technological Forecasting and Social Change: “This study aims to shed light on the process of building an effective smart city by integrating various practical perspectives with a consideration of smart city characteristics taken from the literature. We developed a framework for conducting case studies examining how smart cities were being implemented in San Francisco and Seoul Metropolitan City. The study’s empirical results suggest that effective, sustainable smart cities emerge as a result of dynamic processes in which public and private sector actors coordinate their activities and resources on an open innovation platform. The different yet complementary linkages formed by these actors must further be aligned with respect to their developmental stage and embedded cultural and social capabilities. Our findings point to eight ‘stylized facts’, based on both quantitative and qualitative empirical results that underlie the facilitation of an effective smart city. In elaborating these facts, the paper offers useful insights to managers seeking to improve the delivery of smart city developmental projects.”
 

Global Open Data Initiative moving forward


“The Global Open Data Initiative will serve as a guiding voice internationally on open data issues. Civil society groups who focus on open data have often been isolated to single national contexts, despite the similar challenges and opportunities repeating themselves in countries across the globe. The Global Open Data Initiative aims to help share valuable resources, guidance and judgment, and to clarify the potential for government open data across the world.
Provide a leading vision for how governments approach open data. Open data commitments are among the most popular commitments for countries participating in the Open Government Partnership. The Global Open Data Initiative recommendations and resources will help guide open data initiatives and others as they seek to design and implement strong, effective open data initiatives and policies. Global Open Data Initiative resources will also help civil society actors who will be evaluating government initiatives.
Increase awareness of open data. Global Open Data Initiative will work to advance the understanding of open data issues, challenges, and resources by promoting best practices, engaging in online and offline dialogue, and supporting networking between organizations both new and familiar to the open data arena.
Support the development of the global open data community especially in civil society. Civil society organizations (CSOs) have a key role to play as suppliers, intermediaries, and users of open data, though at present, relatively few organizations are engaging with open data and the opportunities it presents. Most CSOs lack the awareness, skills and support needed to be active users and providers of open data in ways that can help them meet their goals. The Global Open Data Initiative aims to help CSOs, to engage with and use open data whether whatever area they work on – be it climate change, democratic rights, land governance or financial reform.
Our immediate focus is on two activities:

  1. To consult with members of the CSO community around the world about what they think is important in this area
  2. Develop a set of principles in collaboration with the CSO community to guide open government data policies and approaches and to help initiate, strengthen and further elevate conversations between governments and civil society.”

The Value of Personal Data


The Digital Enlightenment Yearbook 2013 is dedicated this year to Personal Data:  “The value of personal data has traditionally been understood in ethical terms as a safeguard for personality rights such as human dignity and privacy. However, we have entered an era where personal data are mined, traded and monetized in the process of creating added value – often in terms of free services including efficient search, support for social networking and personalized communications. This volume investigates whether the economic value of personal data can be realized without compromising privacy, fairness and contextual integrity. It brings scholars and scientists from the disciplines of computer science, law and social science together with policymakers, engineers and entrepreneurs with practical experience of implementing personal data management.
The resulting collection will be of interest to anyone concerned about privacy in our digital age, especially those working in the field of personal information management, whether academics, policymakers, or those working in the private sector.”

Using Big Data to Ask Big Questions


Chase Davis in the SOURCE: “First, let’s dispense with the buzzwords. Big Data isn’t what you think it is: Every federal campaign contribution over the last 30-plus years amounts to several tens of millions of records. That’s not Big. Neither is a dataset of 50 million Medicare records. Or even 260 gigabytes of files related to offshore tax havens—at least not when Google counts its data in exabytes. No, the stuff we analyze in pursuit of journalism and app-building is downright tiny by comparison.
But you know what? That’s ok. Because while super-smart Silicon Valley PhDs are busy helping Facebook crunch through petabytes of user data, they’re also throwing off intellectual exhaust that we can benefit from in the journalism and civic data communities. Most notably: the ability to ask Big Questions.
Most of us who analyze public data for fun and profit are familiar with small questions. They’re focused, incisive, and often have the kind of black-and-white, definitive answers that end up in news stories: How much money did Barack Obama raise in 2012? Is the murder rate in my town going up or down?
Big Questions, on the other hand, are speculative, exploratory, and systemic. As the name implies, they are also answered at scale: Rather than distilling a small slice of a dataset into a concrete answer, Big Questions look at entire datasets and reveal small questions you wouldn’t have thought to ask.
Can we track individual campaign donor behavior over decades, and what does that tell us about their influence in politics? Which neighborhoods in my city are experiencing spikes in crime this week, and are police changing patrols accordingly?
Or, by way of example, how often do interest groups propose cookie-cutter bills in state legislatures?

Looking at Legislation

Even if you don’t follow politics, you probably won’t be shocked to learn that lawmakers don’t always write their own bills. In fact, interest groups sometimes write them word-for-word.
Sometimes those groups even try to push their bills in multiple states. The conservative American Legislative Exchange Council has gotten some press, but liberal groups, social and business interests, and even sororities and fraternities have done it too.
On its face, something about elected officials signing their names to cookie-cutter bills runs head-first against people’s ideal of deliberative Democracy—hence, it tends to make news. Those can be great stories, but they’re often limited in scope to a particular bill, politician, or interest group. They’re based on small questions.
Data science lets us expand our scope. Rather than focusing on one bill, or one interest group, or one state, why not ask: How many model bills were introduced in all 50 states, period, by anyone, during the last legislative session? No matter what they’re about. No matter who introduced them. No matter where they were introduced.
Now that’s a Big Question. And with some basic data science, it’s not particularly hard to answer—at least at a superficial level.

Analyze All the Things!

Just for kicks, I tried building a system to answer this question earlier this year. It was intended as an example, so I tried to choose methods that would make intuitive sense. But it also makes liberal use of techniques applied often to Big Data analysis: k-means clustering, matrices, graphs, and the like.
If you want to follow along, the code is here….
To make exploration a little easier, my code represents similar bills in graph space, shown at the top of this article. Each dot (known as a node) represents a bill. And a line connecting two bills (known as an edge) means they were sufficiently similar, according to my criteria (a cosine similarity of 0.75 or above). Thrown into a visualization software like Gephi, it’s easy to click around the clusters and see what pops out. So what do we find?
There are 375 clusters in total. Because of the limitations of our data, many of them represent vague, subject-specific bills that just happen to have similar titles even though the legislation itself is probably very different (think things like “Budget Bill” and “Campaign Finance Reform”). This is where having full bill text would come handy.
But mixed in with those bills are a handful of interesting nuggets. Several bills that appear to be modeled after legislation by the National Conference of Insurance Legislators appear in multiple states, among them: a bill related to limited lines travel insurance; another related to unclaimed insurance benefits; and one related to certificates of insurance.”