Facebook Nation


Essay by Leonidas Donskis in the Hedgehog Review: “A German anthropologist who lived in England and was studying the society and culture of Kosovo Albanians once said something that lodged firmly in my memory: Whenever she wanted to make an interesting thought or position available to Albanian students or intellectuals, she simply published it on her Facebook page. “You don’t have to add a comment or explain anything,” she said with a smile. “The posting of it on Facebook is a sure sign that the message is a good one.”
She promised to perform this operation with a few thoughts of mine she had heard and liked in one of my lectures. When she noticed my surprise, she cheerfully explained that the Albanians considers themselves such a particularly dispersed lot, such a diaspora nation par excellence, that they are inclined to view themselves as a coherent collectivity only on Facebook. Without it, their friends, relatives, and family members living in Albania and elsewhere would have no tangible ties to one another.
This made me think that today’s version of Ahasver, the Wandering Jew of legend, is a Facebook user, and that Facebook—not the physical world—is where that displaced person wanders.
The Facebook Nation is host to an ongoing referendum in which its denizens cast their ballots daily, hourly, even minute by minute. Let’s make no mistake about what it means to be in this peculiar digital republic. For a lover, as Milan Kundera put it, to be is to live in the eye of one’s beloved. And what is it to be for people who don’t know where they are or where they want to be or even if they exist at all? Quite simply, it is to be liked on Facebook.
Facebook is where everyone becomes his or her own journalist, a situation that has forced real journalists to become Facebooking and tweeting simulacra of their former selves, or else to abandon journalism for other pursuits, whether literary or academic or something altogether different. Évariste Gamelin, the protagonist of Anatole France’s 1912 novel Les dieux ont soif (The Gods Are Thirsty), is a painter, a passionate disciple of Jacques Louis David, and a young fanatic who doesn’t know whether he is fatally in love with his charming girlfriend or with his mother, the Revolution. He believes that after the Revolution every citizen will become a judge of himself and of the Republic. But modernity plays a joke on him: It does not fulfill this promise. It had no intention of doing so. Instead of turning into judges, we all became journalists…”

Networks and Hierarchies


on whether political hierarchy in the form of the state has met its match in today’s networked world in the American Interest: “…To all the world’s states, democratic and undemocratic alike, the new informational, commercial, and social networks of the internet age pose a profound challenge, the scale of which is only gradually becoming apparent. First email achieved a dramatic improvement in the ability of ordinary citizens to communicate with one another. Then the internet came to have an even greater impact on the ability of citizens to access information. The emergence of search engines marked a quantum leap in this process. The advent of laptops, smartphones, and other portable devices then emancipated electronic communication from the desktop. With the explosive growth of social networks came another great leap, this time in the ability of citizens to share information and ideas.
It was not immediately obvious how big a challenge all this posed to the established state. There was a great deal of cheerful talk about the ways in which the information technology revolution would promote “smart” or “joined-up” government, enhancing the state’s ability to interact with citizens. However, the efforts of Anonymous, Wikileaks and Edward Snowden to disrupt the system of official secrecy, directed mainly against the U.S. government, have changed everything. In particular, Snowden’s revelations have exposed the extent to which Washington was seeking to establish a parasitical relationship with the key firms that operate the various electronic networks, acquiring not only metadata but sometimes also the actual content of vast numbers of phone calls and messages. Techniques of big-data mining, developed initially for commercial purposes, have been adapted to the needs of the National Security Agency.
The most recent, and perhaps most important, network challenge to hierarchy comes with the advent of virtual currencies and payment systems like Bitcoin. Since ancient times, states have reaped considerable benefits from monopolizing or at least regulating the money created within their borders. It remains to be seen how big a challenge Bitcoin poses to the system of national fiat currencies that has evolved since the 1970s and, in particular, how big a challenge it poses to the “exorbitant privilege” enjoyed by the United States as the issuer of the world’s dominant reserve (and transaction) currency. But it would be unwise to assume, as some do, that it poses no challenge at all….”

No silver bullet: De-identification still doesn’t work


Arvind Narayanan and Edward W. Felten: “Paul Ohm’s 2009 article Broken Promises of Privacy spurred a debate in legal and policy circles on the appropriate response to computer science research on re-identification techniques. In this debate, the empirical research has often been misunderstood or misrepresented. A new report by Ann Cavoukian and Daniel Castro is full of such inaccuracies, despite its claims of “setting the record straight.” In a response to this piece, Ed Felten and I point out eight of our most serious points of disagreement with Cavoukian and Castro. The thrust of our arguments is that (i) there is no evidence that de-identification works either in theory or in practice and (ii) attempts to quantify its efficacy are unscientific and promote a false sense of security by assuming unrealistic, artificially constrained models of what an adversary might do. Specifically, we argue that:

  1. There is no known effective method to anonymize location data, and no evidence that it’s meaningfully achievable.
  2. Computing re-identification probabilities based on proof-of-concept demonstrations is silly.
  3. Cavoukian and Castro ignore many realistic threats by focusing narrowly on a particular model of re-identification.
  4. Cavoukian and Castro concede that de-identification is inadequate for high-dimensional data. But nowadays most interesting datasets are high-dimensional.
  5. Penetrate-and-patch is not an option.
  6. Computer science knowledge is relevant and highly available.
  7. Cavoukian and Castro apply different standards to big data and re-identification techniques.
  8. Quantification of re-identification probabilities, which permeates Cavoukian and Castro’s arguments, is a fundamentally meaningless exercise.

Data privacy is a hard problem. Data custodians face a choice between roughly three alternatives: sticking with the old habit of de-identification and hoping for the best; turning to emerging technologies like differential privacy that involve some trade-offs in utility and convenience; and using legal agreements to limit the flow and use of sensitive data. These solutions aren’t fully satisfactory, either individually or in combination, nor is any one approach the best in all circumstances. Change is difficult. When faced with the challenge of fostering data science while preventing privacy risks, the urge to preserve the status quo is understandable. However, this is incompatible with the reality of re-identification science. If a “best of both worlds” solution exists, de-identification is certainly not that solution. Instead of looking for a silver bullet, policy makers must confront hard choices.”

Introduction to Open Geospatial Consortium (OGC) Standards


Joseph McGenn; Dominic Taylor; Gail Millin-Chalabi (Editor); Kamie Kitmitto (Editor) at Jorum : “The onset of the Information Age and Digital Revolution has created a knowledge based society where the internet acts as a global platform for the sharing of information. In a geospatial context, this resulted in an advancement of techniques in how we acquire, study and share geographic information and with the development of Geographic Information Systems (GIS), locational services, and online mapping, spatial data has never been more abundant. The transformation to this digital era has not been without its drawbacks, and a forty year lack of common polices to data sharing has resulted in compatibility issues and great diversity in how software and data are delivered. Essential to the sharing of spatial information is interoperability, where different programmes can exchange and open data from various sources seamlessly. Applying universal standards across a sector provides interoperable solutions. The Open Geospatial Consortium (OGC) facilitates interoperability by providing open standard specifications which organisations can use to develop geospatial software. This means that two separate pieces of software or platforms, if developed using open standard specifications, can exchange data without compatibility issues. By defining these specifications and standards the OGC plays a crucial role in how geospatial information is shared on a global scale. Standard specifications are the invisible glue that holds information systems together, without which, data sharing generally would be an arduous task. On some level they keep the world spinning and this course will instil some appreciation for them from a geospatial perspective. This course introduces users to the OGC and all the common standards in the context of geoportals and mapping solutions. These standards are defined and explored using a number of platforms and interoperability is demonstrated in a practical sense. Finally, users will implement these standards to develop their own platforms for sharing geospatial information.”

Brief survey of crowdsourcing for data mining


Paper by Guo XintongWang Hongzhi, Yangqiu Song, and Gao Hong in Expert Systems with Applications: “Crowdsourcing allows large-scale and flexible invocation of human input for data gathering and analysis, which introduces a new paradigm of data mining process. Traditional data mining methods often require the experts in analytic domains to annotate the data. However, it is expensive and usually takes a long time. Crowdsourcing enables the use of heterogeneous background knowledge from volunteers and distributes the annotation process to small portions of efforts from different contributions. This paper reviews the state-of-the-arts on the crowdsourcing for data mining in recent years. We first review the challenges and opportunities of data mining tasks using crowdsourcing, and summarize the framework of them. Then we highlight several exemplar works in each component of the framework, including question designing, data mining and quality control. Finally, we conclude the limitation of crowdsourcing for data mining and suggest related areas for future research.

New Commerce Department report explores huge benefits, low cost of government data


Mark Doms, Under Secretary for Economic Affairs in a blog: Today we are pleased to roll out an important new Commerce Department report on government data. “Fostering Innovation, Creating Jobs, Driving Better Decisions: The Value of Government Data,” arrives as our society increasingly focuses on how the intelligent use of data can make our businesses more competitive, our governments smarter, and our citizens better informed.

And when it comes to data, as the Under Secretary for Economic Affairs, I have a special appreciation for the Commerce Department’s two preeminent statistical agencies, the Census Bureau and the Bureau of Economic Analysis. These agencies inform us on how our $17 trillion economy is evolving and how our population (318 million and counting) is changing, data critical to our country. Although “Big Data” is all the rage these days, the government has been in this  business for a long time: the first Decennial Census was in 1790, gathering information on close to four million people, a huge dataset for its day, and not too shabby by today’s standards as well.

Just how valuable is the data we provide? Our report seeks to answer this question by exploring the range of federal statistics and how they are applied in decision-making. Examples of our data include gross domestic product, employment, consumer prices, corporate profits, retail sales, agricultural supply and demand, population, international trade and much more.

Clearly, as shown in the report, the value of this information to our society far exceeds its cost – and not just because the price tag is shockingly low: three cents, per person, per day. Federal statistics guide trillions of dollars in annual investments at an average annual cost of $3.7 billion: just 0.02 percent of our $17 trillion dollar economy covers the massive amount of data collection, processing and dissemination. With a statistical system that is comprehensive, consistent, confidential, relevant and accessible, the federal government is uniquely positioned to provide a wide range of statistics that complement the vast and growing sources of private sector data.

Our federally collected information is frequently “invisible,” because attribution is not required. But it flows daily into myriad commercial products and services. Today’s report identifies the industries that intensively use our data and provides a rough estimate of the size of this sector. The lower-bound estimate suggests government statistics help private firms generate revenues of at least $24 billion annually – more than six times what we spend for the data. The upper-bound estimate suggests annual revenues of $221 billion!

This report takes a first crack at putting an actual dollars and cents value to government data. We’ve learned a lot from this initial study, and look forward to honing in even further on that figure in our next report.”

Forget The Wisdom of Crowds; Neurobiologists Reveal The Wisdom Of The Confident


Emerging Technology From the arXiv: “Way back in 1906, the English polymath Francis Galton visited a country fair in which 800 people took part in a contest to guess the weight of a slaughtered ox. After the fair, he collected the guesses and calculated their average which turned out to be 1208 pounds. To Galton’s surprise, this was within 1 per cent of the true weight of 1198 pounds.
This is one of the earliest examples of a phenomenon that has come to be known as the wisdom of the crowd. The idea is that the collective opinion of a group of individuals can be better than a single expert opinion.
This phenomenon is commonplace today on websites such as Reddit in which users vote on the importance of particular stories and the most popular are given greater prominence.
However, anyone familiar with Reddit will know that the collective opinion isn’t always wise. In recent years, researchers have spent a significant amount of time and effort teasing apart the factors that make crowds stupid. One important factor turns out to be the way members of a crowd influence each other.
It turns out that if a crowd offers a wide range of independent estimates, then it is more likely to be wise. But if members of the crowd are influenced in the same way, for example by each other or by some external factor, then they tend to converge on a biased estimate. In this case, the crowd is likely to be stupid.
Today, Gabriel Madirolas and Gonzalo De Polavieja at the Cajal Institute in Madrid, Spain, say they found a way to analyse the answers from a crowd which allows them to remove this kind of bias and so settle on a wiser answer.
The theory behind their work is straightforward. Their idea is that some people are more strongly influenced by additional information than others who are confident in their own opinion. So identifying these more strongly influenced people and separating them from the independent thinkers creates two different groups. The group of independent thinkers is then more likely to give a wise estimate. Or put another way, ignore the wisdom of the crowd in favour of the wisdom of the confident.
So how to identify confident thinkers. Madirolas and De Polavieja began by studying the data from an earlier set of experiments in which groups of people were given tasks such as to estimate the length of the border between Switzerland and Italy, the correct answer being 734 kilometres.
After one task, some groups were shown the combined estimates of other groups before beginning their second task. These experiments clearly showed how this information biased the answers from these groups in their second tasks.
Madirolas and De Polavieja then set about creating a mathematical model of how individuals incorporate this extra information. They assume that each person comes to a final estimate based on two pieces of information: first, their own independent estimate of the length of the border and second, the earlier combined estimate revealed to the group. Each individual decides on a final estimate depending on the weighting they give to each piece of information.
Those people who are heavily biased give a strong weighting to the additional information whereas people who are confident in their own estimate give a small or zero weighting to the additional information.
Madirolas and De Polavieja then take each person’s behaviour and fit it to this model to reveal how independent their thinking has been.
That allows them to divide the groups into independent thinkers and biased thinkers. Taking the collective opinion of the independent thinkers then gives a much more accurate estimate of the length of the border.
“Our results show that, while a simple operation like the mean, median or geometric mean of a group may not allow groups to make good estimations, a more complex operation taking into account individuality in the social dynamics can lead to a better collective intelligence,” they say.

Ref: arxiv.org/abs/1406.7578 : Wisdom of the Confident: Using Social Interactions to Eliminate the Bias in Wisdom of the Crowds”

Incentivizing Peer Review


in Wired on “The Last Obstacle for Open Access Science: The Galapagos Islands’ Charles Darwin Foundation runs on an annual operating budget of about $3.5 million. With this money, the center conducts conservation research, enacts species-saving interventions, and provides educational resources about the fragile island ecosystems. As a science-based enterprise whose work would benefit greatly from the latest research findings on ecological management, evolution, and invasive species, there’s one glaring hole in the Foundation’s budget: the $800,000 it would cost per year for subscriptions to leading academic journals.
According to Richard Price, founder and CEO of Academia.edu, this episode is symptomatic of a larger problem. “A lot of research centers” – NGOs, academic institutions in the developing world – “are just out in the cold as far as access to top journals is concerned,” says Price. “Research is being commoditized, and it’s just another aspect of the digital divide between the haves and have-nots.”
 
Academia.edu is a key player in the movement toward open access scientific publishing, with over 11 million participants who have uploaded nearly 3 million scientific papers to the site. It’s easy to understand Price’s frustration with the current model, in which academics donate their time to review articles, pay for the right to publish articles, and pay for access to articles. According to Price, journals charge an average of $4000 per article: $1500 for production costs (reformatting, designing), $1500 to orchestrate peer review (labor costs for hiring editors, administrators), and $1000 of profit.
“If there were no legacy in the scientific publishing industry, and we were looking at the best way to disseminate and view scientific results,” proposes Price, “things would look very different. Our vision is to build a complete replacement for scientific publishing,” one that would allow budget-constrained organizations like the CDF full access to information that directly impacts their work.
But getting to a sustainable new world order requires a thorough overhaul of academic publishing industry. The alternative vision – of “open science” – has two key properties: the uninhibited sharing of research findings, and a new peer review system that incorporates the best of the scientific community’s feedback. Several groups have made progress on the former, but the latter has proven particularly difficult given the current incentive structure. The currency of scientific research is the number of papers you’ve published and their citation counts – the number of times other researchers have referred to your work in their own publications. The emphasis is on creation of new knowledge – a worthy goal, to be sure – but substantial contributions to the quality, packaging, and contextualization of that knowledge in the form of peer review goes largely unrecognized. As a result, researchers view their role as reviewers as a chore, a time-consuming task required to sustain the ecosystem of research dissemination.
“Several experiments in this space have tried to incorporate online comment systems,” explains Price, “and the result is that putting a comment box online and expecting high quality comments to flood in is just unrealistic. My preference is to come up with a system where you’re just as motivated to share your feedback on a paper as you are to share your own findings.” In order to make this lofty aim a reality, reviewers’ contributions would need to be recognized. “You need something more nuanced, and more qualitative,” says Price. “For example, maybe you gather reputation points from your community online.” Translating such metrics into tangible benefits up the food chain – hirings, tenure decisions, awards – is a broader community shift that will no doubt take time.
A more iterative peer review process could allow the community to better police faulty methods by crowdsourcing their evaluation. “90% of scientific studies are not reproducible,” claims Price; a problem that is exacerbated by the strong bias toward positive results. Journals may be unlikely to publish methodological refutations, but a flurry of well-supported comments attached to a paper online could convince the researchers to marshal more convincing evidence. Typically, this sort of feedback cycle takes years….”

U.S. Secretary of Commerce Penny Pritzker Announces Expansion and Enhancement of Commerce Data Programs


Press Release from the U.S. Secretary of Commerce:Department will hire first-ever Chief Data Officer

As “America’s Data Agency,” the Department of Commerce is prepared and well-positioned to foster the next phase in the open data revolution. In line with President Obama’s Year of Action, U.S. Secretary of Commerce Penny Pritzker today announced a series of steps taken to enhance and expand the data programs at the Department.
“Data is a key pillar of the Department’s “Open for Business Agenda,” and for the first time, we have made it a department-wide strategic priority,” said Secretary of Commerce Penny Pritzker. “No other department can rival the reach, depth and breadth of the Department of Commerce’s data programs. The Department of Commerce is working to unleash more of its data to strengthen the nation’s economic growth; make its data easier to access, understand, and use; and maximize the return of data investments for businesses, entrepreneurs, government, taxpayers, and communities.”
Secretary Pritzker made a number of major announcements today as a special guest speaker at the Environmental Systems Research Institute’s (Esri) User Conference in San Diego, California. She discussed the power and potential of open data, recognizing that data not only enable start-ups and entrepreneurs, move markets, and empower companies large and small, but also touch the lives of Americans every day.
In her remarks, Secretary Pritzker outlined new ways the Department of Commerce is working to unlock the potential of even more open data to make government smarter, including the following:
Chief Data Officer
Today, Secretary Pritzker announced the Commerce Department will hire its first-ever Chief Data Officer. This leader will be responsible for developing and implementing a vision for the future of the diverse data resources at Commerce.
The new Chief Data Officer will pull together a platform for all data sets; instigate and oversee improvements in data collection and dissemination; and ensure that data programs are coordinated, comprehensive, and strategic.
The Chief Data Officer will hold the key to unlocking more government data to help support a data-enabled Department and economy.
Trade Developer Portal
The International Trade Administration has launched its “Developer Portal,” an online toolkit to put diverse sets of trade and investment data in a single place, making it easier for the business community to use and better tap into the 95 percent of American customers that live overseas.
In creating this portal, the Commerce Department is making its data public to software developers, giving them access to authoritative information on U.S. exports and international trade to help U.S. businesses export and expand their operations in overseas markets. The developer community will be able to integrate the data into applications and mashups to help U.S. business owners compete abroad while also creating more jobs here at home.
Data Advisory Council
Open data requires open dialogue. To facilitate this, the Commerce Department is creating a data advisory council, comprised of 15 private sector leaders that will advise the Department on the best use of government data.
This new advisory council will help Commerce maximize the value of its data by:

  • discovering how to deliver data in more usable, timely, and accessible ways;
  • improving how data is utilized and shared to make businesses and governments more responsive, cost-effective, and efficient;
  • better anticipating customers’ needs; and
  • collaborating with the private sector to develop new data products and services.

The council’s primary focus will be on the accessibility and usability of Commerce data, as well as the transformation of the Department’s supporting infrastructure and procedures for managing data.
These data leaders will represent a broad range of business interests—reflecting the wide range of scientific, statistical, and other data that the Department of Commerce produces. Members will serve two-year terms and will meet about four times a year. The advisory council will be housed within the Economics and Statistics Administration.”
Commerce data inform decisions that help make government smarter, keep businesses more competitive and better inform citizens about their own communities – with the potential to guide up to $3.3 trillion in investments in the United States each year.

Do We Choose Our Friends Because They Share Our Genes?


Rob Stein at NPR: “People often talk about how their friends feel like family. Well, there’s some new research out that suggests there’s more to that than just a feeling. People appear to be more like their friends genetically than they are to strangers, the research found.
“The striking thing here is that friends are actually significantly more similar to one another than we were expecting,” says  James Fowler, a professor of medical genetics at the University of California, San Diego, who conducted the study with Nicholas A. Christakis, a social scientist at Yale University.
In fact, the study in Monday’s issue of the Proceedings of the National Academy of Sciences found that friends are as genetically similar as fourth cousins.
“It’s as if they shared a great- great- great-grandparent in common,” Fowler told Shots.
Some of the genes that friends were most likely to have in common involve smell. “We tend to smell things the same way that our friends do,” Fowler says. The study involved nearly 2,000 adults.
This suggests that as humans evolved, the ability to tolerate and be drawn to certain smells may have influenced where people hung out. Today we might call this the Starbucks effect.
“You may really love the smell of coffee. And you’re drawn to a place where other people have been drawn to who also love the smell of coffee,” Fowler says. “And so that might be the opportunity space for you to make friends. You’re all there together because you love coffee and you make friends because you all love coffee.”…”