Civilized Discourse Construction Kit


Jeff Atwood at “Coding Horror“: “Forum software? Maybe. Let’s see, it’s 2013, has forum software advanced at all in the last ten years? I’m thinking no.
Forums are the dark matter of the web, the B-movies of the Internet. But they matter. To this day I regularly get excellent search results on forum pages for stuff I’m interested in. Rarely a day goes by that I don’t end up on some forum, somewhere, looking for some obscure bit of information. And more often than not, I find it there….

At Stack Exchange, one of the tricky things we learned about Q&A is that if your goal is to have an excellent signal to noise ratio, you must suppress discussion. Stack Exchange only supports the absolute minimum amount of discussion necessary to produce great questions and great answers. That’s why answers get constantly re-ordered by votes, that’s why comments have limited formatting and length and only a few display, and so forth….

Today we announce the launch of Discourse, a next-generation, 100% open source discussion platform built for the next decade of the Internet.

Discourse-logo-big

The goal of the company we formed, Civilized Discourse Construction Kit, Inc., is exactly that – to raise the standard of civilized discourse on the Internet through seeding it with better discussion software:

  • 100% open source and free to the world, now and forever.
  • Feels great to use. It’s fun.
  • Designed for hi-resolution tablets and advanced web browsers.
  • Built in moderation and governance systems that let discussion communities protect themselves from trolls, spammers, and bad actors – even without official moderators.”

New NAS Report: Copyright in the Digital Era: Building Evidence for Policy


0309278953National Academies of Sciences: “Over the course of several decades, copyright protection has been expanded and extended through legislative changes occasioned by national and international developments. The content and technology industries affected by copyright and its exceptions, and in some cases balancing the two, have become increasingly important as sources of economic growth, relatively high-paying jobs, and exports. Since the expansion of digital technology in the mid-1990s, they have undergone a technological revolution that has disrupted long-established modes of creating, distributing, and using works ranging from literature and news to film and music to scientific publications and computer software.

In the United States and internationally, these disruptive changes have given rise to a strident debate over copyright’s proper scope and terms and means of its enforcement–a debate between those who believe the digital revolution is progressively undermining the copyright protection essential to encourage the funding, creation, and distribution of new works and those who believe that enhancements to copyright are inhibiting technological innovation and free expression.

Copyright in the Digital Era: Building Evidence for Policy examines a range of questions regarding copyright policy by using a variety of methods, such as case studies, international and sectoral comparisons, and experiments and surveys. This report is especially critical in light of digital age developments that may, for example, change the incentive calculus for various actors in the copyright system, impact the costs of voluntary copyright transactions, pose new enforcement challenges, and change the optimal balance between copyright protection and exceptions.”

Is Privacy Algorithmically Impossible?


MIT Technology Reviewwhat.is_.personal.data2x519: “In 1995, the European Union introduced privacy legislation that defined “personal data” as any information that could identify a person, directly or indirectly. The legislators were apparently thinking of things like documents with an identification number, and they wanted them protected just as if they carried your name.
Today, that definition encompasses far more information than those European legislators could ever have imagined—easily more than all the bits and bytes in the entire world when they wrote their law 18 years ago.
Here’s what happened. First, the amount of data created each year has grown exponentially (see figure)…
Much of this data is invisible to people and seems impersonal. But it’s not. What modern data science is finding is that nearly any type of data can be used, much like a fingerprint, to identify the person who created it: your choice of movies on Netflix, the location signals emitted by your cell phone, even your pattern of walking as recorded by a surveillance camera. In effect, the more data there is, the less any of it can be said to be private. We are coming to the point that if the commercial incentives to mine the data are in place, anonymity of any kind may be “algorithmically impossible,” says Princeton University computer scientist Arvind Narayanan.”

6 Things You May Not Know About Open Data


GovTech: “On Friday, May 3, Palo Alto, Calif., CIO Jonathan Reichental …said that when it comes to making data more open, “The invisible becomes visible,” and he outlined six major points that identify and define what open data really is:

1.  It’s the liberation of peoples’ data

The public sector collects data that pertains to government, such as employee salaries, trees or street information, and government entities are therefore responsible for liberating that data so the constituent can view it in an accessible format. Though this practice has become more commonplace in recent years, Reichental said government should have been doing this all along.

2.  Data has to be consumable by a machine

Piecing data together from a spreadsheet to a website or containing it in a PDF isn’t the easiest way to retrieve data. To make data more open, in needs to be in a readable format so users don’t have to go through additional trouble of finding or reading it.

3.  Data has a derivative value

When data is made available to the public, people like app developers, arichitects or others are able to analyze the data. In some cases, data can be used in city planning to understand what’s happening at the city scale.

4.  It eliminates the middleman

For many states, public records laws require them to provide data when a public records request is made. But oftentimes, complying with such request regulations involves long and cumbersome processes. Lawyers and other government officials must process paperwork, and it can take weeks to complete a request. By having data readily available, these processes can be eliminated, thus also eliminating the middleman responsible for processing the requests. Direct access to the data saves time and resources.

5.  Data creates deeper accountability

Since government is expected to provide accessible data, it is therefore being watched, making it more accountable for its actions — everything from emails, salaries and city council minutes can be viewed by the public.

6.  Open Data builds trust

When the community can see what’s going on in its government through the access of data, Reichtental said individuals begin to build more trust in their government and feel less like the government is hiding information.”

Linking open data to augmented intelligence and the economy


Open Data Institute and Professor Nigel Shadbolt (@Nigel_Shadbolt) interviewed by by (@digiphile):  “…there are some clear learnings. One that I’ve been banging on about recently has been that yes, it really does matter to turn the dial so that governments have a presumption to publish non-personal public data. If you would publish it anyway, under a Freedom of Information request or whatever your local legislative equivalent is, why aren’t you publishing it anyway as open data? That, as a behavioral change. is a big one for many administrations where either the existing workflow or culture is, “Okay, we collect it. We sit on it. We do some analysis on it, and we might give it away piecemeal if people ask for it.” We should construct publication process from the outset to presume to publish openly. That’s still something that we are two or three years away from, working hard with the public sector to work out how to do and how to do properly.
We’ve also learned that in many jurisdictions, the amount of [open data] expertise within administrations and within departments is slight. There just isn’t really the skillset, in many cases. for people to know what it is to publish using technology platforms. So there’s a capability-building piece, too.
One of the most important things is it’s not enough to just put lots and lots of datasets out there. It would be great if the “presumption to publish” meant they were all out there anyway — but when you haven’t got any datasets out there and you’re thinking about where to start, the tough question is to say, “How can I publish data that matters to people?”
The data that matters is revealed in the fact that if we look at the download stats on these various UK, US and other [open data] sites. There’s a very, very distinctive parallel curve. Some datasets are very, very heavily utilized. You suspect they have high utility to many, many people. Many of the others, if they can be found at all, aren’t being used particularly much. That’s not to say that, under that long tail, there isn’t large amounts of use. A particularly arcane open dataset may have exquisite use to a small number of people.
The real truth is that it’s easy to republish your national statistics. It’s much harder to do a serious job on publishing your spending data in detail, publishing police and crime data, publishing educational data, publishing actual overall health performance indicators. These are tough datasets to release. As people are fond of saying, it holds politicians’ feet to the fire. It’s easy to build a site that’s full of stuff — but does the stuff actually matter? And does it have any economic utility?”
there are some clear learnings. One that I’ve been banging on about recently has been that yes, it really does matter to turn the dial so that governments have a presumption to publish non-personal public data. If you would publish it anyway, under a Freedom of Information request or whatever your local legislative equivalent is, why aren’t you publishing it anyway as open data? That, as a behavioral change. is a big one for many administrations where either the existing workflow or culture is, “Okay, we collect it. We sit on it. We do some analysis on it, and we might give it away piecemeal if people ask for it.” We should construct publication process from the outset to presume to publish openly. That’s still something that we are two or three years away from, working hard with the public sector to work out how to do and how to do properly.
We’ve also learned that in many jurisdictions, the amount of [open data] expertise within administrations and within departments is slight. There just isn’t really the skillset, in many cases. for people to know what it is to publish using technology platforms. So there’s a capability-building piece, too.
One of the most important things is it’s not enough to just put lots and lots of datasets out there. It would be great if the “presumption to publish” meant they were all out there anyway — but when you haven’t got any datasets out there and you’re thinking about where to start, the tough question is to say, “How can I publish data that matters to people?”
The data that matters is revealed in the fact that if we look at the download stats on these various UK, US and other [open data] sites. There’s a very, very distinctive parallel curve. Some datasets are very, very heavily utilized. You suspect they have high utility to many, many people. Many of the others, if they can be found at all, aren’t being used particularly much. That’s not to say that, under that long tail, there isn’t large amounts of use. A particularly arcane open dataset may have exquisite use to a small number of people.
The real truth is that it’s easy to republish your national statistics. It’s much harder to do a serious job on publishing your spending data in detail, publishing police and crime data, publishing educational data, publishing actual overall health performance indicators. These are tough datasets to release. As people are fond of saying, it holds politicians’ feet to the fire. It’s easy to build a site that’s full of stuff — but does the stuff actually matter? And does it have any economic utility?

The Big Data Debate: Correlation vs. Causation


Gil Press: “In the first quarter of 2013, the stock of big data has experienced sudden declines followed by sporadic bouts of enthusiasm. The volatility—a new big data “V”—continues and Ted Cuzzillo summed up the recent negative sentiment in “Big data, big hype, big danger” on SmartDataCollective:
“A remarkable thing happened in Big Data last week. One of Big Data’s best friends poked fun at one of its cornerstones: the Three V’s. The well-networked and alert observer Shawn Rogers, vice president of research at Enterprise Management Associates, tweeted his eight V’s: ‘…Vast, Volumes of Vigorously, Verified, Vexingly Variable Verbose yet Valuable Visualized high Velocity Data.’ He was quick to explain to me that this is no comment on Gartner analyst Doug Laney’s three-V definition. Shawn’s just tired of people getting stuck on V’s.”…
Cuzzillo is joined by a growing chorus of critics that challenge some of the breathless pronouncements of big data enthusiasts. Specifically, it looks like the backlash theme-of-the-month is correlation vs. causation, possibly in reaction to the success of Viktor Mayer-Schönberger and Kenneth Cukier’s recent big data book in which they argued for dispensing “with a reliance on causation in favor of correlation”…
In “Steamrolled by Big Data,” The New Yorker’s Gary Marcus declares that “Big Data isn’t nearly the boundless miracle that many people seem to think it is.”…
Matti Keltanen at The Guardian agrees, explaining “Why ‘lean data’ beats big data.” Writes Keltanen: “…the lightest, simplest way to achieve your data analysis goals is the best one…The dirty secret of big data is that no algorithm can tell you what’s significant, or what it means. Data then becomes another problem for you to solve. A lean data approach suggests starting with questions relevant to your business and finding ways to answer them through data, rather than sifting through countless data sets. Furthermore, purely algorithmic extraction of rules from data is prone to creating spurious connections, such as false correlations… today’s big data hype seems more concerned with indiscriminate hoarding than helping businesses make the right decisions.”
In “Data Skepticism,” O’Reilly Radar’s Mike Loukides adds this gem to the discussion: “The idea that there are limitations to data, even very big data, doesn’t contradict Google’s mantra that more data is better than smarter algorithms; it does mean that even when you have unlimited data, you have to be very careful about the conclusions you draw from that data. It is in conflict with the all-too-common idea that, if you have lots and lots of data, correlation is as good as causation.”
Isn’t more-data-is-better the same as correlation-is-as-good-as-causation? Or, in the words of Chris Andersen, “with enough data, the numbers speak for themselves.”
“Can numbers actually speak for themselves?” non-believer Kate Crawford asks in “The Hidden Biases in Big Data” on the Harvard Business Review blog and answers: “Sadly, they can’t. Data and data sets are not objective; they are creations of human design…
And David Brooks in The New York Times, while probing the limits of “the big data revolution,” takes the discussion to yet another level: “One limit is that correlations are actually not all that clear. A zillion things can correlate with each other, depending on how you structure the data and what you compare. To discern meaningful correlations from meaningless ones, you often have to rely on some causal hypothesis about what is leading to what. You wind up back in the land of human theorizing…”

The Next Great Internet Disruption: Authority and Governance


An essay by David Bollier and John Clippinger as part of their ongoing work of ID3, the Institute for Data-Driven Design :As the Internet and digital technologies have proliferated over the past twenty years, incumbent enterprises nearly always resist open network dynamics with fierce determination, a narrow ingenuity and resistance….But the inevitable rearguard actions to defend old forms are invariably overwhelmed by the new, network-based ones.  The old business models, organizational structures, professional sinecures, cultural norms, etc., ultimately yield to open platforms.
When we look back on the past twenty years of Internet history, we can more fully appreciate the prescience of David P. Reed’s seminal 1999 paper on “Group Forming Networks” (GFNs). “Reed’s Law” posits that value in networks increases exponentially as interactions move from a broadcasting model that offers “best content” (in which value is described by n, the number of consumers) to a network of peer-to-peer transactions (where the network’s value is based on “most members” and mathematically described by n2).  But by far the most valuable networks are based on those that facilitate group affiliations, Reed concluded.  When users have tools for “free and responsible association for common purposes,” he found, the value of the network soars exponentially to 2– a fantastically large number.   This is the Group Forming Network.  Reed predicted that “the dominant value in a typical network tends to shift from one category to another as the scale of the network increases.…”
What is really interesting about Reed’s analysis is that today’s world of GFNs, as embodied by Facebook, Twitter, Wikipedia and other Web 2.0 technologies, remains highly rudimentary.  It is based on proprietary platforms (as opposed to open source, user-controlled platforms), and therefore provides only limited tools for members of groups to develop trust and confidence in each other.  This suggests a huge, unmet opportunity to actualize greater value from open networks.  Citing Francis Fukuyama’ book Trust, Reed points out that “there is a strong correlation between the prosperity of national economies and social capital, which [Fukuyama] defines culturally as the ease with which people in a particular culture can form new associations.”

An API for "We the People"


WeThePeopleThe White House Blog: “We can’t talk about We the People without getting into the numbers — more than 8 million users, more than 200,000 petitions, more than 13 million signatures. The sheer volume of participation is, to us, a sign of success.
And there’s a lot we can learn from a set of data that rich and complex, but we shouldn’t be the only people drawing from its lessons.
So starting today, we’re making it easier for anyone to do their own analysis or build their own apps on top of the We the People platform. We’re introducing the first version of our API, and we’re inviting you to use it.
Get started here: petitions.whitehouse.gov/developers
This API provides read-only access to data on all petitions that passed the 150 signature threshold required to become publicly-available on the We the People site. For those who don’t need real-time data, we plan to add the option of a bulk data download in the near future. Until that’s ready, an incomplete sample data set is available for download here.”

Frameworks for a Location–Enabled Society


Annual CGA Conference “Location-enabled devices are weaving “smart grids” and building “smart cities;” they allow people to discover a friend in a shopping mall, catch a bus at its next stop, check surrounding air quality while walking down a street, or avoid a rain storm on a tourist route – now or in the near future. And increasingly they allow those who provide services to track, whether we are walking past stores on the street or seeking help in a natural disaster.
The Centre for Spatial Law and Policy based in Washington, DC, the Center for Geographic Analysis, the Belfer Center for Science and International Affairs and the Berkman Center for Internet and Society at Harvard University are co-hosting a two-day program examining the legal and policy issues that will impact geospatial technologies and the development of location-enabled societies. The event will take place at Harvard University on May 2-3, 2013…The goal is to explore the different dimensions of policy and legal concerns in geospatial technology applications, and to begin in creating a policy and legal framework for a location-enabled society. Download the conference program brochure.
Live Webcast:

Stream videos at Ustream

Cities and Data


20130427_USC502The Economist: “Many cities around the country find themselves in a similar position: they are accumulating data faster than they know what to do with. One approach is to give them to the public. For example, San Francisco, New York, Philadelphia, Boston and Chicago are or soon will be sharing the grades that health inspectors give to restaurants with an online restaurant directory.
Another way of doing it is simply to publish the raw data and hope that others will figure out how to use them. This has been particularly successful in Chicago, where computer nerds have used open data to create many entirely new services. Applications are now available that show which streets have been cleared after a snowfall, what time a bus or train will arrive and how requests to fix potholes are progressing.
New York and Chicago are bringing together data from departments across their respective cities in order to improve decision-making. When a city holds a parade it can combine data on street closures, bus routes, weather patterns, rubbish trucks and emergency calls in real time.”