Using Crowdsourcing In Government


Daren C. Brabham for IBM Center for The Business of Government: “The growing interest in “engaging the crowd” to identify or develop innovative solutions to public problems has been inspired by similar efforts in the commercial world.  There, crowdsourcing has been successfully used to design innovative consumer products or solve complex scientific problems, ranging from custom-designed T-shirts to mapping genetic DNA strands.
The Obama administration, as well as many state and local governments, have been adapting these crowdsourcing techniques with some success.  This report provides a strategic view of crowdsourcing and identifies four specific types:

  • Type 1:  Knowledge Discovery and Management. Collecting knowledge reported by an on-line community, such as the reporting of earth tremors or potholes to a central source.
  • Type 2:  Distributed Human Intelligence Tasking. Distributing “micro-tasks” that require human intelligence to solve, such as transcribing handwritten historical documents into electronic files.
  • Type 3:  Broadcast Search. Broadcasting a problem-solving challenge widely on the internet and providing an award for solution, such as NASA’s prize for an algorithm to predict solar flares
  • Type 4:  Peer-Vetted Creative Production. Creating peer-vetted solutions, where an on-line community both proposes possible solutions and is empowered to collectively choose among the solutions.

By understanding the different types, which require different approaches, public managers will have a better chance of success.  Dr. Brabham focuses on the strategic design process rather than on the specific technical tools that can be used for crowdsourcing.  He sets forth ten emerging best practices for implementing a crowdsourcing initiative.”

Five myths about big data


Samuel Arbesman, senior scholar at the Ewing Marion Kauffman Foundation and the author of “The Half-Life of Facts” in the Washington Post: “Big data holds the promise of harnessing huge amounts of information to help us better understand the world. But when talking about big data, there’s a tendency to fall into hyperbole. It is what compels contrarians to write such tweets as “Big Data, n.: the belief that any sufficiently large pile of s— contains a pony.” Let’s deflate the hype.
1. “Big data” has a clear definition.
The term “big data” has been in circulation since at least the 1990s, when it is believed to have originated in Silicon Valley. IBM offers a seemingly simple definition: Big data is characterized by the four V’s of volume, variety, velocity and veracity. But the term is thrown around so often, in so many contexts — science, marketing, politics, sports — that its meaning has become vague and ambiguous….
2. Big data is new.
By many accounts, big data exploded onto the scene quite recently. “If wonks were fashionistas, big data would be this season’s hot new color,” a Reuters report quipped last year. In a May 2011 report, the McKinsey Global Institute declared big data “the next frontier for innovation, competition, and productivity.”
It’s true that today we can mine massive amounts of data — textual, social, scientific and otherwise — using complex algorithms and computer power. But big data has been around for a long time. It’s just that exhaustive datasets were more exhausting to compile and study in the days when “computer” meant a person who performed calculations….
3. Big data is revolutionary.
In their new book, “Big Data: A Revolution That Will Transform How We Live, Work, and Think,”Viktor Mayer-Schonberger and Kenneth Cukier compare “the current data deluge” to the transformation brought about by the Gutenberg printing press.
If you want more precise advertising directed toward you, then yes, big data is revolutionary. Generally, though, it’s likely to have a modest and gradual impact on our lives….
4. Bigger data is better.
In science, some admittedly mind-blowing big-data analyses are being done. In business, companies are being told to “embrace big data before your competitors do.” But big data is not automatically better.
Really big datasets can be a mess. Unless researchers and analysts can reduce the number of variables and make the data more manageable, they get quantity without a whole lot of quality. Give me some quality medium data over bad big data any day…
5. Big data means the end of scientific theories.
Chris Anderson argued in a 2008 Wired essay that big data renders the scientific method obsolete: Throw enough data at an advanced machine-learning technique, and all the correlations and relationships will simply jump out. We’ll understand everything.
But you can’t just go fishing for correlations and hope they will explain the world. If you’re not careful, you’ll end up with spurious correlations. Even more important, to contend with the “why” of things, we still need ideas, hypotheses and theories. If you don’t have good questions, your results can be silly and meaningless.
Having more data won’t substitute for thinking hard, recognizing anomalies and exploring deep truths.”

OpenCounter


Code for America: “OpenCounter’s mission is to empower entrepreneurs and foster local economic development by simplifying the process of registering a business.
Economic development happens in many forms, from projects like the revitalization of the Brooklyn Navy Yard or Hudson Rail Yards in New York City, to campaigns to encourage residents to shop at local merchants. While the majority of headlines will focus on a City’s effort to secure a major new employer (think Apple’s 1,000,000 square foot expansion in Austin, Texas), most economic development and job creation happens on a much smaller scale, as individuals stake their financial futures on creating a new product, store, service or firm.
But these new businesses aren’t in a position to accept tax breaks on capital equipment or enter into complex development and disposition agreements to build new offices or stores. Many new businesses can’t even meet the underwriting criteria of  SBA backed revolving-loan programs. Competition for local grants for facade improvements or signage assistance can be fierce….
Despite many cities’ genuine efforts to be “business-friendly,” their default user interface consists of florescent-lit formica, waiting lines, and stacks of forms. Online resources often remind one of a phone book, with little interactivity or specialization based on either the businesses’ function or location within a jurisdiction.
That’s why we built OpenCounter….See what we’re up to at opencounter.us or visit a live version of our software at http://opencounter.cityofsantacruz.com.”

Defense Against National Vulnerabilities in Public Data


DOD/DARPA Notice (See also Foreign Policy article): “OBJECTIVE: Investigate the national security threat posed by public data available either for purchase or through open sources. Based on principles of data science, develop tools to characterize and assess the nature, persistence, and quality of the data. Develop tools for the rapid anonymization and de-anonymization of data sources. Develop framework and tools to measure the national security impact of public data and to defend against the malicious use of public data against national interests.
DESCRIPTION: The vulnerabilities to individuals from a data compromise are well known and documented now as “identity theft.” These include regular stories published in the news and research journals documenting the loss of personally identifiable information by corporations and governments around the world. Current trends in social media and commerce, with voluntary disclosure of personal information, create other potential vulnerabilities for individuals participating heavily in the digital world. The Netflix Challenge in 2009 was launched with the goal of creating better customer pick prediction algorithms for the movie service [1]. An unintended consequence of the Netflix Challenge was the discovery that it was possible to de-anonymize the entire contest data set with very little additional data. This de-anonymization led to a federal lawsuit and the cancellation of the sequel challenge [2]. The purpose of this topic is to understand the national level vulnerabilities that may be exploited through the use of public data available in the open or for purchase.
Could a modestly funded group deliver nation-state type effects using only public data?…”
The official link for this solicitation is: www.acq.osd.mil/osbp/sbir/solicitations/sbir20133.
 

Data is Inert — It’s What You Do With It That Counts


Kevin Merritt, CEO and Founder, Socrata, in NextGov: “In its infancy, the open data movement was mostly about offering catalogs of government data online that concerned citizens and civic activists could download. But now, a wide variety of external stakeholders are using open data to deliver new applications and services. At the same time, governments themselves are harnessing open data to drive better decision-making.
In a relatively short period of time, open data has evolved from serving as fodder for data publishing to fuel for open innovation.
One of the keys to making this transformation truly work, however, is our ability to re-instrument or re-tool underlying business systems and processes so managers can receive open data in consumable forms on a regular, continuous basis in real-time….”

I Flirt and Tweet. Follow Me at #Socialbot.


in The New York Times: “FROM the earliest days of the Internet, robotic programs, or bots, have been trying to pass themselves off as human. Chatbots greet users when they enter an online chat room, for example, or kick them out when they get obnoxious….

Now come socialbots. These automated charlatans are programmed to tweet and retweet. They have quirks, life histories and the gift of gab. Many of them have built-in databases of current events, so they can piece together phrases that seem relevant to their target audience. They have sleep-wake cycles so their fakery is more convincing, making them less prone to repetitive patterns that flag them as mere programs. Some have even been souped up by so-called persona management software, which makes them seem more real by adding matching Facebook, Reddit or Foursquare accounts, giving them an online footprint over time as they amass friends and like-minded followers.

Researchers say this new breed of bots is being designed not just with greater sophistication but also with grander goals: to sway elections, to influence the stock market, to attack governments, even to flirt with people and one another.

…Socialbots are tapping into an ever-expanding universe of social media. Last year, the number of Twitter accounts topped 500 million. Some researchers estimate that only 35 percent of the average Twitter user’s followers are real people. In fact, more than half of Internet traffic already comes from nonhuman sources like bots or other types of algorithms. Within two years, about 10 percent of the activity occurring on social online networks will be masquerading bots, according to technology researchers….

Much of the social media remains unregulated by campaign finance and transparency laws. So far, the Federal Election Commission has been reluctant to venture into this realm.

But the bots are likely to venture into ours, said Tim Hwang, chief scientist at the Pacific Social Architecting Corporation, which creates bots and technologies that can shape social behavior. “Our vision is that in the near future automatons will eventually be able to rally crowds, open up bank accounts, write letters,” he said, “all through human surrogates.”

The Shame Game: U.S. Department of Labor Smartphone App Will Allow Public to Effortlessly Scrutinize Business Employment Practices


Charles B. Palmer in National Law Review: “The United States Department of Labor (DOL) recently launched a contest to find a new smartphone app that will allow the general public to effortlessly search for and scrutinize businesses and employers that have faced DOL citations. Dubbed the DOL Fair Labor Data Challenge, the contest seeks app entries that integrate information from consumer ratings websites, location tracking services, DOL Wage & Hour Division (WHD) citation data, and Occupational Safety & Health Administration (OSHA) citation data, into one software platform. In addition, the contest also encourages app developers to include other features in their respective app entries, such as information from state health boards and various licensing agencies.
The DOL Fair Labor Data Challenge is part of the DOL’s plan to amplify its enforcement efforts through increased public awareness and ease of access to citation data. Consumers and job applicants will soon be able to search for and publicly shame employers that hold one or more citations in the DOL database, all by just using their smartphones.”

For OpenBlock, Big Improvements From Small Newsrooms


at Ideas Lab: “A little more than five months after NBC News shut down its hyperlocal product, EveryBlock.com, the original open-source application has been resurrected in Columbia, Mo. But although both products were born of the same Django codebase and Knight Foundation funding, visitors to The Columbia Daily Tribune’s new Neighborhoods site will see a different emphasis and a new hope for a project that has slowed under the weight of government disability and technical complexity….The user interface is clean and smart, and the government data — which is the most difficult of any kind of data to mine — appears to be more current and complete than any OpenBlock installation has seen since the very early days before its code was made public.
That kind of commitment is needed for OpenBlock to succeed, because pulling digital records out of all but the very most efficient and transparent government agencies is a tremendous drag on the expense side of the news business. That difficulty, though, can also create an opportunity for outsized revenue.
Chris Gubbels, the Web developer who’s been overseeing the project for The Tribune, said that unlike many jurisdictions, Columbia’s police and fire data were “pretty simple” to pull into OpenBlock. The police even provided The Tribune with an RSS feed of geocoded 911 response calls.”

New Book: Untangling the Web


By Aleks Krotoski: “The World Wide Web is the most revolutionary innovation of our time. In the last decade, it has utterly transformed our lives. But what real effects is it having on our social world? What does it mean to be a modern family when dinner table conversations take place over smartphones? What happens to privacy when we readily share our personal lives with friends and corporations? Are our Facebook updates and Twitterings inspiring revolution or are they just a symptom of our global narcissism? What counts as celebrity, when everyone can have a following or be a paparazzo? And what happens to relationships when love, sex and hate can be mediated by a computer? Social psychologist Aleks Krotoski has spent a decade untangling the effects of the Web on how we work, live and play. In this groundbreaking book, she uncovers how much humanity has – and hasn’t – changed because of our increasingly co-dependent relationship with the computer. In Untangling the Web, she tells the story of how the network became woven in our lives, and what it means to be alive in the age of the Internet.” Blog: http://untanglingtheweb.tumblr.com/
 
 

Orwell is drowning in data: the volume problem


Dom Shaw in OpenDemocracy: “During World War II, whilst Bletchley Park laboured in the front line of code breaking, the British Government was employing vast numbers of female operatives to monitor and report on telephone, mail and telegraph communications in and out of the country.
The biggest problem, of course, was volume. Without even the most primitive algorithm to detect key phrases that later were to cause such paranoia amongst the sixties and seventies counterculture, causing a whole generation of drug users to use a wholly unnecessary set of telephone synonyms for their desired substance, the army of women stationed in exchanges around the country was driven to report everything and then pass it on up to those whose job it was to analyse such content for significance.
Orwell’s vision of Big Brother’s omniscience was based upon the same model – vast armies of Winston Smiths monitoring data to ensure discipline and control. He saw a culture of betrayal where every citizen was held accountable for their fellow citizens’ political and moral conformity.
Up until the US Government’s Big Data Research and Development Initiative [12] and the NSA development of the Prism programme [13], the fault lines always lay in the technology used to collate or collect and the inefficiency or competing interests of the corporate systems and processes that interpreted the information. Not for the first time, the bureaucracy was the citizen’s best bulwark against intrusion.
Now that the algorithms have become more complex and the technology tilted towards passive surveillance through automation, the volume problem becomes less of an obstacle….
The technology for obtaining this information, and indeed the administration of it, is handled by corporations. The Government, driven by the creed that suggests private companies are better administrators than civil servants, has auctioned off the job to a dozen or more favoured corporate giants who are, as always, beholden not only to their shareholders, but to their patrons within the government itself….
The only problem the state had was managing the scale of the information gleaned from so many people in so many forms. Not any more. The volume problem has been overcome.”