Weather Channel Now Also Forecasts What You'll Buy


Katie Rosman int the Wall Street Journal: “The Weather Channel knows the chance for rain in St. Louis on Friday, what the heat index could reach in Santa Fe on Saturday and how humid Baltimore may get on Sunday.
It also knows when you’re most likely to buy bug spray.
The enterprise is transforming from a cable network viewers flip to during hurricane season into an operation that forecasts consumer behavior by analyzing when, where and how often people check the weather. Last fall the Weather Channel Cos. renamed itself the Weather Co. to reflect the growth of its digital-data business.

The Atlanta-based company has amassed more than 75 years’ worth of information: temperatures, dew points, cloud-cover percentages and much more, across North America and elsewhere.
The company supplies information for many major smartphone weather apps and has invested in data-crunching algorithms. It uses this analysis to appeal to advertisers who want to fine-tune their pitches to consumers….
Weather Co. researchers are now diving into weather-sentiment analysis—how local weather makes people feel, and then act—in different regions of the country. To cull this data, Mr. Walsh’s weather-analytics team directly polls visitors to the Weather.com website, asking them about their moods and purchases on specifics days.
In a series of polls conducted between June 3 and Nov. 4 last year, residents of the Northeast region responded to the question, “Yesterday, what was your mood for most of the day?”

Index: The Data Universe


The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on the data universe and was originally published in 2013.

  • How much data exists in the digital universe as of 2012: 2.7 zetabytes*
  • Increase in the quantity of Internet data from 2005 to 2012: +1,696%
  • Percent of the world’s data created in the last two years: 90
  • Number of exabytes (=1 billion gigabytes) created every day in 2012: 2.5; that number doubles every month
  • Percent of the digital universe in 2005 created by the U.S. and western Europe vs. emerging markets: 48 vs. 20
  • Percent of the digital universe in 2012 created by emerging markets: 36
  • Percent of the digital universe in 2020 predicted to be created by China alone: 21
  • How much information in the digital universe is created and consumed by consumers (video, social media, photos, etc.) in 2012: 68%
  • Percent of which enterprises have liability or responsibility for (copyright, privacy, compliance with regulations, etc.): 80
  • Amount included in the Obama Administration’s 2-12 Big Data initiative: over $200 million
  • Amount the Department of Defense is investing annually on Big Data projects as of 2012: over $250 million
  • Data created per day in 2012: 2.5 quintillion bytes
  • How many terabytes* of data collected by the U.S. Library of Congress as of April 2011: 235
  • How many terabytes of data collected by Walmart per hour as of 2012: 2,560, or 2.5 petabytes*
  • Projected growth in global data generated per year, as of 2011: 40%
  • Number of IT jobs created globally by 2015 to support big data: 4.4 million (1.9 million in the U.S.)
  • Potential shortage of data scientists in the U.S. alone predicted for 2018: 140,000-190,000, in addition to 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions
  • Time needed to sequence the complete human genome (analyzing 3 billion base pairs) in 2003: ten years
  • Time needed in 2013: one week
  • The world’s annual effective capacity to exchange information through telecommunication networks in 1986, 2007, and (predicted) 2013: 281 petabytes, 65 exabytes, 667 exabytes
  • Projected amount of digital information created annually that will either live in or pass through the cloud: 1/3
  • Increase in data collection volume year-over-year in 2012: 400%
  • Increase in number of individual data collectors from 2011 to 2012: nearly double (over 300 data collection parties in 2012)

*1 zetabyte = 1 billion terabytes | 1 petabyte = 1,000 terabytes | 1 terabyte = 1,000 gigabytes | 1 gigabyte = 1 billion bytes

Sources

Using Crowdsourcing In Government


Daren C. Brabham for IBM Center for The Business of Government: “The growing interest in “engaging the crowd” to identify or develop innovative solutions to public problems has been inspired by similar efforts in the commercial world.  There, crowdsourcing has been successfully used to design innovative consumer products or solve complex scientific problems, ranging from custom-designed T-shirts to mapping genetic DNA strands.
The Obama administration, as well as many state and local governments, have been adapting these crowdsourcing techniques with some success.  This report provides a strategic view of crowdsourcing and identifies four specific types:

  • Type 1:  Knowledge Discovery and Management. Collecting knowledge reported by an on-line community, such as the reporting of earth tremors or potholes to a central source.
  • Type 2:  Distributed Human Intelligence Tasking. Distributing “micro-tasks” that require human intelligence to solve, such as transcribing handwritten historical documents into electronic files.
  • Type 3:  Broadcast Search. Broadcasting a problem-solving challenge widely on the internet and providing an award for solution, such as NASA’s prize for an algorithm to predict solar flares
  • Type 4:  Peer-Vetted Creative Production. Creating peer-vetted solutions, where an on-line community both proposes possible solutions and is empowered to collectively choose among the solutions.

By understanding the different types, which require different approaches, public managers will have a better chance of success.  Dr. Brabham focuses on the strategic design process rather than on the specific technical tools that can be used for crowdsourcing.  He sets forth ten emerging best practices for implementing a crowdsourcing initiative.”

Five myths about big data


Samuel Arbesman, senior scholar at the Ewing Marion Kauffman Foundation and the author of “The Half-Life of Facts” in the Washington Post: “Big data holds the promise of harnessing huge amounts of information to help us better understand the world. But when talking about big data, there’s a tendency to fall into hyperbole. It is what compels contrarians to write such tweets as “Big Data, n.: the belief that any sufficiently large pile of s— contains a pony.” Let’s deflate the hype.
1. “Big data” has a clear definition.
The term “big data” has been in circulation since at least the 1990s, when it is believed to have originated in Silicon Valley. IBM offers a seemingly simple definition: Big data is characterized by the four V’s of volume, variety, velocity and veracity. But the term is thrown around so often, in so many contexts — science, marketing, politics, sports — that its meaning has become vague and ambiguous….
2. Big data is new.
By many accounts, big data exploded onto the scene quite recently. “If wonks were fashionistas, big data would be this season’s hot new color,” a Reuters report quipped last year. In a May 2011 report, the McKinsey Global Institute declared big data “the next frontier for innovation, competition, and productivity.”
It’s true that today we can mine massive amounts of data — textual, social, scientific and otherwise — using complex algorithms and computer power. But big data has been around for a long time. It’s just that exhaustive datasets were more exhausting to compile and study in the days when “computer” meant a person who performed calculations….
3. Big data is revolutionary.
In their new book, “Big Data: A Revolution That Will Transform How We Live, Work, and Think,”Viktor Mayer-Schonberger and Kenneth Cukier compare “the current data deluge” to the transformation brought about by the Gutenberg printing press.
If you want more precise advertising directed toward you, then yes, big data is revolutionary. Generally, though, it’s likely to have a modest and gradual impact on our lives….
4. Bigger data is better.
In science, some admittedly mind-blowing big-data analyses are being done. In business, companies are being told to “embrace big data before your competitors do.” But big data is not automatically better.
Really big datasets can be a mess. Unless researchers and analysts can reduce the number of variables and make the data more manageable, they get quantity without a whole lot of quality. Give me some quality medium data over bad big data any day…
5. Big data means the end of scientific theories.
Chris Anderson argued in a 2008 Wired essay that big data renders the scientific method obsolete: Throw enough data at an advanced machine-learning technique, and all the correlations and relationships will simply jump out. We’ll understand everything.
But you can’t just go fishing for correlations and hope they will explain the world. If you’re not careful, you’ll end up with spurious correlations. Even more important, to contend with the “why” of things, we still need ideas, hypotheses and theories. If you don’t have good questions, your results can be silly and meaningless.
Having more data won’t substitute for thinking hard, recognizing anomalies and exploring deep truths.”

OpenCounter


Code for America: “OpenCounter’s mission is to empower entrepreneurs and foster local economic development by simplifying the process of registering a business.
Economic development happens in many forms, from projects like the revitalization of the Brooklyn Navy Yard or Hudson Rail Yards in New York City, to campaigns to encourage residents to shop at local merchants. While the majority of headlines will focus on a City’s effort to secure a major new employer (think Apple’s 1,000,000 square foot expansion in Austin, Texas), most economic development and job creation happens on a much smaller scale, as individuals stake their financial futures on creating a new product, store, service or firm.
But these new businesses aren’t in a position to accept tax breaks on capital equipment or enter into complex development and disposition agreements to build new offices or stores. Many new businesses can’t even meet the underwriting criteria of  SBA backed revolving-loan programs. Competition for local grants for facade improvements or signage assistance can be fierce….
Despite many cities’ genuine efforts to be “business-friendly,” their default user interface consists of florescent-lit formica, waiting lines, and stacks of forms. Online resources often remind one of a phone book, with little interactivity or specialization based on either the businesses’ function or location within a jurisdiction.
That’s why we built OpenCounter….See what we’re up to at opencounter.us or visit a live version of our software at http://opencounter.cityofsantacruz.com.”

Defense Against National Vulnerabilities in Public Data


DOD/DARPA Notice (See also Foreign Policy article): “OBJECTIVE: Investigate the national security threat posed by public data available either for purchase or through open sources. Based on principles of data science, develop tools to characterize and assess the nature, persistence, and quality of the data. Develop tools for the rapid anonymization and de-anonymization of data sources. Develop framework and tools to measure the national security impact of public data and to defend against the malicious use of public data against national interests.
DESCRIPTION: The vulnerabilities to individuals from a data compromise are well known and documented now as “identity theft.” These include regular stories published in the news and research journals documenting the loss of personally identifiable information by corporations and governments around the world. Current trends in social media and commerce, with voluntary disclosure of personal information, create other potential vulnerabilities for individuals participating heavily in the digital world. The Netflix Challenge in 2009 was launched with the goal of creating better customer pick prediction algorithms for the movie service [1]. An unintended consequence of the Netflix Challenge was the discovery that it was possible to de-anonymize the entire contest data set with very little additional data. This de-anonymization led to a federal lawsuit and the cancellation of the sequel challenge [2]. The purpose of this topic is to understand the national level vulnerabilities that may be exploited through the use of public data available in the open or for purchase.
Could a modestly funded group deliver nation-state type effects using only public data?…”
The official link for this solicitation is: www.acq.osd.mil/osbp/sbir/solicitations/sbir20133.
 

Data is Inert — It’s What You Do With It That Counts


Kevin Merritt, CEO and Founder, Socrata, in NextGov: “In its infancy, the open data movement was mostly about offering catalogs of government data online that concerned citizens and civic activists could download. But now, a wide variety of external stakeholders are using open data to deliver new applications and services. At the same time, governments themselves are harnessing open data to drive better decision-making.
In a relatively short period of time, open data has evolved from serving as fodder for data publishing to fuel for open innovation.
One of the keys to making this transformation truly work, however, is our ability to re-instrument or re-tool underlying business systems and processes so managers can receive open data in consumable forms on a regular, continuous basis in real-time….”

I Flirt and Tweet. Follow Me at #Socialbot.


in The New York Times: “FROM the earliest days of the Internet, robotic programs, or bots, have been trying to pass themselves off as human. Chatbots greet users when they enter an online chat room, for example, or kick them out when they get obnoxious….

Now come socialbots. These automated charlatans are programmed to tweet and retweet. They have quirks, life histories and the gift of gab. Many of them have built-in databases of current events, so they can piece together phrases that seem relevant to their target audience. They have sleep-wake cycles so their fakery is more convincing, making them less prone to repetitive patterns that flag them as mere programs. Some have even been souped up by so-called persona management software, which makes them seem more real by adding matching Facebook, Reddit or Foursquare accounts, giving them an online footprint over time as they amass friends and like-minded followers.

Researchers say this new breed of bots is being designed not just with greater sophistication but also with grander goals: to sway elections, to influence the stock market, to attack governments, even to flirt with people and one another.

…Socialbots are tapping into an ever-expanding universe of social media. Last year, the number of Twitter accounts topped 500 million. Some researchers estimate that only 35 percent of the average Twitter user’s followers are real people. In fact, more than half of Internet traffic already comes from nonhuman sources like bots or other types of algorithms. Within two years, about 10 percent of the activity occurring on social online networks will be masquerading bots, according to technology researchers….

Much of the social media remains unregulated by campaign finance and transparency laws. So far, the Federal Election Commission has been reluctant to venture into this realm.

But the bots are likely to venture into ours, said Tim Hwang, chief scientist at the Pacific Social Architecting Corporation, which creates bots and technologies that can shape social behavior. “Our vision is that in the near future automatons will eventually be able to rally crowds, open up bank accounts, write letters,” he said, “all through human surrogates.”

The Shame Game: U.S. Department of Labor Smartphone App Will Allow Public to Effortlessly Scrutinize Business Employment Practices


Charles B. Palmer in National Law Review: “The United States Department of Labor (DOL) recently launched a contest to find a new smartphone app that will allow the general public to effortlessly search for and scrutinize businesses and employers that have faced DOL citations. Dubbed the DOL Fair Labor Data Challenge, the contest seeks app entries that integrate information from consumer ratings websites, location tracking services, DOL Wage & Hour Division (WHD) citation data, and Occupational Safety & Health Administration (OSHA) citation data, into one software platform. In addition, the contest also encourages app developers to include other features in their respective app entries, such as information from state health boards and various licensing agencies.
The DOL Fair Labor Data Challenge is part of the DOL’s plan to amplify its enforcement efforts through increased public awareness and ease of access to citation data. Consumers and job applicants will soon be able to search for and publicly shame employers that hold one or more citations in the DOL database, all by just using their smartphones.”

For OpenBlock, Big Improvements From Small Newsrooms


at Ideas Lab: “A little more than five months after NBC News shut down its hyperlocal product, EveryBlock.com, the original open-source application has been resurrected in Columbia, Mo. But although both products were born of the same Django codebase and Knight Foundation funding, visitors to The Columbia Daily Tribune’s new Neighborhoods site will see a different emphasis and a new hope for a project that has slowed under the weight of government disability and technical complexity….The user interface is clean and smart, and the government data — which is the most difficult of any kind of data to mine — appears to be more current and complete than any OpenBlock installation has seen since the very early days before its code was made public.
That kind of commitment is needed for OpenBlock to succeed, because pulling digital records out of all but the very most efficient and transparent government agencies is a tremendous drag on the expense side of the news business. That difficulty, though, can also create an opportunity for outsized revenue.
Chris Gubbels, the Web developer who’s been overseeing the project for The Tribune, said that unlike many jurisdictions, Columbia’s police and fire data were “pretty simple” to pull into OpenBlock. The police even provided The Tribune with an RSS feed of geocoded 911 response calls.”