Open for Business: How Open Data Can Help Achieve the G20 Growth Target


New Report commissioned by Omydiar Network on the Business Case for Open Data: “Economic analysis has confirmed the significant contribution to economic growth and productivity achievable through an open data agenda. Governments, the private sector, individuals and communities all stand to benefit from the innovation and information that will inform investment, drive the creation of new industries, and inform decision making and research. To mark a step change in the way valuable information is created and reused, the G20 should release information as open data.
In May 2014, Omidyar Network commissioned Lateral Economics to undertake economic analysis on the potential of open data to support the G20’s 2% growth target and illustrate how an open data agenda can make a significant contribution to economic growth and productivity. Combining all G20 economies, output could increase by USD 13 trillion cumulatively over the next five years. Implementation of open data policies would thus boost cumulative G20 GDP by around 1.1 percentage points (almost 55%) of the G20’s 2% growth target over five years.
Recommendations
Importantly, open data cuts across a number of this year’s G20 priorities: attracting private infrastructure investment, creating jobs and lifting participation, strengthening tax systems and fighting corruption. This memo suggests an open data thread that runs across all G20 priorities. The more data is opened, the more it can be used, reused, repurposed and built on—in combination with other data—for everyone’s benefit.
We call on G20 economies to sign up to the Open Data Charter.
The G20 should ensure that data released by G20 working groups and themes is in line with agreed open data standards. This will lead to more accountable, efficient, effective governments who are going further to expose inadequacy, fight corruption and spur innovation.
Data is a national resource and open data is a ‘win-win’ policy. It is about making more of existing resources. We know that the cost of opening data is smaller than the economic returns, which could be significant. Methods to respect privacy concerns must be taken into account. If this is done, as the public and private sector share of information grows, there will be increasing positive returns.
The G20 opportunity
This November, leaders of the G20 Member States will meet in Australia to drive forward commitments made in the St Petersburg G20 Leaders Declaration last September and to make firm progress on stimulating growth. Actions across the G20 will include increasing investment, lifting employment and participation, enhancing trade and promoting competition.
The resulting ‘Brisbane Action Plan’ will encapsulate all of these commitments with the aim of raising the level of G20 output by at least 2% above the currently projected level over the next five years. There are major opportunities for cooperative and collective action by G20 governments.
Governments should intensify the release of existing public sector data – both government and publicly funded research data. But much more can be done to promote open data than simply releasing more government data. In appropriate circumstances, governments can mandate public disclosure of private sector data (e.g. in corporate financial reporting).
Recommendations for action

  • G20 governments should adopt the principles of the Open Data Charter to encourage the building of stronger, more interconnected societies that better meet the needs of our citizens and allow innovation and prosperity to flourish.
  • G20 governments should adopt specific open data targets under each G20 theme, as illustrated below, such as releasing open data related to beneficial owners of companies, as well revenues from extractive industries
  • G20 governments should consider harmonizing licensing regimes across the G20
  • G20 governments should adopt metrics for measuring the quantity and quality of open data publication, e.g. using the Open Data Institute’s Open Data Certificates as a bottom-up mechanism for driving the adoption of common standards.

Illustrative G20 examples
Fiscal and monetary policy
Governments possess rich real time data that is not open or accessed by government macro-economic managers. G20 governments should:

  • Open up models that lie behind economic forecasts and help assess alternative policy settings;
  • Publish spending and contractual data to enable comparative shopping by government between government suppliers.

Anti corruption
Open data may directly contribute to reduced corruption by increasing the likelihood corruption will be detected. G20 governments should:

  • Release open data related to beneficial owners of companies as well as revenues from extractive industries,
  • Collaborate on harmonised technical standards that permit the tracing of international money flows – including the tracing of beneficial owners of commercial entities, and the comparison and reconciliation of transactions across borders.

Trade
Obtaining and using trade data from multiple jurisdictions is difficult. Access fees, specific licenses, and non-machine readable formats all involve large transaction costs. G20 governments should:

  • Harmonise open data policies related to trade data.
  • Use standard trade schema and formats.

Employment
Higher quality information on employment conditions would facilitate better matching of employees to organizations, producing greater job-satisfaction and improved productivity. G20 governments should:

  • Open up centralised job vacancy registers to provide new mechanisms for people to find jobs.
  • Provide open statistical information about the demand for skills in particular areas to help those supporting training and education to hone their offerings.

Energy
Open data will help reduce the cost of energy supply and improve energy efficiency. G20 governments should:

  • Provide incentives for energy companies to publish open data from consumers and suppliers to enable cost savings through optimizing energy plans.
  • Release energy performance certifications for buildings
  • Publish real-time energy consumption for government buildings.

Infrastructure
Current infrastructure asset information is fragmented and inefficient. Exposing current asset data would be a significant first step in understanding gaps and providing new insights. G20 governments should:

  • Publish open data on governments’ infrastructure assets and plans to better understand infrastructure gaps, enable greater efficiency and insights in infrastructure development and use and analyse cost/benefits.
  • Publish open infrastructure data, including contracts via Open Contracting Partnership, in a consistent and harmonised way across G20 countries…”

App pays commuters to take routes that ease congestion


Springwise: “Congestion at peak hours is a major problem in the world’s busiest city centres. We’ve recently seen Gothenburg in Sweden offering free bicycles to ease the burden on public transport services, but now a new app is looking to take a different approach to the same problem. Urban Engines uses algorithms to help cities determine key congestion choke points and times, and can then reward commuters for avoiding them.
The Urban Engines system is based on commuters using the smart commuter cards already found in many major cities. The company tracks journeys made with those commuter cards, and uses that data to identify main areas of congestion, and at what times the congestion occurs. The system has already been employed in Washington, D.C, and Sao Paulo, Brazil, helping provide valuable data for work with city planners.
It’s in Singapore, however, where the most interesting work has been achieved so far. There, commuters who have signed up and registered their commuter cards can earn rewards when they travel. They will earn one point for every kilometre travelled during peak hours, or triple that when travelling off-peak. The points earned can then be converted into discounts on future journeys, or put towards an in-app raffle game, where they have the opportunity to win sums of money. Urban Engines claim there’s been a 7 to 13 percent reduction in journeys made during peak hours, with 200,000 commuters taking part.
The company is based on an original experiment carried out in Bangalore. The rewards program there, carried out among 20,000 employees of the Indian company Infosys, lead to 17 percent of traffic shifting to off-peak travel times in six months. A similarly successful experiment has also been carried out on the Stanford University campus, and the plan is to now expand to other major cities…”

Poetica


at TechnologyCrunch: “The ability to collaborate on the draft of a document is actually fiendishly tedious online. Many people might be used to Microsoft Word ‘Track Changes’ (ugh) despite the fact it looks awful and takes some getting used to. Nor does Google Docs really create a collaboration experience that mere mortals can get into. Step in Poetica, a brand new startup co-founded by Blaine Cook, formerly Twitter’s founding lead engineer.
Cook has now raised an angel round of funding for the London-based company which is hoping to change how teams create, share and edit work on the web, across any devices and mediums.
Poetica, which opens its doors to new signups today, is a browser-based editor and Chrome extension that portrays a more traditional view of text collaboration – in the same way you might see someone scribble on a piece of paper….
Cook says the goal is to “bring rich collaboration tools based on cutting-edge technology and design to everyone” who wants to communicate online. In other words, they are going for a fairly big play here. And he reckons he can do it from London, over the Valley, where he worked at Twitter: “London has an incredible community of brilliant software engineers and designers, and a growing and supportive investor base.”

How collective intelligence emerges: knowledge creation process in Wikipedia from microscopic viewpoint


Kyungho Lee  for the 2014 International Working Conference on Advanced Visual Interfaces: “The Wikipedia, one of the richest human knowledge repositories on the Internet, has been developed by collective intelligence. To gain insight into Wikipedia, one asks how initial ideas emerge and develop to become a concrete article through the online collaborative process? Led by this question, the author performed a microscopic observation of the knowledge creation process on the recent article, “Fukushima Daiichi nuclear disaster.” The author collected not only the revision history of the article but also investigated interactions between collaborators by making a user-paragraph network to reveal an intellectual intervention of multiple authors. The knowledge creation process on the Wikipedia article was categorized into 4 major steps and 6 phases from the beginning to the intellectual balance point where only revisions were made. To represent this phenomenon, the author developed a visaphor (digital visual metaphor) to digitally represent the article’s evolving concepts and characteristics. Then the author created a dynamic digital information visualization using particle effects and network graph structures. The visaphor reveals the interaction between users and their collaborative efforts as they created and revised paragraphs and debated aspects of the article.”

Crowdsourcing moving beyond the fringe


Bob Brown in Networked World: ” Depending up on how you look at it, crowdsourcing is all the rage these days — think Wikipedia, X Prize and Kickstarter — or at the other extreme, greatly underused.
To the team behind the new “insight network” Yegii, crowdsourcing has not nearly reached its potential despite having its roots as far back as the early 1700s and a famous case of the British Government seeking a solution to “The Longitude Problem” in order to make sailing less life threatening. (I get the impression that mention of this example is obligatory at any crowdsourcing event.)
This angel-funded startup, headed by an MIT Sloan School of Management senior lecturer and operating from a Boston suburb, is looking to exploit crowdsourcing’s potential through a service that connects financial, healthcare, technology and other organizations seeking knowledge with experts who can provide it – and fairly fast. To CEO Trond Undheim, crowdsourcing is “no longer for fringe freelance work,” and the goal is to get more organizations and smart individuals involved.
“Yegii is essentially a network of networks, connecting people, organizations, and knowledge in new ways,” says Undheim, who explains that the name Yegii is Korean for “talk” or “discussion”. “Our focus is laser sharp: we only rank and rate knowledge that says something essential about what I see as the four forces of industry disruption: technology, policy, user dynamics and business models.  We tackle challenging business issues across domains, from life sciences to energy to finance.  The point is that today’s industry classification is falling apart. We need more specific insight than in-house strategizing or generalist consulting advice.”
Undheim attempted to drum up interest in the new business last week at an event at Babson College during which a handful of crowdsourcing experts spoke. Harvard Business School adjunct professor Alan MacCormack discussed the X Prize, Netflix Prize and other examples of spurring competition through crowdsourcing. MIT’s Peter Gloor extolled the virtue of collaborative and smart swarms of people vs. stupid crowds (such as football hooligans). A couple of advertising/marketing execs shared stories of how clients and other brands are increasingly tapping into their customer base and the general public for new ideas from slogans to products, figuring that potential new customers are more likely to trust their peers than corporate ads. Another speaker dove into more details about how to run a crowdsourcing challenge, which includes identifying motivation that goes beyond money.
All of this was to frame Yegii’s crowdsourcing plan, which is at the beta stage with about a dozen clients (including Akamai and Santander bank) and is slated for mass production later this year. Yegii’s team consists of five part-timers, plus a few interns, who are building a web-based platform that consists of “knowledge assets,” that is market research, news reports and datasets from free and paid sources. That content – on topics that range from Bitcoin’s impact on banks to telecom bandwidth costs — is reviewed and ranked through a combination of machine learning and human peers. Information seekers would pay Yegii up to hundreds of dollars per month or up to tens of thousands of dollars per project, and then multidisciplinary teams would accept the challenge of answering their questions via customized reports within staged deadlines.
“We are focused on building partnerships with other expert networks and associations that have access to smart people with spare capacity, wherever they are,” Undheim says.
One reason organizations can benefit from crowdsourcing, Undheim says, is because of the “ephemeral nature of expertise in today’s society.” In other words, people within your organization might think of themselves as experts in this or that, but when they really think about it, they might realize their level of expertise has faded. Yegii will strive to narrow down the best sources of information for those looking to come up to speed on a subject over a weekend, whereas hunting for that information across a vast search engine would not be nearly as efficient….”

Lawsuit Would Force IRS to Release Nonprofit Tax Forms Digitally


Suzanne Perry at the Chronicle of Philanthropy on how “Open Data Could Shine a Light on Pay and Lobbying”: “Nonprofits that want to find out what their peers are doing can find a wealth of information in the forms the groups must file each year with the Internal Revenue Service—how much they pay their chief executives, how much they spend on fundraising, who is on their boards, where they offer services.
But the way the IRS makes those data available harkens to the digital dark ages, and critics who want to overhaul the system have been shaking up the generally polite nonprofit world with legal challenges, charges of monopoly, and talk of “disrupting” the status quo.
The issue will take center stage in a courtroom this week when a federal district judge in San Francisco is scheduled to consider arguments about whether to approve the IRS’s move to dismiss a lawsuit filed by an open-records group.
The group wants to obtain some specific Forms 990s, the informational tax documents filed by nonprofits, in a format that can be read by computers.
In theory, that shouldn’t be difficult since the nine nonprofits involved— including the American National Standards Institute, the New Horizons Foundation, and the International Code Council—submitted the forms electronically. But the IRS converts all 990s, no matter how they were filed, into images, rendering them useless for digital operations like searching multiple forms for information­.
That means watchdog groups and those that provide information on charities, like Charity Navigator, GuideStar, and the Urban Institute, have to spend money to manually enter the data they get from the IRS before making it available to the public, even if it has previously been digitized.
The lawsuit against the IRS, filed by Public.Resource.Org, aims to end that practice.
Carl Malamud, who heads the group, is a longtime activist who successfully pushed the Securities and Exchange Commission to post corporate filings free online in the 1990s, among other projects.
He wants to do the same with the IRS, arguing that data should be readily available at no cost about a sector that represents more than 1.5 million tax-exempt organizations and more than $1.5-trillion in revenue.

Putting Open Data to Work for Communities


Report by  Kathryn L.S. PettitLeah HendeyBrianna LosoyaG. Thomas Kingsley  at the Urban Institute: “The National Neighborhood Indicators Partnership (NNIP) is a network of local organizations that collect, organize, and use neighborhood data to tackle issues in their communities. As the movement for government transparency has spread at the local level, more NNIP partners are participating in the call for governments to release data and are using open data to provide information for decisionmaking and community engagement. Local NNIP partners and open data advocates have complementary strengths and should work together to more effectively advance open government data that benefits all residents.”

A New Way to Look at Law, With Data Viz and Machine Learning


  in Wired:

Ravel displays search results as an interactive visualization. Image: Ravel
“On TV, being a lawyer is all about dazzling jurors with verbal pyrotechnics. But for many lawyers–especially young ones–the job is about research. Long, dry, tedious research.
It’s that less glamorous side of the profession that Daniel Lewis and Nik Reed are trying to upend with Ravel. Using data visualization, language analysis, and machine learning, the Stanford Law grads are aiming to reinvent legal research–and perhaps give young lawyers a deeper understanding of their field in the process.
Lawyers have long relied on subscription services like LexisNexis and WestLaw to do their jobs. These services offer indispensable access to vast databases of case documents. Lewis remembers seeing the software on the computers at his Dad’s law firm when he used to hang out there as a kid. You’d put in a keyword, say, securities fraud, and get back a long, rank-ordered list of results relevant to that topic.
Years later, when Lewis was embarking on his own legal career as a first year at Stanford Law, he was struck by how little had changed. “The tools and technologies were the same,” he says. “It was surprising and disconcerting.” Reed, his classmate there, was also perplexed, especially having spent some time in the finance industry working with its high-powered tools. “There was all this cool stuff that everyone else was using in every other field, and it just wasn’t coming to lawyers,” he says.

Early users have reported that Ravel cut their overall research time by up to two thirds….

Ravel’s most ambitious features, however, are intended to help with the analysis of cases. These tools, saved for premium subscribers, are designed to automatically surface the key passages in whatever case you happen to be looking at, sussing out instances when they’ve been cited or reinterpreted in cases that followed.
To do this, Ravel effectively has to map the law, an undertaking that involves both human insight and technical firepower. The process, roughly: Lewis and Reed will look at a particular case, pinpoint the case it’s referencing, and then figure out what ties them together. It could be a direct reference, or a glancing one. It might show up as three paragraphs in that later ruling, or just a sentence.
Once those connections have been made, they’re handed off to Ravel’s engineers. The engineers, which make up more than half of the company’s ten-person team, are tasked with building models that can identify those same sorts of linkages in other cases, using natural language processing. In effect, Ravel’s trying to uncover the subtle linguistic patterns undergirding decades of legal rulings.
That all goes well beyond visual search, and the idea of future generations of lawyers learning from an algorithmic analysis of the law seems quietly dangerous in its own way (though a sterling conceit for a near-future short story!)
Still, compared to the comparatively primitive tools that still dominate the field today, Lewis and Reed see Ravel as a promising resource for young lawyers and law students. “It’s about helping them research more confidently,” Lewis says. “It’s about making sure they understand the story in the right way.” And, of course, about making all that research a little less tedious, too.”

Big Data, My Data


Jane Sarasohn-Kahn  at iHealthBeat: “The routine operation of modern health care systems produces an abundance of electronically stored data on an ongoing basis,” Sebastian Schneeweis writes in a recent New England Journal of Medicine Perspective.
Is this abundance of data a treasure trove for improving patient care and growing knowledge about effective treatments? Is that data trove a Pandora’s black box that can be mined by obscure third parties to benefit for-profit companies without rewarding those whose data are said to be the new currency of the economy? That is, patients themselves?
In this emerging world of data analytics in health care, there’s Big Data and there’s My Data (“small data”). Who most benefits from the use of My Data may not actually be the consumer.
Big focus on Big Data. Several reports published in the first half of 2014 talk about the promise and perils of Big Data in health care. The Federal Trade Commission’s study, titled “Data Brokers: A Call for Transparency and Accountability,” analyzed the business practices of nine “data brokers,” companies that buy and sell consumers’ personal information from a broad array of sources. Data brokers sell consumers’ information to buyers looking to use those data for marketing, managing financial risk or identifying people. There are health implications in all of these activities, and the use of such data generally is not covered by HIPAA. The report discusses the example of a data segment called “Smoker in Household,” which a company selling a new air filter for the home could use to target-market to an individual who might seek such a product. On the downside, without the consumers’ knowledge, the information could be used by a financial services company to identify the consumer as a bad health insurance risk.
Big Data and Privacy: A Technological Perspective,” a report from the President’s Office of Science and Technology Policy, considers the growth of Big Data’s role in helping inform new ways to treat diseases and presents two scenarios of the “near future” of health care. The first, on personalized medicine, recognizes that not all patients are alike or respond identically to treatments. Data collected from a large number of similar patients (such as digital images, genomic information and granular responses to clinical trials) can be mined to develop a treatment with an optimal outcome for the patients. In this case, patients may have provided their data based on the promise of anonymity but would like to be informed if a useful treatment has been found. In the second scenario, detecting symptoms via mobile devices, people wishing to detect early signs of Alzheimer’s Disease in themselves use a mobile device connecting to a personal couch in the Internet cloud that supports and records activities of daily living: say, gait when walking, notes on conversations and physical navigation instructions. For both of these scenarios, the authors ask, “Can the information about individuals’ health be sold, without additional consent, to third parties? What if this is a stated condition of use of the app? Should information go to the individual’s personal physicians with their initial consent but not a subsequent confirmation?”
The World Privacy Foundation’s report, titled “The Scoring of America: How Secret Consumer Scores Threaten Your Privacy and Your Future,” describes the growing market for developing indices on consumer behavior, identifying over a dozen health-related scores. Health scores include the Affordable Care Act Individual Health Risk Score, the FICO Medication Adherence Score, various frailty scores, personal health scores (from WebMD and OneHealth, whose default sharing setting is based on the user’s sharing setting with the RunKeeper mobile health app), Medicaid Resource Utilization Group Scores, the SF-36 survey on physical and mental health and complexity scores (such as the Aristotle score for congenital heart surgery). WPF presents a history of consumer scoring beginning with the FICO score for personal creditworthiness and recommends regulatory scrutiny on the new consumer scores for fairness, transparency and accessibility to consumers.
At the same time these three reports went to press, scores of news stories emerged discussing the Big Opportunities Big Data present. The June issue of CFO Magazine published a piece called “Big Data: Where the Money Is.” InformationWeek published “Health Care Dives Into Big Data,” Motley Fool wrote about “Big Data’s Big Future in Health Care” and WIRED called “Cloud Computing, Big Data and Health Care” the “trifecta.”
Well-timed on June 5, the Office of the National Coordinator for Health IT’s Roadmap for Interoperability was detailed in a white paper, titled “Connecting Health and Care for the Nation: A 10-Year Vision to Achieve an Interoperable Health IT Infrastructure.” The document envisions the long view for the U.S. health IT ecosystem enabling people to share and access health information, ensuring quality and safety in care delivery, managing population health, and leveraging Big Data and analytics. Notably, “Building Block #3” in this vision is ensuring privacy and security protections for health information. ONC will “support developers creating health tools for consumers to encourage responsible privacy and security practices and greater transparency about how they use personal health information.” Looking forward, ONC notes the need for “scaling trust across communities.”
Consumer trust: going, going, gone? In the stakeholder community of U.S. consumers, there is declining trust between people and the companies and government agencies with whom people deal. Only 47% of U.S. adults trust companies with whom they regularly do business to keep their personal information secure, according to a June 6 Gallup poll. Furthermore, 37% of people say this trust has decreased in the past year. Who’s most trusted to keep information secure? Banks and credit card companies come in first place, trusted by 39% of people, and health insurance companies come in second, trusted by 26% of people.
Trust is a basic requirement for health engagement. Health researchers need patients to share personal data to drive insights, knowledge and treatments back to the people who need them. PatientsLikeMe, the online social network, launched the Data for Good project to inspire people to share personal health information imploring people to “Donate your data for You. For Others. For Good.” For 10 years, patients have been sharing personal health information on the PatientsLikeMe site, which has developed trusted relationships with more than 250,000 community members…”