The Impact of Open: Keeping you healthy


of Sunlight: “In healthcare, the goal-set shared widely throughout the field is known as “the Triple Aim”: improving individual experience of care, improving population health, and reducing the cost of care. Across the wide array of initiatives undertaken by health care data users, the great majority seem to fall within the scope of at least one aspect of the Triple Aim. Below is a set of examples that reveal how data — both open and not — is being used to achieve its elements.

The use of open data to reduce costs:

The use of open data to improve quality of care:

  • Using open data on a substantial series of individual hospital quality measures, CMS created a hospital comparison tool that allows consumers to compare average quality of care outcomes across their local hospitals.

  • Non-profit organizations survey hospitals and have used this data to provide another national measure of hospital quality that consumers can use to select a high-quality hospital.

  • In New York state, widely-shared data on cardiac surgery outcomes associated with individual providers has led to improved outcomes and better understanding of successful techniques.

  • In the UK, the National Health Service is actively working towards defining concrete metrics to evaluate how the system as a whole is moving towards improved quality. …

  • The broad cultural shift towards data-sharing in healthcare appears to have facilitated additional secured sharing in order to achieve the joint goal of improving healthcare quality and effectiveness. The current effort to securely network of millions of patient data records through the federal PCORI system has the potential to advance understanding of disease treatment at an unprecedented pace.

  • Through third-party tools, people are able to use the products of aggregated patient data in order to begin diagnosing their own symptoms more accurately, giving them a head start in understanding how to optimize their visit to a provider.

The use of open data to improve population health:

  • Out of the three elements of the triple aim, population health may have the longest and deepest relationship with open data. Public datasets like those collected by the Centers for Disease Control and the US Census have for decades been used to monitor disease prevalence, verify access to health insurance, and track mortality and morbidity statistics.

  • Population health improvement has been a major focus for newer developments as well. Health data has been a regular feature in tech efforts to improve the ways that governments — including local health departments — reach their constituencies. The use of data in new communication tools improves population health by increasing population awareness of local health trends and disease prevention opportunities. Two examples of this work in action include the Chicago Health Atlas, which combines health data and healthcare consumer problem-solving, and Philadelphia’s map interface to city data about available flu vaccines.

One final observation for open data advocates to take from health data concerns the way that the sector encourages the two-way information flow: it embraces the notion that data users can also be data producers. Open data ecosystems are properly characterized by multi-directional relationships among governmental and non-governmental actors, with opportunities for feedback, correction and augmentation of open datasets. That this happens at the scale of health data is important and meaningful for open data advocates who can face push-back when they ask their governments to ingest externally-generated data….”

Microsoft Unveils Machine Learning for the Masses


The service, called Microsoft Azure Machine Learning, was announced Monday but won’t be available until July. It combines Microsoft’s own software with publicly available open source software, packaged in a way that is easier to use than most of the arcane strategies currently in use.
“This is drag-and-drop software,” said Joseph Sirosh, vice president for machine learning at Microsoft. “My high schooler is using this.”
That would be a big step forward in popularizing what is currently a difficult process in increasingly high demand. It would also further the ambitions of Satya Nadella, Microsoft’s chief executive, of making Azure the center of Microsoft’s future.
Users of Azure Machine Learning will have to keep their data in Azure, and Microsoft will provide ways to move data from competing services, like Amazon Web Services. Pricing has not yet been finalized, Mr. Sirosh said, but will be based on a premium to Azure’s standard computing and transmission charges.
Machine learning computers examine historical data through different algorithms and programming languages to make predictions. The process is commonly used in Internet search, fraud detection, product recommendations and digital personal assistants, among other things.
As more data is automatically stored online, there are opportunities to use machine learning for performing maintenance, scheduling hospital services, and anticipating disease outbreaks and crime, among other things. The methods have to become easier and cheaper to be popular, however.
That is the goal of Azure Machine Learning. “This is, as far as I know, the first comprehensive machine learning service in the cloud,” Mr. Sirosh said. “I’m leveraging every asset in Microsoft for this.” He is also using ways of accessing an open source version of R, a standard statistical language, while in Azure.
Microsoft is likely to face competition from rival cloud companies, including Google and Amazon. Both Google and Amazon have things like data frameworks used in building machine learning algorithms, as well as their own analysis services. IBM is eager to make use of its predictive software in its cloud business. Visualization companies like Tableau specialize in presenting the results so they can be acted on easily…”

15 Ways to bring Civic Innovation to your City


Chris Moore at AcuitasGov: “In my previous blog post I wrote about a desire to see our Governments transform to be part of the  21st century.  I saw a recent reference to how governments across Canada have lost their global leadership, how government in Canada at all levels is providing analog services to a digital society.  I couldn’t agree more.  I have been thinking lately about some practical ways that Mayors and City Managers could innovate in their communities.  I realize that there are a number of municipal elections happening this fall across Canada, a time when leadership changes and new ideas emerge.  So this blog is also for Mayoral candidates who have a sense that technology and innovation have a role to play in their city and in their administration.
I thought I would identify 15 initiatives that cities could pursue as part of their Civic Innovation Strategy.   For the last 50 years technology in local government in Canada has been viewed as an expense, as a necessary evil, not always understood by elected officials and senior administrators.  Information and Technology is part of every aspect of a city, it is critical in delivering services.  It is time to not just think of this as an expense but as an investment, as a way to innovate, reduce costs, enhance citizen service delivery and transform government operations.
Here are my top 15 ways to bring Civic Innovation to your city:
1. Build 21st Century Digital Infrastructure like the Chattanooga Gig City Project.
2. Build WiFi networks like the City of Edmonton on your own and in partnership with others.
3. Provide technology and internet to children and youth in need like the City of Toronto.
4. Connect to a national Education and Research network like Cybera in Alberta and CANARIE.
5. Create a Mayors Task-force on Innovation and Technology leveraging your city’s resources.
6. Run a hackathon or two or three like the City of Glasgow or maybe host a hacking health event like the City of Vancouver.
7. Launch a Startup incubator like Startup Edmonton or take it to the next level and create a civic lab like the City of Barcelona.
8. Develop an Open Government Strategy, I like to the Open City Strategy from Edmonton.
9. If Open Government is too much then just start with Open Data, Edmonton has one of the best.
10. Build a Citizen Dashboard to showcase your cities services and commitment to the public.
11. Put your Crime data online like the Edmonton Police Service.
12. Consider a pilot project with sensor technology for parking like the City of Nice or for  waste management like the City of Barcelona.
13. Embrace Car2Go, Modo and UBER as ways to move people in your city.
14. Consider turning your IT department into the Innovation and Technology Department like they did at the City of Chicago.
15. Partner with other near by local governments to create a shared Innovation and Technology agency.
Now more than ever before cities need to find ways to innovate, to transform and to create a foundation that is sustainable.  Now is the time for both courage and innovations in government.  What is your city doing to move into the 21st Century?”

Privacy and Open Government


Paper by Teresa Scassa in Future Internet: “The public-oriented goals of the open government movement promise increased transparency and accountability of governments, enhanced citizen engagement and participation, improved service delivery, economic development and the stimulation of innovation. In part, these goals are to be achieved by making more and more government information public in reusable formats and under open licences. This paper identifies three broad privacy challenges raised by open government. The first is how to balance privacy with transparency and accountability in the context of “public” personal information. The second challenge flows from the disruption of traditional approaches to privacy based on a collapse of the distinctions between public and private sector actors. The third challenge is that of the potential for open government data—even if anonymized—to contribute to the big data environment in which citizens and their activities are increasingly monitored and profiled.”

LifeLogging: personal big data


Paper by Gurrin, Cathal and Smeaton, Alan F. and Doherty, Aiden R. at Foundations and Trends in Information Retrieval: “We have recently observed a convergence of technologies to foster the emergence of lifelogging as a mainstream activity. Computer storage has become significantly cheaper, and advancements in sensing technology allows for the efficient sensing of personal activities, locations and the environment. This is best seen in the growing popularity of the quantified self movement, in which life activities are tracked using wearable sensors in the hope of better understanding human performance in a variety of tasks. This review aims to provide a comprehensive summary of lifelogging, to cover its research history, current technologies, and applications. Thus far, most of the lifelogging research has focused predominantly on visual lifelogging in order to capture life details of life activities, hence we maintain this focus in this review. However, we also reflect on the challenges lifelogging poses to an information retrieval scientist. This review is a suitable reference for those seeking a information retrieval scientist’s perspective on lifelogging and the quantified self.”

How Crowdsourced Astrophotographs on the Web Are Revolutionizing Astronomy


Emerging Technology From the arXiv: “Astrophotography is currently undergoing a revolution thanks to the increased availability of high quality digital cameras and the software available to process the pictures after they have been taken.
Since photographs of the night sky are almost always better with long exposures that capture more light, this processing usually involves combining several images of the same part of the sky to produce one with a much longer effective exposure.
That’s all straightforward if you’ve taken the pictures yourself with the same gear under the same circumstances. But astronomers want to do better.
“The astrophotography group on Flickr alone has over 68,000 images,” say Dustin Lang at Carnegie Mellon University in Pittsburgh and a couple of pals. These and other images represent a vast source of untapped data for astronomers.
The problem is that it’s hard to combine images accurately when little is known about how they were taken. Astronomers take great care to use imaging equipment in which the pixels produce a signal that is proportional to the number of photons that hit.
But the same cannot be said of the digital cameras widely used by amateurs. All kinds of processes can end up influencing the final image.
So any algorithm that combines them has to cope with these variations. “We want to do this without having to infer the (possibly highly nonlinear) processing that has been applied to each individual image, each of which has been wrecked in its own loving way by its creator,” say Lang and co.
Now, these guys say they’ve cracked it. They’ve developed a system that automatically combines images from the same part of the sky to increase the effective exposure time of the resulting picture. And they say the combined images can rival those from much professional telescopes.
They’ve tested this approach by downloading images of two well-known astrophysical objects: the NGC 5907 Galaxy and the colliding pair of galaxies—Messier 51a and 51b.
For NGC 5907, they ended up with 4,000 images from Flickr, 1,000 from Bing and 100 from Google. They used an online system called astrometry.net that automatically aligns and registers images of the night sky and then combined the images using their new algorithm, which they call Enhance.
The results are impressive. They say that the combined images of NGC5907 (bottom three images) show some of the same faint features that revealed a single image taken over 11 hours of exposure using a 50 cm telescope (the top left image). All the images reveal the same kind of fine detail such as a faint stellar stream around the galaxy.
The combined image for the M51 galaxies is just as impressive, taking only 40 minutes to produce on a single processor. It reveals extended structures around both galaxies, which astronomers know to be debris from their gravitational interaction as they collide.
Lang and co say these faint features are hugely important because they allow astronomers to measure the age, mass ratios, and orbital configurations of the galaxies involved. Interestingly, many of these faint features are not visible in any of the input images taken from the Web. They emerge only once images have been combined.
One potential problem with algorithms like this is that they need to perform well as the number of images they combine increases. It’s no good if they grind to a halt as soon as a substantial amount of data becomes available.
On this score, Lang and co say astronomers can rest easy. The performance of their new Enhance algorithm scales linearly with the number of images it has to combine. That means it should perform well on large datasets.
The bottom line is that this kind of crowd-sourced astronomy has the potential to make a big impact, given that the resulting images rival those from large telescopes.
And it could also be used for historical images, say Lang and co. The Harvard Plate Archives, for example, contain half a million images dating back to the 1880s. These were all taken using different emulsions, with different exposures and developed using different processes. So the plates all have different responses to light, making them hard to compare.
That’s exactly the problem that Lang and co have solved for digital images on the Web. So it’s not hard to imagine how they could easily combine the data from the Harvard archives as well….”
Ref: arxiv.org/abs/1406.1528 : Towards building a Crowd-Sourced Sky Map

Opening Public Transportation Data in Germany


Thesis by Kaufmann, Stefan: “Open data has been recognized as a valuable resource, and public institutions have taken to publishing their data under open licenses, also in Germany. However, German public transit agencies are still reluctant to publish their schedules as open data. Also, two widely used data exchange formats used in German transit planning are proprietary, with no documentation publicly available. Through this work, one of the proprietary formats was reverse-engineered, and a transformation process into the open GTFS schedule format was developed. This process allowed a partnering transit operator to publish their schedule as open data. Also, through a survey taken with German transit authorities and operators, the prevalence of transit data exchange formats, and reservations concerning open transit data were evaluated. The survey brought a series of issues to light which serve as obstacles for opening up transit data. Addressing the issues found through this work, and partnering with open-minded transit authorities to further develop transit data publishing processes can serve as a foundation for wider adoption of publishing open transit data in Germany”

Big Data, Big Questions


Special Issue by the International Journal of Communication on Big Data, Big Questions:

Critiquing Big Data: Politics, Ethics, Epistemology | Special Section Introduction PDF
Kate Crawford, Mary L. Gray, Kate Miltner 10 pgs.
The Big Data Divide ABSTRACT PDF
Mark Andrejevic 17 pgs.
Metaphors of Big Data ABSTRACT PDF
Cornelius Puschmann, Jean Burgess 20 pgs.
Advertising, Big Data and the Clearance of the Public Realm: Marketers’ New Approaches to the Content Subsidy ABSTRACT PDF
Nick Couldry, Joseph Turow 17 pgs.
A Dozen Ways to Get Lost in Translation: Inherent Challenges in Large Scale Data Sets ABSTRACT PDF
Lawrence Busch 18 pgs.
Working Within a Black Box: Transparency in the Collection and Production of Big Twitter Data ABSTRACT PDF
Kevin Driscoll, Shawn Walker 20 pgs.
Living on Fumes: Digital Footprints, Data Fumes, and the Limitations of Spatial Big Data ABSTRACT PDF
Jim Thatcher 19 pgs.
This One Does Not Go Up To 11: The Quantified Self Movement as an Alternative Big Data Practice ABSTRACT PDF
Dawn Nafus, Jamie Sherman 11 pgs.
The Theory/Data Thing ABSTRACT PDF
Geoffrey C. Bowker 5 pgs.

Finding Mr. Smith or why anti-corruption needs open data


Martin Tisne: “Anti-corruption groups have been rightly advocating for the release of information on the beneficial or real owners of companies and trust. The idea is to crack down on tax evasion and corruption by identifying the actual individuals hiding behind several layers of shell companies.
But knowing that “Mr. Smith” is the owner of company X is of no interest, unless you know who Mr. Smith is.
The real interest lies in figuring out that Mr. Smith is linked to company Y, that has been illegally exporting timber from country Z, and that Mr. Smith is the son-in-law of the mining minister of yet another country, who has been accused of embezzling mining industry revenues.
For that, investigative journalists, prosecution authorities, civil society groups like Global Witness and Transparency International will need access not just to public registries of beneficial ownership but also contract data, political exposed persons databases (“PEPs” databases), project by project extractive industry data, and trade export/import data.
Unless those datasets are accessible, comparable, linked, it won’t be possible. We are talking about millions of datasets – no problem for computers to crunch, but impossible to go through manually.
This is what is different in the anti-corruption landscape today, compared to 10 years ago. Technology makes it possible. Don’t get me wrong – there are still huge, thorny political obstacles to getting the data even publicly available in the first place. But unless it is open data, I fear those battles will have been in vain.
That’s why we need open data as a topic on the G20 anti-corruption working group.”

Index: The Networked Public


The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on the networked public and was originally published in 2014.

Global Overview

  • The proportion of global population who use the Internet in 2013: 38.8%, up 3 percentage points from 2012
  • Increase in average global broadband speeds from 2012 to 2013: 17%
  • Percent of internet users surveyed globally that access the internet at least once a day in 2012: 96
  • Hours spent online in 2012 each month across the globe: 35 billion
  • Country with the highest online population, as a percent of total population in 2012: United Kingdom (85%)
  • Country with the lowest online population, as a percent of total population in 2012: India (8%)
  • Trend with the highest growth rate in 2012: Location-based services (27%)
  • Years to reach 50 million users: telephone (75), radio (38), TV (13), internet (4)

Growth Rates in 2014

  • Rate at which the total number of Internet users is growing: less than 10% a year
  • Worldwide annual smartphone growth: 20%
  • Tablet growth: 52%
  • Mobile phone growth: 81%
  • Percentage of all mobile users who are now smartphone users: 30%
  • Amount of all web usage in 2013 accounted for by mobile: 14%
  • Amount of all web usage in 2014 accounted for by mobile: 25%
  • Percentage of money spent on mobile used for app purchases: 68%
  • Growth of BitCoin wallet between 2013 and 2014: 8 times increase
  • Number of listings on AirBnB in 2014: 550k, 83% growth year on year
  • How many buyers are on Alibaba in 2014: 231MM buyers, 44% growth year on year

Social Media

  • Number of Whatsapp messages on average sent per day: 50 billion
  • Number sent per day on Snapchat: 1.2 billion
  • How many restaurants are registered on GrubHub in 2014: 29,000
  • Amount the sale of digital songs fell in 2013: 6%
  • How much song streaming grew in 2013: 32%
  • Number of photos uploaded and shared every day on Flickr, Snapchat, Instagram, Facebook and Whatsapp combined in 2014: 1.8 billion
  • How many online adults in the U.S. use a social networking site of some kind: 73%
  • Those who use multiple social networking sites: 42%
  • Dominant social networking platform: Facebook, with 71% of online adults
  • Number of Facebook users in 2004, its founding year: 1 million
  • Number of monthly active users on Facebook in September 2013: 1.19 billion, an 18% increase year-over-year
  • How many Facebook users log in to the site daily: 63%
  • Instagram users who log into the service daily: 57%
  • Twitter users who are daily visitors: 46%
  • Number of photos uploaded to Facebook every minute: over 243,000, up 16% from 2012
  • How much of the global internet population is actively using Twitter every month: 21%
  • Number of tweets per minute: 350,000, up 250% from 2012
  • Fastest growing demographic on Twitter: 55-64 year age bracket, up 79% from 2012
  • Fastest growing demographic on Facebook: 45-54 year age bracket, up 46% from 2012
  • How many LinkedIn accounts are created every minute: 120, up 20% from 2012
  • The number of Google searches in 2013: 3.5 million, up 75% from 2012
  • Percent of internet users surveyed globally that use social media in 2012: 90
  • Percent of internet users surveyed globally that use social media daily: 60
  • Time spent social networking, the most popular online activity: 22%, followed by searches (21%), reading content (20%), and emails/communication (19%)
  • The average age at which a child acquires an online presence through their parents in 10 mostly Western countries: six months
  • Number of children in those countries who have a digital footprint by age 2: 81%
  • How many new American marriages between 2005-2012 began by meeting online, according to a nationally representative study: more than one-third 
  • How many of the world’s 505 leaders are on Twitter: 3/4
  • Combined Twitter followers: of 505 world leaders: 106 million
  • Combined Twitter followers of Justin Bieber, Katy Perry, and Lady Gaga: 122 million
  • How many times all Wikipedias are viewed per month: nearly 22 billion times
  • How many hits per second: more than 8,000 
  • English Wikipedia’s share of total page views: 47%
  • Number of articles in the English Wikipedia in December 2013: over 4,395,320 
  • Platform that reaches more U.S. adults between ages 18-34 than any cable network: YouTube
  • Number of unique users who visit YouTube each month: more than 1 billion
  • How many hours of video are watched on YouTube each month: over 6 billion, 50% more than 2012
  • Proportion of YouTube traffic that comes from outside the U.S.: 80%
  • Most common activity online, based on an analysis of over 10 million web users: social media
  • People on Twitter who recommend products in their tweets: 53%
  • People who trust online recommendations from people they know: 90%

Mobile and the Internet of Things

  • Number of global smartphone users in 2013: 1.5 billion
  • Number of global mobile phone users in 2013: over 5 billion
  • Percent of U.S. adults that have a cell phone in 2013: 91
  • Number of which are a smartphone: almost two thirds
  • Mobile Facebook users in March 2013: 751 million, 54% increase since 2012
  • Growth rate of global mobile traffic as a percentage of global internet traffic as of May 2013: 15%, up from .9% in 2009
  • How many smartphone owners ages 18–44 “keep their phone with them for all but two hours of their waking day”: 79%
  • Those who reach for their smartphone immediately upon waking up: 62%
  • Those who couldn’t recall a time their phone wasn’t within reach or in the same room: 1 in 4
  • Facebook users who access the service via a mobile device: 73.44%
  • Those who are “mobile only”: 189 million
  • Amount of YouTube’s global watch time that is on mobile devices: almost 40%
  • Number of objects connected globally in the “internet of things” in 2012: 8.7 billion
  • Number of connected objects so far in 2013: over 10 billion
  • Years from tablet introduction for tables to surpass desktop PC and notebook shipments: less than 3 (over 55 million global units shipped in 2013, vs. 45 million notebooks and 35 million desktop PCs)
  • Number of wearable devices estimated to have been shipped worldwide in 2011: 14 million
  • Projected number of wearable devices in 2016: between 39-171 million
  • How much of the wearable technology market is in the healthcare and medical sector in 2012: 35.1%
  • How many devices in the wearable tech market are fitness or activity trackers: 61%
  • The value of the global wearable technology market in 2012: $750 million
  • The forecasted value of the market in 2018: $5.8 billion
  • How many Americans are aware of wearable tech devices in 2013: 52%
  • Devices that have the highest level of awareness: wearable fitness trackers,
  • Level of awareness for wearable fitness trackers amongst American consumers: 1 in 3 consumers
  • Value of digital fitness category in 2013: $330 million
  • How many American consumers surveyed are aware of smart glasses: 29%
  • Smart watch awareness amongst those surveyed: 36%

Access

  • How much of the developed world has mobile broadband subscriptions in 2013: 3/4
  • How much of the developing world has broadband subscription in 2013: 1/5
  • Percent of U.S. adults that had a laptop in 2012: 57
  • How many American adults did not use the internet at home, at work, or via mobile device in 2013: one in five
  • Amount President Obama initiated spending in 2009 in an effort to expand access: $7 billion
  • Number of Americans potentially shut off from jobs, government services, health care and education, among other opportunities due to digital inequality: 60 million
  • American adults with a high-speed broadband connection at home as of May 2013: 7 out of 10
  • Americans aged 18-29 vs. 65+ with a high-speed broadband connection at home as of May 2013: 80% vs. 43
  • American adults with college education (or more) vs. adults with no high school diploma that have a high-speed broadband connection at home as of May 2013: 89% vs. 37%
  • Percent of U.S. adults with college education (or more) that use the internet in 2011: 94
  • Those with no high school diploma that used the internet in 2011: 43
  • Percent of white American households that used the internet in 2013: 67
  • Black American households that used the internet in 2013: 57
  • States with lowest internet use rates in 2013: Mississippi, Alabama and Arkansas
  • How many American households have only wireless telephones as of the second half of 2012: nearly two in five
  • States with the highest prevalence of wireless-only adults according to predictive modeling estimates: Idaho (52.3%), Mississippi (49.4%), Arkansas (49%)
  • Those with the lowest prevalence of wireless-only adults: New Jersey (19.4%), Connecticut (20.6%), Delaware (23.3%) and New York (23.5%)

Sources