Crowdsourced transit app shows what time the bus will really come


Springwise: “The problem with most transport apps is that they rely on fixed data from transport company schedules and don’t truly reflect exactly what’s going on with the city’s trains and buses at any given moment. Operating like a Waze for public transport, Israel’s Ototo app crowdsources real-time information from passengers to give users the best suggestions for their commute.
The app relies on a community of ‘Riders’, who allow anonymous location data to be sent from their smartphone whenever they’re using public transport. By collating this data together, Ototo offers more realistic information about bus and train routes. While a bus may be due in five minutes, a Rider currently on that bus might be located more than five minutes away, indicating that the bus isn’t on time. Ototo can then suggest a quicker route for users. According to Fast Company, the service currently has a 12,000-strong global Riders community that powers its travel recommendations. On top of this, the app is designed in an easy-to-use infographic format that quickly and efficiently tells users where they need to be going and how long it will take. The app is free to download from the App Store, and the video below offers a demonstration:


Ototo faces competition from similar services such as New York City’s Moovit, which also details how crowded buses are.”

The data gold rush


Neelie KROES (European Commission):  “Nearly 200 years ago, the industrial revolution saw new networks take over. Not just a new form of transport, the railways connected industries, connected people, energised the economy, transformed society.
Now we stand facing a new industrial revolution: a digital one.
With cloud computing its new engine, big data its new fuel. Transporting the amazing innovations of the internet, and the internet of things. Running on broadband rails: fast, reliable, pervasive.
My dream is that Europe takes its full part. With European industry able to supply, European citizens and businesses able to benefit, European governments able and willing to support. But we must get all those components right.
What does it mean to say we’re in the big data era?
First, it means more data than ever at our disposal. Take all the information of humanity from the dawn of civilisation until 2003 – nowadays that is produced in just two days. We are also acting to have more and more of it become available as open data, for science, for experimentation, for new products and services.
Second, we have ever more ways – not just to collect that data – but to manage it, manipulate it, use it. That is the magic to find value amid the mass of data. The right infrastructure, the right networks, the right computing capacity and, last but not least, the right analysis methods and algorithms help us break through the mountains of rock to find the gold within.
Third, this is not just some niche product for tech-lovers. The impact and difference to people’s lives are huge: in so many fields.
Transforming healthcare, using data to develop new drugs, and save lives. Greener cities with fewer traffic jams, and smarter use of public money.
A business boost: like retailers who communicate smarter with customers, for more personalisation, more productivity, a better bottom line.
No wonder big data is growing 40% a year. No wonder data jobs grow fast. No wonder skills and profiles that didn’t exist a few years ago are now hot property: and we need them all, from data cleaner to data manager to data scientist.
This can make a difference to people’s lives. Wherever you sit in the data ecosystem – never forget that. Never forget that real impact and real potential.
Politicians are starting to get this. The EU’s Presidents and Prime Ministers have recognised the boost to productivity, innovation and better services from big data and cloud computing.
But those technologies need the right environment. We can’t go on struggling with poor quality broadband. With each country trying on its own. With infrastructure and research that are individual and ineffective, separate and subscale. With different laws and practices shackling and shattering the single market. We can’t go on like that.
Nor can we continue in an atmosphere of insecurity and mistrust.
Recent revelations show what is possible online. They show implications for privacy, security, and rights.
You can react in two ways. One is to throw up your hands and surrender. To give up and put big data in the box marked “too difficult”. To turn away from this opportunity, and turn your back on problems that need to be solved, from cancer to climate change. Or – even worse – to simply accept that Europe won’t figure on this mapbut will be reduced to importing the results and products of others.
Alternatively: you can decide that we are going to master big data – and master all its dependencies, requirements and implications, including cloud and other infrastructures, Internet of things technologies as well as privacy and security. And do it on our own terms.
And by the way – privacy and security safeguards do not just have to be about protecting and limiting. Data generates value, and unlocks the door to new opportunities: you don’t need to “protect” people from their own assets. What you need is to empower people, give them control, give them a fair share of that value. Give them rights over their data – and responsibilities too, and the digital tools to exercise them. And ensure that the networks and systems they use are affordable, flexible, resilient, trustworthy, secure.
One thing is clear: the answer to greater security is not just to build walls. Many millennia ago, the Greek people realised that. They realised that you can build walls as high and as strong as you like – it won’t make a difference, not without the right awareness, the right risk management, the right security, at every link in the chain. If only the Trojans had realised that too! The same is true in the digital age: keep our data locked up in Europe, engage in an impossible dream of isolation, and we lose an opportunity; without gaining any security.
But master all these areas, and we would truly have mastered big data. Then we would have showed technology can take account of democratic values; and that a dynamic democracy can cope with technology. Then we would have a boost to benefit every European.
So let’s turn this asset into gold. With the infrastructure to capture and process. Cloud capability that is efficient, affordable, on-demand. Let’s tackle the obstacles, from standards and certification, trust and security, to ownership and copyright. With the right skills, so our workforce can seize this opportunity. With new partnerships, getting all the right players together. And investing in research and innovation. Over the next two years, we are putting 90 million euros on the table for big data and 125 million for the cloud.
I want to respond to this economic imperative. And I want to respond to the call of the European Council – looking at all the aspects relevant to tomorrow’s digital economy.
You can help us build this future. All of you. Helping to bring about the digital data-driven economy of the future. Expanding and depening the ecosystem around data. New players, new intermediaries, new solutions, new jobs, new growth….”

The Parable of Google Flu: Traps in Big Data Analysis


David Lazer: “…big data last winter had its “Dewey beats Truman” moment, when the poster child of big data (at least for behavioral data), Google Flu Trends (GFT), went way off the rails in “nowcasting” the flu–overshooting the peak last winter by 130% (and indeed, it has been systematically overshooting by wide margins for 3 years). Tomorrow we (Ryan Kennedy, Alessandro Vespignani, and Gary King) have a paper out in Science dissecting why GFT went off the rails, how that could have been prevented, and the broader lessons to be learned regarding big data.
[We are The Parable of Google Flu (WP-Final).pdf we submitted before acceptance. We have also posted an SSRN paper evaluating GFT for 2013-14, since it was reworked in the Fall.]Key lessons that I’d highlight:
1) Big data are typically not scientifically calibrated. This goes back to my post last month regarding measurement. This does not make them useless from a scientific point of view, but you do need to build into the analysis that the “measures” of behavior are being affected by unseen things. In this case, the likely culprit was the Google search algorithm, which was modified in various ways that we believe likely to have increased flu related searches.
2) Big data + analytic code used in scientific venues with scientific claims need to be more transparent. This is a tricky issue, because there are both legitimate proprietary interests involved and privacy concerns, but much more can be done in this regard than has been done in the 3 GFT papers. [One of my aspirations over the next year is to work together with big data companies, researchers, and privacy advocates to figure out how this can be done.]
3) It’s about the questions, not the size of the data. In this particular case, one could have done a better job stating the likely flu prevalence today by ignoring GFT altogether and just project 3 week old CDC data to today (better still would have been to combine the two). That is, a synthesis would have been more effective than a pure “big data” approach. I think this is likely the general pattern.
4) More generally, I’d note that there is much more that the academy needs to do. First, the academy needs to build the foundation for collaborations around big data (e.g., secure infrastructures, legal understandings around data sharing, etc). Second, there needs to be MUCH more work done to build bridges between the computer scientists who work on big data and social scientists who think about deriving insights about human behavior from data more generally. We have moved perhaps 5% of the way that we need to in this regard.”

How government can engage with citizens online – expert views


The Guardian: In our livechat on 28 February the experts discussed how to connect up government and citizens online. Digital public services are not just for ‘techno wizzy people’, so government should make them easier for everyone… Read the livechat in full
Michael Sanders, head of research for the behavioural insights team@mike_t_sanders
It’s important that government is a part of people’s lives: when people interact with government it shouldn’t be a weird and alienating experience, but one that feels part of their everyday lives.
Online services are still too often difficult to use: most people who use the HMRC website will do so infrequently, and will forget its many nuances between visits. This is getting better but there’s a long way to go.
Digital by default keeps things simple: one of our main findings from our research on improving public services is that we should do all we can to “make it easy”.
There is always a risk of exclusion: we should avoid “digital by default” becoming “digital only”.
Ben Matthews, head of communications at Futuregov@benrmatthews
We prefer digital by design to digital by default: sometimes people can use technology badly, under the guise of ‘digital by default’. We should take a more thoughtful approach to technology, using it as a means to an end – to help us be open, accountable and human.
Leadership is important: you can get enthusiasm from the frontline or younger workers who are comfortable with digital tools, but until they’re empowered by the top of the organisation to use them actively and effectively, we’ll see little progress.
Jargon scares people off: ‘big data’ or ‘open data’, for example….”

Predicting Individual Behavior with Social Networks


Article by Sharad Goel and Daniel Goldstein (Microsoft Research): “With the availability of social network data, it has become possible to relate the behavior of individuals to that of their acquaintances on a large scale. Although the similarity of connected individuals is well established, it is unclear whether behavioral predictions based on social data are more accurate than those arising from current marketing practices. We employ a communications network of over 100 million people to forecast highly diverse behaviors, from patronizing an off-line department store to responding to advertising to joining a recreational league. Across all domains, we find that social data are informative in identifying individuals who are most likely to undertake various actions, and moreover, such data improve on both demographic and behavioral models. There are, however, limits to the utility of social data. In particular, when rich transactional data were available, social data did little to improve prediction.”

Overcoming 'Tragedies of the Commons' with a Self-Regulating, Participatory Market Society


Paper by Dirk Helbing; “Our society is fundamentally changing. These days, almost nothing works without a computer chip. Processing power doubles every 18 months and will exceed the capabilities of human brains in about ten years from now. Some time ago, IBM’s Big Blue computer already beat the best chess player. Meanwhile, computers perform about 70 percent of all financial transactions, and IBM’s Watson advises customers better than human telephone hotlines. Will computers and robots soon replace skilled labor? In many European countries, unemployment is reaching historical heights. The forthcoming economic and social impact of future information and communication technologies (ICT) will be huge – probably more significant than that caused by the steam engine, or by nano- or biotechnology.
The storage capacity for data is growing even faster than computational capacity. Within just a year we will soon generate more data than in the entire history of humankind. The “Internet of Things” will network trillions of sensors. Unimaginable amounts of data will be collected. Big Data is already being praised as the “oil of the 21st century”. What opportunities and risks does this create for our society, economy, and environment?”

Open Government -Opportunities and Challenges for Public Governance


New volume of Public Administration and Information Technology series: “Given this global context, and taking into account both the need of academicians and practitioners, it is the intention of this book to shed light on the open government concept and, in particular:
• To provide comprehensive knowledge of recent major developments of open government around the world.
• To analyze the importance of open government efforts for public governance.
• To provide insightful analysis about those factors that are critical when designing, implementing and evaluating open government initiatives.
• To discuss how contextual factors affect open government initiatives’success or failure.
• To explore the existence of theoretical models of open government.
• To propose strategies to move forward and to address future challenges in an international context.”

Big Data, Big New Businesses


Nigel Shaboldt and Michael Chui: “Many people have long believed that if government and the private sector agreed to share their data more freely, and allow it to be processed using the right analytics, previously unimaginable solutions to countless social, economic, and commercial problems would emerge. They may have no idea how right they are.

Even the most vocal proponents of open data appear to have underestimated how many profitable ideas and businesses stand to be created. More than 40 governments worldwide have committed to opening up their electronic data – including weather records, crime statistics, transport information, and much more – to businesses, consumers, and the general public. The McKinsey Global Institute estimates that the annual value of open data in education, transportation, consumer products, electricity, oil and gas, health care, and consumer finance could reach $3 trillion.

These benefits come in the form of new and better goods and services, as well as efficiency savings for businesses, consumers, and citizens. The range is vast. For example, drawing on data from various government agencies, the Climate Corporation (recently bought for $1 billion) has taken 30 years of weather data, 60 years of data on crop yields, and 14 terabytes of information on soil types to create customized insurance products.

Similarly, real-time traffic and transit information can be accessed on smartphone apps to inform users when the next bus is coming or how to avoid traffic congestion. And, by analyzing online comments about their products, manufacturers can identify which features consumers are most willing to pay for, and develop their business and investment strategies accordingly.

Opportunities are everywhere. A raft of open-data start-ups are now being incubated at the London-based Open Data Institute (ODI), which focuses on improving our understanding of corporate ownership, health-care delivery, energy, finance, transport, and many other areas of public interest.

Consumers are the main beneficiaries, especially in the household-goods market. It is estimated that consumers making better-informed buying decisions across sectors could capture an estimated $1.1 trillion in value annually. Third-party data aggregators are already allowing customers to compare prices across online and brick-and-mortar shops. Many also permit customers to compare quality ratings, safety data (drawn, for example, from official injury reports), information about the provenance of food, and producers’ environmental and labor practices.

Consider the book industry. Bookstores once regarded their inventory as a trade secret. Customers, competitors, and even suppliers seldom knew what stock bookstores held. Nowadays, by contrast, bookstores not only report what stock they carry but also when customers’ orders will arrive. If they did not, they would be excluded from the product-aggregation sites that have come to determine so many buying decisions.

The health-care sector is a prime target for achieving new efficiencies. By sharing the treatment data of a large patient population, for example, care providers can better identify practices that could save $180 billion annually.

The Open Data Institute-backed start-up Mastodon C uses open data on doctors’ prescriptions to differentiate among expensive patent medicines and cheaper “off-patent” varieties; when applied to just one class of drug, that could save around $400 million in one year for the British National Health Service. Meanwhile, open data on acquired infections in British hospitals has led to the publication of hospital-performance tables, a major factor in the 85% drop in reported infections.

There are also opportunities to prevent lifestyle-related diseases and improve treatment by enabling patients to compare their own data with aggregated data on similar patients. This has been shown to motivate patients to improve their diet, exercise more often, and take their medicines regularly. Similarly, letting people compare their energy use with that of their peers could prompt them to save hundreds of billions of dollars in electricity costs each year, to say nothing of reducing carbon emissions.

Such benchmarking is even more valuable for businesses seeking to improve their operational efficiency. The oil and gas industry, for example, could save $450 billion annually by sharing anonymized and aggregated data on the management of upstream and downstream facilities.

Finally, the move toward open data serves a variety of socially desirable ends, ranging from the reuse of publicly funded research to support work on poverty, inclusion, or discrimination, to the disclosure by corporations such as Nike of their supply-chain data and environmental impact.

There are, of course, challenges arising from the proliferation and systematic use of open data. Companies fear for their intellectual property; ordinary citizens worry about how their private information might be used and abused. Last year, Telefónica, the world’s fifth-largest mobile-network provider, tried to allay such fears by launching a digital confidence program to reassure customers that innovations in transparency would be implemented responsibly and without compromising users’ personal information.

The sensitive handling of these issues will be essential if we are to reap the potential $3 trillion in value that usage of open data could deliver each year. Consumers, policymakers, and companies must work together, not just to agree on common standards of analysis, but also to set the ground rules for the protection of privacy and property.”

Disinformation Visualization: How to lie with datavis


Mushon Zer-Aviv at School of Data: “Seeing is believing. When working with raw data we’re often encouraged to present it differently, to give it a form, to map it or visualize it. But all maps lie. In fact, maps have to lie, otherwise they wouldn’t be useful. Some are transparent and obvious lies, such as a tree icon on a map often represents more than one tree. Others are white lies – rounding numbers and prioritising details to create a more legible representation. And then there’s the third type of lie, those lies that convey a bias, be it deliberately or subconsciously. A bias that misrepresents the data and skews it towards a certain reading.

It all sounds very sinister, and indeed sometimes it is. It’s hard to see through a lie unless you stare it right in the face, and what better way to do that than to get our minds dirty and look at some examples of creative and mischievous visual manipulation.
Over the past year I’ve had a few opportunities to run Disinformation Visualization workshops, encouraging activists, designers, statisticians, analysts, researchers, technologists and artists to visualize lies. During these sessions I have used the DIKW pyramid (Data > Information > Knowledge > Wisdom), a framework for thinking about how data gains context and meaning and becomes information. This information needs to be consumed and understood to become knowledge. And finally when knowledge influences our insights and our decision making about the future it becomes wisdom. Data visualization is one of the ways to push data up the pyramid towards wisdom in order to affect our actions and decisions. It would be wise then to look at visualizations suspiciously.
DIKW
Centuries before big data, computer graphics and social media collided and gave us the datavis explosion, visualization was mostly a scientific tool for inquiry and documentation. This history gave the artform its authority as an integral part of the scientific process. Being a product of human brains and hands, a certain degree of bias was always there, no matter how scientific the process was. The effect of these early off-white lies are still felt today, as even our most celebrated interactive maps still echo the biases of the Mercator map projection, grounding Europe and North America on the top of the world, over emphasizing their size and perceived importance over the Global South. Our contemporary practices of programmatically data driven visualization hide both the human eyes and hands that produce them behind data sets, algorithms and computer graphics, but the same biases are still there, only they’re harder to decipher…”