Brian Merchant at Motherboard: “Technology should probably be transforming public transit a lot faster than it is. Yes, apps like Hopstop have made finding stops easier and I’ve started riding the bus in unfamiliar parts of town a bit more often thanks to Google Maps’ route info. But these are relatively small steps, and it’s all limited to making scheduling information more widely available. Where’s the innovation on the other side? Where’s the Uber-like interactivity, the bus that comes to you after a tap on the iPhone?
In Finland, actually. The Kutsuplus is Helsinki’s groundbreaking mass transit hybrid program that lets riders choose their own routes, pay for fares on their phones, and summon their own buses. It’s a pretty interesting concept. With a ten minute lead time, you summon a Kutsuplus bus to a stop using the official app, just as you’d call a livery cab on Uber. Each minibus in the fleet seats at least nine people, and there’s room for baby carriages and bikes.
You can call your own private Kutsuplus, but if you share the ride, you share the costs—it’s about half the price of a cab fare, and a dollar or two more expensive than old school bus transit. You can then pick your own stop, also using the app.
The interesting part is the scheduling, which is entirely automated. If you’re sharing the ride, an algorithm determines the most direct route, and you only get charged as though you were riding solo. You can pay with a Kutsuplus wallet on the app, or, eventually, bill the charge to your phone bill.”
NEW Publication: “Reimagining Governance in Practice: Benchmarking British Columbia’s Citizen Engagement Efforts”
Over the last few years, the Government of British Columbia (BC), Canada has initiated a variety of practices and policies aimed at providing more legitimate and effective governance. Leveraging advances in technology, the BC Government has focused on changing how it engages with its citizens with the goal of optimizing the way it seeks input and develops and implements policy. The efforts are part of a broader trend among a wide variety of democratic governments to re-imagine public service and governance.
At the beginning of 2013, BC’s Ministry of Citizens’ Services and Open Government, now the Ministry of Technology, Innovation and Citizens’ Services, partnered with the GovLab to produce “Reimagining Governance in Practice: Benchmarking British Columbia’s Citizen Engagement Efforts.” The GovLab’s May 2013 report, made public today, makes clear that BC’s current practices to create a more open government, leverage citizen engagement to inform policy decisions, create new innovations, and provide improved public monitoring—though in many cases relatively new—are consistently among the strongest examples at either the provincial or national level.
According to Stefaan Verhulst, Chief of Research at the GovLab: “Our benchmarking study found that British Columbia’s various initiatives and experiments to create a more open and participatory governance culture has made it a leader in how to re-imagine governance. Leadership, along with the elimination of imperatives that may limit further experimentation, will be critical moving forward. And perhaps even more important, as with all initiatives to re-imaging governance worldwide, much more evaluation of what works, and why, will be needed to keep strengthening the value proposition behind the new practices and polices and provide proof-of-concept.”
See also our TheGovLab Blog.
The Value of Personal Data
The Digital Enlightenment Yearbook 2013 is dedicated this year to Personal Data: “The value of personal data has traditionally been understood in ethical terms as a safeguard for personality rights such as human dignity and privacy. However, we have entered an era where personal data are mined, traded and monetized in the process of creating added value – often in terms of free services including efficient search, support for social networking and personalized communications. This volume investigates whether the economic value of personal data can be realized without compromising privacy, fairness and contextual integrity. It brings scholars and scientists from the disciplines of computer science, law and social science together with policymakers, engineers and entrepreneurs with practical experience of implementing personal data management.
The resulting collection will be of interest to anyone concerned about privacy in our digital age, especially those working in the field of personal information management, whether academics, policymakers, or those working in the private sector.”
Using Big Data to Ask Big Questions
Chase Davis in the SOURCE: “First, let’s dispense with the buzzwords. Big Data isn’t what you think it is: Every federal campaign contribution over the last 30-plus years amounts to several tens of millions of records. That’s not Big. Neither is a dataset of 50 million Medicare records. Or even 260 gigabytes of files related to offshore tax havens—at least not when Google counts its data in exabytes. No, the stuff we analyze in pursuit of journalism and app-building is downright tiny by comparison.
But you know what? That’s ok. Because while super-smart Silicon Valley PhDs are busy helping Facebook crunch through petabytes of user data, they’re also throwing off intellectual exhaust that we can benefit from in the journalism and civic data communities. Most notably: the ability to ask Big Questions.
Most of us who analyze public data for fun and profit are familiar with small questions. They’re focused, incisive, and often have the kind of black-and-white, definitive answers that end up in news stories: How much money did Barack Obama raise in 2012? Is the murder rate in my town going up or down?
Big Questions, on the other hand, are speculative, exploratory, and systemic. As the name implies, they are also answered at scale: Rather than distilling a small slice of a dataset into a concrete answer, Big Questions look at entire datasets and reveal small questions you wouldn’t have thought to ask.
Can we track individual campaign donor behavior over decades, and what does that tell us about their influence in politics? Which neighborhoods in my city are experiencing spikes in crime this week, and are police changing patrols accordingly?
Or, by way of example, how often do interest groups propose cookie-cutter bills in state legislatures?
Looking at Legislation
Even if you don’t follow politics, you probably won’t be shocked to learn that lawmakers don’t always write their own bills. In fact, interest groups sometimes write them word-for-word.
Sometimes those groups even try to push their bills in multiple states. The conservative American Legislative Exchange Council has gotten some press, but liberal groups, social and business interests, and even sororities and fraternities have done it too.
On its face, something about elected officials signing their names to cookie-cutter bills runs head-first against people’s ideal of deliberative Democracy—hence, it tends to make news. Those can be great stories, but they’re often limited in scope to a particular bill, politician, or interest group. They’re based on small questions.
Data science lets us expand our scope. Rather than focusing on one bill, or one interest group, or one state, why not ask: How many model bills were introduced in all 50 states, period, by anyone, during the last legislative session? No matter what they’re about. No matter who introduced them. No matter where they were introduced.
Now that’s a Big Question. And with some basic data science, it’s not particularly hard to answer—at least at a superficial level.
Analyze All the Things!
Just for kicks, I tried building a system to answer this question earlier this year. It was intended as an example, so I tried to choose methods that would make intuitive sense. But it also makes liberal use of techniques applied often to Big Data analysis: k-means clustering, matrices, graphs, and the like.
If you want to follow along, the code is here….
To make exploration a little easier, my code represents similar bills in graph space, shown at the top of this article. Each dot (known as a node) represents a bill. And a line connecting two bills (known as an edge) means they were sufficiently similar, according to my criteria (a cosine similarity of 0.75 or above). Thrown into a visualization software like Gephi, it’s easy to click around the clusters and see what pops out. So what do we find?
There are 375 clusters in total. Because of the limitations of our data, many of them represent vague, subject-specific bills that just happen to have similar titles even though the legislation itself is probably very different (think things like “Budget Bill” and “Campaign Finance Reform”). This is where having full bill text would come handy.
But mixed in with those bills are a handful of interesting nuggets. Several bills that appear to be modeled after legislation by the National Conference of Insurance Legislators appear in multiple states, among them: a bill related to limited lines travel insurance; another related to unclaimed insurance benefits; and one related to certificates of insurance.”
Commons at the Intersection of Peer Production, Citizen Science, and Big Data: Galaxy Zoo
Defining Open Data
As the open data movement grows, and even more governments and organisations sign up to open data, it becomes ever more important that there is a clear and agreed definition for what “open data” means if we are to realise the full benefits of openness, and avoid the risks of creating incompatibility between projects and splintering the community.
Open can apply to information from any source and about any topic. Anyone can release their data under an open licence for free use by and benefit to the public. Although we may think mostly about government and public sector bodies releasing public information such as budgets or maps, or researchers sharing their results data and publications, any organisation can open information (corporations, universities, NGOs, startups, charities, community groups and individuals).
Read more about different kinds of data in our one page introduction to open data
There is open information in transport, science, products, education, sustainability, maps, legislation, libraries, economics, culture, development, business, design, finance …. So the explanation of what open means applies to all of these information sources and types. Open may also apply both to data – big data and small data – or to content, like images, text and music!
So here we set out clearly what open means, and why this agreed definition is vital for us to collaborate, share and scale as open data and open content grow and reach new communities.
What is Open?
The full Open Definition provides a precise definition of what open data is. There are 2 important elements to openness:
- Legal openness: you must be allowed to get the data legally, to build on it, and to share it. Legal openness is usually provided by applying an appropriate (open) license which allows for free access to and reuse of the data, or by placing data into the public domain.
- Technical openness: there should be no technical barriers to using that data. For example, providing data as printouts on paper (or as tables in PDF documents) makes the information extremely difficult to work with. So the Open Definition has various requirements for “technical openness,” such as requiring that data be machine readable and available in bulk.”…
The Science Behind Using Online Communities To Change Behavior
Sean D. Young in TechCrunch: “Although social media and online communities might have been developed for people to connect and share information, recent research shows that these technologies are really helpful in changing behaviors. My colleagues and I in the medical school, for instance, created online communities designed to improve health by getting people to do things, such as test for HIV, stop using methamphetamines, and just de-stress and relax. We don’t handpick people to join because we think they’ll love the technology; that’s not how science works. We invite them because the technology is relevant to them — they’re engaging in drugs, sex and other behaviors that might put themselves and others at risk. It’s our job to create the communities in a way that engages them enough to want to stay and participate. Yes, we do offer to pay them $30 to complete an hour-long survey, but then they are free to collect their money and never talk to us again. But for some reason, they stay in the group and decide to be actively engaged with strangers.
So how do we create online communities that keep people engaged and change their behaviors? Our starting point is to understand and address their psychological needs….
Throughout our research, we find that newly created online communities can change people’s behaviors by addressing the following psychological needs:
The Need to Trust. Sharing our thoughts, experiences, and difficulties with others makes us feel closer to others and increases our trust. When we trust people, we’re more open-minded, more willing to learn, and more willing to change our behavior. In our studies, we found that sharing personal information (even something as small as describing what you did today) can help increase trust and change behavior.
The Need to Fit In. Most of us inherently strive to fit in. Social norms, or other people’s attitudes and behaviors, heavily influence our own attitudes and behaviors. Each time a new online community or group forms, it creates its own set of social norms and expectations for how people should behave. Most people are willing to change their attitudes and/or behavior to fit these group norms and fit in with the community.
The Need for Self-Worth. When people feel good about themselves, they are more open to change and feel empowered to be able to change their behavior. When an online community is designed to have people support and care for each other, they can help to increase self-esteem.
The Need to Be Rewarded for Good Behavior. Anyone who has trained a puppy knows that you can get him to keep sitting as long as you keep the treats flowing to reward him, but if you want to wean him off the treats and really train him then you’ll need to begin spacing out the treats to make them less predictable. Well, people aren’t that different from animals in that way and can be trained with reinforcements too. For example, “liking” people’s communications when they immediately join a network, and then progressively spacing out the time that their posts are liked (psychologists call this variable reinforcement) can be incorporated onto social network platforms to encourage them to keep posting content. Eventually, these behaviors become habits.
The Need to Feel Empowered. While increasing self-esteem makes people feel good about themselves, increasing empowerment helps them know they have the ability to change. Creating a sense of empowerment is one of the most powerful predictors of whether people will change their behavior. Belonging to a network of people who are changing their own behaviors, support our needs, and are confident in our changing our behavior empowers us and gives us the ability to change our behavior.”
Imagining Data Without Division
Thomas Lin in Quanta Magazine: “As science dives into an ocean of data, the demands of large-scale interdisciplinary collaborations are growing increasingly acute…Seven years ago, when David Schimel was asked to design an ambitious data project called the National Ecological Observatory Network, it was little more than a National Science Foundation grant. There was no formal organization, no employees, no detailed science plan. Emboldened by advances in remote sensing, data storage and computing power, NEON sought answers to the biggest question in ecology: How do global climate change, land use and biodiversity influence natural and managed ecosystems and the biosphere as a whole?…
For projects like NEON, interpreting the data is a complicated business. Early on, the team realized that its data, while mid-size compared with the largest physics and biology projects, would be big in complexity. “NEON’s contribution to big data is not in its volume,” said Steve Berukoff, the project’s assistant director for data products. “It’s in the heterogeneity and spatial and temporal distribution of data.”
Unlike the roughly 20 critical measurements in climate science or the vast but relatively structured data in particle physics, NEON will have more than 500 quantities to keep track of, from temperature, soil and water measurements to insect, bird, mammal and microbial samples to remote sensing and aerial imaging. Much of the data is highly unstructured and difficult to parse — for example, taxonomic names and behavioral observations, which are sometimes subject to debate and revision.
And, as daunting as the looming data crush appears from a technical perspective, some of the greatest challenges are wholly nontechnical. Many researchers say the big science projects and analytical tools of the future can succeed only with the right mix of science, statistics, computer science, pure mathematics and deft leadership. In the big data age of distributed computing — in which enormously complex tasks are divided across a network of computers — the question remains: How should distributed science be conducted across a network of researchers?
Part of the adjustment involves embracing “open science” practices, including open-source platforms and data analysis tools, data sharing and open access to scientific publications, said Chris Mattmann, 32, who helped develop a precursor to Hadoop, a popular open-source data analysis framework that is used by tech giants like Yahoo, Amazon and Apple and that NEON is exploring. Without developing shared tools to analyze big, messy data sets, Mattmann said, each new project or lab will squander precious time and resources reinventing the same tools. Likewise, sharing data and published results will obviate redundant research.
To this end, international representatives from the newly formed Research Data Alliance met this month in Washington to map out their plans for a global open data infrastructure.”
Using Participatory Crowdsourcing in South Africa to Create a Safer Living Environment
The study illustrates how participatory crowdsourcing (specifically humans as sensors) can be used as a Smart City initiative focusing on public safety by illustrating what is required to contribute to the Smart City, and developing a roadmap in the form of a model to assist decision making when selecting an optimal crowdsourcing initiative. Public safety data quality criteria were developed to assess and identify the problems affecting data quality.
This study is guided by design science methodology and applies three driving theories: the Data Information Knowledge Action Result (DIKAR) model, the characteristics of a Smart City, and a credible Data Quality Framework. Four critical success factors were developed to ensure high quality public safety data is collected through participatory crowdsourcing utilising voice technologies.”
Mobile phone data are a treasure-trove for development
Paul van der Boor and Amy Wesolowski in SciDevNet: “Each of us generates streams of digital information — a digital ‘exhaust trail’ that provides real-time information to guide decisions that affect our lives. For example, Google informs us about traffic by using both its ‘My Location’ feature on mobile phones and third-party databases to aggregate location data. BBVA, one of Spain’s largest banks, analyses transactions such as credit card payments as well as ATM withdrawals to find out when and where peak spending occurs.This type of data harvest is of great value. But, often, there is so much data that its owners lack the know-how to process it and fail to realise its potential value to policymakers.
Meanwhile, many countries, particularly in the developing world, have a dearth of information. In resource-poor nations, the public sector often lives in an analogue world where piles of paper impede operations and policymakers are hindered by uncertainty about their own strengths and capabilities.Nonetheless, mobile phones have quickly pervaded the lives of even the poorest: 75 per cent of the world’s 5.5 billion mobile subscriptions are in emerging markets. These people are also generating digital trails of anything from their movements to mobile phone top-up patterns. It may seem that putting this information to use would take vast analytical capacity. But using relatively simple methods, researchers can analyse existing mobile phone data, especially in poor countries, to improve decision-making.
Think of existing, available data as low-hanging fruit that we — two graduate students — could analyse in less than a month. This is not a test of data-scientist prowess, but more a way of saying that anyone could do it.
There are three areas that should be ‘low-hanging fruit’ in terms of their potential to dramatically improve decision-making in information-poor countries: coupling healthcare data with mobile phone data to predict disease outbreaks; using mobile phone money transactions and top-up data to assess economic growth; and predicting travel patterns after a natural disaster using historical movement patterns from mobile phone data to design robust response programmes.
Another possibility is using call-data records to analyse urban movement to identify traffic congestion points. Nationally, this can be used to prioritise infrastructure projects such as road expansion and bridge building.
The information that these analyses could provide would be lifesaving — not just informative or revenue-increasing, like much of this work currently performed in developed countries.
But some work of high social value is being done. For example, different teams of European and US researchers are trying to estimate the links between mobile phone use and regional economic development. They are using various techniques, such as merging night-time satellite imagery from NASA with mobile phone data to create behavioural fingerprints. They have found that this may be a cost-effective way to understand a country’s economic activity and, potentially, guide government spending.
Another example is given by researchers (including one of this article’s authors) who have analysed call-data records from subscribers in Kenya to understand malaria transmission within the country and design better strategies for its elimination. [1]
In this study, published in Science, the location data of the mobile phones of more than 14 million Kenyan subscribers was combined with national malaria prevalence data. After identifying the sources and sinks of malaria parasites and overlaying these with phone movements, analysis was used to identify likely transmission corridors. UK scientists later used similar methods to create different epidemic scenarios for the Côte d’Ivoire.”