Stefaan Verhulst

Effect of Government Data Openness on a Knowledge-based Economy

Curated on August 9, 2016August 3, 2018 by Stefaan Verhulst

Jae-Nam Lee, Juyeon Ham and Byounggu Choi at Procedia Computer Science: “Many governments have recently begun to adopt the concept of open innovation. However, studies on the openness of government data and its effect on the global competitiveness have not received much attention. Therefore, this study aims to investigate the effects of government data openness on a knowledge-based economy at the government level. The proposed model was analyzed using secondary data collected from three different reports. The findings indicate that government data openness positively affects the formation of knowledge bases in a country and that the level of knowledge base of a country positively affects the global competitiveness of a country….(More)”

Taking a More Sophisticated Look at Human Beings

Curated on August 9, 2016October 10, 2018 by Stefaan Verhulst

Nathan Collins at Pacific Standard: “Are people fundamentally selfish, or are they cooperators? Actually, it’s kind of an odd question—after all, why are those the only options? The answer is that those options are derived in large part from philosophy and classical economic theory, rather than data. In a new paper, researchers have flipped the script, using observations of simple social situations to show that optimism, pessimism, envy, and trust, rather than selfishness and sacrifice, are the basic ingredients of our behavior.

That conclusion advances wider “efforts toward the identification of basic behavioral phenotypes,” or categories of behavior, and the results could be usefully applied in social science, policy, and business, Julia Poncela-Casasnovas and her colleagues write in Science Advances.

Classical economic theory has something of a bad reputation these days, and not without reason. For one thing, most economic theory assumes people are rational, in the sense that they are strategic and maximize their payoffs in all that they do. The list of objections to that approach is long and well-documented, but there’s a counter objection—amid a slew of objections and anecdotes, there’s little in the way of a cohesive alternative theory.

Optimism, pessimism, envy, and trust are the basic ingredients of our behavior.

Poncela-Casasnovas and her colleagues’ experiments are, they hope, a step toward such a theory. Their idea was to put ordinary people in simple social situations with economic tradeoffs, observe how those people act, and then construct a data-driven classification of their behavior…. Using standard statistical methods, the researchers identified four such player types: optimists (20 percent), who always go for the highest payoff, hoping the other player will coordinate to achieve that goal; pessimists (30 percent), who act according to the opposite assumption; the envious (21 percent), who try to score more points than their partners; and the trustful (17 percent), who always cooperate. The remaining 12 percent appeared to make their choices completely at random.

Those results don’t yet add up to anything like a theory of human behavior, but they do “open the door to making relevant advances in a number of directions,” the authors write. In particular, the researchers hope their results will help explain behavior in other simple games, and aid those hoping to understand how people may respond to new policy initiatives….(More)”

An investigation of unpaid crowdsourcing

Curated on August 9, 2016August 3, 2018 by Stefaan Verhulst

Chapter by Ria Mae Borromeo and Motomichi Toyama in Human-centric Computing and Information Sciences: “The continual advancement of internet technologies has led to the evolution of how individuals and organizations operate. For example, through the internet, we can now tap a remote workforce to help us accomplish certain tasks, a phenomenon called crowdsourcing. Crowdsourcing is an approach that relies on people to perform activities that are costly or time-consuming using traditional methods. Depending on the incentive given to the crowd workers, crowdsourcing can be classified as paid or unpaid. In paid crowdsourcing, the workers are incentivized financially, enabling the formation of a robust workforce, which allows fast completion of tasks. Consequently, in unpaid crowdsourcing, the lack of financial incentive potentially leads to an unpredictable workforce and indeterminable task completion time. However, since payment to workers is not necessary, it can be an economical alternative for individuals and organizations who are more concerned about the budget than the task turnaround time. In this study, we explore unpaid crowdsourcing by reviewing crowdsourcing applications where the crowd comes from a pool of volunteers. We also evaluate its performance in sentiment analysis and data extraction projects. Our findings suggest that for such tasks, unpaid crowdsourcing completes slower but yields results of similar or higher quality compared to its paid counterpart…(More)”

The Ethics of Biomedical Big Data

Curated on August 9, 2016August 3, 2018 by Stefaan Verhulst

Book edited by Mittelstadt, Brent Daniel, and Floridi, Luciano: “This book presents cutting edge research on the new ethical challenges posed by biomedical Big Data technologies and practices. ‘Biomedical Big Data’ refers to the analysis of aggregated, very large datasets to improve medical knowledge and clinical care. The book describes the ethical problems posed by aggregation of biomedical datasets and re-use/re-purposing of data, in areas such as privacy, consent, professionalism, power relationships, and ethical governance of Big Data platforms. Approaches and methods are discussed that can be used to address these problems to achieve the appropriate balance between the social goods of biomedical Big Data research and the safety and privacy of individuals. Seventeen original contributions analyse the ethical, social and related policy implications of the analysis and curation of biomedical Big Data, written by leading experts in the areas of biomedical research, medical and technology ethics, privacy, governance and data protection. The book advances our understanding of the ethical conundrums posed by biomedical Big Data, and shows how practitioners and policy-makers can address these issues going forward….(More)”

Exploring Online Engagement in Public Policy Consultation: The Crowd or the Few?

Curated on August 9, 2016October 24, 2018 by Stefaan Verhulst

Helen K. Liu in Australian Journal of Public Administration: “Governments are increasingly adopting online platforms to engage the public and allow a broad and diverse group of citizens to participate in the planning of government policies. To understand the role of crowds in the online public policy process, we analyse participant contributions over time in two crowd-based policy processes, the Future Melbourne wiki and the Open Government Dialogue. Although past evaluations have shown the significance of public consultations by expanding the engaged population within a short period of time, our empirical case studies suggest that a small number of participants contribute a disproportionate share of ideas and opinions. We discuss the implications of our initial examination for the future design of engagement platforms….(More)”

5 Crowdsourced News Platforms Shaping The Future of Journalism and Reporting

Curated on August 9, 2016August 20, 2018 by Stefaan Verhulst

Maria Krisette Capati at Crowdsourcing Week: “We are exposed to a myriad of news and updates worldwide. As the crowd becomes moreinvolved in providing information, adopting that ‘upload mindset’ coined by Will Merritt ofZooppa, access to all kinds of data is a few taps and clicks away….

Google News Lab – Better reporting and insightful storytelling

Last week, Google announced its own crowdsourced news platform dubbed News Lab as part of their efforts “to empower innovation at the intersection of technology and media.”

Scouting for real-time stories, updates, and breaking news is much easier and systematize for journalists worldwide. They can use Google’s tools for better reporting, data for insightful storytelling and programs to focus on the future of media, tackling this initiative in three ways.

“There’s a revolution in data journalism happening in newsrooms today, as more data sets and more tools for analysis are allowing journalists to create insights that were never before possible,” Google said.

Grasswire – first-hand information in real-time

The design looks bleak and simple, but the site itself is rich with content—first-hand information crowdsourced from Twitter users in real-time and verified. Austen Allred, co-founder of Grasswire was inspired to develop the platform after his “minor slipup” as the American Journalism Review (AJR) puts it, when he missed his train out of Shanghai that actually saved his life.

“The bullet train Allred was supposed to be on collided with another train in the Wenzhou area ofChina’s Zhejiang province,” AJR wrote. “Of the 1,630 passengers, 40 died, and another 210 were injured.” The accident happened in 2011. Unfortunately, the Chinese government made some cover upon the incident, which frustrated Allred in finding first-hand information.

After almost four years, Grasswire was launched, a website that collects real-time information from users for breaking news infused with crowdsourcing model afterward. “It’s since grown into a more complex interface, allowing users to curate selected news tweets by voting and verifying information with a fact-checking system,” AJR wrote, which made the verification of data open and systematized.

Rappler – Project Agos: a technology for disaster risk reduction

The Philippines is a favorite hub for typhoons. The aftermath of typhoon Haiyan was exceedingly disastrous. But the crowds were steadfast in uploading and sharing information and crowdsourcing became mainstream during the relief operations. Maria Ressa said that they had to educate netizens to use the appropriate hashtags for years (#nameoftyphoonPH, e.g. #YolandaPH) for typhoons to collect data on social media channels easily.

Education and preparation can mitigate the risks and save lives if we utilize the right technology and act accordingly. In her blog, After Haiyan: Crisis management and beyond, Maria wrote, “We need to educate not just the first responders and local government officials, but more importantly, the people in the path of the storms.” …

China’s CCDI app – Crowdsourcing political reports to crack down corruption practices

In China, if you want to mitigate or possible, eradicate corrupt practices, then there’s an app for that.China launched its own anti-corruption app called, Central Commission for Discipline InspectionWebsite App, allowing the public to upload text messages, photos and videos of Chinese officials’ any corrupt practices.

The platform was released by the government agency, Central Committee for Discipline Inspection.Nervous in case you’ll be tracked as a whistleblower? Interestingly, anyone can report anonymously.China Daily said, “the anti-corruption authorities received more than 1,000 public reports, and nearly70 percent were communicated via snapshots, text messages or videos uploaded,” since its released.Kenya has its own version, too, called Ushahidi using crowdmapping, and India’s I Paid a Bribe.

Newzulu – share news, publish and get paid

While journalists can get fresh insights from Google News Labs, the crowd can get real-time verified news from Grasswire, and CCDI is open for public, Newzulu crowdsourced news platforms doesn’t just invite the crowd to share news, they can also publish and get paid.

It’s “a community of over 150,000 professional and citizen journalists who share and break news to the world as it happens,” originally based in Sydney. Anyone can submit stories, photos, videos, and even stream live….(More)”

Crowdfunding for Sustainable Entrepreneurship and Innovation

Curated on August 7, 2016August 3, 2018 by Stefaan Verhulst

Book edited by Walter Vassallo: “Crowdfunding for Sustainable Entrepreneurship and Innovation is a pivotal reference source for the latest scholarly research and business practices on the opportunities and benefits gained from the use of crowdfunding in modern society, discussing its socio-economic impact, in addition to its business implications. Featuring current trends and future directions for crowdfunding initiatives, this book is ideally designed for students, researchers, practitioners, entrepreneurs, and policy makers.

New financing models such as crowdfunding are democratizing access to credit, offering individuals and communities the opportunity to support, co-create, contribute and invest in public and private initiatives. This book relates to innovation in its essence to anticipate future needs and in creating new business models without losing revenue. There are tremendous unexplored opportunities in crowdsourcing and crowdfunding; two sides of the same coin that can lead to a revolution of current social and economic models. The reading of this book will provide insight on the changes taking place in crowdfunding, and offer strategic opportunities and advantages….(More)”

Open Data for Social Change and Sustainable Development

Curated on August 5, 2016August 3, 2018 by Stefaan Verhulst

Special issue of the Journal of Community Informatics edited by Raed M. Sharif and Francois Van Schalkwyk: “As the second phase of the Emerging Impacts of Open Data in Developing Countries (ODDC) drew to a close, discussions started on a possible venue for publishing some of the papers that emerged from the research conducted by the project partners. In 2012 the Journal of Community Informatics published a special issue titled ‘Community Informatics and Open Government Data’. Given the journal’s previous interest in the field of open data, its established reputation and the fact that it is a peer-reviewed open access journal, the Journal of Community Informatics was approached and agreed to a second special issue with a focus on open data. A closed call for papers was sent out to the project research partners. Shortly afterwards, the first Open Data Research Symposium was held ahead of the International Open Data Conference 2015 in Ottawa, Canada. For the first time, a forum was provided to academics and researchers to present papers specifically on open data. Again there were discussions about an appropriate venue to publish selected papers from the Symposium. The decision was taken by the Symposium Programme Committee to invite the twenty plus presenters to submit full papers for consideration in the special issue.

The seven papers published in this special issue are those that were selected through a double-blind peer review process. Researchers are often given a rough ride by open data advocates – the research community is accused of taking too long, not being relevant enough and of speaking in tongues unintelligible to social movements and policy-makers. And yet nine years after the ground-breaking meeting in Sebastopol at which the eight principles of open government data were penned, seven after President Obama injected political legitimacy into a movement, and five after eleven nation states formed the global Open Government Partnership (OGP), which has grown six-fold in membership; an email crosses our path in which the authors of a high-level report commit to developing a comprehensive understanding of a continental open data ecosystem through an examination of open data supply. Needless to say, a single example is not necessarily representative of global trends in thinking about open data. Yet, the focus on government and on the supply of open data by open data advocates – with little consideration of open data use, the differentiation of users, intermediaries, power structures or the incentives that propel the evolution of ecosystems – is still all too common. Empirical research has already revealed the limitations of ‘supply it and they will use it’ open data practices, and has started to fill critical knowledge gaps to develop a more holistic understanding of the determinants of effective open data policy and practice. As open data policies and practices evolve, the need to capture the dynamics of this evolution and to trace unfolding outcomes becomes critical to advance a more efficient and progressive field of research and practice. The trajectory of the existing body of literature on open data and the role of public authorities, both local and national, in the provision of open data

As open data policies and practices evolve, the need to capture the dynamics of this evolution and to trace unfolding outcomes becomes critical to advance a more efficient and progressive field of research and practice. The trajectory of the existing body of literature on open data and the role of public authorities, both local and national, in the provision of open data is logical and needed in light of the central role of government in producing a wide range of types and volumes of data. At the same time, the complexity of open data ecosystem and the plethora of actors (local, regional and global suppliers, intermediaries and users) makes a compelling case for opening avenues for more diverse discussion and research beyond the supply of open data. The research presented in this special issue of the Journal of Community Informatics touches on many of these issues, sets the pace and contributes to the much-needed knowledge base required to promote the likelihood of open data living up to its promise. … (More)”

Legal confusion threatens to slow data science

Curated on August 5, 2016August 3, 2018 by Stefaan Verhulst

Simon Oxenham in Nature: “Knowledge from millions of biological studies encoded into one network — that is Daniel Himmelstein’s alluring description of Hetionet, a free online resource that melds data from 28 public sources on links between drugs, genes and diseases. But for a product built on public information, obtaining legal permissions has been surprisingly tough.

Menche rapidly gave consent — but not everyone was so helpful. One research group never replied to Himmelstein, and three replied without clearing up the legal confusion. Ultimately, Himmelstein published the final version of Hetionet in July — minus one data set whose licence forbids redistribution, but including the three that he still lacks clear permission to republish. The tangle shows that many researchers don’t understand that simply posting a data set publicly doesn’t mean others can legally republish it, says Himmelstein.

The confusion has the power to slow down science, he says, because researchers will be discouraged from combining data sets into more useful resources. It will also become increasingly problematic as scientists publish more information online. “Science is becoming more and more dependent on reusing data,” Himmelstein says….

Himmelstein is not convinced that he is legally in the clear — and feels that such uncertainty may deter other scientists from reproducing academic data. If a researcher launches a commercial product that is based on public data sets, he adds, the stakes of not having clear licensing are likely to rise. “I think these are largely untested waters, and most academics aren’t in the position to risk setting off a legal battle that will help clarify these issues,” he says….(More)”

Revealing Algorithmic Rankers

Curated on August 5, 2016October 9, 2018 by Stefaan Verhulst

Julia Stoyanovich and Ellen P. Goodman in the Freedom to Tinker Blog: “ProPublica’s story on “machine bias” in an algorithm used for sentencing defendants amplified calls to make algorithms more transparent and accountable. It has never been more clear that algorithms are political (Gillespie) and embody contested choices (Crawford), and that these choices are largely obscured from public scrutiny (Pasquale and Citron). We see it in controversies over Facebook’s newsfeed, or Google’s search results, or Twitter’s trending topics. Policymakers are considering how to operationalize “algorithmic ethics” and scholars are calling for accountable algorithms (Kroll, et al.).

One kind of algorithm that is at once especially obscure, powerful, and common is the ranking algorithm (Diakopoulos). Algorithms rank individuals to determine credit worthiness, desirability for college admissions and employment, and compatibility as dating partners. They encode ideas of what counts as the best schools, neighborhoods, and technologies. Despite their importance, we actually can know very little about why this person was ranked higher than another in a dating app, or why this school has a better rank than that one. This is true even if we have access to the ranking algorithm, for example, if we have complete knowledge about the factors used by the ranker and their relative weights, as is the case for US News ranking of colleges. In this blog post, we argue that syntactic transparency, wherein the rules of operation of an algorithm are more or less apparent, or even fully disclosed, still leaves stakeholders in the dark: those who are ranked, those who use the rankings, and the public whose world the rankings may shape.

Using algorithmic rankers as an example, we argue that syntactic transparency alone will not lead to true algorithmic accountability (Angwin). This is true even if the complete input data is publicly available. We advocate instead for interpretability, which rests on making explicit the interactions between the program and the data on which it acts. An interpretable algorithm allows stakeholders to understand the outcomes, not merely the process by which outcomes were produced….

Opacity in algorithmic rankers can lead to four types of harms:

(1) Due process / fairness. The subjects of the ranking cannot have confidence that their ranking is meaningful or correct, or that they have been treated like similarly situated subjects. Syntactic transparency helps with this but it will not solve the problem entirely, especially when people cannot interpret how weighted factors have impacted the outcome (Source 2 above).

(2) Hidden normative commitments. A ranking formula implements some vision of the “good.” Unless the public knows what factors were chosen and why, and with what weights assigned to each, it cannot assess the compatibility of this vision with other norms. Even where the formula is disclosed, real public accountability requires information about whether the outcomes are stable, whether the attribute weights are meaningful, and whether the outcomes are ultimately validated against the chosen norms. Did the vendor evaluate the actual effect of the features that are postulated as important by the scoring / ranking mode? Did the vendor take steps to compensate for mutually-reinforcing correlated inputs, and for possibly discriminatory inputs? Was stability of the ranker interrogated on real or realistic inputs? This kind of transparency around validation is important for both learning algorithms which operate according to rules that are constantly in flux and responsive to shifting data inputs, and for simpler score-based rankers that are likewise sensitive to the data.

(3) Interpretability. Especially where ranking algorithms are performing a public function (e.g., allocation of public resources or organ donations) or directly shaping the public sphere (e.g., ranking politicians), political legitimacy requires that the public be able to interpret algorithmic outcomes in a meaningful way. At the very least, they should know the degree to which the algorithm has produced robust results that improve upon a random ordering of the items (a ranking-specific confidence measure). In the absence of interpretability, there is a threat to public trust and to democratic participation, raising the dangers of an algocracy (Danaher) – rule by incontestable algorithms.

(4) Meta-methodological assessment. Following on from the interpretability concerns is a meta question about whether a ranking algorithm is the appropriate method for shaping decisions. There are simply some domains, and some instances of datasets, in which rank order is not appropriate. For example, if there are very many ties or near-ties induced by the scoring function, or if the ranking is too unstable, it may be better to present data through an alternative mechanism such as clustering. More fundamentally, we should question the use of an algorithmic process if its effects are not meaningful or if it cannot be explained. In order to understand whether the ranking methodology is valid, as a first order question, the algorithmic process needs to be interpretable….

The Ranking Facts show how the properties of the 10 highest-ranked items compare to the entire dataset (Relativity), making explicit cases where the ranges of values, and the median value, are different at the top-10 vs. overall (median is marked with red triangles for faculty size and average publication count). The label lists the attributes that have most impact on the ranking (Impact), presents the scoring formula (if known), and explains which attributes correlate with the computed score. Finally, the label graphically shows the distribution of scores (Stability), explaining that scores differ significantly up to top-10 but are nearly indistinguishable in later positions.

Something like the Rankings Facts makes the process and outcome of algorithmic ranking interpretable for consumers, and reduces the likelihood of opacity harms, discussed above. Beyond Ranking Facts, it is important to develop Interpretability tools that enable vendors to design fair, meaningful and stable ranking processes, and that support external auditing. Promising technical directions include, e.g., quantifying the influence of various features on the outcome under different assumptions about availability of data and code, and investigating whether provenance techniques can be used to generate explanations….(More)”