CrowdFlower Launches Open Data Project


Anthony Ha at Techcrunch: “Crowdsourcing company CrowdFlower allows businesses to tap into a distributed workforce of 5 million contributors for basic tasks like sentiment analysis. Today it’s releasing some of that data to the public through its new Data for Everyone initiative…. hope is to turn CrowdFlower into a central repository where open data can be found by researchers and entrepreneurs. (Factual was another startup trying to become a hub for open data, though in recent years, it’s become more focused on gathering location data to power mobile ads.)…

As for the data that’s available now, …There’s a lot of Twitter sentiment analysis covering things like from attitudes towards brands and products, yogurt (?), and climate change. Among the more recent data sets, I was particularly taken in the gender breakdown of who’s been on the cover of Time magazine and, yes, the analysis of who thought the dress (you know the one) was gold and white versus blue and black…. (More)”

Crowdsourcing America’s cybersecurity is an idea so crazy it might just work


at the Washington Post: “One idea that’s starting to bubble up from Silicon Valley is the concept of crowdsourcing cybersecurity. As Silicon Valley venture capitalist Robert R. Ackerman, Jr. has pointed out, due to “the interconnectedness of our society in cyberspace,” cyber networks are best viewed as an asset that we all have a shared responsibility to protect. Push on that concept hard enough and you can see how many of the core ideas from Silicon Valley – crowdsourcing, open source software, social networking, and the creative commons – can all be applied to cybersecurity.

Silicon Valley venture capitalists are already starting to fund companies that describe themselves as crowdsourcing cybersecurity. For example, take Synack, a “crowd security intelligence” company that received $7.5 million in funding from Kleiner Perkins (one of Silicon Valley’s heavyweight venture capital firms), Allegis Ventures, and Google Ventures in 2014. Synack’s two founders are ex-NSA employees, and they are using that experience to inform an entirely new type of business model. Synack recruits and vets a global network of “white hat hackers,” and then offers their services to companies worried about their cyber networks. For a fee, these hackers are able to find and repair any security risks.

So how would crowdsourced national cybersecurity work in practice?

For one, there would be free and transparent sharing of computer code used to detect cyber threats between the government and private sector. In December, the U.S. Army Research Lab added a bit of free source code, a “network forensic analysis network” known as Dshell, to the mega-popular code sharing site GitHub. Already, there have been 100 downloads and more than 2,000 unique visitors. The goal, says William Glodek of the U.S. Army Research Laboratory, is for this shared code to “help facilitate the transition of knowledge and understanding to our partners in academia and industry who face the same problems.”

This open sourcing of cyber defense would be enhanced with a scaled-up program of recruiting “white hat hackers” to become officially part of the government’s cybersecurity efforts. Popular annual events such as the DEF CON hacking conference could be used to recruit talented cyber sleuths to work alongside the government.

There have already been examples of communities where people facing a common cyber threat gather together to share intelligence. Perhaps the best-known example is the Conficker Working Group, a security coalition that was formed in late 2008 to share intelligence about malicious Conficker malware. Another example is the Financial Services Information Sharing and Analysis Center, which was created by presidential mandate in 1998 to share intelligence about cyber threats to the nation’s financial system.

Of course, there are some drawbacks to this crowdsourcing idea. For one, such a collaborative approach to cybersecurity might open the door to government cyber defenses being infiltrated by the enemy. Ackerman makes the point that you never really know who’s contributing to any community. Even on a site such as Github, it’s theoretically possible that an ISIS hacker or someone like Edward Snowden could download the code, reverse engineer it, and then use it to insert “Trojan Horses” intended for military targets into the code….  (More)

If Data Sharing is the Answer, What is the Question?


Christine L. Borgman at ERCIM News: “Data sharing has become policy enforced by governments, funding agencies, journals, and other stakeholders. Arguments in favor include leveraging investments in research, reducing the need to collect new data, addressing new research questions by reusing or combining extant data, and reproducing research, which would lead to greater accountability, transparency, and less fraud. Arguments against data sharing rarely are expressed in public fora, so popular is the idea. Much of the scholarship on data practices attempts to understand the socio-technical barriers to sharing, with goals to design infrastructures, policies, and cultural interventions that will overcome these barriers.
However, data sharing and reuse are common practice in only a few fields. Astronomy and genomics in the sciences, survey research in the social sciences, and archaeology in the humanities are the typical exemplars, which remain the exceptions rather than the rule. The lack of success of data sharing policies, despite accelerating enforcement over the last decade, indicates the need not just for a much deeper understanding of the roles of data in contemporary science but also for developing new models of scientific practice. Science progressed for centuries without data sharing policies. Why is data sharing deemed so important to scientific progress now? How might scientific practice be different if these policies were in place several generations ago?
Enthusiasm for “big data” and for data sharing are obscuring the complexity of data in scholarship and the challenges for stewardship. Data practices are local, varying from field to field, individual to individual, and country to country. Studying data is a means to observe how rapidly the landscape of scholarly work in the sciences, social sciences, and the humanities is changing. Inside the black box of data is a plethora of research, technology, and policy issues. Data are best understood as representations of observations, objects, or other entities used as evidence of phenomena for the purposes of research or scholarship. Rarely do they stand alone, separable from software, protocols, lab and field conditions, and other context. The lack of agreement on what constitutes data underlies the difficulties in sharing, releasing, or reusing research data.
Concerns for data sharing and open access raise broader questions about what data to keep, what to share, when, how, and with whom. Open data is sometimes viewed simply as releasing data without payment of fees. In research contexts, open data may pose complex issues of licensing, ownership, responsibility, standards, interoperability, and legal harmonization. To scholars, data can be assets, liabilities, or both. Data have utilitarian value as evidence, but they also serve social and symbolic purposes for control, barter, credit, and prestige. Incentives for scientific advancement often run counter to those for sharing data.
….
Rather than assume that data sharing is almost always a “good thing” and that doing so will promote the progress of science, more critical questions should be asked: What are the data? What is the utility of sharing or releasing data, and to whom? Who invests the resources in releasing those data and in making them useful to others? When, how, why, and how often are those data reused? Who benefits from what kinds of data transfer, when, and how? What resources must potential re-users invest in discovering, interpreting, processing, and analyzing data to make them reusable? Which data are most important to release, when, by what criteria, to whom, and why? What investments must be made in knowledge infrastructures, including people, institutions, technologies, and repositories, to sustain access to data that are released? Who will make those investments, and for whose benefit?
Only when these questions are addressed by scientists, scholars, data professionals, librarians, archivists, funding agencies, repositories, publishers, policy makers, and other stakeholders in research will satisfactory answers arise to the problems of data sharing…(More)”.

Breaking Public Administrations’ Data Silos. The Case of Open-DAI, and a Comparison between Open Data Platforms.


Paper by Raimondo Iemma, Federico Morando, and Michele Osella: “An open reuse of public data and tools can turn the government into a powerful ‘platform’ also involving external innovators. However, the typical information system of a public agency is not open by design. Several public administrations have started adopting technical solutions to overcome this issue, typically in the form of middleware layers operating as ‘buses’ between data centres and the outside world. Open-DAI is an open source platform designed to expose data as services, directly pulling from legacy databases of the data holder. The platform is the result of an ongoing project funded under the EU ICT PSP call 2011. We present the rationale and features of Open-DAI, also through a comparison with three other open data platforms: the Socrata Open Data portal, CKAN, and ENGAGE….(More)”

US government and private sector developing ‘precrime’ system to anticipate cyber-attacks


Martin Anderson at The Stack: “The USA’s Office of the Director of National Intelligence (ODNI) is soliciting the involvement of the private and academic sectors in developing a new ‘precrime’ computer system capable of predicting cyber-incursions before they happen, based on the processing of ‘massive data streams from diverse data sets’ – including social media and possibly deanonymised Bitcoin transactions….
At its core the predictive technologies to be developed in association with the private sector and academia over 3-5 years are charged with the mission ‘to invest in high-risk/high-payoff research that has the potential to provide the U.S. with an overwhelming intelligence advantage over our future adversaries’.
The R&D program is intended to generate completely automated, human-free prediction systems for four categories of event: unauthorised access, Denial of Service (DoS), malicious code and scans and probes which are seeking access to systems.
The CAUSE project is an unclassified program, and participating companies and organisations will not be granted access to NSA intercepts. The scope of the project, in any case, seems focused on the analysis of publicly available Big Data, including web searches, social media exchanges and trawling ungovernable avalanches of information in which clues to future maleficent actions are believed to be discernible.
Program manager Robert Rahmer says: “It is anticipated that teams will be multidisciplinary and might include computer scientists, data scientists, social and behavioral scientists, mathematicians, statisticians, content extraction experts, information theorists, and cyber-security subject matter experts having applied experience with cyber capabilities,”
Battelle, one of the concerns interested in participating in CAUSE, is interested in employing Hadoop and Apache Spark as an approach to the data mountain, and includes in its preliminary proposal an intent to ‘de-anonymize Bitcoin sale/purchase activity to capture communication exchanges more accurately within threat-actor forums…’.
Identifying and categorising quality signal in the ‘white noise’ of Big Data is a central plank in CAUSE, and IARPA maintains several offices to deal with different aspects of it. Its pointedly-named ‘Office for Anticipating Surprise’  frames the CAUSE project best, since it initiated it. The OAS is occupied with ‘Detecting and forecasting the emergence of new technical capabilities’, ‘Early warning of social and economic crises, disease outbreaks, insider threats, and cyber attacks’ and ‘Probabilistic forecasts of major geopolitical trends and rare events’.
Another concerned department is The Office of Incisive Analysis, which is attempting to break down the ‘data static’ problem into manageable mission stages:
1) Large data volumes and varieties – “Providing powerful new sources of information from massive, noisy data that currently overwhelm analysts”
2) Social-Cultural and Linguistic Factors – “Analyzing language and speech to produce insights into groups and organizations. “
3) Improving Analytic Processes – “Dramatic enhancements to the analytic process at the individual and group level. “
The Office of Smart Collection develops ‘new sensor and transmission technologies, with the seeking of ‘Innovative approaches to gain access to denied environments’ as part of its core mission, while the Office of Safe and Secure Operations concerns itself with ‘Revolutionary advances in science and engineering to solve problems intractable with today’s computers’.
The CAUSE program, which attracted 150 developers, organisations, academics and private companies to the initial event, will announce specific figures about funding later in the year, and practice ‘predictions’ from participants will begin in the summer, in an accelerating and stage-managed program over five years….(More)”

The Power of Heuristics


ideas42: “People are presented with many choices throughout their day, from what to have for lunch to where to go on vacation to how much money to save for emergencies. In many situations, this ability to choose enhances our lives. However, having too many choices can sometimes feel like a burden, especially if the choices are complex or the decisions we’re making are important. In these instances, we often make poor decisions, or sometimes even fail to choose at all. This can create real problems, for example when people fail to save enough for retirement or don’t make the right choices when it comes to staying healthy.
So why is it that so much effort has been spent trying to improve decision-making by giving people even more information about the choices available – often complicating the choice even further?
In a new paper by ideas42, ideas42 co-founder Antoinette Schoar of MIT’s Sloan School of Management, and ideas42’s Saugato Datta argue that this approach of providing more information to help individuals make better decisions is flawed, “since it does not take into account the psychological or behavioral barriers that prevent people from making better decisions.” The solution, they propose, is using effective rules of thumb, or ‘heuristics’, to “enable people to make ‘reasonably good’ decisions without needing to understand all the complex nuances of the situation.” The paper explores the effectiveness of heuristics as a tool to simplify information during decision-making and help people follow through on their intentions. The authors offer powerful examples of effective heuristics-based methods in three domains: financial education, agriculture, and medicine….(More)”

Netpolitik: What the Emergence of Networks Means for Diplomacy and Statecraft


Charlie Firestone and Leshuo Dong at the Aspen Journal of Ideas: “…The network is emerging as a dominant form of organization for our age of complexity. This is supported by technological and economic trends. Furthermore, enemies are networks, players are networks, even governments are becoming networks. It makes sense to understand network principles and apply them for use in the world of diplomacy. Accordingly, governments, organizations and individuals should heed these recommendations:

  • Understand and apply two-way communications and network principles to all forms of diplomacy with the aim of earning the sympathy, empathy and where applicable, the loyalty of future generations. This is a mindset shift for governments, diplomats and citizens around the world.
  • This means engaging the world’s populations to communicate with each other. That will entail physical connections to the global common medium, an ability to have what you send be received by others in the form you send it, end to end, and literacy in the communications methods of the day. The world’s population should have a meaningful right to connect.
  • Of course, if there is to be a global communications network, it needs to be safe, so governments remain in the role of protector of the environment needed for users to trust in their networks. States have a role to protect against cyberwar, cybercrimes, and loss of a person’s identity, i.e., security and privacy online. But these protections cannot be a screen for illegitimate governmental controls over or unwarranted surveillance of its citizens. Nor can governments be expected to shoulder that burden alone. Everyone will need to practice a basic level of Net hygiene and literacy as an element of their digital citizenship.

As networks proliferate, principles of netpolitik will emerge. Governments, businesses, non-governmental organizations, and every citizen would be well advised to be thinking in these terms in the years ahead….(More).”

Reclaiming Accountability: Transparency, Executive Power, and the U.S. Constitution


New book by Heidi Kitrosser: “Americans tend to believe in government that is transparent and accountable. Those who govern us work for us, and therefore they must also answer to us. But how do we reconcile calls for greater accountability with the competing need for secrecy, especially in matters of national security? Those two imperatives are usually taken to be antithetical, but Heidi Kitrosser argues convincingly that this is not the case—and that our concern ought to lie not with secrecy, but with the sort of unchecked secrecy that can result from “presidentialism,” or constitutional arguments for broad executive control of information.
In Reclaiming Accountability, Kitrosser traces presidentialism from its start as part of a decades-old legal movement through its appearance during the Bush and Obama administrations, demonstrating its effects on secrecy throughout. Taking readers through the key presidentialist arguments—including “supremacy” and “unitary executive theory”—she explains how these arguments misread the Constitution in a way that is profoundly at odds with democratic principles. Kitrosser’s own reading offers a powerful corrective, showing how the Constitution provides myriad tools, including the power of Congress and the courts to enforce checks on presidential power, through which we could reclaim government accountability….(More)”

Open data could turn Europe’s digital desert into a digital rainforest


Joanna Roberts interviews Dirk Helbing, Professor of Computational Social Science at ETH Zurich at Horizon: “…If we want to be competitive, Europe needs to find its own way. How can we differentiate ourselves and make things better? I believe Europe should not engage in the locked data strategy that we see in all these huge IT giants. Instead, Europe should engage in open data, open innovation, and value-sensitive design, particularly approaches that support informational self-determination. So everyone can use this data, generate new kinds of data, and build applications on top. This is going to create ever more possibilities for everyone else, so in a sense that will turn a digital desert into a digital rainforest full of opportunities for everyone, with a rich information ecosystem.’…
The Internet of Things is the next big emerging information communication technology. It’s based on sensors. In smartphones there are about 15 sensors; for light, for noise, for location, for all sorts of things. You could also buy additional external sensors for humidity, for chemical substances and almost anything that comes to your mind. So basically this allows us to measure the environment and all the features of our physical, biological, economic, social and technological environment.
‘Imagine if there was one company in the world controlling all the sensors and collecting all the information. I think that might potentially be a dystopian surveillance nightmare, because you couldn’t take a single step or speak a single word without it being recorded. Therefore, if we want the Internet of Things to be consistent with a stable democracy then I believe we need to run it as a citizen web, which means to create and manage the planetary nervous system together. The citizens themselves would buy the sensors and activate them or not, would decide themselves what sensor data they would share with whom and for what purpose, so informational self-determination would be at the heart, and everyone would be in control of their own data.’….
A lot of exciting things will become possible. We would have a real-time picture of the world and we could use this data to be more aware of what the implications of our decisions and actions are. We could avoid mistakes and discover opportunities we would otherwise have missed. We will also be able to measure what’s going on in our society and economy and why. In this way, we will eventually identify the hidden forces that determine the success or failure of a company, of our economy or even our society….(More)”

Action-Packed Signs Could Mean Fewer Pedestrian Accidents


Marielle Mondon at Next City: “Action-packed road signs could mean less unfortunate action for pedestrians. More than a year after New York and San Francisco implemented Vision Zero campaigns to increase pedestrian safety, new research shows that warning signs depicting greater movement — think running stick figures, not walking ones — cause fewer pedestrian accidents.
“A sign that evokes more perceived movement increases the observer’s perception of risk, which in turn brings about earlier attention and earlier stopping,” said Ryan Elder, co-author of the new Journal of Consumer Research report. “If you want to grab attention, you need signs that are more dynamic.”

The real U.S. pedestrian sign on the left represents what almost seems to be a casual stroll, while the example on the far right amps up the speed of the walkers.

The study argues that drivers react faster to signs showing greater movement because the threat of a last-minute accident seems more real — and often, a quicker reaction, even by a few seconds, can make a major difference….
Another important point in a world where pedestrians can play games with walk signals: Elder’s suggestions seem more noteworthy than whimsical — and not necessarily a contribution to urban cutesification that annoys some city-dwellers….(More)”