Data scientists rejoice! There’s an online marketplace selling algorithms from academics


SiliconRepublic: “Algorithmia, an online marketplace that connects computer science researchers’ algorithms with developers who may have uses for them, has exited its private beta.

Algorithms are essential to our online experience. Google uses them to determine which search results are the most relevant. Facebook uses them to decide what should appear in your news feed. Netflix uses them to make movie recommendations.

Founded in 2013, Algorithmia could be described as an app store for algorithms, with over 800 of them available in its library. These algorithms provide the means of completing various tasks in the fields of machine learning, audio and visual processing, and computer vision.

Algorithmia found a way to monetise algorithms by creating a platform where academics can share their creations and charge a royalty fee per use, while developers and data scientists can request specific algorithms in return for a monetary reward. One such suggestion is for ‘punctuation prediction’, which would insert correct punctuation and capitalisation in speech-to-text translation.

While it’s not the first algorithm marketplace online, Algorithmia will accept and sell any type of algorithm and host them on its servers. What this means is that developers need only add a simple piece of code to their software in order to send a query to Algorithmia’s servers, so the algorithm itself doesn’t have to be integrated in its entirety….

Computer science researchers can spend years developing algorithms, only for them to be published in a scientific journal never to be read by software engineers.

Algorithmia intends to create a community space where academics and engineers can meet to discuss and refine these algorithms for practical use. A voting and commenting system on the site will allow users to engage and even share insights on how contributions can be improved.

To that end, Algorithmia’s ultimate goal is to advance the development of algorithms as well as their discovery and use….(More)”

Encyclopedia of Social Network Analysis and Mining


“The Encyclopedia of Social Network Analysis and Mining (ESNAM) is the first major reference work to integrate fundamental concepts and research directions in the areas of social networks and  applications to data mining. While ESNAM  reflects the state-of-the-art in  social network research, the field  had its start in the 1930s when fundamental issues in social network research were broadly defined. These communities were limited to relatively small numbers of nodes (actors) and links. More recently the advent of electronic communication, and in particular on-line communities, have created social networks of hitherto unimaginable sizes. People around the world are directly or indirectly connected by popular social networks established using web-based platforms rather than by physical proximity.

Reflecting the interdisciplinary nature of this unique field, the essential contributions of diverse disciplines, from computer science, mathematics, and statistics to sociology and behavioral science, are described among the 300 authoritative yet highly readable entries. Students will find a world of information and insight behind the familiar façade of the social networks in which they participate. Researchers and practitioners will benefit from a comprehensive perspective on the methodologies for analysis of constructed networks, and the data mining and machine learning techniques that have proved attractive for sophisticated knowledge discovery in complex applications. Also addressed is the application of social network methodologies to other domains, such as web networks and biological networks….(More)”

Making emotive games from open data


Katie Collins at WIRED: “Microsoft researcher Kati London’s aim is “to try to get people to think of data in terms of personalities, relationships and emotions”, she tells the audience at the Story Festival in London. Through Project Sentient Data, she uses her background in games development to create fun but meaningful experiences that bridge online interactions and things that are happening in the real world.
One such experience invited children to play against the real-time flow of London traffic through an online game called the Code of Everand. The aim was to test the road safety knowledge of 9-11 year olds and “make alertness something that kids valued”.
The core mechanic of the game was that of a normal world populated by little people, containing spirit channels that only kids could see and go through. Within these spirit channels, everything from lorries and cars from the streets became monsters. The children had to assess what kind of dangers the monsters posed and use their tools to dispel them.
“Games are great ways to blur and observe the ways people interact with real-world data,” says London.
In one of her earlier projects back in 2005, London used her knowledge of horticulture to bring artificial intelligence to plants. “Almost every workspace I go into has a half dead plant in it, so we gave plants the ability to tell us what they need.” It was, she says, an exercise in “humanising data” that led to further projects that saw her create self aware street signs and a dynamic city map that expressed shame neighbourhood by neighbourhood depending on the open dataset of public complaints in New York.
A further project turned complaint data into cartoons on Instagram every week. London praised the open data initiative in New York, but added that for people to access it, they had to know it existed and know where to find it. The cartoons were a “lightweight” form of “civic engagement” that helped to integrate hyperlocal issues into everyday conversation.
London also gamified community engagement through a project commissioned by the Knight Foundation called Macon Money….(More)”.

Cultures of Code


Brian Hayes in the American Scientist: “Kim studies parallel algorithms, designed for computers with thousands of processors. Chris builds computer simulations of fluids in motion, such as ocean currents. Dana creates software for visualizing geographic data. These three people have much in common. Computing is an essential part of their professional lives; they all spend time writing, testing, and debugging computer programs. They probably rely on many of the same tools, such as software for editing program text. If you were to look over their shoulders as they worked on their code, you might not be able to tell who was who.
Despite the similarities, however, Kim, Chris, and Dana were trained in different disciplines, and they belong to  different intellectual traditions and communities. Kim, the parallel algorithms specialist, is a professor in a university department of computer science. Chris, the fluids modeler, also lives in the academic world, but she is a physicist by training; sometimes she describes herself as a computational scientist (which is not the same thing as a computer scientist). Dana has been programming since junior high school but didn’t study computing in college; at the startup company where he works, his title is software developer.
These factional divisions run deeper than mere specializations. Kim, Chris, and Dana belong to different professional societies, go to different conferences, read different publications; their paths seldom cross. They represent different cultures. The resulting Balkanization of computing seems unwise and unhealthy, a recipe for reinventing wheels and making the same mistake three times over. Calls for unification go back at least 45 years, but the estrangement continues. As a student and admirer of all three fields, I find the standoff deeply frustrating.
Certain areas of computation are going through a period of extraordinary vigor and innovation. Machine learning, data analysis, and programming for the web have all made huge strides. Problems that stumped earlier generations, such as image recognition, finally seem to be yielding to new efforts. The successes have drawn more young people into the field; suddenly, everyone is “learning to code.” I am cheered by (and I cheer for) all these events, but I also want to whisper a question: Will the wave of excitement ever reach other corners of the computing universe?…
What’s the difference between computer science, computational science, and software development?…(More)”

Big Data Now


at Radar – O’Reilly: “In the four years we’ve been producing Big Data Now, our wrap-up of important developments in the big data field, we’ve seen tools and applications mature, multiply, and coalesce into new categories. This year’s free wrap-up of Radar coverage is organized around seven themes:

  • Cognitive augmentation: As data processing and data analytics become more accessible, jobs that can be automated will go away. But to be clear, there are still many tasks where the combination of humans and machines produce superior results.
  • Intelligence matters: Artificial intelligence is now playing a bigger and bigger role in everyone’s lives, from sorting our email to rerouting our morning commutes, from detecting fraud in financial markets to predicting dangerous chemical spills. The computing power and algorithmic building blocks to put AI to work have never been more accessible.
  • The convergence of cheap sensors, fast networks, and distributed computation: The amount of quantified data available is increasing exponentially — and aside from tools for centrally handling huge volumes of time-series data as it arrives, devices and software are getting smarter about placing their own data accurately in context, extrapolating without needing to ‘check in’ constantly.
  • Reproducing, managing, and maintaining data pipelines: The coordination of processes and personnel within organizations to gather, store, analyze, and make use of data.
  • The evolving, maturing marketplace of big data components: Open-source components like Spark, Kafka, Cassandra, and ElasticSearch are reducing the need for companies to build in-house proprietary systems. On the other hand, vendors are developing industry-specific suites and applications optimized for the unique needs and data sources in a field.
  • The value of applying techniques from design and social science: While data science knows human behavior in the aggregate, design works in the particular, where A/B testing won’t apply — you only get one shot to communicate your proposal to a CEO, for example. Similarly, social science enables extrapolation from sparse data. Both sets of tools enable you to ask the right questions, and scope your problems and solutions realistically.
  • The importance of building a data culture: An organization that is comfortable with gathering data, curious about its significance, and willing to act on its results will perform demonstrably better than one that doesn’t. These priorities must be shared throughout the business.
  • The perils of big data: From poor analysis (driven by false correlation or lack of domain expertise) to intrusiveness (privacy invasion, price profiling, self-fulfilling predictions), big data has negative potential.

Download our free snapshot of big data in 2014, and follow the story this year on Radar.”

Computer-based personality judgments are more accurate than those made by humans


Paper by Wu Youyou, Michal Kosinski and David Stillwell at PNAS (Proceedings of the National Academy of Sciences): “Judging others’ personalities is an essential skill in successful social living, as personality is a key driver behind people’s interactions, behaviors, and emotions. Although accurate personality judgments stem from social-cognitive skills, developments in machine learning show that computer models can also make valid judgments. This study compares the accuracy of human and computer-based personality judgments, using a sample of 86,220 volunteers who completed a 100-item personality questionnaire. We show that (i) computer predictions based on a generic digital footprint (Facebook Likes) are more accurate (r = 0.56) than those made by the participants’ Facebook friends using a personality questionnaire (r = 0.49); (ii) computer models show higher interjudge agreement; and (iii) computer personality judgments have higher external validity when predicting life outcomes such as substance use, political attitudes, and physical health; for some outcomes, they even outperform the self-rated personality scores. Computers outpacing humans in personality judgment presents significant opportunities and challenges in the areas of psychological assessment, marketing, and privacy…(More)”.

Businesses dig for treasure in open data


Lindsay Clark in ComputerWeekly: “Open data, a movement which promises access to vast swaths of information held by public bodies, has started getting its hands dirty, or rather its feet.
Before a spade goes in the ground, construction and civil engineering projects face a great unknown: what is down there? In the UK, should someone discover anything of archaeological importance, a project can be halted – sometimes for months – while researchers study the site and remove artefacts….
During an open innovation day hosted by the Science and Technologies Facilities Council (STFC), open data services and technology firm Democrata proposed analytics could predict the likelihood of unearthing an archaeological find in any given location. This would help developers understand the likely risks to construction and would assist archaeologists in targeting digs more accurately. The idea was inspired by a presentation from the Archaeological Data Service in the UK at the event in June 2014.
The proposal won support from the STFC which, together with IBM, provided a nine-strong development team and access to the Hartree Centre’s supercomputer – a 131,000 core high-performance facility. For natural language processing of historic documents, the system uses two components of IBM’s Watson – the AI service which famously won the US TV quiz show Jeopardy. The system uses SPSS modelling software, the language R for algorithm development and Hadoop data repositories….
The proof of concept draws together data from the University of York’s archaeological data, the Department of the Environment, English Heritage, Scottish Natural Heritage, Ordnance Survey, Forestry Commission, Office for National Statistics, the Land Registry and others….The system analyses sets of indicators of archaeology, including historic population dispersal trends, specific geology, flora and fauna considerations, as well as proximity to a water source, a trail or road, standing stones and other archaeological sites. Earlier studies created a list of 45 indicators which was whittled down to seven for the proof of concept. The team used logistic regression to assess the relationship between input variables and come up with its prediction….”

Big Data, Machine Learning, and the Social Sciences: Fairness, Accountability, and Transparency


at Medium: “…So why, then, does granular, social data make people uncomfortable? Well, ultimately—and at the risk of stating the obvious—it’s because data of this sort brings up issues regarding ethics, privacy, bias, fairness, and inclusion. In turn, these issues make people uncomfortable because, at least as the popular narrative goes, these are new issues that fall outside the expertise of those those aggregating and analyzing big data. But the thing is, these issues aren’t actually new. Sure, they may be new to computer scientists and software engineers, but they’re not new to social scientists.

This is why I think the world of big data and those working in it — ranging from the machine learning researchers developing new analysis tools all the way up to the end-users and decision-makers in government and industry — can learn something from computational social science….

So, if technology companies and government organizations — the biggest players in the big data game — are going to take issues like bias, fairness, and inclusion seriously, they need to hire social scientists — the people with the best training in thinking about important societal issues. Moreover, it’s important that this hiring is done not just in a token, “hire one social scientist for every hundred computer scientists” kind of way, but in a serious, “creating interdisciplinary teams” kind of kind of way.


Thanks to Moritz Hardt for the picture!

While preparing for my talk, I read an article by Moritz Hardt, entitled “How Big Data is Unfair.” In this article, Moritz notes that even in supposedly large data sets, there is always proportionally less data available about minorities. Moreover, statistical patterns that hold for the majority may be invalid for a given minority group. He gives, as an example, the task of classifying user names as “real” or “fake.” In one culture — comprising the majority of the training data — real names might be short and common, while in another they might be long and unique. As a result, the classic machine learning objective of “good performance on average,” may actually be detrimental to those in the minority group….

As an alternative, I would advocate prioritizing vital social questions over data availability — an approach more common in the social sciences. Moreover, if we’re prioritizing social questions, perhaps we should take this as an opportunity to prioritize those questions explicitly related to minorities and bias, fairness, and inclusion. Of course, putting questions first — especially questions about minorities, for whom there may not be much available data — means that we’ll need to go beyond standard convenience data sets and general-purpose “hammer” methods. Instead we’ll need to think hard about how best to instrument data aggregation and curation mechanisms that, when combined with precise, targeted models and tools, are capable of elucidating fine-grained, hard-to-see patterns….(More).”

Big video data could change how we do everything — from catching bad guys to tracking shoppers


Sean Varah at VentureBeat: “Everyone takes pictures and video with their devices. Parents record their kids’ soccer games, companies record employee training, police surveillance cameras at busy intersections run 24/7, and drones monitor pipelines in the desert.
With vast amounts of video growing vaster at a rate faster than the day before, and the hottest devices like drones decreasing in price and size until everyone has one (OK, not in their pocket quite yet) it’s time to start talking about mining this mass of valuable video data for useful purposes.
Julian Mann, the cofounder of Skybox Imaging — a company in the business of commercial satellite imagery and the developer advocate for Google Earth outreach — says that the new “Skybox for Good” program will provide “a constantly updated model of change of the entire planet” with the potential to “save lives, protect the environment, promote education, and positively impact humanity.”…
Mining video data through “man + machine” artificial intelligence is new technology in search of unsolved problems. Could this be the next chapter in the ever-evolving technology revolution?
For the past 50 years, satellite imagery has only been available to the U.S. intelligence community and those countries with technology to launch their own. Digital Globe was one of the first companies to make satellite imagery available commercially, and now Skybox and a few others have joined them. Drones are even newer, having been used by the U.S. military since the ‘90s for surveillance over battlefields or, in this age of counter-terrorism, playing the role of aerial detectives finding bad guys in the middle of nowhere. Before drones, the same tasks required thousands of troops on the ground, putting many young men and women in harm’s way. Today, hundreds of trained “eyes” safely located here in the U.S. watch hours of video from a single drone to assess current situations in countries far away….”

Smarter Than Us: The Rise of Machine Intelligence


 

Book by Stuart Armstrong at the Machine Intelligence Research Institute: “What happens when machines become smarter than humans? Forget lumbering Terminators. The power of an artificial intelligence (AI) comes from its intelligence, not physical strength and laser guns. Humans steer the future not because we’re the strongest or the fastest but because we’re the smartest. When machines become smarter than humans, we’ll be handing them the steering wheel. What promises—and perils—will these powerful machines present? Stuart Armstrong’s new book navigates these questions with clarity and wit.
Can we instruct AIs to steer the future as we desire? What goals should we program into them? It turns out this question is difficult to answer! Philosophers have tried for thousands of years to define an ideal world, but there remains no consensus. The prospect of goal-driven, smarter-than-human AI gives moral philosophy a new urgency. The future could be filled with joy, art, compassion, and beings living worthwhile and wonderful lives—but only if we’re able to precisely define what a “good” world is, and skilled enough to describe it perfectly to a computer program.
AIs, like computers, will do what we say—which is not necessarily what we mean. Such precision requires encoding the entire system of human values for an AI: explaining them to a mind that is alien to us, defining every ambiguous term, clarifying every edge case. Moreover, our values are fragile: in some cases, if we mis-define a single piece of the puzzle—say, consciousness—we end up with roughly 0% of the value we intended to reap, instead of 99% of the value.
Though an understanding of the problem is only beginning to spread, researchers from fields ranging from philosophy to computer science to economics are working together to conceive and test solutions. Are we up to the challenge?
A mathematician by training, Armstrong is a Research Fellow at the Future of Humanity Institute (FHI) at Oxford University. His research focuses on formal decision theory, the risks and possibilities of AI, the long term potential for intelligent life (and the difficulties of predicting this), and anthropic (self-locating) probability. Armstrong wrote Smarter Than Us at the request of the Machine Intelligence Research Institute, a non-profit organization studying the theoretical underpinnings of artificial superintelligence.”