Data Mining Reveals How Social Coding Succeeds (And Fails)


Emerging Technology From the arXiv : “Collaborative software development can be hugely successful or fail spectacularly. An analysis of the metadata associated with these projects is teasing apart the difference….
The process of developing software has undergone huge transformation in the last decade or so. One of the key changes has been the evolution of social coding websites, such as GitHub and BitBucket.
These allow anyone to start a collaborative software project that other developers can contribute to on a voluntary basis. Millions of people have used these sites to build software, sometimes with extraordinary success.
Of course, some projects are more successful than others. And that raises an interesting question: what are the differences between successful and unsuccessful projects on these sites?
Today, we get an answer from Yuya Yoshikawa at the Nara Institute of Science and Technology in Japan and a couple of pals at the NTT Laboratories, also in Japan.  These guys have analysed the characteristics of over 300,000 collaborative software projects on GitHub to tease apart the factors that contribute to success. Their results provide the first insights into social coding success from this kind of data mining.
A social coding project begins when a group of developers outline a project and begin work on it. These are the “internal developers” and have the power to update the software in a process known as a “commit”. The number of commits is a measure of the activity on the project.
External developers can follow the progress of the project by “starring” it, a form of bookmarking on GitHub. The number of stars is a measure of the project’s popularity. These external developers can also request changes, such as additional features and so on, in a process known as a pull request.
Yoshikawa and co begin by downloading the data associated with over 300,000 projects from the GitHub website. This includes the number of internal developers, the number of stars a project receives over time and the number of pull requests it gets.
The team then analyse the effectiveness of the project by calculating factors such as the number of commits per internal team member, the popularity of the project over time, the number of pull requests that are fulfilled and so on.
The results provide a fascinating insight into the nature of social coding. Yoshikawa and co say the number of internal developers on a project plays a significant role in its success. “Projects with larger numbers of internal members have higher activity, popularity and sociality,” they say….
Ref: arxiv.org/abs/1408.6012 : Collaboration on Social Media: Analyzing Successful Projects on Social Coding”

Using Crowds for Evaluation Tasks: Validity by Numbers vs. Validity by Expertise


Paper by Christoph Hienerth and Frederik Riar:Developing and commercializing novel ideas is central to innovation processes. As the outcome of such ideas cannot fully be foreseen, the evaluation of them is crucial. With the rise of the internet and ICT, more and new kinds of evaluations are done by crowds. This raises the question whether individuals in crowds possess necessary capabilities to evaluate and whether their outcomes are valid. As empirical insights are not yet available, this paper deals with the examination of evaluation processes and general evaluation components, the discussion of underlying characteristics and mechanism of these components affecting evaluation outcomes (i.e. evaluation validity). We further investigate differences between firm- and crowd-based evaluation using different cases of applications, and develop a theoretical framework towards evaluation validity, i.e. validity by numbers vs. the validity by expertise. The identified factors that influence the validity of evaluations are: (1) the number of evaluation tasks, (2) complexity, (3) expertise, (4) costs, and (5) time to outcome. For each of these factors, hypotheses are developed based on theoretical arguments. We conclude with implications, proposing a model of evaluation validity.”

A Few Useful Things to Know about Machine Learning


A new research paper by Pedro Domingos: “Machine learning algorithms can figure out how to perform important tasks by generalizing from examples. This is often feasible and cost-effective where manual programming is not. As more data becomes available, more ambitious problems can be tackled. As a result, machine learning is widely used in computer science and other fields. However, developing successful machine learning applications requires a substantial amount of “black art” that is hard to find in textbooks. This article summarizes twelve key lessons that machine learning researchers and practitioners have learned. These include pitfalls to avoid, important issues to focus on, and answers to common questions.”
 

The wisest choices depend on instinct and careful analysis


John Kay in the Financial Times: “Moneyball, Michael Lewis’s 2003 book on the science of picking baseball teams, was perhaps written to distract himself from his usual work of attacking the financial services industry. Even after downloading the rules of baseball, I still could not fully understand what was going on. But I caught the drift: sabermetrics, the statistical analysis of the records of players, proved a better guide than the accumulated wisdom of experienced coaches.

Another lesson, important for business strategy, was the brevity of the benefits gained by the Oakland A’s, Lewis’s sporting heroes. If the only source of competitive advantage is better quantitative analysis – whether in baseball or quant strategies in the financial sector – such an advantage can be rapidly and accurately imitated.

At the same time, another genre of books proclaims the virtues of instinctive decision-making. Malcolm Gladwell’s Blink (2005) begins with accounts of how experts could identify the Getty kouros – a statue of naked youth purported to be of ancient Greek provenance and purchased in 1985 for $9m – as fake immediately, even though it had supposedly been authenticated through extended scientific tests.

Gary Klein, a cognitive psychologist has for many years monitored the capabilities of experienced practical decision makers – firefighters, nurses and military personnel – who make immediate judgments that are vindicated by the more elaborate assessments possible only with hindsight.
Of course, there is no real inconsistency between the two propositions. The experienced coaches disparaged by sabermetrics enthusiasts were right to believe they knew a lot about spotting baseball talent; they just did not know as much as they thought they did. The art experts and firefighters who made instantaneous, but accurate, judgments were not hearing voices in the air. But no expert can compete with chemical analysis and carbon dating in assessing the age of a work of art.
There are two ways of reconciling expertise with analysis. One takes the worst of both worlds, combining the overconfidence of experience with the naive ignorance of the quant. The resulting bogus rationality seeks to objectivise expertise by fitting it into a template.
It is exemplified in the processes by which interviewers for jobs, and managers who make personnel assessments, are required to complete checklists explaining how they reached their conclusion using prescribed criteria….”

Values at Play in Digital Games


New book by Mary Flanagan and Helen Nissenbaum: “All games express and embody human values, providing a compelling arena in which we play out beliefs and ideas. “Big ideas” such as justice, equity, honesty, and cooperation—as well as other kinds of ideas, including violence, exploitation, and greed—may emerge in games whether designers intend them or not. In this book, Mary Flanagan and Helen Nissenbaum present Values at Play, a theoretical and practical framework for identifying socially recognized moral and political values in digital games. Values at Play can also serve as a guide to designers who seek to implement values in the conception and design of their games.
After developing a theoretical foundation for their proposal, Flanagan and Nissenbaum provide detailed examinations of selected games, demonstrating the many ways in which values are embedded in them. They introduce the Values at Play heuristic, a systematic approach for incorporating values into the game design process. Interspersed among the book’s chapters are texts by designers who have put Values at Play into practice by accepting values as a design constraint like any other, offering a real-world perspective on the design challenges involved.”

Riding the Second Wave of Civic Innovation


Jeremy Goldberg at Governing: “Innovation and entrepreneurship in local government increasingly require mobilizing talent from many sectors and skill sets. Fortunately, the opportunities for nurturing cross-pollination between the public and private sectors have never been greater, thanks in large part to the growing role of organizations such as Bayes Impact, Code for America, Data Science for Social Good and Fuse Corps.
Indeed, there’s reason to believe that we might be entering an even more exciting period of public-private collaboration. As one local-government leader recently put it to me when talking about the critical mass of pro-bono civic-innovation efforts taking place across the San Francisco Bay area, “We’re now riding the second wave of civic pro-bono and civic innovation.”
As an alumni of Fuse Corps’ executive fellows program, I’m convinced that the opportunities initiated by it and similar organizations are integral to civic innovation. Fuse Corps brings civic entrepreneurs with experience across the public, private and nonprofit sectors to work closely with government employees to help them negotiate project design, facilitation and management hurdles. The organization’s leadership training emphasizes “smallifying” — building innovation capacity by breaking big challenges down into smaller tasks in a shorter timeframe — and making “little bets” — low-risk actions aimed at developing and testing an idea.
Since 2012, I have managed programs and cross-sector networks for the Silicon Valley Talent Partnership. I’ve witnessed a groundswell of civic entrepreneurs from across the region stepping up to participate in discussions and launch rapid-prototyping labs focused on civic innovation.
Cities across the nation are creating new roles and programs to engage these civic start-ups. They’re learning that what makes these projects, and specifically civic pro-bono programs, work best is a process of designing, building, operationalizing and bringing them to scale. If you’re setting out to create such a program, here’s a short list of best practices:
Assets: Explore existing internal resources and knowledge to understand the history, departmental relationships and overall functions of the relevant agencies or departments. Develop a compendium of current service/volunteer programs.
City policies/legal framework: Determine what the city charter, city attorney’s office or employee-relations rules and policies say about procurement, collective bargaining and public-private partnerships.
Leadership: The support of the city’s top leadership is especially important during the formative stages of a civic-innovation program, so it’s important to understand how the city’s form of government will impact the program. For example, in a “strong mayor” government the ability to make definitive decisions on a public-private collaboration may be unlikely to face the same scrutiny as it might under a “council/mayor” government.
Cross-departmental collaboration: This is essential. Without the support of city staff across departments, innovation projects are unlikely to take off. Convening a “tiger team” of individuals who are early adopters of such initiatives is important step. Ultimately, city staffers best understand the needs and demands of their departments or agencies.
Partners from corporations and philanthropy: Leveraging existing partnerships will help to bring together an advisory group of cross-sector leaders and executives to participate in the early stages of program development.
Business and member associations: For the Silicon Valley Talent Partnership, the Silicon Valley Leadership Group has been instrumental in advocating for pro-bono volunteerism with the cities of Fremont, San Jose and Santa Clara….”

Enchanted Objects


Book by David Rose: “Some believe the future will look like more of the same—more smartphones, tablets, screens embedded in every conceivable surface. David Rose has a different vision: technology that atomizes, combining itself with the objects that make up the very fabric of daily living. Such technology will be woven into the background of our environment, enhancing human relationships and channeling desires for omniscience, long life, and creative expression. The enchanted objects of fairy tales and science fiction will enter real life.
Groundbreaking, timely, and provocative, Enchanted Objects is a blueprint for a better future, where efficient solutions come hand in hand with technology that delights our senses. It is essential reading for designers, technologists, entrepreneurs, business leaders, and anyone who wishes to understand the future and stay relevant in the Internet of Things. Download the prologue here.”

For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights


in the New York Times: “Technology revolutions come in measured, sometimes foot-dragging steps. The lab science and marketing enthusiasm tend to underestimate the bottlenecks to progress that must be overcome with hard work and practical engineering.

The field known as “big data” offers a contemporary case study. The catchphrase stands for the modern abundance of digital data from many sources — the web, sensors, smartphones and corporate databases — that can be mined with clever software for discoveries and insights. Its promise is smarter, data-driven decision-making in every field. That is why data scientist is the economy’s hot new job.

Yet far too much handcrafted work — what data scientists call “data wrangling,” “data munging” and “data janitor work” — is still required. Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets….”

Technology’s Crucial Role in the Fight Against Hunger


Crowdsourcing, predictive analytics and other new tools could go far toward finding innovative solutions for America’s food insecurity.

National Geographic recently sent three photographers to explore hunger in the United States. It was an effort to give a face to a very troubling statistic: Even today, one-sixth of Americans do not have enough food to eat. Fifty million people in this country are “food insecure” — having to make daily trade-offs among paying for food, housing or medical care — and 17 million of them skip at least one meal a day to get by. When choosing what to eat, many of these individuals must make choices between lesser quantities of higher-quality food and larger quantities of less-nutritious processed foods, the consumption of which often leads to expensive health problems down the road.
This is an extremely serious, but not easily visible, social problem. Nor does the challenge it poses become any easier when poorly designed public-assistance programs continue to count the sauce on a pizza as a vegetable. The deficiencies caused by hunger increase the likelihood that a child will drop out of school, lowering her lifetime earning potential. In 2010 alone, food insecurity cost America $167.5 billion, a figure that includes lost economic productivity, avoidable health-care expenses and social-services programs.
As much as we need specific policy innovations, if we are to eliminate hunger in America food insecurity is just one of many extraordinarily complex and interdependent “systemic” problems facing us that would benefit from the application of technology, not just to identify innovative solutions but to implement them as well. In addition to laudable policy initiatives by such states as Illinois and Nevada, which have made hunger a priority, or Arkansas, which suffers the greatest level of food insecurity but which is making great strides at providing breakfast to schoolchildren, we can — we must — bring technology to bear to create a sustained conversation between government and citizens to engage more Americans in the fight against hunger.

Identifying who is genuinely in need cannot be done as well by a centralized government bureaucracy — even one with regional offices — as it can through a distributed network of individuals and organizations able to pinpoint with on-the-ground accuracy where the demand is greatest. Just as Ushahidi uses crowdsourcing to help locate and identify disaster victims, it should be possible to leverage the crowd to spot victims of hunger. As it stands, attempts to eradicate so-called food deserts are often built around developing solutions for residents rather than with residents. Strategies to date tend to focus on the introduction of new grocery stores or farmers’ markets but with little input from or involvement of the citizens actually affected.

Applying predictive analytics to newly available sources of public as well as private data, such as that regularly gathered by supermarkets and other vendors, could also make it easier to offer coupons and discounts to those most in need. In addition, analyzing nonprofits’ tax returns, which are legally open and available to all, could help map where the organizations serving those in need leave gaps that need to be closed by other efforts. The Governance Lab recently brought together U.S. Department of Agriculture officials with companies that use USDA data in an effort to focus on strategies supporting a White House initiative to use climate-change and other open data to improve food production.

Such innovative uses of technology, which put citizens at the center of the service-delivery process and streamline the delivery of government support, could also speed the delivery of benefits, thus reducing both costs and, every bit as important, the indignity of applying for assistance.

Being open to new and creative ideas from outside government through brainstorming and crowdsourcing exercises using social media can go beyond simply improving the quality of the services delivered. Some of these ideas, such as those arising from exciting new social-science experiments involving the use of incentives for “nudging” people to change their behaviors, might even lead them to purchase more healthful food.

Further, new kinds of public-private collaborative partnerships could create the means for people to produce their own food. Both new kinds of financing arrangements and new apps for managing the shared use of common real estate could make more community gardens possible. Similarly, with the kind of attention, convening and funding that government can bring to an issue, new neighbor-helping-neighbor programs — where, for example, people take turns shopping and cooking for one another to alleviate time away from work — could be scaled up.

Then, too, advances in citizen engagement and oversight could make it more difficult for lawmakers to cave to the pressures of lobbying groups that push for subsidies for those crops, such as white potatoes and corn, that result in our current large-scale reliance on less-nutritious foods. At the same time, citizen scientists reporting data through an app would be able do a much better job than government inspectors in reporting what is and is not working in local communities.

As a society, we may not yet be able to banish hunger entirely. But if we commit to using new technologies and mechanisms of citizen engagement widely and wisely, we could vastly reduce its power to do harm.