The Tricky Ethics of Using YouTube Videos for Academic Research


Jane C.Hu in P/S Magazine: “…But just because something is legal doesn’t mean it’s ethical. That doesn’t mean it’s necessarily unethical, either, but it’s worth asking questions about how and why researchers use social media posts, and whether those uses could be harmful. I was once a researcher who had to obtain human-subjects approval from a university institutional review board, and I know it can be a painstaking application process with long wait times. Collecting data from individuals takes a long time too. If you could just sub in YouTube videos in place of collecting your own data, that saves time, money, and effort. But that could be at the expense of the people whose data you’re scraping.

But, you might say, if people don’t want to be studied online, then they shouldn’t post anything. But most people don’t fully understand what “publicly available” really means or its ramifications. “You might know intellectually that technically anyone can see a tweet, but you still conceptualize your audience as being your 200 Twitter followers,” Fiesler says. In her research, she’s found that the majority of people she’s polled have no clue that researchers study public tweets.

Some may disagree that it’s researchers’ responsibility to work around social media users’ ignorance, but Fiesler and others are calling for their colleagues to be more mindful about any work that uses publicly available data. For instance, Ashley Patterson, an assistant professor of language and literacy at Penn State University, ultimately decided to use YouTube videos in her dissertation work on biracial individuals’ educational experiences. That’s a decision she arrived at after carefully considering her options each step of the way. “I had to set my own levels of ethical standards and hold myself to it, because I knew no one else would,” she says. One of Patterson’s first steps was to ask herself what YouTube videos would add to her work, and whether there were any other ways to collect her data. “It’s not a matter of whether it makes my life easier, or whether it’s ‘just data out there’ that would otherwise go to waste. The nature of my question and the response I was looking for made this an appropriate piece [of my work],” she says.

Researchers may also want to consider qualitative, hard-to-quantify contextual cues when weighing ethical decisions. What kind of data is being used? Fiesler points out that tweets about, say, a television show are way less personal than ones about a sensitive medical condition. Anonymized written materials, like Facebook posts, could be less invasive than using someone’s face and voice from a YouTube video. And the potential consequences of the research project are worth considering too. For instance, Fiesler and other critics have pointed out that researchers who used YouTube videos of people documenting their experience undergoing hormone replacement therapy to train an artificial intelligence to identify trans people could be putting their unwitting participants in danger. It’s not obvious how the results of Speech2Face will be used, and, when asked for comment, the paper’s researchers said they’d prefer to quote from their paper, which pointed to a helpful purpose: providing a “representative face” based on the speaker’s voice on a phone call. But one can also imagine dangerous applications, like doxing anonymous YouTubers.

One way to get ahead of this, perhaps, is to take steps to explicitly inform participants their data is being used. Fiesler says that, when her team asked people how they’d feel after learning their tweets had been used for research, “not everyone was necessarily super upset, but most people were surprised.” They also seemed curious; 85 percent of participants said that, if their tweet were included in research, they’d want to read the resulting paper. “In human-subjects research, the ethical standard is informed consent, but inform and consent can be pulled apart; you could potentially inform people without getting their consent,” Fiesler suggests….(More)”.

Inaccurate Statistical Discrimination


NBER paper by J. Aislinn Bohren, Kareem Haggag, Alex Imas, Devin G. Pope: “Discrimination has been widely studied in economics and other disciplines. In addition to identifying evidence of discrimination, economists often categorize the source of discrimination as either taste-based or statistical. Categorizing discrimination in this way can be valuable for policy design and welfare analysis. We argue that a further categorization is important and needed. Specifically, in many situations economic agents may have inaccurate beliefs about the expected productivity or performance of a social group. This motivates our proposed distinction between accurate (based on correct beliefs) and inaccurate (based on incorrect beliefs) statistical discrimination. We do a thorough review of the discrimination literature and argue that this distinction is rarely discussed. Using an online experiment, we illustrate how to identify accurate versus inaccurate statistical discrimination. We show that ignoring this distinction – as is often the case in the discrimination literature – can lead to erroneous interpretations of the motives and implications of discriminatory behavior. In particular, when not explicitly accounted for, inaccurate statistical discrimination can be mistaken for taste-based discrimination, accurate statistical discrimination, or a combination of the two….(More)”.

How to use data for good — 5 priorities and a roadmap


Stefaan Verhulst at apolitical: “…While the overarching message emerging from these case studies was promising, several barriers were identified that if not addressed systematically could undermine the potential of data science to address critical public needs and limit the opportunity to scale the practice more broadly.

Below we summarise the five priorities that emerged through the workshop for the field moving forward.

1. Become People-Centric

Much of the data currently used for drawing insights involve or are generated by people.

These insights have the potential to impact people’s lives in many positive and negative ways. Yet, the people and the communities represented in this data are largely absent when practitioners design and develop data for social good initiatives.

To ensure data is a force for positive social transformation (i.e., they address real people’s needs and impact lives in a beneficiary way), we need to experiment with new ways to engage people at the design, implementation, and review stage of data initiatives beyond simply asking for their consent.

(Photo credit: Image from the people-led innovation report)

As we explain in our People-Led Innovation methodology, different segments of people can play multiple roles ranging from co-creation to commenting, reviewing and providing additional datasets.

The key is to ensure their needs are front and center, and that data science for social good initiatives seek to address questions related to real problems that matter to society-at-large (a key concern that led The GovLab to instigate 100 Questions Initiative).

2. Establish Data About the Use of Data (for Social Good)

Many data for social good initiatives remain fledgling.

As currently designed, the field often struggles with translating sound data projects into positive change. As a result, many potential stakeholders—private sector and government “owners” of data as well as public beneficiaries—remain unsure about the value of using data for social good, especially against the background of high risks and transactions costs.

The field needs to overcome such limitations if data insights and its benefits are to spread. For that, we need hard evidence about data’s positive impact. Ironically, the field is held back by an absence of good data on the use of data—a lack of reliable empirical evidence that could guide new initiatives.

The field needs to prioritise developing a far more solid evidence base and “business case” to move data for social good from a good idea to reality.

3. Develop End-to-End Data Initiatives

Too often, data for social good focus on the “data-to-knowledge” pipeline without focusing on how to move “knowledge into action.”

As such, the impact remains limited and many efforts never reach an audience that can actually act upon the insights generated. Without becoming more sophisticated in our efforts to provide end-to-end projects and taking “data from knowledge to action,” the positive impact of data will be limited….

4. Invest in Common Trust and Data Steward Mechanisms 

For data for social good initiatives (including data collaboratives) to flourish and scale, there must be substantial trust between all parties involved; and amongst the public-at-large.

Establishing such a platform of trust requires each actor to invest in developing essential trust mechanisms such as data governance structures, contracts, and dispute resolution methods. Today, designing and establishing these mechanisms take tremendous time, energy, and expertise. These high transaction costs result from the lack of common templates and the need to each time design governance structures from scratch…

5. Build Bridges Across Cultures

As C.P. Snow famously described in his lecture on “Two Cultures and the Scientific Revolution,” we must bridge the “two cultures” of science and humanism if we are to solve the world’s problems….

To implement these five priorities we will need experimentation at the operational but also institutional level. This involves the establishment of “data stewards” within organisations that can accelerate data for social good initiative in a responsible manner integrating the five priorities above….(More)”

We should extend EU bank data sharing to all sectors


Carlos Torres Vila in the Financial Times: “Data is now driving the global economy — just look at the list of the world’s most valuable companies. They collect and exploit the information that users generate through billions of online interactions taking place every day. 


But companies are hoarding data too, preventing others, including the users to whom the data relates, from accessing and using it. This is true of traditional groups such as banks, telcos and utilities, as well as the large digital enterprises that rely on “proprietary” data. 
Global and national regulators must address this problem by forcing companies to give users an easy way to share their own data, if they so choose. This is the logical consequence of personal data belonging to users. There is also the potential for enormous socio-economic benefits if we can create consent-based free data flows. 
We need data-sharing across companies in all sectors in a real time, standardised way — not at a speed and in a format dictated by the companies that stockpile user data. These new rules should apply to all electronic data generated by users, whether provided directly or observed during their online interactions with any provider, across geographic borders and in any sector. This could include everything from geolocation history and electricity consumption to recent web searches, pension information or even most recently played songs. 

This won’t be easy to achieve in practice, but the good news is that we already have a framework that could be the model for a broader solution. The UK’s Open Banking system provides a tantalising glimpse of what may be possible. In Europe, the regulation known as the Payment Services Directive 2 allows banking customers to share data about their transactions with multiple providers via secure, structured IT interfaces. We are already seeing this unlock new business models and drive competition in digital financial services. But these rules do not go far enough — they only apply to payments history, and that isn’t enough to push forward a data-driven economic revolution across other sectors of the economy. 

We need a global framework with common rules across regions and sectors. This has already happened in financial services: after the 2008 financial crisis, the G20 strengthened global banking standards and created the Financial Stability Board. The rules, while not perfect, have delivered uniformity which has strengthened the system. 

We need a similar global push for common rules on the use of data. While it will be difficult to achieve consensus on data, and undoubtedly more difficult still to implement and enforce it, I believe that now is the time to decide what we want. The involvement of the G20 in setting up global standards will be essential to realising the potential that data has to deliver a better world for all of us. There will be complaints about the cost of implementation. I know first hand how expensive it can be to simultaneously open up and protect sensitive core systems. 

The alternative is siloed data that holds back innovation. There will also be justified concerns that easier data sharing could lead to new user risks. Security must be a non-negotiable principle in designing intercompany interfaces and protecting access to sensitive data. But Open Banking shows that these challenges are resolvable. …(More)”.

France Bans Judge Analytics, 5 Years In Prison For Rule Breakers


Artificial Lawyer: “In a startling intervention that seeks to limit the emerging litigation analytics and prediction sector, the French Government has banned the publication of statistical information about judges’ decisions – with a five year prison sentence set as the maximum punishment for anyone who breaks the new law.

Owners of legal tech companies focused on litigation analytics are the most likely to suffer from this new measure.

The new law, encoded in Article 33 of the Justice Reform Act, is aimed at preventing anyone – but especially legal tech companies focused on litigation prediction and analytics – from publicly revealing the pattern of judges’ behaviour in relation to court decisions.

A key passage of the new law states:

‘The identity data of magistrates and members of the judiciary cannot be reused with the purpose or effect of evaluating, analysing, comparing or predicting their actual or alleged professional practices.’ *

As far as Artificial Lawyer understands, this is the very first example of such a ban anywhere in the world.

Insiders in France told Artificial Lawyer that the new law is a direct result of an earlier effort to make all case law easily accessible to the general public, which was seen at the time as improving access to justice and a big step forward for transparency in the justice sector.

However, judges in France had not reckoned on NLP and machine learning companies taking the public data and using it to model how certain judges behave in relation to particular types of legal matter or argument, or how they compare to other judges.

In short, they didn’t like how the pattern of their decisions – now relatively easy to model – were potentially open for all to see.

Unlike in the US and the UK, where judges appear to have accepted the fait accompli of legal AI companies analysing their decisions in extreme detail and then creating models as to how they may behave in the future, French judges have decided to stamp it out….(More)”.

The Landscape of Open Data Policies


Apograf: “Open Access (OA) publishing has a long history, going back to the early 1990s, and was born with the explicit intention of improving access to scholarly literature. The internet has played a pivotal role in garnering support for free and reusable research publications, as well as stronger and more democratic peer-review systems — ones are not bogged down by the restrictions of influential publishing platforms….

Looking back, looking forward

Launched in 1991, ArXiv.org was a pioneering platform in this regard, a telling example of how researchers could cooperate to publish academic papers for free and in full view for the public. Though it has limitations — papers are curated by moderators and are not peer-reviewed — arXiv is a demonstration of how technology can be used to overcome some of the incentive and distribution problems that scientific research had long been subjected to.

The scientific community has itself assumed the mantle to this end: the Budapest Open Access Initiative (BOAI) and the Berlin Declaration on Open Access Initiative, launched in 2002 and 2003 respectively, are considered landmark movements in the push for unrestricted access to scientific research. While mostly symbolic, the effort highlighted the growing desire to solve the problems plaguing the space through technology.

The BOAI manifesto begins with a statement that is an encapsulation of the movement’s purpose,

“An old tradition and a new technology have converged to make possible an unprecedented public good. The old tradition is the willingness of scientists and scholars to publish the fruits of their research in scholarly journals without payment, for the sake of inquiry and knowledge. The new technology is the internet. The public good they make possible is the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it by all scientists, scholars, teachers, students, and other curious minds.”

Plan S is a more recent attempt to make publicly funded research available to all. Launched by Science Europe in September 2018, Plan S — short for ‘Shock’ — has energized the research community with its resolution to make access to publicly funded knowledge a right to everyone and dissolve the profit-driven ecosystem of research publication. Members of the European Union have vowed to achieve this by 2020.

Plan S has been supported by governments outside Europe as well. China has thrown itself behind it, and the state of California has enacted a law that requires open access to research one year after publishing. It is, of course, not without its challenges: advocacy and ensuring that publishing is not restricted a few venues are two such obstacles. However, the organization behind forming the guidelines, cOAlition S, has agreed to make the guidelines more flexible.

The emergence of this trend is not without its difficulties, however, and numerous obstacles continue to hinder the dissemination of information in a manner that is truly transparent and public. Chief among these are the many gates that continue to keep research as somewhat of exclusive property, besides the fact that the infrastructure and development for such systems are short on funding and staff…..(More)”.

How Can We Overcome the Challenge of Biased and Incomplete Data?


Knowledge@Wharton: “Data analytics and artificial intelligence are transforming our lives. Be it in health care, in banking and financial services, or in times of humanitarian crises — data determine the way decisions are made. But often, the way data is collected and measured can result in biased and incomplete information, and this can significantly impact outcomes.  

In a conversation with Knowledge@Wharton at the SWIFT Institute Conference on the Impact of Artificial Intelligence and Machine Learning in the Financial Services Industry, Alexandra Olteanu, a post-doctoral researcher at Microsoft Research, U.S. and Canada, discussed the ethical and people considerations in data collection and artificial intelligence and how we can work towards removing the biases….

….Knowledge@Wharton: Bias is a big issue when you’re dealing with humanitarian crises, because it can influence who gets help and who doesn’t. When you translate that into the business world, especially in financial services, what implications do you see for algorithmic bias? What might be some of the consequences?

Olteanu: A good example is from a new law in the New York state according to which insurance companies can now use social media to decide the level for your premiums. But, they could in fact end up using incomplete information. For instance, you might be buying your vegetables from the supermarket or a farmer’s market, but these retailers might not be tracking you on social media. So nobody knows that you are eating vegetables. On the other hand, a bakery that you visit might post something when you buy from there. Based on this, the insurance companies may conclude that you only eat cookies all the time. This shows how even incomplete data can affect you….(More)”.

EU countries and car manufacturers to share information to improve road safety


Press Release: “EU member states, car manufacturers and navigation systems suppliers will share information on road conditions with the aim of improving road safety. Cora van Nieuwenhuizen, Minister of Infrastructure and Water Management, agreed this today with four other EU countries during the ITS European Congress in Eindhoven. These agreements mean that millions of motorists in the Netherlands will have access to more information on unsafe road conditions along their route.

The data on road conditions that is registered by modern cars is steadily improving. For instance, information on iciness, wrong-way drivers and breakdowns in emergency lanes. This kind of data can be instantly shared with road authorities and other vehicles following the same route. Drivers can then adapt their driving behaviour appropriately so that accidents and delays are prevented….

The partnership was announced today at the ITS European Congress, the largest European event in the fields of smart mobility and the digitalisation of transport. Among other things, various demonstrations were given on how sharing this type of data contributes to road safety. In the year ahead, the car manufacturers BMW, Volvo, Ford and Daimler, the EU member states Germany, the Netherlands, Finland, Spain and Luxembourg, and navigation system suppliers TomTom and HERE will be sharing data. This means that millions of motorists across the whole of Europe will receive road safety information in their car. Talks on participating in the partnership are also being conducted with other European countries and companies.

ADAS

At the ITS congress, Minister Van Nieuwenhuizen and several dozen parties today also signed an agreement on raising awareness of advanced driver assistance systems (ADAS) and their safe use. Examples of ADAS include automatic braking systems, and blind spot detection and lane keeping systems. Using these driver assistance systems correctly makes driving a car safer and more sustainable. The agreement therefore also includes the launch of the online platform “slimonderweg.nl” where road users can share information on the benefits and risks of ADAS.
Minister Van Nieuwenhuizen: “Motorists are often unaware of all the capabilities modern cars offer. Yet correctly using driver assistance systems really can increase road safety. From today, dozens of parties are going to start working on raising awareness of ADAS and improving and encouraging the safe use of such systems so that more motorists can benefit from them.”

Connected Transport Corridors

Today at the congress, progress was also made regarding the transport of goods. For example, at the end of this year lorries on three transport corridors in our country will be sharing logistics data. This involves more than just information on environmental zones, availability of parking, recommended speeds and predicted arrival times at terminals. Other new technologies will be used in practice on a large scale, including prioritisation at smart traffic lights and driving in convoy. Preparatory work on the corridors around Amsterdam and Rotterdam and in the southern Netherlands has started…..(More)”.

The Right to the Datafied City: Interfacing the Urban Data Commons


Chapter by Michiel de Lange in The Right to the Smart City: “The current datafication of cities raises questions about what Lefebvre and many after him have called “the right to the city.” In this contribution, I investigate how the use of data for civic purposes may strengthen the “right to the datafied city,” that is, the degree to which different people engage and participate in shaping urban life and culture, and experience a sense of ownership. The notion of the commons acts as the prism to see how data may serve to foster this participatory “smart citizenship” around collective issues. This contribution critically engages with recent attempts to theorize the city as a commons. Instead of seeing the city as a whole as a commons, it proposes a more fine-grained perspective of the “commons-as-interface.” The “commons-as-interface,” it is argued, productively connects urban data to the human-level political agency implied by “the right to the city” through processes of translation and collectivization. The term is applied to three short case studies, to analyze how these processes engender a “right to the datafied city.” The contribution ends by considering the connections between two seemingly opposed discourses about the role of data in the smart city – the cybernetic view versus a humanist view. It is suggested that the commons-as-interface allows for more detailed investigations of mediation processes between data, human actors, and urban issues….(More)”.

107 Years Later, The Titanic Sinking Helps Train Problem-Solving AI


Kiona N. Smith at Forbes: “What could the 107-year-old tragedy of the Titanic possibly have to do with modern problems like sustainable agriculture, human trafficking, or health insurance premiums? Data turns out to be the common thread. The modern world, for better or or worse, increasingly turns to algorithms to look for patterns in the data and and make predictions based on those patterns. And the basic methods are the same whether the question they’re trying to answer is “Would this person survive the Titanic sinking?” or “What are the most likely routes for human trafficking?”

An Enduring Problem

Predicting survival at sea based on the Titanic dataset is a standard practice problem for aspiring data scientists and programmers. Here’s the basic challenge: feed your algorithm a portion of the Titanic passenger list, which includes some basic variables describing each passenger and their fate. From that data, the algorithm (if you’ve programmed it well) should be able to draw some conclusions about which variables made a person more likely to live or die on that cold April night in 1912. To test its success, you then give the algorithm the rest of the passenger list (minus the outcomes) and see how well it predicts their fates.

Online communities like Kaggle.com have held competitions to see who can develop the algorithm that predicts survival most accurately, and it’s also a common problem presented to university classes. The passenger list is big enough to be useful, but small enough to be manageable for beginners. There’s a simple set out of outcomes — life or death — and around a dozen variables to work with, so the problem is simple enough for beginners to tackle but just complex enough to be interesting. And because the Titanic’s story is so famous, even more than a century later, the problem still resonates.

“It’s interesting to see that even in such a simple problem as the Titanic, there are nuggets,” said Sagie Davidovich, Co-Founder & CEO of SparkBeyond, who used the Titanic problem as an early test for SparkBeyond’s AI platform and still uses it as a way to demonstrate the technology to prospective customers….(More)”.