Response to further critique on our paper “Algorithmic Extremism: Examining YouTube’s Rabbit Hole of Radicalization”
We would like to thank Manoel Horta Ribeiro, Savvas Zannettou, Emiliano De Cristofaro, Gianluca Stringhini and Jeremy Blackburn for a detailed critique of our preprint paper, “Algorithmic Extremism: Examining YouTube’s Rabbit Hole of Radicalization.” Two articles, Ribeiro et al. (2019) paper “Auditing radicalization pathways on YouTube” and Ottoni et al. (2019) paper “Analyzing Right-wing YouTube Channels: Hate, Violence and Discrimination”, authored by the members of this research team were amongst the papers we referenced and critiqued as part of our literature review on YouTube radicalization.
It is great to see a critique that engages with the content of the paper. We are grateful for the feedback and commentary. However, we do not agree with most of the critique posted on the cite iDramaLab and would like to address the critique in this post.
Before getting into the details of the critique, we would like to remind everyone on the topic and the scope of our paper. We are specifically interested in the recommendations algorithm and what kind of recommendations YouTube favors. Our main finding was that YouTube recommendation algorithms favor both partisan mainstream media over independent content creators.
Anonymous Data and Misreading of YouTube’s Algorithm
We have been critiqued for a misunderstanding of literature on recommendation systems from Google and that we are not accounting for the viewing history or personalized accounts. However, we would like to clarify that we do understand that there are differences between the personalized recommendations and recommendations presented to an anonymous account. This difference is clearly listed as our study’s primary limitation in the sections “Limitations and Conclusions”, where we write:
“There are several limitations to our study that must be considered for the future. First, the main limitation is the anonymity of the data set and the recommendations. The recommendations the algorithm provided were not based on videos watched over extensive periods. We expect and have anecdotally observed that the recommendation algorithm gets more fine-tuned and context-specific after each video that is watched.”
Why We Don’t Think Anynymity of the Data is a Major Flaw
Our data shows that the users are directed towards mainstream content and more popular videos. It might be the case that this behavior of the algorithm might only describe recommendations for anonymous content or recommendations for brand new accounts. However, our anonymous data also presents support for the inter-channel recommendation stream. This feature of the recommendations seems to be a common behavior observed by people who are logged in and have watch history: the recommendations presented are based on the channels the users are subscribed to AND to channels adjacent to these seed channels. This was the first claim we analyzed in our paper and which we think is partly supported by the anonymous data. This behavior is likely to be even more supported by personalized data.
Nevertheless, this does not mean that YouTube will only ever recommend channels that one has subscribed to on the front page or in the sidebar. This issue is specifically addressed in the Zhao et al. paper (2019), where they discuss methods to reduce selection bias. The premise of the paper is to find a way to create a recommendation that is not just based on an extant feedback loop but to recommend material that the users would like and share, e.g., generate more traffic to YouTube.
They very explicitly state that their goal is to enhance engagement. This might not go against the radicalization narrative. Radicalizing videos might also generate engagement, but based on purely business rationale; this route seems unlike. First, political YouTube is very niche. Out of 31 million channels, only a handful can be classified as political, and only a few of this handful can be classified as potentially radicalizing. None of the top subscribed channels are primarily political. Driving traffic towards niche channels that bring bad publicity for the company seems like an ill-advised business decision when one could direct the traffic towards more mainstream and more popular content that gets more engagement in the form of likes and shares.
After this, Zhao et al. (2019) state that “one algorithm generates candidates by matching topics of query video. Another algorithm retrieves candidate videos based on how often the video has been watched together with the query video”. This explanation seems to reinforce the notion that similar videos and similar videos with more views are being pushed on the top of the queue.
In other words, the description in the Zhao et al. paper supports the claims that recommended videos are going to be similar and more popular rather than more extreme and less popular. As the number of channels and views show in our paper, in Figure 2 of our paper (reposted below), extreme content is not popular.
The next step in the paper stated that after this original matchup, the user history is applied. However, this only happens after generating the recommended candidate based on popularity. Zhao et al. (2019) also write that: “Our ranking system learns from two types of user feedback: 1) engagement behaviors, such as clicks and watches; 2) satisfaction behaviors, such as likes and dismissals.”. Mainstream content, even mainstream political content, fulfills these categories better than niche political channels ever could.
However, the technical papers are not the only source of information we have on YouTube’s recommendation algorithm. After all, there is no way to know if all of YouTube’s recommendation algorithms even work according to this paper. However, we know that YouTube has partner programs, monetization schemes, and other ways of which they prioritize content. Content favored by YouTube is likely to receive preferential treatment. Why else would YouTube promise their content creators anything if these partner programs would have no effect? We have also learned about the p-scores applied to channels as well a s a slew of other anecdotes on the measures that YouTube has applied to boost or to de-rank content. However, we do not have enough data about these other methods, and thus we decided to leave them out of the scope of this study at this point in time. Nevertheless, all of these anecdotes seem to support our data rather than refute it.
Besides, the majority of the studies, including Ribero et al. paper “Auditing radicalization pathways on YouTube” to whose critique we are responding here, conducted on YouTube, rely on this same anonymous data without major pushback. There are also other studies, which are often cited as authoritative and are applying the same anonymous data: from Pew Research Center, AlgoTransparency, and Albright.
Data Collection Time Period
The persistent critique of our research is linked to the timing of the data collection. We have already addressed this issue in our previous response. We would like to stress again that our study is looking at the current claims of radicalization, which are still the prevalent narrative. Based on the response we received on our paper, it seems that this narrative is still strong, which is why responding to the claims of radicalization that exist right now does not disqualify our paper in any way.
We have also received some critique of the methods and clarity of the manuscript. We disagree with this critique. The methods are explained to great lengths from categorization to data collection. If there are any unclear parts, we would like to hear about this in more detail. Besides, as already stated earlier, the limitations are very clearly stated in their section called “Limitations and Conclusions.”
We were also critiqued for not providing any statistical measures of the significance of their results. Our paper is refuting the claims not by statistical correlation but by presenting the direction of algorithmic recommendations. There is no correlation to be measured here since we are only investigating the flow of recommendations and not user activities that would correspond to these recommendations. This is why we are not stating that users are going to do one thing or another, but the focus of the study is on the algorithm and the directions it recommends. If we are misunderstanding what was meant by statistical measures, we are happy to receive clarifications and work to improve the paper. Also, we can provide more information on the ICC algorithm that we used to validate our internal labeler agreement if it was not clear enough stated in the paper.
We have published all our data, including the original channel tags and the scraped data in GitHub, for anyone to peruse and analyze. If anyone is interested with this research effort, please get in touch and let’s find ways to collaborate.
Our claims have been called bold. The researchers critical of our paper, Ribero et al. wrote in their 2019 paper “Auditing Radicalization Pathways on YouTube”:
“We argue that this finding comprises significant evidence that there has been, and there continues to be, user radicalization on YouTube, and our analyses of the activity of these communities (Sec. 4) is consistent with the theory that more extreme content “piggybacked” the surge in popularity ofI.D.W. and Alt-lite content [cit].”
In comparison, our claims are rather tame.
However, our data set is also more complete and investigates all ends of the political spectrum. This is the reason why we can see the algorithm recommending content that falls closer to the center rather than fringes.
Let’s continue the dialogue on this issue and further address the critique of our paper.
Mark Ledwich & Anna Zaitsev
Ottoni, Raphael, Evandro Cunha, Gabriel Magno, Pedro Bernardina, Wagner Meira Jr, and Virgilio Almeida. “Analyzing right-wing youtube channels: Hate, violence and discrimination.” In Proceedings of the 10th ACM Conference on Web Science, pp. 323–332. ACM, 2018.
Ribeiro, Manoel Horta, Raphael Ottoni, Robert West, Virgílio AF Almeida, and Wagner Meira. “Auditing radicalization pathways on youtube.” arXiv preprint arXiv:1908.08313 (2019).
Zhao, Zhe, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, and Ed Chi. 2019. “Recommending what video to watch next: a multitask ranking system.” In Proceedings of the 13th ACM Conference on Recommender Systems (RecSys ’19). Association for Computing Machinery, New York, NY, USA, 43–51. DOI:https://doi.org/10.1145/3298689.3346997