Why you cannot classify political YouTube just left and right
A new paper titled “Evaluating the scale, growth, and origins of right-wing echo chambers on YouTube” was published as a pre-print earlier this week by Hosseinmardi et al. This paper continues the new stream of studies which investigate the effects of YouTube and YouTube recommendation algorithm on political discourse. It also references the paper “Algorithmic extremism: Examining YouTube’s rabbit hole of radicalization”, authored by Mark Ledwich and myself and our work on classification of YouTube channels. The new paper by Hosseinmardi et al. has become controversial due to the far-left to far-right classification applied as the bases of the analysis of political echo chambers.
As someone who has been tackling the classification issue as well, I would like to offer some thought on the matter. Our team behind the study “Algorithmic extremism”, recfluence.net which illustrated the flow of the recommendation algorithm as well as the subsequent visualization of a vastly more complete political YouTube data, the website transparency.tube has tackled with the complexity of both manual and algorithmic classification of data.
What makes the classification of political YouTube content extremely difficult is both the changing political landscape as well as the multiple categories where one could classify the content. Furthermore, analysis of political content is prone to one’s personal bias. Our research and the websites recfluence.net and transparency.tube attempt to analyze the direction of recommendation algorithm and currently maps out over eight thousand political channels across a multi-label analysis which attempts to capture both the political leaning as well as the content type.
To generate the data for the initial research, we conducted a manual classification of the first eight hundred channel. First, to reduce bias, the classification was done by three people separately. Three politically rather diverse labellers watched content from all of the channels we labelled before assigning different “soft tags” that would capture the content as the best approximation.
Second, we attempted to tackle the complexity of the political landscape by creating a multi-label representation of online politics. We conducted a high left-centre-right classification to generate an overview of the content. However, because this is very simplistic, we also applied 18 different labels and allowed for one content creator to be “tagged” with multiple labels. Are these 18 labels enough to classify all political thought on YouTube accurately and without any room for error? Certainly not. However, to study the flow of the recommendation algorithm, some classification has to be created. Showing only the flow of recommendations from individual channels would be incomprehensible.
There are also other things which complicated the political classification work. I do believe that political thought is not a linear spectrum, but is better represented as the four quadrants of the political compass. However, this very granular breakdown of one’s political perspectives, even if ideal, would only apply for the YouTube channels where a single content creator clearly positions their content along with the four quadrants. This framework cannot capture channels which have multiple creators, let alone mainstream news channels such as CNN or Fox News,, i.e. we cannot administer each one a political compass test and then aggregate the results. Even more ambitious effort would be an attempt to apply the theory of moral foundations by Jonathan Haidt, which captures six different dimensions that might determine one’s political leaning. However, this is another framework that only works well on individual level.
The political polarization plaguing our times means that some content creators defy simplistic labels. For example, we found that there is a significant portion of channels which focus mainly on resistance towards “woke” and which all share a similar audience base, to whom this critique is appealing. However, based on the content alone, it is challenging and sometimes impossible to get the sense of the political leaning of the content creators. This is the reason why our classification avoids this and transparency.tube labels are based on the classification on the content itself. This leads to a rough mapping that designates anti-woke channels as right-wing because much of the audience is shared with channels where the political leaning is much more explicit. Does this mean that all these content creators who one can think as anti-woke are on the right? Definitely not. However, their content corresponds to the phenomena which are at this point seen as a right-wing position. This difficulty further leads to a discussion where the “traditional” left-centre-right mapping loses all meaning.
Finally, a question which has been raised in the online chatter is whether one should attempt to classify channels at all and if any classification can be damaging to the reputation of the content creators. I am sympathetic to these concerns, which is why I have always specifically advocated for classification, which attempts to both capture the complexity and provide an accurate description of the content. Our study intentionally rejected nebulous and now all-but meaningless terms such as “alt-right” in favour of more nuanced categories. However, without some classification, even a rough one, presenting results of the studies would become impossible and misleading news and studies that wrongfully claim that YouTube is the great radicalizer will continue to spread if no studies are countering them.
What I believe is important for academics who wish to venture into the midst of political YouTube and who attempt to understand the content creators is to try to take YouTube content creator feedback into consideration. When mistakes and misclassification in the algorithmically classified data on transparency.tube were pointed out, the research team responded either by removing data, reclassifying or in some cases by renaming and clarifying the meaning of the labels. The site intends to provide information on the current state of political YouTube and analysis on discourse, not hit lists or political witch hunt materials.
As a researcher, my greatest goal is to find the truth about YouTube algorithms, radicalization and political bubbles. I am confident that our prior study and websites recfluence.net and transparency.tube offer currently the most thorough, accurate, nuanced and technically the most advanced representation of the political spheres on the platform. I would welcome more discussion on this difficult topic, and I am always happy to offer any assistance on researchers who are tackling the same issues. The data and algorithms that have been used to create the datasets are accessible and open to everyone.