Automated Content Moderation: A technology that is temporarily unreliable

“Blue Sky Twitter”.
Figure 1. “Blue Sky Twitter”. Image: mkhmarketing, CC BY 2.0


In this essay,  I will first shed light on a brief history of this technological innovation, explaining how the rise of Web 2.0 leads to the increased need for content moderation and why automated content moderation is required to replace paid content reviewers. Then I will argue only paid human moderators would benefit from this technology as the traditional task of content moderation is repetitive, traumatic, and exploitative (Robert, 2019). By contrast, platforms would not see automated content moderation as beneficial, on one hand, de-contextualised algorithms fail to identify the complexity of human language and image-based material. This would pose risks to platforms’ reputation (Gillespie, 2018; & Gorwa & Binns, et al, 2020; Robert, 2019), on the other hand, drawing the line between the acceptable and the unacceptable is politically difficult. The conclusion drawn is that automated content moderation will be reliable for platforms if two conditions are met: technical innovation in algorithms and government intervention.

A brief history of Automated Content Moderation

The need for content moderation emerged in parallel with the transformative rise of Web 2.0 in which changing the role of internet users from passive receivers to active content contributors (Campbell, 2009). Despite the early utopian fantasy of the internet creates a world free of prejudices (Barlow, 1996), this transformation nevertheless enables users’ diversified offline political, social, and cultural attitudes, existing prejudices, anti-social ideologies been reproduced online as well (Noble, 2018, p. 32). In other words, content moderation is required as users can freely express their thoughts, and sometimes these can be “toxic.”

Figure 2. An image illustrates the differences between Web 1.0 and Web 2.0. Image: Seoweb, all rights reserved.

In the early stage of Web 2.0 where the mainstream sites in which conversations take place were bulletin boards and forums such as USENET, the task of content management was predominately undertaken by dedicated administrators and volunteers whose implemented local rules around community norms and users’ behaviours (Roberts, 2019 & Gorwa & Binns, et al, 2020).

As the internet became more accessible in the 1990s, the rapid growth of internet users led to the proliferation of spam, this what Gorwa & Binns, et al (2020) has termed “scale problem” led to some experiments of automated filters such as the ‘Automated Retroactive Minimal Moderation’(ARMM) launched in 1993 (Gorwa & Binns, et al, 2020, p. 3) as manual intervention no longer capable of dealing with the tremendous amount of harmful content,

Later on, the emergence of social media platforms at the beginning of the 21st century facilitated a new model of content moderation, which Roberts (2019) dubbed as “commercial content moderation”(CCM). The essence of CCM is a combination of human labor and automated filters (Gerrard, 2018 & Roberts, 2019). On one hand, scale problem can be solved by deploying automated filters that matching uploaded content with existing “blacklists”, flagging potentially dangerous content for human reviewers, on the other hand, human reviewers equipped with cultural competence and the ability to recognise complex linguistic word manipulation can identify toxic content that machine cannot identify (Gerrard, 2018). In other words, CCM encapsulates both accuracy and efficiency in content moderation.

Figure 3. A flowchart explaining how CCM works. Image: Dixongl, all rights reserved.

However, this does not mean that CCM is ethical. The facts of conducting content moderation and using human labor have been long hidden invisible to the public until the Facebook internal O Desk document leaked in 2012 (Roberts, 2019). Gillespie (2018), Noble (2018), and Suzor (2019) argued social media platforms kept these processes unseen to their users because platforms feed on user-generated content to sustain their operation. They wish to present themselves as an open space for expressing instead of a place where users’ uploaded content under 24/7 surveillance. They also don’t want the public to find out that they are outsourcing the moderation task as letting under-paid human moderators review the worst of humanity repetitively might damage the platforms’ reputation (Roberts, 2019). Regardless, the leak of the Facebook document has shocked the public, especially when people realised paid-reviewers are exposed directly to toxic content without psychological protection (Roberts, 2019). In recent years, under increasing public and government pressures, platforms have been forced to devise more intelligent approaches to content moderation, particularly automated decision-making systems such as algorithmic machine-learning to gradually replace the traditional approach involves human moderators.

Figure 4. A bar graph shows a dramatic increase in the use of Facebook’s machine-learning content moderation. Image: The Economics, some rights reserved. Source: Facebook

Watch “Field of Vision: The Moderators” directed by Ciaran Cassidy & Adrian Chen to explore the backstage work of paid-content reviewers in India.

Figure 5. “Field of vision: The moderators”. Video: field of vision, Youtube, all rights reserved.

Who benefits(socially): Human moderators

Automated decision-making moderation systems have significantly reduced the psychological burdens taken by human moderators. As mentioned above, human labor is a crucial part of the traditional model of CCM because human moderators are needed to identify contextualised content and manually update “the toxic content blacklist”(Gorwa & Binns, et al, 2020). Roberts (2019, p. 65) outlined four ways in which human moderators are exploited: (1). Nearly all paid-content reviewers are outsourced contractors rather than official employees of platforms. (2). Wages disproportionate to psychological hazards. (3). Round-the-clock work. (4). No psychological consultations were provided. Nevertheless as machine-learning, in particular, the accuracy of identifying harmful content improves as human reviewers uploading database, algorithms have been smarter than once they were initially launched. Human reviewers are now playing a less crucial role in deciding what content will go or stay in accordance with company policies, this has significantly reduced the frequency of human exposure to harmful content, despite this will inevitably lead to a societal issue of unemployment.

Figure 6. The featured image of a film “The cleaners”. Image: PBS, all rights reserved. Source:

Who does not benefit (economically and politically)?: Platforms

Although it is widely acknowledged that platforms directly control the key business in the field of automated content moderation because they draw the line between the acceptable and the unacceptable, they unilaterally exercise the power of removing, hiding, or blocking content deemed toxic. They design the algorithmic moderation system with little or no government and public oversight (Gorwa & Binns, et al, 2020). This is not to say that platforms don’t have to appease individual users and governments. Rather, they care about these actors very much (Gillespie, 2018). There are two reasons why platforms do not see the benefits of automated content moderation: first, algorithmic decision-making systems make incorrect decisions that might damage platforms’ reputation. Second, it is politically challenging to decide whether to be “harsh” on harmful content or using “softer” ways to neutralise them.

As for the former, algorithmic moderation system has been widely criticised for being unreliable while vetting information consists of complicated use of language, image-based, and context-dependent material or content with unknown features (Gillespie, 2018; & Gorwa & Binns, et al, 2020; Roberts 2019). In March 2019, a New Zealand murder brutally killed 51 people in a live-streaming on Facebook; the platform’s algorithm initially failed to detect the video as it was shot from a first-person viewpoint, so its patterns of pixels didn’t match with anything existed in the blacklist (Dickson, 2019). Not only has Facebook’s moderation system been subsequently criticsed by the public, but the reliability of automated content moderation as technology has also been widely questioned. As Gillespie (2018) argued,

‘One failure may incur considerable public outrage to overshadow a million previous success in content moderation”(Gillespie, 2018, p. 9).

Platforms, therefore, don’t consider automated moderation system beneficial because incorrect decisions made by algorithms might seriously devastate platforms’ reputation. Platforms value reputation because a bad reputation leads to user loss, which in turn diminishes user-generated content that platforms could commodify.

Figure 7. A screenshot of the New Zealand massacre video, shot 7th Oct.

Furthermore, automated content moderation poses a new political challenge for platforms to draw the line between the inappropriate and the appropriate. This involves a deeper socio-political problem of free expression. Platforms must consider to what extent does this freedom should be regulated to avoid being used to victimise others? Gillespie (2018) argued that it is the platform’s values and interests that determine what is appropriate and what is not. However, if the platform’s algorithm was to become too restrictive on defining what content is unacceptable and being too harsh on removing them, backlashes may occur, the platform might take the risk of losing users to reassure its advertisers because users would consider restricted content removing as a serious violation of the political right of free expression (Noble, 2018). By contrast, if a platform such as Twitter that emphasises the value of US-based free speech, its algorithm might be more likely to ignore potentially harmful content. This would also lead to a reputation issue and, if possible, government pressure (Gillespie, 2018).

Figure 8. “Facebook Jail Rules”. Image: Makeover Arena. All rights reserved.

In conclusion, this essay has delineated a brief genesis of automated content moderation. It originated from the Web 2.0 culture; the later proliferation of spam overwhelmed the maximum capacity of manually remove. The rise of social media platforms creates CCM, a model of moderation combining algorithms and human labor. In recent years, platforms have heavily invested in developing machine-learning moderation systems. I’ve argued, human moderators will benefit from this transformative change as it has greatly reduced human reviewers’ exposure to harmful content, on the other hand, platforms would not see the benefits(at least temporarily) unless a significant technical innovation presents that enables algorithms to identify the complexity of human language and contextualised content, and government intervention is in place to set a universal agenda of what is appropriate and what isn’t.




Barlow, J. P. (1996, February 8). A Declaration of the Independence of Cyberspace. Retrieved 27 February 2017, from Electronic Frontier Foundation website:

Campbell, S. (2009). Effects of mobile communication, In Media effects : advances in theory and research (pp. 592-606). Routledge.

Dickson, B. (2019, July 10). Human help wanted: Why AI is terrible at content moderation. Retrieved from:

Gerrard, Y. (2018). Beyond the hashtag: Circumventing content moderation on social media. New Media & Society, 20(12), 4492–4511.

 Gillespie, T. (2018). All platforms moderate. In Custodians of the internet: platforms, content moderation, and the hidden decisions that shape social media (pp. 1–23). New Haven: Yale University Press. ISBN: 030023502X,9780300235029

Gorwa, R., Binns, R., & Katzenbach, C. (2020). Algorithmic content moderation: Technical and political challenges in the automation of platform governance. Big Data & Society, 7(1).

 Noble, S. U. (2018). A society, searching. In Algorithms of Oppression: How search engines reinforce racism (pp. 15–63). New York University. ISBN: 9781479837243

Suzor N. P.  (2019). Lawless: The secret rules that govern our digital lives. Cambridge, UK: Cambridge University Press.

Roberts, S. T. (2019). Behind the Screen: Content Moderation in the Shadows of Social Media. (pp. 1-69). Yale University Press. ISBN: 0300245319,9780300245318



About Ziang Chai 5 Articles
In 2019, I began to study how media and digital technologies have shaped people's day-to-day activities, and life has just gotten better from there. A second-year BA student from the University of Sydney. Majored in Digital culture.