Automated content moderation: there is still a long way to go

"File:AI cartoon for assisting with content moderation on Wikipedia.jpg" by FauxNeme, CC BY-SA 4.0


Automated content moderation is an efficient technology generated by the rapid development of social platforms and the limitations of human moderators (Gorwa, Binns, & Katzenbach, 2020). Although it is a necessary historical trend in communications media development, whether this technology is mature enough now remains a question worth studying. In this essay, I will briefly introduce genesis of this innovation by analzing the background of online society, and discuss how different social and cultural groups are affected by this networked change. Through these analysis, I think that the automated content moderation itself can still not fully take on the task of content moderation.

The brief history of automated content moderation

Different to traditional publics, networked publics gather users from all over the world into the same network community and allow them to create and share content and have their say in that community (Boyd, 2010). And since most social media platforms allow identity construction (Marwick, 2013), users can easily behave casually even badly online without exposuring their identity as well as taking responsibility. Toxic techno cultures (Massanari, 2017) is a product of users’ excessive freedom. Its spread on social platforms has made it inevitable for users to access “offensive, offensive, or illegal” content (Gillespie, 2018, p.5). These seriously inhibit participation in the online communities, making it necessary to moderate content on the platform. However, platform content moderation didn’t really get much attention until its importance was highlighted in the 2016 US election (Gorwa, Binns, & Katzenbach, 2020). Since then, content management has formally attracted social attention and become one of the major media concerns.

Despite the fact that platforms have employed moderators to manage media content, this does not seem to be an ideal way. Different from the traditional management mode, the scale of the platforms which becomes far larger than before discourages the feasibility of human moderation (Gorwa, Binns, & Katzenbach, 2020). Consider that human moderation is constrained by the scale (Gorwa, Binns, & Katzenbach, 2020) and expensive human input (Cambridge Consultants, 2019), more intelligent approaches to content moderation should be introduced. On the other hand, moderators’ complaints about their work have also shaken up the company’s normal operation of the content moderation program. Professor Jennifer Beckett, from University of Melbourne, also as a moderator, pointed out the negative impact of the most violent, disturbing content on human moderators, she said:

“You have to read all the comments – even the ones no-one else sees because they’re so bad you’ve removed them.”

“It was almost impossible to keep up with the flow of horribly racist, religiously intolerant rants in the comments.”

Harmful content and heavy stress inevitably put human moderators at risk for physical and mental health. In fact, moderators have claimed that malicious content has left them suffering from post-traumatic stress disorder(PTSD). Seeing this, Facebook paid out a whopping $52m to them, and comforted the moderators by introducing more automated content moderation systems.

Video: Google and YouTube moderators’ work experiences and feelings


It seems that automated content moderation which refers to a system that is able to identify users’ online behaviors, then sort and remove content (Gorwa, Binns, & Katzenbach, 2020) is a viable solution. To prove this, Cambridge Consultants (2019) briefly introduced the workflow of AI and human cooperation to moderate content, and pointed out the positive impact of automated systems on the effectiveness of content regulation. As figure 1 shows, this technology can be used for large-scale moderation, save labor costs, and replace human moderators in dealing with negative content. These advantages make it an indispensable assistant of content moderation, indicating that utilizing automated content moderation is an inevitable trend in the development of information management.

figure 1:

Three key ways in which AI can improve the effectiveness of the typical online content moderation workflow
Image:Cambridge Consultants, all rights reserved

As with any technology, automated content moderation is unlikely to be perfect. Despite its overwhelming advantages over human moderators, it may not be a good thing for some people.

Who are the beneficiaries?

Human moderators

As shown in the figure above, the pre-recognition capability of automated content moderation filters out a large number of disturbing content, reducing the workload of the human moderators. This technology would be the front line of defense for a large number of harmful media content. Thus, it as both a helper and a shield for the human moderators, relieving their stress while maintaining their physical and mental health.

Investor and corporations

As mentioned above, on the one hand, it improves the company’s efficiency in processing information and reduces the cost of human input, on the other hand, it can also be a tool for the company to make profits.

Matamoros-fernandez (2017) believes that all the infrastructure construction of the platform is created for the purpose of obtaining economic profits. In addition, the basic business model of the platform is to make profits by collecting user data and using consumers’ attention (ACCC, 2019). These facts suggest the possibility of a profitable design of the platform’ automation system. Except this, the US Communication Decency Act (CDA) 230 has given social platforms the right to monitor content posted by users, allowing them to handle content based on their own content censorship policies without accountability (Matamoros-Fernandez, 2017). This further condones the profit-oriented compaies, which may encourage them to ignore the fairness, legitimacy and potential harm, and instead design the content that should be deleted by the automated content moderation system according to the preferences of users, so as to obtain greater profits. In fact, few users are aware of the mandatory nature of the platform content moderation operation (Gillespie, 2018). Although social media platforms insist that they are liberal and democratic, they never disclose the amount and kind of content they delete (Gillespie, 2018). The reasons behind it can be seen as unpresentable.

Who are not benefited from it?

Vulnerable groups, including those with racial identities and different values

Referring to the status of media factories, Gillespie (2018) claims that most employees of media platforms are educated white males and are liberal or libertarian. This means that these wokers’ values may be embedded in the design of automated content moderation, leading to a bias against the groups they are focusing on. One example is Twitter’s ‘Quality Filter’ (Gorwa, Binns, & Katzenbach, 2020), which aims to identify and block content which is potentially harmful. Yet its apparent liberal bias infringes on conservatives’ right to free expression.

Tweets against double standard and conservative censorship
Image: J_Patriot, all rights reserved

It is also worth thinking about the lack of transparency (Gillespie, 2018) of automated content moderation. This means that what users see may not represent the mainstream ideas, but the ideas that the platform is trying to instil in them. Moreover, users do not know what and how many contents has been removed. Even if those with marginal identities catches the clue, their tiny power makes them speechless.

The limitations of automated content moderation technology are also worrying. This is because it has to be updated at all times, or serious adverse consequences may result. An example is that the Nation of Islam leader Louis Farrakhan compared Jewish people to “termites” in an anti-Semitic attackFor a while, the content could not be cleaned up because the automated system had not been updated. As a result, Twitter was flooded with hate speech against Jews. Although the platform had since reacted, the damage to Jews was irreparable. Limitations are also reflected in the inability of automated moderation systems to accurately recognize humor and tropes (Matamoros-Fernandez, 2017). Although human moderators can do something about it, their restricted understanding of context and their own biases allow some hate speech to slip through the net. Overall, hate speech can still run amok on the Internet, threatening the mental health of vulnerable groups.



In conclusion, human moderators are the main beneficiaries of this Internet revolution, because automated content moderation reduces their stress and helps them to maintain physical and mental health. Investor and corporations are also benefit from it. Because of their immense power, automated systems can be used as a means of making economic profits. However, the situation is less optimistic for vulnerable groups, including those with racial identities and different values, since existing biases and limitation of the automated technology discourage their participation in the online community. Although this technology has undeniably promoted the development of information management, it is still an immature technology due to its potential disadvantages. Perhaps one day in the future, it will be able to understand the context of content and completely replace human moderators.


ACCC (2019). Digital platforms inquiry. Final Report,1-14.

BBC. (2020, May 12). Facebook to pay $52m to content moderators over PTSD. Retrieved from:

Beckett, J. (2014, December 8). ‘Haters gonna hate’ is no consolation for online moderators. The Conversation. Retrieved from:

Boyd, d. (2010). Social Network Sites as Networked Publics: Affordances, Dynamics, and Implications. In Networked Self: Identity, Community and Culture on Social Network Sites (pp. 39–58). Routledge.

Cambridge Consultants. (2019). Use of AI in Online Content Moderation. Retrieved from

Gillespie, T. (2018). All platforms moderate. In Custodians of the internet: platforms, content moderation, and the hidden decisions that shape social media (pp. 1-23). New Haven: Yale University Press. ISBN: 030023503X,9780300235029

Gorwa, R., Binns, R., & Katzenbach, C. (2020). Algorithmic content moderation: Technical and political challenges in the automation of platform governance. Big Data & Society.

Grygiel, J. (2018, November 1). Hate speech is still easy to find on social media. The Conversation. Retrieved from:

Marwick, A. (2013). Online Identity. Chapter 23 in John Hartley, Jean Burgess, and Axel Bruns, A Companion to New Media Dynamics (pp. 355-364). Hoboken, NJ: Wiley­Blackwell.

Massanari, A. (2017). Gamergate and The Fappening: How Reddit’a algorithm, governance, and culture support toxic technocultures. In New Media & Society, 19 (3), 329-346. DOI:

Matamoros-Fernandez, A. (2017). Platformed racism: the mediation and circulation of an Australian race-based controversy on Twitter, Facebook and YouTube. Information, Communication & Society, 20(6), 930-946.







About Yuki Shen 2 Articles
USYD, digital cultures student