Platform Safety and Freedom of Expression: A Long Road Ahead for Content Moderation

Such moves on social media platforms regularly provoke vigorous public discourse when platforms change their content policies. In early 2025, Meta subtly rebranded its approach to hate speech moderation with the phrase, “More expression, fewer mistakes.” This well-crafted PR slogan bores back an old worry: is it really feasible for platforms to carry out an act of balancing free talk and protecting its users? In this paper, we conduct a critical examination of the contradictions in governance in digital spaces. Looking through Meta’s 2025 policy modification, it highlights a main quandary: within the vernacular of moderation as a category of neutral tools meant to strike a balance between liberty and protection, moderation practices are rarely devoid of stylish or unconscious impact shaped by political flicks and ideological interests.

Neutrality in content moderation is difficult in itself because such processes are formed by cultural and legal contexts. The First Amendment grants the United States a range of free speech, causing platforms to deem political material particularly fragile and take sole caution when deleting it. As a result, in the lead-up to the January 6, 2021, Capitol insurrection, a large volume of conspiracy theories, inflammatory rhetoric, and aggressive speech circulated widely through Facebook groups and on X. These narratives ultimately escalated into a large-scale violent event, as thousands of supporters of former President Donald Trump attempted to disrupt the certification of the presidential election results. The attack resulted in multiple deaths and numerous injuries, and it sparked widespread concern over the role of social media platforms in amplifying violence and facilitating the spread of conspiratorial content. This serves as a paradigmatic example of how the absence of effective moderation can lead to real-world crises (Frenkel, Isaac, & Conger, 2021).In contrast, platforms in China are required to proactively remove content deemed threatening to social stability or national unity. It is rare to see widespread discussions on the shortcomings of political policies on Chinese social media, as such discourse is heavily regulated by the state. For instance， platforms like Weibo and Douyin will censor content related to ethnic separatism or social unrest. While such efforts may have curbed the spread of hate speech to some extent, they also resulted in the frequent deletion or silencing of posts related to feminist activism, LGBTQ+ rights, and ethnic minority advocacy. This illustrates how state-driven moderation can easily coincide with the suppression of dissenting voices (The Guardian, 2024). These dynamics highlight the inherent difficulty in striking a balance between protecting freedom of expression and ensuring user safety, as content moderation tends to reflect political biases.Thus, it becomes evident that the classification of hate speech is closely intertwined with political considerations. There is no universally accepted definition of “hate speech”; rather, its interpretation varies across media platforms and national jurisdictions. In liberal democracies, in order to avoid infringing upon freedom of expression, only a limited range of content is typically designated as hate speech. For instance, former President Donald Trump has posted racially charged remarks on the social media platform X (formerly Twitter), yet under U.S. law or the platform’s own moderation standards, such statements were not formally classified as hate speech. This may have been influenced by the politics of the platform or of the platform’s support for Trump. In China, the definition of hate speech is much broader, but Twitter suspended Trump’s account — and itself became a subject of controversy — only after his account incited the rioting at the Capitol. For example, content that challenges the state or dominant ideological narratives is often removed or censored—such as Xinjiang content, content on Xinjiang, gender equality content, or people from Taiwan. Videos that are relevant are swiftly taken down, and related accounts are usually banned. An examination of the ways in which both media and government actors in China and the United States construct different understandings of freedom of expression and hate speech ultimately shines light upon the distinctly different political systems of the two nations. This also shows that, in practice, whether something is considered hate speech or otherwise very often is highly correlated with current political authority. Often, content moderation is not for the reason that the content itself is hateful but rather because it goes against the social or political grain.

Secondly, algorithms are far from neutral tools. Social media platforms not only decide what content to remove but also determine what content is algorithmically promoted and given greater visibility. In pursuit of traffic, political influence, or user retention, these platforms often prioritize content that provokes emotional reactions—especially extreme or divisive views. When content is predicted by algorithms to generate high engagement, it is more likely to be supported and amplified by the platform. This amplification contributes to toxic online environments, where users are incentivized to attack one another through polarizing discourse.For instance, YouTube has been shown to guide users from innocuous videos toward conspiracy-driven content to increase discussion and viewer retention. As Kim (2024) observes, “Conspiracy theories often exploit negative emotions to persuade audiences, and such videos tend to receive more recommendations due to their high traffic and profitability.” Massanari (2017) similarly found that Reddit’s design features—such as anonymity, upvote-based ranking, and minimal moderation—interact with a culture of “geek masculinity” to foster toxic subcommunities like r/TheFappening and r/The_Donald. While Reddit publicly promotes itself as a neutral platform that encourages open discussion, its design often silences victims while protecting aggressors. The hate and harassment in these communities abuse the architecture of the platform and because of the traffic they bring in, the platform tends to either tolerate such behaviours or implicitly support them. Hate and violence are thereby normalized under the guise of “satire” or “humor,” turning the platform into a breeding ground for extremism.

This dynamic echoes Matamoros-Fernández’s (2017) concept of “platformed racism,” which describes how hateful content is not only circulated on platforms but also amplified by their built-in functionalities. Features such as likes, shares, and recommendation systems—while framed as tools for enhancing “user engagement”—often push harmful and controversial content to wider audiences. The algorithms were designed to allow topics filled with hate to snowball in terms of visibility and influence. Such content is not rewarded on the platform by accident but is part of the commercial imperatives built into data driven logic.

In addition, the governance of hate speech on media platforms is gradually more coupled with user dynamics and also intersects with the corporate governance of this practise. In an era of platform capitalism and mastodon driven interaction, users themselves are increasingly used to produce and spread hate speech, especially by platform design that incentivizes emotional expression. As Carlson and Frazer (2018) conceptualize through their framework of “emotional governance,” platforms are structurally configured to encourage affective engagement while outsourcing the responsibility for maintaining safety to the users. This arrangement hurts marginalised communities disproportionately.

Experiencing systemic backlash, Indigenous users in particular are reflected in their research. Indigenous Australian footballer Adam Goodes became the subject of extended online abuse, making it a compelling example. In 2013, during an AFL match, a 13-year-old white girl shouted “ape” at Goodes. Goodes did not sit hopefully on that, reporting the incident and having the girl ejected from the stadium, but the ensuing public controversy continued to rage in traditional and social media in 2015, when Goodes celebrated a goal with a symbolic war dance code for a proud, unashamed display of his Aboriginality and response to racism. However, many in the public and media deemed the act “provocative,” triggering an avalanche of online abuse. Social media users labeled Goodes with racist epithets, disseminated hashtags such as #GoodesMustGo, and organized coordinated harassment campaigns via Facebook and Twitter. These attacks continued for months until finally Goodes ended his career early in a tragic conclusion to this career.This is not an anecdote, however; it is exemplification of a structure of digital exclusion. Here, platform infrastructures that fail to intervene adequately in emotional aggression online play a role in weaponizing their behaviours against minority communities, and in some cases they are the very tools that enable such abuse. As a result, the voices of minority users are threateningly silenced by the dominant ones and made irrelevant. As a result, the regulation of hate speech presents not only content, but also affect management problems involving how users manage themselves emotionally within specific contexts, which makes affective expression governance a difficult and ongoing effort in the digital environment.

Various policy approaches have been taken by different governments in terms of how to regulate and limit the hate speech on social media platforms. Yet, they usually feature some restrictions of their own. Australia’s Online Safety Act is one such example, which adopts a “duty of care” framework. It requires platforms to prevent foreseeable harms like cyberbullying, hate speech, and threats to child and adolescent safety, proactively. Though the framework looks progressive in theory, critics say several social media companies have already opposed it for fearing it will restrict free speech. Although, the act’s actual effectiveness is yet to be judged over time (Office of the eSafety Commissioner 2021), China’s regulatory model is very efficient, but very much more authoritarian (Office of the eSafety Commissioner 2021), which is influenced by much wider ideological differences that exist for the society and politics. Usually hateful content is removed quickly, but such regulatory efficiency comes at the cost of transparency. While expressing legitimate grievances, voices can be suppressed; in some cases, people can even be subjected to state surveillance or punishment because of overly rigid censorship practises. Matamoros-Fernández (2017) refers to this dynamic as an expression of “platform sovereignty,” where content governance is guided more by nationalist imperatives than democratic values.This paper argues that the most viable way forward in addressing the “prisoner’s dilemma” of hate speech governance lies in a tripartite model of co-regulation: first, platforms must commit to greater transparency in their moderation practices; second, governments must ensure oversight that is both accountable and effective; and third, civil society actors must be empowered to participate in content monitoring and ethical deliberation.

Overall, content moderation has never been a neutral process. Whether it is Reddit’s architectural tolerance of toxic communities (Massanari, 2015), or the emotional trauma experienced by Indigenous users on Australian platforms (Carlson & Frazer, 2018), these examples make clear that social media platforms are not merely intermediaries of speech—they are also active producers and enablers of hate speech. These platforms are far from innocent.The case of Adam Goodes starkly illustrates how such hate speech can escalate into a form of digital violence. A widely respected Indigenous athlete, Goodes became the target of algorithmic amplification and platform inaction, leading to waves of abuse, mockery, and silencing that ultimately forced his premature retirement. Not because he said something wrong, but because the platform’s architecture was never built to protect justice. Platforms do not simply “tolerate” racism—they help organize and amplify it.We must recognize that platforms are not neutral canvases—they are political infrastructures. Thus, content moderation should not be reduced to a binary debate between “freedom of speech” and “censorship.” What is urgently needed is systemic accountability: for the persistent lack of transparency in decision-making, the inconsistent or absent sense of responsibility, and the culturally biased nature of many moderation teams.The way forward does not lie solely in “more regulation” or “stricter policies,” but in confronting the underlying power structures shaped by political interests and algorithmic profit motives. Unless platforms are fundamentally restructured to recognize harm, repair injustice, and redistribute power, content moderation will remain a misnomer—a superficial solution that disguises deeper crises. Achieving a fair and effective standard for regulating hate speech while preserving freedom of expression on social media remains one of the most urgent and complex challenges of our time.

Reference list：

Frenkel, S., Conger, K., & Isaac, M. (2021, January 6). Violent Capitol mob was planned on social media. The New York Times.

https://www.nytimes.com/2021/01/06/us/politics/protesters-storm-capitol-hill-building.html

The Guardian. (2024, October 23). China cracks down on ‘uncivilised’ online puns used to discuss sensitive topics.

https://www.theguardian.com/world/2024/oct/23/china-meme-online-pun-crackdown-rules

Kim, S. J., & Chen, K. (2024). The use of emotions in conspiracy and debunking videos to engage publics on YouTube. New Media & Society, 26(7), 3854-3875.

Massanari, A. (2017). Gamergate and The Fappening: How Reddit’s algorithm, governance, and culture support toxic technocultures. New

Media & Society, 19(3), 329–346. https://doi.org/10.1177/1461444815608807

Matamoros-Fernández, A. (2017). Platformed racism: the mediation and circulation of an Australian race-based controversy on Twitter, Facebook and YouTube. Information, Communication & Society, 20(6), 930–946.

https://doi.org/10.1080/1369118X.2017.1293130

Bronwyn Carlson and Ryan Frazer. (2018). Social Media Mob: Being Indigenous Online. Macquarie University. https://research-management.mq.edu.au/ws/portalfiles/portal/85013179/MQU_SocialMediaMob_report_Carlson_Frazer.pdf

Commonwealth of Australia. (2021). Online Safety Act 2021 (No. 76, 2021).

Federal Register of Legislation.

https://www.legislation.gov.au/Details/C2021A00076

Platform Safety and Freedom of Expression: A Long Road Ahead for Content Moderation

Be the first to comment

Leave a Reply Cancel reply