The internet fosters instantaneous idea and news sharing and community building, even across continents. Connections that seemed impossible in the past are now a click away thanks to the advancement of tech. However, together with all the good developments comes bad as well. Online hate speech and harassment are two examples of online behavior that not only put people at risk, but also society as a whole. The freedom of expression the internet provides is quite a double edged sword. While expression is encouraged, people tend to misuse this opportunity by spreading hate and content that is deeply prejudicial to people regarding their race, gender, sexual orientation and others. The possibility of actually ‘doing’ something tends to rise as extremist ideologies get more and more prevalent online. Women and minority groups are frequently victims of online bullying. This mistreatment leads to coping mechanisms such as being silenced or retreating to a state of emotional distress. Moderation of hate speech and online bullying remains a challenge to policymakers and the general public which constantly needs to be solved. My aim in writing this blog is to provide insight on how social platforms strive to balance user privacy, freedom of speech, and the responsibility of actually shielding users from harm.

1 Defining Hate Speech and Online Harms

Hate speech entails hate crimes relating to the language, expressions, or images that disparage or put individuals and groups at risk based on their race, gender, ethnicity, religion, disability, or sexual orientation (European Parliament, 2018). Depending on legal frameworks and cultural norms, the nuances may differ from one society to another. It can invoke violence, environmental harm, harmful cancellation of essential services, and the abominable treatment of people when they are unable to defend themselves and are viewed as targets – to perpetuate deeply rooted stereotypes.

Forms of violence or aggressive behavior directly linked to hate speech may extend to harmful online activities. Cyberbullying, doxxing, privacy violations and targeted malicious attacks are just some of the online activities. These drain users’ mental energy and make them reluctant to engage in civic or public debates. Without containment, such actions may spill over into the offline sphere, sometimes provoking violent activities. Facebook has been reported to have problems managing hate speech in conflict-prone geographical areas and commentators argued that incendiary online commentary promoted violence offline, what is referred to as “real” violence (Facebook, 2021).

In the context of global policy frameworks, some content can be so damaging that it may threaten the purpose of dialogue by marginalizing or erasing vulnerable populations. Many platforms therefore use a mix of automated and human-reviewed processes, community-endorsed policies, and user flagging systems to control the spread of hatred or harmful content. The challenge is finding the balance where user liberties are tempered with enough restricting actions to prevent real harm.

Figure 1. Symbolic representation of free speech amidst online censorship and abuse (Form Unsplash).

2 Privacy, Digital Rights, and the Burden of Moderation

Every discussion involving the moderation of hate speech needs to consider the individual’s privacy and digital rights. Increased scrutiny or surveillance of user-generated content to root out hate speech can impact privacy. There is also the problem when governments or corporations obtain huge volumes of data for monitoring hateful actions, creating risks of greater data management abuse, data retention, or sharing without authorization.

Moderation is a powerful intersection of issues surrounding privacy, platform responsibility, and user rights. Platforms may require in-depth knowledge about users’ posts and shared content to appropriately monitor hate speech and harassment. This comes into conflict with laws that seek to protect personal information. Striking a balance between these conflicts is problematic since too much oversight can infringe upon legitimate free expression, while not enough can generate an uncontested toxic and abusive online environment (European Parliament, 2018). Resolving these issues requires the creation and implementation of appropriate policies and laws that are systematic, proactive, forthright, user-friendly, and devoid of paternalistic contradictions.

We can also notice the contradiction between encryption algorithms and content moderation. Complete encryption on messaging platforms preserves users’ privacy but makes the process of locating and deleting dangerous content very difficult. Some lawmakers argue for the existence of “back doors” to encrypted systems so that law enforcement agencies can intervene at will when dangerous materials are disseminated. Conversely, critics assert that removing encryption exposes users to increased breaches of privacy without strengthening overall digital security. Such a course of action unilaterally blurs or obscures the existing defenses while doing little to curb the circulation of hate speech.

Figure 2. Increase in the percentage of U.S. adults reporting experiences with severe forms of online harassment between 2017 and 2020. Source: Pew Research Center (Vogels, 2021).

3 The Rise of AI and Algorithmic Moderation

Modern content moderation integrates automation as one of its core attributes, which includes artificial intelligence (AI) and sophisticated algorithms designed to filter, flag, and occasionally delete harmful content. Algorithms scan text-based posts for certain keywords linked with hate speech, images for hateful symbols, and user activity for behavioral patterns characteristic of harassment campaigns.

However, automated systems can flag legitimate conversations as hate speech because they contain certain words, and damaging content employing disguised language or context-sensitive slurs is often left unaddressed. For example, a comedic post that reclaims an offensive term can be flagged, while subtle dog-whistle comments bypass scrutiny (European Parliament, 2018). Furthermore, bias from certain age, region, or language groups may be present in the training data used, allowing discrimination against older users and increased false positives or negatives for other associated minority users.

Figure 3. The principle of content-based filtering, where item similarity drives recommendations. (Source: Rosidi, 2023).

Due to these intricacies, AI moderation still needs human involvement. Even the largest platforms depend on the manual work of thousands of human moderators who go through content that automated systems have flagged. These moderators have to battle their own problems, including potential psychological damage due to trauma from incessantly viewing disturbing content. Also, variation in cultural understanding and societal frameworks, as well as local laws, pose additional problems of inconsistent outcomes in moderation. For instance, moderation done by some platforms in different countries tends to have their policies messed up because their legal systems converge, resulting in content adjudication that seems skewed or taken out of context.

4 Balancing Platform Safety and Free Speech

The tension in moderation is situated at the boundary of platform safety and freedom. Most social media companies issue “community policies” which preset prohibited content such as hate speech, harassment, or even violence. Any content posted that falls under these guidelines is subject to removal and suspension or outright banning of the offending user. Ideally, these measures are meant to enhance the security of the community. Undoubtedly, the idea has some merit. However, the removal of content has been flagged as excessive by critics, as some cases blur the line between offensive opinion and genuine hate speech (European Parliament, 2018).

A single, unified policy may not be able to accommodate cultural customs or legal practices appropriately. What is considered “harmful speech” can range from one part of the world to another. Context matters: a politically charged slogan viewed as hateful in one society might be considered legitimate protest in another. Moreover, some governmental policies shift responsibility for removing deleterious material to the platform and threaten, with considerable fines or legal action, non-compliance. This may encourage companies to take the safe route and censor more content than warranted.

Some people may view having private companies dictate the rules as no longer bound by public accountability institutions – thus restricting participatory democracy. On the other hand, unregulated spaces invite harassment, deception, and hate speech. Compromising between the two extremes is never a straightforward endeavor; dynamic negotiation is required along the spectrum of values, resources, and social technologies.

5 Case Study: Gender-Based Hate Speech on Twitter

As a contemporary example of gendered harassment, consider how Twitter functions as a hostile platform for journalists. Support and opposition are meted out in equal proportions, which fundamentally changes the lives of female journalists, politicians, and even activists within a decade. While many women enjoy social media’s accessibility, numerous others have to deal with relentless online harassment in the form of violent threats, sexist slurs, and concerted efforts to wipe out their credibility. Carlson and Frazer note that prominent women who have suffered intersectional bigotry and sexism, together with other forms of hatred, are attacked the most.

Figure 4. Illustration based on examples of abusive tweets directed at US abortion rights activist, from Amnesty International (2018).

Abusive tweets with violent and misogynistic imagery often accompany harsh words aimed at women. Violent, hateful, and abusive speech often circulates in the form of spam, and some users work in shifts to report women’s accounts, causing bots to suspend the accounts of women (Carlson and Frazer, 2020). Intimate images and doxing are other means of severe psychological abuse that further prevent women from actively participating in public debates. As a result, many women find themselves in an extremely difficult situation: continue their online activity with the risk of severe abuse, or disconnect from the online world completely.

In avoiding violence, harassment, or any form of hate conduct for protected attributes like gender, Twitter has developed policies that aim to remove content in violation of such decency. Specific AI programs on the platform are trained to monitor for particular words and phrases that are considered threats or slurs. These systems work alongside reporting mechanisms that allow users to submit tweets they perceive as violating the platform’s terms of service. Twitter has adopted a policy-based approach that addresses the issue through account warnings and suspensions for repeat offenders. Still, much more can be done. By Twitter’s own admission, many tweets deemed dangerous are reviewed after hours or days, and supposedly borderline cases are not always consistently interpreted.

There are noticeable gaps in Twitter’s moderation policies, especially in regard to enforcement, despite strides taken to optimally refine them. Experts assert that the majority of women targeted by online harassment are more often than not placed in a position where they have to prove beyond doubt why adequate action should be taken against the aggressor. Offenders utilizing coded phrases or references subtle enough to slip under the radar, alongside more direct insults, use the absence of immediate reporting to their advantage. Even in cases where context-specific hate speech is used, AI systems are ill-equipped to identify it, and moderators devoid of the deep cultural and linguistic knowledge necessary to catch every harmful nuance are equally stunted. Hate speech can flourish in certain contexts, causing anxiety, public silence, and emotional distress to those inadequately vulnerable.

Gendered harassment campaigns demonstrate the extent to which unfettered speech can erase women’s expression in the form of chilling women’s readiness to engage in discussion or debate on controversial subjects. At the same time, moderation advocates raise issues related to censorship. Some defenders of free speech express concern about overzealous policies which might restrict violative commentary or claim that edgy humor or legitimate criticism will be silenced. Privacy concerns arise when Twitter and other platforms become more aggressive about surveilling private messages, user activity, and behavior for the purpose of abuse detection. These perplexing compromises illustrate that moderation is not merely about the removal of “harmful” content, but about maintaining balance where all individuals feel secure enough to inhabit and engage freely in discourse.

6 What Lies Ahead

People’s speech and online abuse are some of the hardest issues to manage in today’s digital world. But hostile or violent speech undermines trust, drives vulnerable people away from public engagement, and has the potential to cause actual damage. Free expression coupled with user safety requires sophisticated, complex, and evolving approaches, from automated AI systems to community standards, human moderation, and clearly defined norms at all levels of the organization. The example of gendered hate speech on Twitter shows the personal consequences of online discourse, but also exposes the complexities behind the development of sound policies: without the right restrictions, hateful content is rampant; too much control silences valid expression. The same applies to the growing issue of privacy. Any content moderation system must avoid overreach, be respectful of rights to privacy, remain transparent, and at the same time be restrained in scope.

Over the next few years, tensions surrounding hate speech and speech policies, platform governance, and user rights will dominate the discussion on digital policy. We may witness the rise of “federated” or community-driven social networks where smaller groups self-organize and set localized governing and enforcement norms. In another hypothetical scenario, the platforms could apply more stringent policies with advanced detection systems, which may help in reducing hate speech but could lead to overreach. As a consequence, ongoing developments will likely produce a fragmented model of regulatory frameworks across the world, each uniquely tailored by culture, politics, and commerce. Whether this will one day result in more unified global action or a lack of cooperation will depend on international collaboration and how effectively the competing interests are managed.

7 References

Amnesty International. (2018). Toxic Twitter – A toxic place for women. https://www.amnesty.org/en/latest/research/2018/03/online-violence-against-women-chapter-1-1/

Carlson, B. Frazer, R. (2018). Social Media Mob: Being Indigenous Online, Macquarie University, Sydney.

European Parliament: Directorate-General for Internal Policies of the Union & Wilk, A. v. d. (2018). Cyber violence and hate speech online against women, European Parliament. https://data.europa.eu/doi/10.2861/738618

Facebook. (2021). Facebook: Regulating hate speech in the Asia Pacific. Final Report to Facebook under the auspices of its Content.

Rosidi, N. (2023). Step-by-step guide to building content-based filtering. StrataScratch Blog. https://www.stratascratch.com/blog/step-by-step-guide-to-building-content-based-filtering/

Vogels, E. A. (2021). The state of online harassment. Pew Research Center. https://www.pewresearch.org/internet/2021/01/13/the-state-of-online-harassment/

Hate Speech, Online Harms, and the Complex Road of Content Moderation