
Have you ever shared a well-shot photo on social media Or uploaded a hand-drawn artwork?Have you ever thought about how those digital footprints – even your everyday searches – could be used to train artificial intelligence without your knowledge?
In a world where AI is becoming more and more important, there’s one basic question that needs to be answered: Who owns the data used to train AI?
Introduction
In digital world, everyone is generating data all the time. This could be a photo you post, a blog post you write, or even a comment you make on a news story. With the fast growth of AI technology across all areas, lots of personal data has become a major resource for AI development. Businesses collect, store and use this data to build smarter AI systems.
“The World Economic Forum described personal data as ‘the new “oil” – a valuable resource of the twenty-first century … a new type of raw material that is on a par with capital and labour’ (World Economic Forum, 2011, p. 5).”——cited in Terry Flew (2021), Regulating Platforms
The right of privacy has always been seen as a basic human right. As claimed in Article 12 Universal Declaration of Human Rights (1948): “ No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honor and reputation. Everyone has the right to the protection of the law against such interference or attacks.” (United Nations. General Assembly. 1949)
But the truth is, when this data is used to train AI, the user is often in the unknowable position of not being able to control it. Privacy and data sovereignty are gradually eroded in the process, leaving behind a series of legal and ethical issues that need to be addressed sooner rather than later.
How has privacy changed in the digital age?
The modern legal concept of privacy began to form in 1891, American lawyers Samuel Warren and Louis Brandeis defined privacy as “it is the right to be let alone.” (Matyas et.al, 2009)
Until 1967, Alan Westin published Privacy and Freedom, which defined privacy in terms of self-determination “privacy is the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others” (Westin, 1967).
These definitions have formed the foundation of most modern data protection frameworks.
Because of modern technology, privacy now includes not just personally identifiable information, but also behavioral data, creative data and info on all areas of life.
Alexandra Rengel (2013) pointed out that privacy involves several aspects: the right not to be disturbed, the right to prevent unwanted access by others, the right to control personal information, the right to keep it private, the right to maintain personalities, individuality and dignities, as well as the right to authority over close relationships and aspects of one’s personal life (Rengel, 2013).
The digital environment today also requires a new change in the concept of privacy: the ‘principle of respect for context’.
The principle acknowledges that “Respect for context means consumers have a right to expect that companies will collect, use, and disclose personal data in ways that are consistent with the [social] context in which consumers provide the data (Nissenbaum, 2018).”
In the digital age, this principle faces unprecedented challenges. Businesses are looking at user data to develop AI, but this development often lacks transparency and user agreement.
Case Study 1: The Dispute Over Stability AI and Artworks
In recent years, controversies over AI training data have grown. In early 2023, a major lawsuit by famous artist Sarah Andersen and several other artists targeted Stability AI, Midjourney, and DeviantArt.

The case highlights serious ethical issues in AI development, these companies grabbed millions of copyright-protected images from the Internet without the artists’ agreement and used them to train their AI models. (Kinsella, 2023).
It focused on the LAION dataset, which includes 5 billion images said to be taken from the internet. These images were used by Stability and Runway to build an AI model called Stable Diffusion. The lawsuit also claims that Midjourney used this model to train its own AI system, and that Deviant Art used the model in Dream Up, an image-generation tool.
The case involves fundamental issues of creative autonomy and identity. When AI systems are trained on artists’ work and create very similar styles without recognition or reward, artists can suffer both financial loss and serious copyright violations.
The AI supporter said that all the images were taken from the open website. They said that their use of the images fits the “fair use” principle, which is necessary for the development of technological advances. But the artists say that public availability does not mean they have given up their copyright. Using their work without permission is a serious violation of their rights.
From the perspective of The General Data Protection Regulation (GDPR), public data does not mean that it can be used freely. GDPR requires clear consent before processing personal data, and these artists have not been notified or authorized of this. Such a practice not only violates the intellectual property rights of artists, but also destroys their control over their personal data and creations.
U.S. District Judge William Orrick allowed some of the lawsuits to go forward while denying some of them, stating that the artists’ arguments about the AI model copying and distributing their work are plausible, but need further evidence (Schor, 2024).
Case Study 2: Meta grabs Australian users’ private data
Another highly publicized debate relates to Facebook. In a tense Senate inquiry in Canberra, Meta admits it has grabbed the public photos and posts of Australian adult users since 2007 to train its AI system, without offering an opt-out option (Taylor, 2024).
This large-scale collection has immediately caused a debate about consent and respect for context. Meta argues that its service terms permitted such use and claims to be technically compatible with existing regulations. However, users who post personal photos never have a reasonable expectation that their content will train commercial AI systems.
This incident reveals a fundamental disconnect between legal compliance and user expectations. Tony Sheldon said that millions of Australians who use Facebook and Instagram have not agreed to have their photos, videos or records of their lives and families used to train artificial intelligence models. The gap between technical consent and meaningful understanding highlights the inadequacies of current notification and consent models in the AI era.
Additionally, because of the privacy regulations of the General Data Protection Regulation (GDPR), and its legal effects, Meta was forced to stop using data from European users to train its large-scale language models and provided European users with the option to opt-out.
The representatives of company said that as unlike in Europe, there are no existing plans to introduce a similar opt-out option due to the lack of similar legal requirements in Australia (Evans, 2024).
This reveals the imbalance in legal standards for cross-border privacy protection and the fact that the rights of users in different regions may be very different even on the same platform. Also, it reveals the weakness of the Australian Privacy Act about “no express Opt-out” rule.
This vulnerability allowed Mate to legally grab a user’s digital footprint for 15 years. What Mate has done has raised public concerns about the violations of privacy rights and has also alarmed regulators. The global debate about data protection is now constantly rising, requiring companies to respect the basic rights of their users while using data to progress AI.
Legal Basis and Regulatory Framework
Both cases illustrate the clash between the right to personal privacy, copyright and the development of artificial intelligence. Traditional legal frameworks have struggled to address the fact that AI systems acquire and learn from data. Various legal framework globally that attempts to regulate personal data use, but few of them directly address the unique challenges of AI training.
The European Union’s General Data Protection Regulation (GDPR) is no doubt one of the most comprehensive data protection laws in the world. It sets out principles of lawfulness, fairness, transparency, purpose limitation, data minimization, accuracy, storage limitation, integrity, and confidentiality. Article 5 clearly states that “Collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes” (European Parliament, 2016). When applied to Meta’s Australian Data Grab, this principle raises serious questions about whether reusing public social media posts for AI training represents “compatible” processing without specific notice and permission.
The California Consumer Privacy Act (CCPA) also gives California residents rights to know what data a company collects about them. They can ask the company to delete it. They can also say no to their data being sold. Companies are not allowed to treat them unfairly for using these rights (California, 2018).However, as the artists’ lawsuit against Stability AI illustrates, these protections may be inadequate when companies collect publicly available creative works for AI training purposes, raising questions about whether such collection constitutes ‘personal information’ under the current definition.
Australia’s Privacy Act (APA)1988, especially the Australian Privacy Principle 3 regarding the collection of personal information, became a focal point after Meta’s data scraping was exposed (Australian Government, 1988).
The world’s first comprehensive regulatory attempt based on risk levels for AI systems is the proposed EU Artificial Intelligence Act (AI ACT), which addresses the risks of AI and positions Europe to play a leading role globally (European Commission, 2021). Even though it’s aimed at high-risk AI applications, it does not really say much about training data rights. Cases like Andersen and Stability AI highlight the regulatory gap for creative professionals, whose works form the training base for generative AI systems.
These laws limit how data can be used, but they also show that people care about privacy and who owns their data. But most of them were made before the rapidly develop of AI technology. As these cases show, these laws can not fully deal with the challenges of collecting large amounts of data for AI training. Especially in areas like respect for context, proper attribution and fair compensation.
Finding balance: towards ethical AI training practices
According to Iliya’s interview, training advanced AI models like GPT-4 requires huge financial resources—around $63 million. This creates strong incentives for companies to collect as much data as they can while keeping costs down. Companies like Meta and Stability AI prioritize scale and efficiency over user consent, using public data to train their systems. But from the users’ perspective, personal data and creative output deserve protection. However, in the age of AI, it is becoming increasingly difficult to safeguard privacy.
Addressing these contradictions requires a multi-pronged approach. Lessons can be drawn from cases such as Meta’s data collection in Australia and the artists’ lawsuit against Stability AI. Specifically for AI training data, regulatory frameworks must evolve. Explicit consent should be required for the use of personal data in AI training, and the systems associated with training datasets should be disclosed for cybersecurity purposes.
As the Meta case shows, current notice-and-consent models often fail to provide users with a meaningful understanding of how their data is used for AI training. Future regulations should require specific, explicit disclosure of AI training purposes, along with a true opt-out mechanism that imposes no service fees. The EU AI Act’s risk-based approach offers a starting point, but more specific regulations on rights related to training data are needed (European Commission, 2021).
The Anderson case against Stability AI highlights the need for a clearer framework regarding the use of creative works in AI training. Future regulations should establish fair compensation mechanisms and enforce copyright protections for creators whose work contributes significantly to commercial AI systems. There must be a clearer distinction between learning from creative works and extracting unique styles or content for reproduction.
Companies developing AI systems should adopt a code of ethics in addition to complying with the law. They need to transparently demonstrate how user data is used to train AI systems. Ongoing litigation against Stability AI suggests that failing to address these concerns could result in legal and reputational risks.
Individuals also need better tools and knowledge to protect their privacy. As the Meta case in Australia illustrates, many users are unaware that their content is being used for AI training until after the fact (Schor, 2024). This information asymmetry undermines meaningful consent and highlights the need for more transparent data practices and user-friendly controls. Stronger regulatory control over personal information will support the responsible development of AI.
References List:
Australian Government. (1988). Privacy Act 1988 (Cth). https://www.legislation.gov.au/Details/C2021C00379
European Commission. (2021). Proposal for a regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). https://digital-strategy.ec.europa.eu/en/library/proposal-regulation-laying-down-harmonised-rules-artificial-intelligence
European Union. (2016). Regulation (EU) 2016/679 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data. Official Journal of the European Union, L 119/1. https://eur-lex.europa.eu/eli/reg/2016/679/oj
Evans, J. (2024). Facebook admits to scraping every Australian adult user’s public photos and posts to train AI, with no opt-out option. https://www.abc.net.au/news/2024-09-11/facebook-scraping-photos-data-no-opt-out/104336170?utm_campaign=abc_news_web&utm_content=link&utm_medium=content_shared&utm_source=abc_news_
Flew, T. (2021). Regulating platforms. Polity Press.
Kinsella, E. (2023). Artists and visual media company sue AI image generator for copyright breach. The Art Newspaper. https://www.theartnewspaper.com/2023/02/15/artists-and-visual-media-company-sue-ai-image-generator-for-copyright-breach
Lauterpacht, H. (1948). The universal declaration of human rights. Brit. YB Int’l L., 25, 354.
Matyas, V., Fischer-Hubner, S., Cvrcek, D., Svenda, P., Matyáš, V., Fischer-Hübner, S., Cvrček, D., & Švenda, P. (2009). Future of Identity in the Information Society: 4th IFIP WG 9.2, 9.6/11.6, 11.7/FIDIS International Summer School, Brno, Czech Republic, September 1-7, 2008 : Revised Selected Papers: Vol. v.No. 298 (2009 edition.). Springer. https://doi.org/10.1007/978-3-642-03315-5
Nissenbaum, H. (2018). Respecting Context to Protect Privacy: Why Meaning Matters. Science and Engineering Ethics, 24(3), 831–852. https://doi.org/10.1007/s11948-015-9674-9
Rengel, A. (2013). Privacy in the 21st Century. Martinus Nijhoff Publishers.
Schor, Z. (2024). Andersen v. Stability AI: The landmark case unpacking the copyright risks of AI image generators. NYU Journal of Intellectual Property & Entertainment Law. https://jipel.law.nyu.edu/andersen-v-stability-ai-the-landmark-case-unpacking-the-copyright-risks-of-ai-image-generators/
State of California. (2018). California Consumer Privacy Act, Cal. Civ. Code § 1798.100 et seq. https://leginfo.legislature.ca.gov/faces/codes_displayText.xhtml division=3.&part=4.&lawCode=CIV&title=1.81.5
Taylor, J (2024). Meta admits scraping Australian users’ posts to train AI. The Guardian. https://www.theguardian.com/technology/article/2024/sep/11/meta-ai-post-scraping-security-opt-out-privacy-laws
United Nations. General Assembly. (1949). Universal declaration of human rights (Vol. 3381). Department of State, United States of America.
Westin, A. F. (1970). Privacy and freedom. Bodley Head.
Be the first to comment