By admin, Ocak 10, 2025
Training of large language models (LLM) in the development of artificial intelligence stands out as one of the most important and rapidly advancing areas of modern technology. By training on massive amounts of text data, these models perform incredibly well on natural language processing and comprehension tasks. Large language models offer innovative solutions in many language-based applications, making human-machine interaction more natural and effective. However, the source and copyright of the data sets used during the training of these models raise important ethical and legal questions.
Meta CEO Mark Zuckerberg is also in trouble on this issue. Allegations that Zuckerberg allowed the company’s LLAMA (Large Language Model Meta AI) artificial intelligence models team to use a data set containing pirated e-books and articles are on the agenda. Kadrey v. filed against Meta. In the Meta lawsuit, it is stated that Zuckerberg approved the use of the dataset called LibGen for training on Llama. This case is just one of many in which major tech companies like Meta are accused of training their AI models with copyrighted works without permission.
LibGen, as a platform for publishing copyrighted works, has been sued multiple times and fined millions of dollars. Meta employees knew LibGen was a “pirated data set,” which could weaken Meta’s position in negotiations with regulators, the plaintiffs’ attorneys allege. Meta’s defense claims that, under the US fair use doctrine, copyrighted works can be used to create something new in a “sufficiently transformative” way.
Meta’s internal correspondence states that the company received Zuckerberg’s approval to use this data set. Additionally, Meta allegedly wrote a script to remove copyright information from the dataset, thus attempting to hide potential copyright infringements. It is stated that Meta engineers removed copyright and source information from the data set.
Meta is also allegedly involved in another copyright violation by torrenting LibGen. Torrenting enables files to be distributed across the web, and in the process Meta is also alleged to be disseminating LibGen content. It is alleged that Ahmad Al-Dahle, one of Meta’s chief engineers, ignored the reservations of his employees who were concerned about torrenting and paved the way for torrenting.
Although the case concerns Meta’s earliest Llama models, it could end in Meta’s favor if the company’s fair use defense is successful, Techcrunch reported. However, judge Vince Chhabria’s rejection of Meta’s request to censor large portions of the case file allowed the plaintiffs’ claims to gain publicity. Judge Chhabria stated that Meta’s privacy requests were to prevent negative publicity, not to prevent the disclosure of sensitive business information that its competitors could use.
LLAMA (Large Language Model Meta AI), an artificial intelligence model developed by Meta and trained on large-scale language data, demonstrates high performance in language processing and comprehension tasks by training on large amounts of text data. LLAMA also has many different features for use in language-based tasks such as natural language processing, text generation, translation and summarization. Meta emphasizes that he developed LLAMA to provide innovative solutions in artificial intelligence research and practical applications. This model uses advanced algorithms and learning techniques to better understand and respond to people’s language, making it a powerful tool in many language-based applications.