AI and copyright - how does ChatGPT deal with copyright? | HÄRTING Rechtsanwälte

Der von dem US-amerikanischen Unternehmen Open AI entwickelte und im November verangenen Jahres veröffentlichte Chatbot ChatGPT (kurz für Chat Generative Pre-trained Transformer) zieht gerade nicht nur in KI-Kenner-Kreisen, sondern auch in der breiten Öffentlichkeit Aufmerksamkeit auf sich. Das Objekt dieser Aufmerksamkeit variiert dabei zwischen skurrilen Bing-Chatbot-Diskussionen (Stichwort: „I’m a good bing, you have been a bad user“) und den immaterialgüterrechtlichen Auswirkungen. In einem anderen Beitrag haben wir schon die rechtliche Situation bei KI-Bildgeneratoren beleuchtet – heute geht es um das geschriebene Wort.

8. March 2023

For those who have not yet tried the bot: ChatGPT promises to answer almost any question with an algorithmically generated, linguistically (largely) correct text. The range of applications is theoretically enormous: take home exams from students could be written by themselves – as could the next annoying brief in the office.

Such use of generative AI raises copyright issues, namely ..

Are the texts generated by ChatGPT protected by copyright?
Can the generated texts infringe the copyrights of third parties? What is the liability then?

First stop for orientation: the general terms and conditions of ChatGPT’s provider OpenAI. These deal with copyright e.g. in section 3 a).

According to these, users are responsible for the input, from which ChatGPT then generates new texts (the output).
OpenAI also grants the user all rights arising from the text output generated by the AI. However, OpenAI reserves the right to use input and output to the extent necessary for the offering of the AI, legal compliance and conformity with OpenAI’s TOS.
The users are responsible for the compliance of the content.

This is all well and good, but it does not answer the question of the copyrightability of AI-generated texts, nor does it answer the question of whether the generated results can in turn violate the rights of other authors.

Copyright protection of AI products

The Copyright Act clearly rejects the protectability of AI texts. According to Section 2 (2) UrhG, a personal intellectual creation is eligible for copyright protection. Therefore, only a human being can be the author within the meaning of Section 8 UrhG. Copyright only covers human intellectual creations. Even if products from an AI have human characteristics, nothing else applies. In this respect, no copyright can arise in the output of the AI, which could be licensed to the user according to OpenAI’s terms and conditions.

However, copyright could arise through human creativity if the use of the AI is only a technical aid for the creation of the result, i.e. the actual creative activity is carried out by a human being – in this case the user of ChatGPT. This is contradicted by the fact that the AI independently creates the result based on the user’s specifications, so the user can only marginally influence what comes out in the end. However, it is not impossible that the specifications of a user to the AI are so detailed that a plot is created that has copyright protection in itself and clearly shows the mental traits of the user.

Incidentally, copyright protection of an AI product comes into question if the user modifies the text ejected by the AI in such a way that it shows the user’s own creative traits.

But what about infringement of existing copyrights by AI-generated texts? A few comments on this:

Verbatim reproductions of copyrighted texts constitute a reproduction (Section 16 UrhG) and therefore infringe copyright subject to the intervention of barriers under Section 44a UrhG.
The fact that the texts originate from an AI is irrelevant. Thus, if – but this is very unlikely – an AI text corresponds to an existing copyrighted text, the copyrights would be infringed as a result of the use of the AI result. The user of the text, i.e. the user, would be liable in the first instance. If the correspondence between the AI product and the pre-existing work is a coincidence, which would seem to be the case if the AI has not been explicitly fed information by the user with regard to the older work, there could also be a “penalty-free” parallel creation. However, this question should be purely academic because of the very unlikely scenario, especially since the consequences can be avoided by using plagiarism scanners after the AI texts have been generated.
If the user copies extensive copyright texts in the prompt, i.e. when feeding the AI, this would in itself constitute copyright infringement (by copying), but the risk of detection should be low.
If the chatbot is asked to modify certain pre-existing texts (e.g. a song lyric), on the other hand, this constitutes a human-controlled adaptation carried out by the AI (Section 23 (1) sentence 1 UrhG), which then infringes copyrights insofar as there is no free use pursuant to Section 23 (1) sentence 2 UrhG. If the user edits the text modified by the AI again in a way that is sufficiently removed from the underlying work, an editor’s copyright (dependent on the copyright of the edited material) may arise in favour of the user pursuant to Section 3 UrhG.

Finally, we would like to grant the chatbot the last word regarding its copyright compliance. We therefore asked the chatbot whether it infringed copyright in its own view. The (translated) answer looked like this:

“As an AI language model, ChatGPT does not generate its own content. Instead, it uses machine learning algorithms to generate responses based on the patterns it has learned from the vast amount of data it has been trained with. This data includes a wide range of publicly available texts, such as books, articles and websites.

ChatGPT itself is not in a position to infringe copyright because it does not create its own content. However, if users provide copyrighted material to ChatGPT for editing or use, they are responsible for any copyright infringement.

It is always important to respect copyright and obtain the necessary permissions before using or sharing copyrighted material. If you have any specific concerns about copyright, it is best to consult a legal professional.”

However, this is not the whole truth. As explained in our post on AI art, applications like ChatGPT are trained using datasets compiled through data mining. Thus, a copyrighted work of speech within the meaning of Section 2 (1) No. 1 UrhG may have already been used in the training of the AI.

Copyright infringements by accessing works for the purpose of training an AI

We therefore asked ChatGPT what the situation was with this training data. The answer is:

“The texts used to train ChatGPT come from a variety of sources, including books, articles, websites and other publicly available texts. Before using any text for training, ChatGPT’s developers make an effort to ensure that the text is either in the public domain or may be used under an appropriate open-source licence.

In cases where copyrighted material is used for training, ChatGPT developers will normally obtain permission from the copyright holder or use the material in accordance with fair use guidelines. Fair use is a legal doctrine that allows limited use of copyrighted material without having to obtain permission from the copyright holder. However, the exact limits of fair use are often determined on a case-by-case basis and may depend on factors such as the purpose and character of the use, the nature of the copyrighted work, the extent and substantiality of the portion used, and the effect on the potential market for or value of the copyrighted work.

While ChatGPT itself does not infringe copyright, ChatGPT’s developers take care to ensure that the training data used to create the model is obtained and used in accordance with applicable laws and regulations.”

ChatGPT talks about “endeavouring to make sure” or “usually obtaining permission”. This is all rather vague. There is also the US law-derived fair use doctrine does not exist in European copyright law. However, copyright law, which is harmonised by EU directives, restricts the author’s exclusive right through §§ 44a ff. UrhG. Accordingly, access to copyright-protected content is permitted for the purposes of text and data mining under Section 44b (1) UrhG as long as the originals used are lawfully accessible. However, they must be deleted as soon as they are no longer required. For more details, see also the article by Constantin Berlage.

Overall, as with image generators, the legal assessment of text generators is still in its infancy. At least regulation of artificial intelligence is slowly beginning. The EU Commission has already published a draft of an AI regulation. However, intellectual property hardly plays a role in this. Instead, the focus is on requirements for data quality and cyber security as well as the transparency of AI applications. However, its entry into force is not expected before 2025. To bridge the waiting time, you can listen to the episode of the härting.fm podcast on AI and the webinar by Martin Schirmbacher and Andreas Lewald.