Insights and Analysis

Copyright and AI: a key moment for rightsholders

29 March 2023

Now is a critical point in time for AI and rightsholders: AI is booming and rightsholders need to act now if they want a world where AI developers need a licence to use copyright works (e.g. images or text) to train AI models. In the first case of its kind in the UK, earlier this year Getty Images brought a claim for copyright infringement against Stability AI, the developer of an AI art generator, for the unlicensed use of Getty images in the underlying dataset used to train the AI model. The case will be a test case for the extent to which UK copyright and database laws are adequate to protect rightsholders in relation to AI models that rely on large datasets of images scraped from the internet, such as Diffusion AI.

Stability AI

Stability AI is one of a number of AI developers that has developed a digital art generator, that can create images based on text prompts from users. The underlying AI model “Stable Diffusion”, like many other AI models, has been trained using a dataset of images that resulted from a general crawl of the internet. Getty Images claim that Stability AI has infringed its copyright by copying millions of Getty’s images and associated text and metadata and using those images and associated IP as part of the dataset used to train “Stable Diffusion”. Similar claims have been brought by Getty Images against Stability AI in the US. A central aspect of the dispute will therefore be the extent to which it is lawful or not for the AI model to rely on a dataset of images which have been scraped from the internet, including Getty’s website.

Copyright

Any reproduction of a copyright work on UK servers, including reproducing that work in a dataset of content that has been scraped from the internet, will potentially infringe copyright, unless an exception applies. However, since the datasets storing copyright works are hidden in the background of an AI product or service, the use of specific copyright works in such datasets often goes undetected or it is difficult for rightsholders to prove that their works have been relied on to produce the output from the AI. However, in the case of Stable Diffusion, whilst the UK particulars are not yet public, the US claim does include multiple examples of images generated by Stable Diffusion which are clearly based on recognisable images from the Getty website, and which include the Getty watermark. This could make it much easier for Getty to prove that its copyright works are included in the dataset. Therefore, the focus will be the extent to which any exceptions to infringement apply.

Text and data mining exceptions

Unlike the US, which has a broadly applicable “fair use” exception, the UK has a defined list of specific exceptions to infringement. That list currently includes an exception to copyright infringement for text and data mining (TDM) however the exception is limited to acts for the purposes of non-commercial research only. Following a series of consultations, the UK government said it planned to broaden the existing exemption to allow TDM for any purpose, in order to help foster AI innovation. The UK’s proposal went further than the exception that was included in the EU Copyright Directive, which came into force in June 2019 (before the arrival of Stable Diffusion or the Chat bots making the headlines at the moment). The hope of the government was that the introduction of a broader exception for TDM for the UK would make the most of the "greater flexibilities" following Brexit and make the UK more competitive as a location for firms carrying out AI development. However, following parliamentary debate, on 1 February this year, the government confirmed it is withdrawing the current proposals for a broader exception and there will be consultation with the creative industry before any further proposals are put forward. In March of this year, however, in response to Sir Patrick Valence’s Pro-Innovation Review of Technologies Report, the government said it will produce a Code of Practice, by the summer, to support AI firms to access copyright works. The involvement of both sides will be needed, it says, and legislation may follow if there is no agreement. It therefore remains unclear at this stage how the law will develop on permitting the use of datasets scraped from the internet under UK copyright laws. In any event, the UK exception as it currently stands is not broad enough to provide Stability AI a defence.

Database rights

Any dataset of images scraped from the internet also potentially infringes UK database rights, in addition to copyright in the images. Under the EU Database Directive, two sets of rights subsist in UK databases: copyright protects the structure of a database and the sui generis database right protects the contents of a database[1]. The owner of a database right can prevent third parties from extracting and re-using the whole or a substantial part of the contents of a database. Repeated and systematic extraction and re-utilisation of insubstantial parts of a database over time can also amount to use of a substantial part of a database and therefore infringe. Even where the use of scraped images is internal and the images are not surfaced, the reproduction of a substantial part of a database, such as the Getty Images database of images, could amount to infringement. Again, it is usually difficult for copyright owners to prove that a substantial part of a database has been extracted and reused, where the images are not surfaced. However, Getty protects its images with watermarks and other metadata, which may mean that in this case, they have evidence of the scale of the extraction and use of their images.

Striking the right balance

The growth of AI is happening at a fast pace. Despite rightsholders, such as Getty Images, offering licences specifically for the purposes of developing AI and machine learning tools, many developers, such as Stability AI, are apparently proceeding without licences in place or seeking permission from rightsholders, and waiting to see which jurisdictions are most problematic. If rightsholders want to ensure that governments, both in the UK and elsewhere, get the balance right between on the one hand, fostering AI developers to develop leading, innovative AI tools (some of which have the potential to be hugely beneficial to society), and on the other hand, adequately compensating rightsholders for the use of their IP in the building of such tools, rightsholders will need to get involved now. Rightsholders should consider taking action against unlicensed developers, as Getty Images has done, and also lobbying governments, especially the UK government, while it is debating this area of the law and potential changes post-Brexit. In addition to future debate on the scope of a TDM exception, the Copyright and Rights in Databases Regulation 1997 is currently in the cross-hairs of the Retained EU Law Bill, for example, and therefore being reviewed to determine whether it should be retained (or retained in its current form) as part of a review of all retained EU law (see our earlier blog here for more details). In the meantime, all eyes will be on the dispute between Getty Images and Stability AI, as a test case for the scope of the current law in this area.

Authored by Penny Thornton

[1] Databases created after 31 December 2020 will only be protected by a UK database right.