Doxuno
BusinessUnited Kingdom

Free UK AI Training Data Licence Template

An AI Training Data Licence is the contract under which a Data Licensor (a publisher, image library, news organisation, music label, research institute or aggregator) grants an AI Vendor (a foundation model provider, general-purpose AI provider or in-house AI team) a licence to use specified data for training, evaluating and retraining AI models. Use our free UK template to draft an AI Training Data Licence under English, Scots or Northern Irish law — addressing the UK copyright (CDPA 1988) and database right framework, the post-Getty Images v Stability AI [2025] EWHC 2863 (Ch) UK risk landscape, the EU AI Act Article 53(1)(c) DSM 2019/790 Article 4 reservation-of-rights compliance obligation that has applied to GPAI providers since 2 August 2025, the Article 53(1)(d) mandatory public training-data summary in the AI Office's Template format, the UK GDPR (as amended by the Data (Use and Access) Act 2025) lawful basis and special category data analysis, and the output / model ownership and sub-licensing allocation that turns a one-off data licence into a sustainable AI ecosystem partnership.

Free to useInstant PDFNo account required

PDF (free) + editable Word (.docx) with Expert

AI TRAINING DATA LICENCE AGREEMENT
England And Wales  ·  8 September 2026
DATA LICENSOR
Calendulis Media Group Ltd
Calendulis House, 280 Bishopsgate, London, EC2M 4AG
AI VENDOR
Borealis AI Research Ltd
78 King's Cross Road, London, WC1X 9DH
~4.2 million articles, ~6.8 billion tokens, ~340 GB compressed of training data
Permitted use: training retraining · England and Wales
This AI Training Data Licence Agreement (the "Agreement") is made on 8 September 2026 between Calendulis Media Group Ltd of Calendulis House, 280 Bishopsgate, London, EC2M 4AG (Companies House no. 06214837) (the "Licensor") and Borealis AI Research Ltd of 78 King's Cross Road, London, WC1X 9DH (Companies House no. 13948257) (the "Vendor"). The Licensor (as a news organisation) agrees to license to the Vendor (as a GPAI provider within EU AI Act Article 3(63)) specified data for the purpose of training, evaluating and (where permitted) retraining one or more artificial intelligence models. The licence is governed by the Copyright, Designs and Patents Act 1988, the Copyright and Rights in Databases Regulations 1997, the UK GDPR (Retained Reg (EU) 2016/679 as amended by the Data (Use and Access) Act 2025), and (where the Vendor markets the trained model in the European Union) the EU AI Act (Reg (EU) 2024/1689) Article 53. The parties acknowledge that UK case law on training-data infringement remains substantially unsettled following Getty Images (US) Inc v Stability AI Ltd [2025] EWHC 2863 (Ch) (4 November 2025), in which the UK High Court rejected the secondary copyright claim because the Stable Diffusion model does not store training data, and the primary claim was withdrawn because training occurred overseas.
1.
DEFINITIONS
"Licensed Data" means the data described in clause 2 below.

"Permitted Use" means the use of the Licensed Data set out in clause 3.

"Trained Model" means any artificial intelligence model or system trained, retrained or evaluated using the Licensed Data (whether in whole or in part).

"Outputs" means content generated by the Trained Model when used in the ordinary course of operation.

"CDPA 1988" means the Copyright, Designs and Patents Act 1988.

"Database Regulations" means the Copyright and Rights in Databases Regulations 1997 (SI 1997/3032).

"DUAA 2025" means the Data (Use and Access) Act 2025, with Part 5 main data protection provisions in force from 5 February 2026.

"EU AI Act" means Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence.

"GPAI" means a general-purpose AI model within Article 3(63) EU AI Act.
2.
LICENSED DATA
2.1 Description. The Licensor licenses to the Vendor the following data: Full text and metadata of Calendulis Media Group's news archive (national broadsheet and regional titles) from 1 January 2018 to 31 August 2026, including article body, headline, byline, dateline, section, tags, and structured factual data extracted from articles (named entities, quotes, statistics).

2.2 Categories. Data categories: News articles (text + metadata); editorial commentary; structured factual data; image captions (no images included).

2.3 Volume. Approximate volume: ~4.2 million articles, ~6.8 billion tokens, ~340 GB compressed

2.4 Format. Format: JSONL with one article per line; UTF-8 encoded; schema documented in Schedule 1

2.5 Delivery. Delivery mechanism: cloud bucket. The Licensor shall make the Licensed Data available to the Vendor at the times and in the manner agreed between the parties from time to time.
3.
PERMITTED USE, TERRITORY AND DURATION
3.1 Permitted Use. The Vendor may use the Licensed Data for training and periodic retraining of the Trained Model during the licence term, to incorporate updates, corrections and quality improvements.

3.2 Territory. The licence is granted for the territory of worldwide.

3.3 Duration. The licence runs for 36 months from the Effective Date.

3.4 Restrictions. Save as expressly permitted, the Vendor shall NOT (a) on-sell or sub-license the Licensed Data as raw data; (b) make the Licensed Data publicly accessible; (c) use the Licensed Data for any purpose other than training the Trained Model; or (d) combine the Licensed Data with data sets that would breach the warranties in clause 5 (where applicable).
4.
GOVERNING LAW AND EXECUTION
4.1 Governing law. This Agreement and any dispute or claim arising out of or in connection with it (including non-contractual disputes) shall be governed by and construed in accordance with the laws of England and Wales. The parties irrevocably submit to the exclusive jurisdiction of the courts of England and Wales.

4.2 Execution. This Agreement is executed as a deed for the purposes of LP(MP)A 1989 and CA 2006 s.46; the limitation period is twelve (12) years under Limitation Act 1980 s.8.
5.
COPYRIGHT, DATABASE RIGHT AND DSM 2019/790 RESERVATION
5.1 Copyright warranty. The Licensor warrants that it owns, or has all necessary sub-licensing rights to, the copyright in the Licensed Data sufficient to grant the licence under this Agreement.

5.2 Database right. The Licensor warrants that, where the Licensed Data constitutes a database within the Database Regulations 1997 (SI 1997/3032), the Licensor is the maker of the database within Regulation 14 (or has rights from the maker) and grants to the Vendor the necessary database right to extract and re-utilise the Licensed Data for the Permitted Use.

5.3 Moral rights. To the fullest extent permitted by CDPA 1988, the authors of the Licensed Data waive their moral rights (right of paternity and right of integrity) in connection with the training and operation of the Trained Model and the production of Outputs.

5.4 DSM Directive 2019/790 Article 4 reservation compliance. The Licensor warrants that the Licensed Data does NOT include works in respect of which a rightholder has expressed a reservation of rights under Article 4 of the EU DSM Directive 2019/790 in machine-readable form (e.g. robots.txt, TDM reservation metadata) that would preclude use for text and data mining for AI training. Where the Licensed Data includes works subject to reservation, the Licensor warrants that the reservation has been respected by removal, or that the Licensor holds a specific licence overriding the reservation.

5.5 Third-party clearance. The Licensor warrants that all necessary third-party clearances (including but not limited to performer consents, image rights, location releases and any other consent or licence required for the lawful use of the Licensed Data) have been obtained.

5.6 CDPA s.29A acknowledgement. The parties acknowledge that section 29A CDPA 1988 (text and data mining exception) applies only to research for non-commercial purposes and does NOT cover commercial AI training. The licence granted in this Agreement is the operative basis for the Vendor's training activities and not s.29A.
6.
PERSONAL DATA AND UK GDPR FRAMEWORK
6.1 Personal data scope. The Licensed Data is pseudonymised — personal identifiers have been replaced with codes, with the key kept separately by the Licensor. The data remains personal data within Article 4(1) UK GDPR and the full UK GDPR regime applies.

6.2 Lawful basis. Processing of personal data under this Agreement relies on Recognised Legitimate Interests under DUAA 2025 amendments to Article 6 UK GDPR (in force 5 February 2026), where the AI training activity falls within a category designated by the Secretary of State as having recognised legitimate interest status.

6.3 Special category carve-out. The Licensor shall scrub from the Licensed Data any special category data within Article 9 UK GDPR before delivery to the Vendor. Where scrubbing is not technically feasible, the Licensor warrants that a Schedule 1 DPA 2018 condition applies and shall maintain an appropriate policy document under Schedule 1 paragraph 39 DPA 2018.

6.4 International transfers. Transfers to the US shall be made under the UK-US Data Bridge (in force 12 October 2023) where the recipient is DPF-certified; transfers to other non-adequate countries use IDTA. Both parties acknowledge the PCLOB quorum risk to the Data Bridge identified in ICO January 2026 guidance.

6.5 Data subject rights. Data subject requests shall be handled jointly. The first-receiving party routes the request; deletion / rectification in the Licensed Data is implemented by the Licensor; deletion / rectification in the Trained Model is implemented by the Vendor where technically feasible.

6.6 ADM safeguards. Where the Trained Model is used to produce automated decisions producing legal or similarly significant effects on data subjects within Article 22 UK GDPR, the Vendor shall implement the safeguards in Articles 22A-22D UK GDPR as inserted by DUAA 2025 Part 5 (in force 5 February 2026).
7.
MODEL OWNERSHIP, OUTPUTS, SUB-LICENSING AND RETRAINING
7.1 Model ownership. The Trained Model (and all weights, parameters and architecture) is the IP of the Vendor. The Licensor has no claim to ownership of the Trained Model save as expressly provided in clauses 9 (royalty) and 10 (audit).

7.2 Output ownership. Outputs of the Trained Model are the IP of the Vendor. The Vendor may grant end-customer terms in respect of Outputs.

7.3 Sub-licensing. The Vendor may sub-license the Trained Model and the Outputs only on the standard end-customer terms attached as Schedule 1 (or as varied with prior written consent of the Licensor, not to be unreasonably withheld).

7.4 Retraining. Yes — the Vendor may retrain the Trained Model with the Licensed Data during the licence term only. Retraining after term expiry requires a fresh licence.

7.5 Model updates and continuance. Model updates and incremental training are permitted within the scope of the Permitted Use (clause 3). The Vendor shall maintain version metadata recording which version of the Licensed Data was used for which version of the Trained Model, sufficient for the Article 53(1)(d) summary obligation (where applicable).
8.
EU AI ACT COMPLIANCE (ARTICLE 53 GPAI)
8.1 Applicability. The Vendor confirms that it places (or intends to place) the Trained Model on the EU market and is accordingly a provider of a GPAI model within the meaning of the EU AI Act 2024/1689.

8.2 Article 53(1)(c) reservation-of-rights policy. The Vendor shall implement a policy to identify and comply with reservation of rights expressed pursuant to Article 4 of the DSM Directive 2019/790, including in respect of the Licensed Data. The Licensor warrants that the Licensed Data has been pre-screened for reservation (clause 5.4); the Vendor's policy operates on the Vendor's own data inputs and any additional sources.

8.3 Article 53(1)(d) training data summary. The Vendor shall publish a "sufficiently detailed summary" of the training content for each Trained Model, in accordance with the Template for the Public Summary of Training Content for general-purpose AI models published by the AI Office (July 2025; mandatory). The summary shall include a description of the Licensed Data sourced under this Agreement (in the categories required by the Template), without disclosing trade secrets or content that would breach this Agreement's confidentiality terms.

8.4 Summary publication deadline. The Vendor shall publish the Article 53(1)(d) summary within 60 days of first placing each Trained Model on the EU market and shall update the summary at least annually thereafter.

8.5 Code of Practice. The Vendor adheres to the GPAI Code of Practice published by the AI Office (July 2025) and shall comply with the Code's commitments in respect of copyright, training data summary and risk management.
9.
ROYALTY, AUDIT, TERMINATION AND DATA DELETION
9.1 Royalty. The Vendor shall pay the Licensor an upfront payment of £750,000 plus an ongoing revenue share as agreed in Schedule 2.

9.2 Payment frequency. Payment is quarterly in arrears within 30 days of each quarter-end.

9.3 Audit rights. Each party may audit the other on at least 30 days written notice, no more than once per calendar year, on a confidential basis.

9.4 Termination for breach. Either party may terminate this Agreement for material breach by the other on giving 30 days written notice (subject to the breaching party's right to cure within that period). Either party may terminate for the other's insolvency without notice.

9.5 Termination for convenience. Either party may terminate this Agreement for convenience on giving 6 month(s)' written notice, expiring on or after the first anniversary of the Effective Date.

9.6 Data deletion on termination. On termination, the Vendor may retain the Trained Model trained to date but shall not undertake any further training, retraining or evaluation using the Licensed Data. Licensed Data not yet used for training shall be deleted within 60 days.
10.
INDEMNITY, INSURANCE AND POST-GETTY RISK ALLOCATION
10.1 Mutual cross-indemnity. Each party shall indemnify the other against claims arising out of the indemnifying party's breach of its warranties or obligations under this Agreement. The Licensor's indemnity covers IP and data protection claims (per Licensor warranties in clauses 5-6); the Vendor's indemnity covers misuse and Output-related claims (per clauses 3 and 7).

10.2 Indemnity cap. The aggregate cumulative liability of each party under the indemnity in clause 10.1 (and otherwise in connection with this Agreement) is capped at £5,000,000, save in respect of fraud or wilful default which are uncapped.

10.3 Insurance. Each party shall maintain professional indemnity insurance with a reputable UK insurer providing minimum cover of £10,000,000 per claim, naming the other party as additional insured for claims under this Agreement. Each party shall, on request, provide copies of insurance certificates.

10.4 Post-Getty UK IP risk. The parties shall share equally any cost, loss, expense and liability arising from a successful UK training-data infringement claim on the Licensed Data, recognising the post-Getty unsettled position.
11.
EXECUTION
IN WITNESS WHEREOF this Agreement has been executed and delivered as a deed by the parties on the date set out at the start of this Agreement.
DATA LICENSOR
Calendulis Media Group Ltd
Date: ____________________
AI VENDOR
Borealis AI Research Ltd
Date: ____________________

Available as a print-ready PDF or an editable Microsoft Word (.docx) file.

What Is a UK AI Training Data Licence?

A UK AI Training Data Licence is the agreement by which a Data Licensor authorises an AI Vendor to use specified data — text corpora, image libraries, audio archives, video footage, scientific datasets, news feeds, code repositories, structured data — for the purpose of training, evaluating, fine-tuning, retraining or otherwise developing artificial intelligence models. It sits at the centre of the modern UK AI economy: foundation model providers, general-purpose AI (GPAI) providers, specialised model developers and in-house AI teams cannot lawfully train on third-party data in the UK without (a) an applicable copyright or database-right exception, (b) the data being out of copyright, or (c) a properly drafted training data licence from the rights holder. The third route — express licensing — is the only commercially sustainable basis for high-quality training data at scale.

Why does a licence matter even more in 2026? Because the UK's narrow text and data mining exception under section 29A of the Copyright, Designs and Patents Act 1988 covers only NON-COMMERCIAL research. Commercial scraping of UK copyright material for AI training is infringement, full stop, absent a licence or rights-holder consent. The November 2025 High Court judgment in Getty Images (US) Inc v Stability AI Ltd [2025] EWHC 2863 (Ch) left this landscape largely unsettled in important respects — the Court REJECTED the secondary copyright claim (a trained model does not store the training images), PARTIALLY upheld the trade mark claim (the Getty watermark appearing in Stable Diffusion outputs), and the primary CDPA infringement claim was effectively WITHDRAWN at trial because Stable Diffusion was trained overseas (UK acts of infringement could not be established). The UK Government's AI / Copyright Report (due March 2026) and the European Parliament's analytical work on AI training and copyright are the next signals; until they land, prudent UK Data Licensors and AI Vendors lean on express training-data licensing rather than fight over the residual scope of s.29A.

Layered on top of UK copyright is the EU AI Act 2024/1689 Article 53 compliance regime for GPAI providers — which applies to UK-incorporated AI Vendors that place GPAI models on the EU market. Article 53(1)(c) requires every GPAI provider to put in place a policy to identify and comply with reservations of rights expressed under Article 4 of the Digital Single Market Directive 2019/790 (DSM 2019/790) — the EU-wide text and data mining opt-out. The obligation has applied since 2 August 2025. Article 53(1)(d) requires every GPAI provider to publish a "sufficiently detailed summary" of the training data used, in the mandatory Template format published by the AI Office in July 2025 and in force from 2 August 2025 (with a transition to 2 August 2027 for pre-existing models and AI Office verification powers from 2 August 2026). The AI Training Data Licence is the contractual instrument through which Data Licensors and AI Vendors evidence compliance with both obligations.

What's Covered in This Template

This UK AI Training Data Licence covers the full data-supply-side architecture across copyright, database right, personal data, EU AI Act compliance and output / model ownership allocation, with a Free baseline and an Expert tier for the compliance overlay.

Data Licensor Party Block

Publisher, image library, news organisation, music label, research institute, individual creator or aggregator — with Companies House number, registered office and named signatory.

AI Vendor Party Block

Foundation model provider, GPAI provider, specialised model team or in-house AI team — with Companies House number, registered office and named signatory.

Data Scope (Free)

Description of the data, categories, volume, format and delivery mechanism (API, SFTP, physical media, cloud bucket, streaming).

Permitted Use (Free)

Training only / training + evaluation / training + retraining / training + full lifecycle — calibrated to the AI Vendor's use case.

Geographic Territory + Duration (Free)

Territory of permitted use (UK only, EU + UK, global) and licence duration (fixed-term, perpetual or subscription).

Governing Law (Free)

England and Wales, Scotland or Northern Ireland with matching exclusive jurisdiction.

Execution Format (Free)

Deed (12 years under s.8 Limitation Act 1980) or simple agreement (6 years under s.5) — UK market practice depends on Licensor preference.

Copyright + Database Right Warranties (Expert)

Full / limited / as-is warranties on copyright ownership; sui generis database right under SI 1997/3032; moral rights waiver under CDPA 1988 s.95.

DSM Article 4 Reservation Compliance (Expert)

Licensor confirms whether the data is or is not subject to a DSM 2019/790 Article 4 reservation — gating the AI Act 53(1)(c) compliance flag for the Vendor.

Third-Party Clearance (Expert)

Confirmation that the Licensor has cleared third-party rights (subjects in photos, contributors to news content, sample clearances for music) for AI training use.

UK GDPR + DUAA 2025 (Expert)

Personal data scrub or Article 6 lawful basis (including DUAA 2025 "recognised legitimate interests" from 5 February 2026); Article 9 special category data analysis; international transfer mechanism for cross-border training.

Output Ownership (Expert)

Licensor owns outputs / Vendor owns outputs / joint ownership / Vendor owns with Licensor licence-back — calibrated to commercial deal.

Model Ownership and Sub-Licensing (Expert)

Vendor owns the trained model; Vendor sub-licensing rights to its customers; Licensor royalty trail or revenue share where used.

Retraining + Model Update Rights (Expert)

Whether the Vendor may use the data for subsequent retraining or model updates without further licence — usually yes within the licence term.

EU AI Act Art 53(1)(c) Reservation Policy (Expert)

Vendor commits to maintain a DSM 2019/790 Article 4 reservation identification and compliance policy — applies to GPAI providers placing models on the EU market since 2 August 2025.

EU AI Act Art 53(1)(d) Training Data Summary (Expert)

Vendor commits to include the licensed data in the AI Office Template public summary of training content — published July 2025, in force 2 August 2025.

Royalty Mechanics (Expert)

Upfront fee / per-use royalty / per-output royalty / revenue share — with audit rights and payment frequency.

Audit Rights (Expert)

Annual or for-cause audit by Licensor of Vendor compliance with the licence terms — usage volume, training scope, sub-licensing, royalty calculation.

Termination + Data Deletion (Expert)

Termination for breach, change of control, regulatory order; data deletion or retention overlay on termination — and the practical question of whether the trained model 'forgets' the training data (it generally does not).

Cross-Indemnity + Liability Cap (Expert)

Licensor indemnifies for IP and personal data warranties; Vendor indemnifies for outputs and downstream use; liability capped at fee paid or fixed amount.

How to Create an AI Training Data Licence

Follow these steps to draft a UK AI Training Data Licence Agreement between a Data Licensor and an AI Vendor.

  1. 1

    Identify the Parties

    Provide the Data Licensor (publisher / image library / news organisation / music label / research institute / individual creator / aggregator) and the AI Vendor (foundation model / GPAI provider / specialised model / in-house team). Add Companies House numbers, registered offices and named signatories.

  2. 2

    Describe the Data

    Insert the data description, categories, volume, format and delivery mechanism (API / SFTP / physical media / cloud bucket / streaming).

  3. 3

    Set Permitted Use, Territory and Duration

    Pick training only / training + evaluation / training + retraining / training + full lifecycle. Set geographic territory (UK / EU + UK / global) and duration (fixed-term / perpetual / subscription).

  4. 4

    Pick Governing Law and Execution Format

    England and Wales / Scotland / Northern Ireland. Pick deed (12-year durability) or simple agreement (6-year).

  5. 5

    Configure Copyright and Database Right Warranties (Expert)

    Pick full / limited / as-is warranty level; tick sui generis database right under SI 1997/3032; tick moral rights waiver under CDPA 1988 s.95.

  6. 6

    Confirm DSM Article 4 and Third-Party Clearance (Expert)

    Confirm whether the data is subject to a DSM 2019/790 Article 4 reservation; confirm third-party clearance (photo subjects, contributors, sample clearances) is in place.

  7. 7

    Address UK GDPR + DUAA 2025 (Expert)

    Tick personal data scrub or pick UK GDPR Article 6 lawful basis (including DUAA 2025 "recognised legitimate interests"). Tick Article 9 special category data analysis if relevant. Pick international transfer mechanism for cross-border training.

  8. 8

    Set Output and Model Ownership (Expert)

    Pick output ownership (Licensor / Vendor / joint / Vendor with Licensor licence-back). Confirm model ownership (Vendor) and sub-licensing rights to Vendor customers. Set royalty trail where used.

  9. 9

    Add EU AI Act Compliance Commitments (Expert)

    Tick Article 53(1)(c) DSM 2019/790 reservation policy commitment and Article 53(1)(d) AI Office Template public summary commitment — for GPAI Vendors marketing models on the EU market.

  10. 10

    Review and Download

    Preview the Licence and download as a free PDF or, with Expert, an editable Microsoft Word (.docx) for execution by both parties.

Why Doxuno documents are different

Four things that make our templates more thorough than AI-generated drafts and more current than static template libraries.

Accurate

Country-specific legal content

Drafted with legal expertise for each jurisdiction, far more thorough than AI-generated drafts that copy generic clauses across borders.

Always current

Always current with the law

Templates carrying statute references are continuously updated as the law changes. Your document always reflects the current legal framework.

Free PDF

Print-ready PDF

Free to download. Vector text, embedded fonts, statute citations baked in. Print, sign, file. Ready for any signing flow including electronic signature.

Word · .docx

Editable Word (.docx)

Continue editing in Word after download. Add custom clauses, reuse the template for similar agreements, or share with a colleague for collaborative review.

Requires Expert one-time unlock or any paid Doxuno subscription.

Legal Considerations

UK AI Training Data Licences engage the UK copyright and database right framework (CDPA 1988 + SI 1997/3032), the Trade Secrets Regulations 2018, the UK GDPR (as amended by DUAA 2025), the EU AI Act 2024/1689 Article 53 compliance regime for GPAI providers, and the unsettled post-Getty v Stability AI UK common law on training data infringement.

This template is for informational purposes only and does not constitute legal advice. UK AI training data licensing is highly specialised and the regulatory landscape is moving rapidly — for any licence above £100,000 in value, any GPAI training deployment, any licence with personal data at scale, any licence involving cross-border training, or any licence with downstream sub-licensing to multiple AI Vendor customers, professional advice from IP and AI counsel is strongly recommended.

Reviewed for England & Wales, Scotland and Northern Ireland copyright and AI law

UK Copyright and the Narrow s.29A TDM Exception

Under section 29A of the Copyright, Designs and Patents Act 1988, fair dealing for the purpose of text and data analysis (TDM) for NON-COMMERCIAL RESEARCH is permitted without infringing copyright — provided the user has lawful access and the source is acknowledged. Critically, the exception does NOT extend to commercial AI training. Commercial scraping of UK copyright material to train a generative AI model is, on the face of CDPA 1988 sections 16-21, copyright infringement absent a licence or rights-holder consent. The UK Government's AI / Copyright Report (due March 2026) was widely expected to consult on a broader commercial TDM exception with an opt-out mechanism (mirroring DSM 2019/790 Article 4); the consultation was paused in late 2024 amid stakeholder opposition. Until policy clarity emerges, express training data licensing is the only commercially safe basis for high-quality UK training data.

Getty Images v Stability AI [2025] EWHC 2863 (Ch) — What It Settled and What It Did Not

The November 2025 High Court judgment in Getty Images (US) Inc v Stability AI Ltd left the UK position on AI training and copyright largely unsettled. The Court REJECTED the secondary copyright claim — a trained model does not STORE the training images in any meaningful sense, so importing the model into the UK does not constitute importing infringing copies under CDPA 1988 ss.22-23. The Court PARTIALLY upheld the trade mark claim — the Getty watermark appearing in Stable Diffusion outputs constituted trade mark use. The PRIMARY CDPA infringement claim (the actual scraping and training acts) was effectively WITHDRAWN at trial because Stable Diffusion was trained OVERSEAS and UK acts of infringement could not be established. The implication for UK Data Licensors and AI Vendors: cross-border training routes around the UK Court's primary infringement jurisdiction; the secondary infringement route is closed; trade mark in outputs survives. Express training data licensing remains the prudent path for both parties.

Database Right under SI 1997/3032 and Trade Secrets under SI 2018/597

Beyond copyright, two further UK IP layers protect training data. The Copyright and Rights in Databases Regulations 1997 (SI 1997/3032) confer a sui generis DATABASE RIGHT on a person who made a substantial investment in obtaining, verifying or presenting the contents of a database. The right runs for 15 years from creation (or first publication) and is RE-RUN whenever there is a substantial further investment. Extraction or re-utilisation of substantial parts of a protected database without consent is infringement. Training an AI on a substantial database without licence is likely infringement. Separately, the Trade Secrets (Enforcement etc.) Regulations 2018 (SI 2018/597) implement EU Directive 2016/943 and protect information that is secret, has commercial value because it is secret, and is subject to reasonable steps to keep it secret. Curated, proprietary training data sets may qualify as trade secrets — adding a third layer of protection for the Licensor.

UK GDPR (DUAA 2025-Amended) and Training Data

Where training data contains personal data, the UK GDPR applies. The lawful basis must be identified at the Article 6 level — for commercial AI training, the most common bases are Article 6(1)(f) legitimate interests (subject to balancing) and (from 5 February 2026, when DUAA 2025 Part 5 came into force) the new "recognised legitimate interests" basis. Article 9 special category data (health, biometric, ethnic origin) requires a Schedule 1 DPA 2018 condition in addition. The Article 5(1)(b) purpose limitation principle is the operational chokepoint: data collected for one purpose (publication, transactional services) cannot lawfully be repurposed for AI training without a fresh lawful basis or a compatible-purpose analysis. The template's Expert tier surfaces all three issues with explicit configuration. International training transfers engage Articles 44-49 and the post-DUAA 'data protection test'.

EU AI Act Article 53(1)(c) DSM Reservation Compliance

Article 53(1)(c) of the EU AI Act 2024/1689 requires every general-purpose AI (GPAI) provider to put in place a policy to identify and comply with reservations of rights expressed under Article 4 of the Digital Single Market Directive 2019/790. Article 4 DSM gives rights holders the right to opt out of text and data mining of their content for commercial purposes — through readable means (machine-readable for online content). The Art 53(1)(c) obligation applied to GPAI providers from 2 August 2025 (the GPAI compliance commencement date under the AI Act). UK-incorporated AI Vendors that place GPAI models on the EU market are within scope. The template's Expert tier embeds the Vendor's reservation policy commitment so the Licensor has contractual visibility into the Vendor's Art 53(1)(c) compliance posture.

EU AI Act Article 53(1)(d) Mandatory Training Data Summary

Article 53(1)(d) of the EU AI Act 2024/1689 requires every GPAI provider to draw up and make publicly available a sufficiently detailed summary about the content used for training of the GPAI model — in the mandatory Template format published by the AI Office in July 2025 and in force from 2 August 2025. Pre-existing models have until 2 August 2027 to comply; AI Office verification of compliance starts from 2 August 2026. The Template requires structured disclosure across data modality, data source categories, licensing arrangements, scraping methodology and rights-holder opt-out compliance. A UK AI Vendor that publishes a Template summary will need to identify the Doxuno-licensed data — the template's Expert tier embeds the Vendor's commitment to include the licensed data in its public summary, giving the Licensor contractual visibility.

Frequently Asked Questions

Create Your AI Training Data Licence Now

Draft a UK AI Training Data Licence Agreement under English law with copyright and database right warranties, UK GDPR + DUAA 2025 lawful basis, EU AI Act Art 53(1)(c) reservation policy and Art 53(1)(d) public summary commitments, output and model ownership allocation and full post-Getty v Stability AI UK risk framework. Fill in the details, preview and download in minutes.

Free PDF · Editable Word with Expert · No account required