Country-specific legal content
Drafted with legal expertise for each jurisdiction, far more thorough than AI-generated drafts that copy generic clauses across borders.
An AI Training Data Licence is the contract under which a Data Licensor (a publisher, image library, news organisation, music label, research institute or aggregator) grants an AI Vendor (a foundation model provider, general-purpose AI provider or in-house AI team) a licence to use specified data for training, evaluating and retraining AI models. Use our free UK template to draft an AI Training Data Licence under English, Scots or Northern Irish law — addressing the UK copyright (CDPA 1988) and database right framework, the post-Getty Images v Stability AI [2025] EWHC 2863 (Ch) UK risk landscape, the EU AI Act Article 53(1)(c) DSM 2019/790 Article 4 reservation-of-rights compliance obligation that has applied to GPAI providers since 2 August 2025, the Article 53(1)(d) mandatory public training-data summary in the AI Office's Template format, the UK GDPR (as amended by the Data (Use and Access) Act 2025) lawful basis and special category data analysis, and the output / model ownership and sub-licensing allocation that turns a one-off data licence into a sustainable AI ecosystem partnership.
PDF (free) + editable Word (.docx) with Expert
Available as a print-ready PDF or an editable Microsoft Word (.docx) file.
A UK AI Training Data Licence is the agreement by which a Data Licensor authorises an AI Vendor to use specified data — text corpora, image libraries, audio archives, video footage, scientific datasets, news feeds, code repositories, structured data — for the purpose of training, evaluating, fine-tuning, retraining or otherwise developing artificial intelligence models. It sits at the centre of the modern UK AI economy: foundation model providers, general-purpose AI (GPAI) providers, specialised model developers and in-house AI teams cannot lawfully train on third-party data in the UK without (a) an applicable copyright or database-right exception, (b) the data being out of copyright, or (c) a properly drafted training data licence from the rights holder. The third route — express licensing — is the only commercially sustainable basis for high-quality training data at scale.
Why does a licence matter even more in 2026? Because the UK's narrow text and data mining exception under section 29A of the Copyright, Designs and Patents Act 1988 covers only NON-COMMERCIAL research. Commercial scraping of UK copyright material for AI training is infringement, full stop, absent a licence or rights-holder consent. The November 2025 High Court judgment in Getty Images (US) Inc v Stability AI Ltd [2025] EWHC 2863 (Ch) left this landscape largely unsettled in important respects — the Court REJECTED the secondary copyright claim (a trained model does not store the training images), PARTIALLY upheld the trade mark claim (the Getty watermark appearing in Stable Diffusion outputs), and the primary CDPA infringement claim was effectively WITHDRAWN at trial because Stable Diffusion was trained overseas (UK acts of infringement could not be established). The UK Government's AI / Copyright Report (due March 2026) and the European Parliament's analytical work on AI training and copyright are the next signals; until they land, prudent UK Data Licensors and AI Vendors lean on express training-data licensing rather than fight over the residual scope of s.29A.
Layered on top of UK copyright is the EU AI Act 2024/1689 Article 53 compliance regime for GPAI providers — which applies to UK-incorporated AI Vendors that place GPAI models on the EU market. Article 53(1)(c) requires every GPAI provider to put in place a policy to identify and comply with reservations of rights expressed under Article 4 of the Digital Single Market Directive 2019/790 (DSM 2019/790) — the EU-wide text and data mining opt-out. The obligation has applied since 2 August 2025. Article 53(1)(d) requires every GPAI provider to publish a "sufficiently detailed summary" of the training data used, in the mandatory Template format published by the AI Office in July 2025 and in force from 2 August 2025 (with a transition to 2 August 2027 for pre-existing models and AI Office verification powers from 2 August 2026). The AI Training Data Licence is the contractual instrument through which Data Licensors and AI Vendors evidence compliance with both obligations.
This UK AI Training Data Licence covers the full data-supply-side architecture across copyright, database right, personal data, EU AI Act compliance and output / model ownership allocation, with a Free baseline and an Expert tier for the compliance overlay.
Publisher, image library, news organisation, music label, research institute, individual creator or aggregator — with Companies House number, registered office and named signatory.
Foundation model provider, GPAI provider, specialised model team or in-house AI team — with Companies House number, registered office and named signatory.
Description of the data, categories, volume, format and delivery mechanism (API, SFTP, physical media, cloud bucket, streaming).
Training only / training + evaluation / training + retraining / training + full lifecycle — calibrated to the AI Vendor's use case.
Territory of permitted use (UK only, EU + UK, global) and licence duration (fixed-term, perpetual or subscription).
England and Wales, Scotland or Northern Ireland with matching exclusive jurisdiction.
Deed (12 years under s.8 Limitation Act 1980) or simple agreement (6 years under s.5) — UK market practice depends on Licensor preference.
Full / limited / as-is warranties on copyright ownership; sui generis database right under SI 1997/3032; moral rights waiver under CDPA 1988 s.95.
Licensor confirms whether the data is or is not subject to a DSM 2019/790 Article 4 reservation — gating the AI Act 53(1)(c) compliance flag for the Vendor.
Confirmation that the Licensor has cleared third-party rights (subjects in photos, contributors to news content, sample clearances for music) for AI training use.
Personal data scrub or Article 6 lawful basis (including DUAA 2025 "recognised legitimate interests" from 5 February 2026); Article 9 special category data analysis; international transfer mechanism for cross-border training.
Licensor owns outputs / Vendor owns outputs / joint ownership / Vendor owns with Licensor licence-back — calibrated to commercial deal.
Vendor owns the trained model; Vendor sub-licensing rights to its customers; Licensor royalty trail or revenue share where used.
Whether the Vendor may use the data for subsequent retraining or model updates without further licence — usually yes within the licence term.
Vendor commits to maintain a DSM 2019/790 Article 4 reservation identification and compliance policy — applies to GPAI providers placing models on the EU market since 2 August 2025.
Vendor commits to include the licensed data in the AI Office Template public summary of training content — published July 2025, in force 2 August 2025.
Upfront fee / per-use royalty / per-output royalty / revenue share — with audit rights and payment frequency.
Annual or for-cause audit by Licensor of Vendor compliance with the licence terms — usage volume, training scope, sub-licensing, royalty calculation.
Termination for breach, change of control, regulatory order; data deletion or retention overlay on termination — and the practical question of whether the trained model 'forgets' the training data (it generally does not).
Licensor indemnifies for IP and personal data warranties; Vendor indemnifies for outputs and downstream use; liability capped at fee paid or fixed amount.
Follow these steps to draft a UK AI Training Data Licence Agreement between a Data Licensor and an AI Vendor.
Provide the Data Licensor (publisher / image library / news organisation / music label / research institute / individual creator / aggregator) and the AI Vendor (foundation model / GPAI provider / specialised model / in-house team). Add Companies House numbers, registered offices and named signatories.
Insert the data description, categories, volume, format and delivery mechanism (API / SFTP / physical media / cloud bucket / streaming).
Pick training only / training + evaluation / training + retraining / training + full lifecycle. Set geographic territory (UK / EU + UK / global) and duration (fixed-term / perpetual / subscription).
England and Wales / Scotland / Northern Ireland. Pick deed (12-year durability) or simple agreement (6-year).
Pick full / limited / as-is warranty level; tick sui generis database right under SI 1997/3032; tick moral rights waiver under CDPA 1988 s.95.
Confirm whether the data is subject to a DSM 2019/790 Article 4 reservation; confirm third-party clearance (photo subjects, contributors, sample clearances) is in place.
Tick personal data scrub or pick UK GDPR Article 6 lawful basis (including DUAA 2025 "recognised legitimate interests"). Tick Article 9 special category data analysis if relevant. Pick international transfer mechanism for cross-border training.
Pick output ownership (Licensor / Vendor / joint / Vendor with Licensor licence-back). Confirm model ownership (Vendor) and sub-licensing rights to Vendor customers. Set royalty trail where used.
Tick Article 53(1)(c) DSM 2019/790 reservation policy commitment and Article 53(1)(d) AI Office Template public summary commitment — for GPAI Vendors marketing models on the EU market.
Preview the Licence and download as a free PDF or, with Expert, an editable Microsoft Word (.docx) for execution by both parties.
Four things that make our templates more thorough than AI-generated drafts and more current than static template libraries.
Drafted with legal expertise for each jurisdiction, far more thorough than AI-generated drafts that copy generic clauses across borders.
Templates carrying statute references are continuously updated as the law changes. Your document always reflects the current legal framework.
Free to download. Vector text, embedded fonts, statute citations baked in. Print, sign, file. Ready for any signing flow including electronic signature.
Continue editing in Word after download. Add custom clauses, reuse the template for similar agreements, or share with a colleague for collaborative review.
Requires Expert one-time unlock or any paid Doxuno subscription.
UK AI Training Data Licences engage the UK copyright and database right framework (CDPA 1988 + SI 1997/3032), the Trade Secrets Regulations 2018, the UK GDPR (as amended by DUAA 2025), the EU AI Act 2024/1689 Article 53 compliance regime for GPAI providers, and the unsettled post-Getty v Stability AI UK common law on training data infringement.
This template is for informational purposes only and does not constitute legal advice. UK AI training data licensing is highly specialised and the regulatory landscape is moving rapidly — for any licence above £100,000 in value, any GPAI training deployment, any licence with personal data at scale, any licence involving cross-border training, or any licence with downstream sub-licensing to multiple AI Vendor customers, professional advice from IP and AI counsel is strongly recommended.
Reviewed for England & Wales, Scotland and Northern Ireland copyright and AI law
Under section 29A of the Copyright, Designs and Patents Act 1988, fair dealing for the purpose of text and data analysis (TDM) for NON-COMMERCIAL RESEARCH is permitted without infringing copyright — provided the user has lawful access and the source is acknowledged. Critically, the exception does NOT extend to commercial AI training. Commercial scraping of UK copyright material to train a generative AI model is, on the face of CDPA 1988 sections 16-21, copyright infringement absent a licence or rights-holder consent. The UK Government's AI / Copyright Report (due March 2026) was widely expected to consult on a broader commercial TDM exception with an opt-out mechanism (mirroring DSM 2019/790 Article 4); the consultation was paused in late 2024 amid stakeholder opposition. Until policy clarity emerges, express training data licensing is the only commercially safe basis for high-quality UK training data.
The November 2025 High Court judgment in Getty Images (US) Inc v Stability AI Ltd left the UK position on AI training and copyright largely unsettled. The Court REJECTED the secondary copyright claim — a trained model does not STORE the training images in any meaningful sense, so importing the model into the UK does not constitute importing infringing copies under CDPA 1988 ss.22-23. The Court PARTIALLY upheld the trade mark claim — the Getty watermark appearing in Stable Diffusion outputs constituted trade mark use. The PRIMARY CDPA infringement claim (the actual scraping and training acts) was effectively WITHDRAWN at trial because Stable Diffusion was trained OVERSEAS and UK acts of infringement could not be established. The implication for UK Data Licensors and AI Vendors: cross-border training routes around the UK Court's primary infringement jurisdiction; the secondary infringement route is closed; trade mark in outputs survives. Express training data licensing remains the prudent path for both parties.
Beyond copyright, two further UK IP layers protect training data. The Copyright and Rights in Databases Regulations 1997 (SI 1997/3032) confer a sui generis DATABASE RIGHT on a person who made a substantial investment in obtaining, verifying or presenting the contents of a database. The right runs for 15 years from creation (or first publication) and is RE-RUN whenever there is a substantial further investment. Extraction or re-utilisation of substantial parts of a protected database without consent is infringement. Training an AI on a substantial database without licence is likely infringement. Separately, the Trade Secrets (Enforcement etc.) Regulations 2018 (SI 2018/597) implement EU Directive 2016/943 and protect information that is secret, has commercial value because it is secret, and is subject to reasonable steps to keep it secret. Curated, proprietary training data sets may qualify as trade secrets — adding a third layer of protection for the Licensor.
Where training data contains personal data, the UK GDPR applies. The lawful basis must be identified at the Article 6 level — for commercial AI training, the most common bases are Article 6(1)(f) legitimate interests (subject to balancing) and (from 5 February 2026, when DUAA 2025 Part 5 came into force) the new "recognised legitimate interests" basis. Article 9 special category data (health, biometric, ethnic origin) requires a Schedule 1 DPA 2018 condition in addition. The Article 5(1)(b) purpose limitation principle is the operational chokepoint: data collected for one purpose (publication, transactional services) cannot lawfully be repurposed for AI training without a fresh lawful basis or a compatible-purpose analysis. The template's Expert tier surfaces all three issues with explicit configuration. International training transfers engage Articles 44-49 and the post-DUAA 'data protection test'.
Article 53(1)(c) of the EU AI Act 2024/1689 requires every general-purpose AI (GPAI) provider to put in place a policy to identify and comply with reservations of rights expressed under Article 4 of the Digital Single Market Directive 2019/790. Article 4 DSM gives rights holders the right to opt out of text and data mining of their content for commercial purposes — through readable means (machine-readable for online content). The Art 53(1)(c) obligation applied to GPAI providers from 2 August 2025 (the GPAI compliance commencement date under the AI Act). UK-incorporated AI Vendors that place GPAI models on the EU market are within scope. The template's Expert tier embeds the Vendor's reservation policy commitment so the Licensor has contractual visibility into the Vendor's Art 53(1)(c) compliance posture.
Article 53(1)(d) of the EU AI Act 2024/1689 requires every GPAI provider to draw up and make publicly available a sufficiently detailed summary about the content used for training of the GPAI model — in the mandatory Template format published by the AI Office in July 2025 and in force from 2 August 2025. Pre-existing models have until 2 August 2027 to comply; AI Office verification of compliance starts from 2 August 2026. The Template requires structured disclosure across data modality, data source categories, licensing arrangements, scraping methodology and rights-holder opt-out compliance. A UK AI Vendor that publishes a Template summary will need to identify the Doxuno-licensed data — the template's Expert tier embeds the Vendor's commitment to include the licensed data in its public summary, giving the Licensor contractual visibility.
Draft a UK AI Training Data Licence Agreement under English law with copyright and database right warranties, UK GDPR + DUAA 2025 lawful basis, EU AI Act Art 53(1)(c) reservation policy and Art 53(1)(d) public summary commitments, output and model ownership allocation and full post-Getty v Stability AI UK risk framework. Fill in the details, preview and download in minutes.
Free PDF · Editable Word with Expert · No account required