Skip to main content

Text and Data Mining

TDM in relation to the right to research

Published onNov 01, 2021
Text and Data Mining

This document is the workspace for exploring the working group’s position on the topic of the “Text and Data Mining”.

Text and data mining and the right to research

  • Text and data mining (TDM) activities are pivotal in supporting research and innovation and in the training of AI systems. 

  • TDM activities are non-consumptive and non-expressive uses of work. TDM does not compete with original markets for works, and may indeed enhance them by increasing demand for a wider range of works. 

  • TDM should not be made subject to additional authorisations or payments once access is legitimate. Generally, TDM activities should not be considered copyright infringement and should not be restricted by copyright. 

  • TDM should be allowed and supported pursuant to exceptions and limitations, in particular to enable a proper exercise of the right to research and AI training-related activities.


  • We encourage the use of larger and more diverse sets of data in order to avoid bias in outputs. 

  • Broad and open exceptions and limitations should apply to support the most extensive possible use of copyrighted works as AI input in order to encourage the elimination and minimisation of bias. Placing barriers around copyright material that can be freely mined risks increasing the likelihood of AI bias, unfairness and exclusion. 

  • One way to reduce bias, unfairness and exclusion in AI systems, other than ensuring that the algorithm itself is not biased, is to ensure that the maximum volume and widest diversity of content is available for training purposes, requiring both minimising unnecessary barriers to TDM and facilitating uses across borders. 

  • A careful balance must be struck between a push to reduce bias, unfairness and exclusion in AI on the one hand and privacy rights and ethical and human rights considerations on the other. 

Digital rights management/Technological protection measures

  • There should be no digital rights management (DRM) or technological protection measures (TPM) to restrict or prevent (otherwise legal) access to the data. There should be ethical requirements for transparency in the modalities of use of data, however this should be established outside the boundaries of the copyright system.

Database rights

  • Database rights are a potential harm to the development of AI, especially given the exclusion of data and other mere facts from copyright protection under international law​.

  • The ​sui generis​ protection in the EU Database Directive14 should be repealed in light of studies that demonstrate its lack of effectiveness in achieving its objectives and the unnecessary and complex implications in the field of copyright law.

Notes from 1st meeting:

Arcadia project to promote the international right to research in copyright law - TDM exception will be very limited for many countries

No comments here
Why not start the discussion?