Legal framework of textual data processing for Machine Translation and Language Technology research and development activities/Public Sector Information (PSI) Case Studies
Do I upload the dataset and also provide a link to the original site
or
Just describe it with metadata, add attribution info, and link to the original site for downloading?
Suggested legal solution
Legal position
The licensing information is slightly confusing, since it accepts the AC corpus as being in the public domain, but then moves to impose conditions to its access (attribution and non-endorsement). This may be construed as having individual legislative documents as under the PD and the complete database (corpus) under copyright (or the sui-generis right) and having thus the licence applied only to the copyrighted parts of the corpus. It is also debatable whether the corpus would fall under Decision 2011/833/EU. In any case, the licensing terms would satisfy the conditions of the re-use decision. Note that the Eurovoc Thesaurus does not fall under the Re-use decision and a special permission is required regarding its re-use.
Suggested course of action
Keep the attribution notice and metadata together with the non-endorsement note. You may upload and share the material through your repository as long as you adhere to these conditions.
Type of Terms and Conditions
Attribution, non-endorsement, copyright notices
Legal basis
Copyright Law, 2011/833/EU
Case #2: Uploading-copying Public data (normally under PSI directive) to a repository[edit | edit source]
Do I upload the dataset and also provide a link to the original site
or
Just describe it with metadata, add attribution info, and link to the original site for downloading?
Suggested legal solution
Legal position
The licensing information is slightly confusing, since it accepts the AC corpus as being in the public domain, but then moves to impose conditions to its access (attribution and non-endorsement). This may be construed as having individual legislative documents as under the PD and the complete database (corpus) under copyright (or the sui-generis right) and having thus the licence applied only to the copyrighted parts of the corpus. It is also debatable whether the corpus would fall under Decision 2011/833/EU. In any case, the licensing terms would satisfy the conditions of the re-use decision. Note that the Eurovoc Thesaurus does not fall under the Re-use decision and a special permission is required regarding its re-use.
Suggested course of action
Keep the attribution notice and metadata together with the non-endorsement note. You may upload and share the material through your repository as long as you adhere to these conditions.
Type of Terms and Conditions
Attribution, non-endorsement, copyright notices
Legal basis
Copyright Law, 2011/833/EU
Case #3: Uploading-copying "Open" data to a repository[edit | edit source]
I want to copy it from the http://opus.lingfil.uu.se/OpenSubtitles.php site which says "IMPORTANT: If you use the OpenSubtitle corpus, please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data! I got the data under this condition!"
Or
Just describe it with metadata, add attribution info, and link to the original site for downloading?
Suggested legal solution
Legal position
This is a case where we have a copyrighted work (subtitles and subtitles database) that is protected under copyright and is licensed under a custom made open licence. Custom made open licences are licences with minimal conditions (i.e. attribution and copyleft) that were made for a specific work or set of works but could potentially interoperate with other open licences. The specific licence only requires reference to the original site, i.e. http://www.opensubtitles.org/, to allow all uses of the work. No further attribution or use of notices is required, since the attribution through the URL is meant to cover them all.