Data Licenses#

Like a software license, a data license governs what someone else can do with data that you create or own and that you make accessible to others through, for example, a data repository. Data licenses vary based on different criteria, such as:

  • Attribution to original owner

  • Permission to redistribute or modify original

  • Inclusion of the same license with derivatives or redistributions

As a result, accessibility to your data is affected by the data license you choose.

Creative Commons Licenses#

Creative Commons or CC provides a number of licenses that can be used with a wide variety of creations that might otherwise fall under copyright restrictions, including music, art, books and photographs. Although not tailored for data, CC licenses can be used as data licenses because they are easy to understand. Its website includes a summary page[Com20a] outlining all the available licenses, explained with simple visual symbols.

Permission Levels#

The permission level provided by a Creative Commons data license can be understood from its name, which is a combination of two-letter “permission marks”. The only exception to this naming scheme is CC0, which will be introduced in the next section.

Permission Mark

What can I do with the data?


Creator must be credited


Derivatives or redistributions must have identical license


Only non-commercial uses are allowed


No derivatives are allowed

For example, the CC BY-ND license specifies that users must credit the creator of the data and cannot create any derivatives.

Dedicating Your Work to the Public with CC0#

CC0 serves as a public dedication mechanism, where you relinquish all copyrights to your data. This means that anyone can modify, redistribute or build on your work. Further, by using CC0, you forfeit the right to attribution. Instead, you have to rely on norms such as good citation practices in academic communities to be recognized as the creator. Several organizations, such as museums, governmental bodies and scientific publishers, have chosen CC0 for access to at least part of their data. In many instances, data repositories maintained by universities recommend CC0 as the default option, such as the 4TU.Centre for Research Data.

Open Data Commons#

Open Data Commons provides three licenses that can be applied specifically to data. The webpages [Com20b] of each of these licenses include human-readable summaries, with the ramifications of the legalese explained in a concise format.

The Public Domain Dedication and License or PDDL#

The PDDL is analogous to CC0, where you waive all your rights to the data you are putting into the public domain. It comes with a set of recommended community norms, which are not mandatory to include and do not form a legal contract but can be useful to have as a guide to encourage fair, open sharing of data. It is also possible to put together a customized set of norms that serve your data-sharing community better.

The Attribution or ODC-BY License#

This license protects your attribution rights as a data owner or creator, just like the BY permission mark of CC licenses. Any use or distribution of your database must also include information on the license used with the original.

The Open Database License or ODbL#

The ODbL adds two more restrictions to the ODC-BY license. The first is that any public uses of your data must be shared with the same license, similar to the CC SA permission mark. The second is that if any version of your data is redistributed in a ‘closed’ format (for example, with Technological Protection Measures), it is mandatory for this redistribution to also be available in a version that is free of such closure measures.

A note on the differences between CC and ODC Licenses#

Although it can seem like the licensing options offered by Creative Commons and Open Data Commons are exactly the same, there are some important differences.

One difference is the scope of rights that are covered by the license, which is nicely explained here. The ODC licenses were made specifically to be applied to data, and typically cover only database rights. On the other hand, the CC licenses are more general-purpose and can be applied to other materials. CC licenses also cover copyrights and other neighbouring rights.

Another difference is the availability of a standardised Community Norms document with the PDDL. The lack of such a document with CC0 means that you have to rely on community norms, which may often be unspoken or unwritten and can vary from community to community, to ensure fair attribution. A comparison between the PDDL and CC0 is provided here.

Other Licensing Options#

It is also possible to choose other data licenses that may have been developed with a specific use case or community in mind or that are not in widespread global use. These include licenses that were developed by national governments, such as the Norwegian License for Open Government Data [Age20]. Often, such licenses are the recommended data licensing option within the corresponding country, especially for data created or owned by their public bodies. Another example is the Open Government Licence or OGL, which was developed by The National Archives, UK.

The Data Curation Center (DCC) guide [Bal20] on how to license research data expatiates on the licenses discussed in this chapter, and gives more information about Prepared Licenses, Bespoke Licenses, Multiple Licensing and Mechanisms for Licensing Data.

If you would like to read more about the challenges and finer points of licensing, this article is a great resource to get you started.

Chapter Tags: This chapter is curated for the Turing Data Study Group (turing-dsg).