Opening up the research enterprise through data publication at UCT
23 Oct 2018 - 16:15
Today, technology offers new ways of not only acquiring but also sharing and storing research data, allowing for greater collaboration among researchers as well as a more rigorous scrutiny – both of which are in the best interests of research. The result is a global movement towards greater openness in science. As part of this, UCT, in March 2018, implemented its research data management (RDM) policy to support effective data sharing and to address the need for data to be findable, accessible, interoperable and reusable (FAIR) to specific quality standards.
“Open science – in particular, making the data on which the science is based freely available – is a response to the notion that university research is a public good and should be publicly available,” says Dr Dale Peters, UCT eResearch director. “In addition, funders are mandating data publication so they don’t repeat-fund research, and journals are mandating it so that results of publications can be verified.”
The purpose of UCT’s RDM policy is to “transform the way research is conducted at UCT by accelerating discovery, increasing the value of research decision-making and catalysing changes throughout the economy and society that are of value to its citizens.”
Beyond the public good, there are other enticing reasons why researchers should want to publish their data openly. The first is for citations, says Niklas Zimmer, head of UCT Libraries’ Digital Library Services (DLS).
“Data is now another thing you can be cited for. Open-access publishing of the data means it will be found and reused by other researchers in your field, and you will be credited for this.”
The open publication of data does require changes in behaviour around research data. Researchers now need to think through issues of data management they may not have considered before, and spend time on data curation. However, these improved data management practices may impact positively on overall research outcomes, both now and in the future.
“Data curation is a very important part of data publication, but it does not necessarily have to result in publication – open or otherwise,” says Thomas King, data curation officer at DLS.
Curation is the process of organising data according to logical standards. High-quality curation allows for the long-term preservation and accessibility of data. It can include activities such as regular, secure backup and archiving, or using open formats that will survive software and technological changes and so remain accessible in the long term.
“The advantages of properly curated data are twofold. First, it means your data won’t be lost. Second, if it is organised and described in an understandable and logical way, it can be reused by colleagues and students,” King says.
According to King, once the data has been properly curated, the majority of the work for data publication is done. The researcher can then upload the data to a platform such as ZivaHub.
An additional advantage of the open publishing of data for reuse is the added value, says Lynn Woolfrey, operations manager at DataFirst, an open data repository at UCT. It has become common to refer to data as the new oil, but, says Woolfrey, this comparison is not accurate.
“To quote Adam Schlosser of the World Economic Forum,” she says, “data is not the new oil, because the attitude of scarcity does not apply to it. No one wants to share oil wealth, but data gains value the more it is shared.”
Woolfrey says she has seen the value of the “virtuous cycle of reuse” in sharing research data as a public good through DataFirst, as SA government data quality has improved over the years thanks to feedback from data users in academia.
Open science support at UCT
UCT offers software and support to researchers throughout their research projects.
Increasingly, funders require the submission of a data management plan (DMP) in the early stages of a research project. DLS hosts DMPonline, a tool developed by the Digital Curation Centre (UK) to enable researchers, data managers and principal investigators to complete their DMPs with appropriate user guidance provided via the platform.
The Open Science Framework is a project management repository – created by the Centre for Open Science – which serves as a collaboration tool. It allows researchers to either work on their projects privately, with a limited number of collaborators, or to make their projects completely open. DLS has set up a UCT instance of the Open Science Framework that researchers can freely and securely make use of with their UCT credentials.
ZivaHub: the institutional data repository of UCT, runs on the cloud-based Figshare platform. Its mission is not to replace existing or discipline-specific data repositories, but rather to offer a service for any staff member or student who needs to openly publish data.
When you publish data on ZivaHub, a persistent identifier – a digital object identifier (DOI) – is created. This makes the data recognisable as belonging to UCT and can be used to identify the data, irrespective of where it sits. The platform also offers metrics, allowing researchers to know how often and where their data has been viewed, downloaded and cited.
An added bonus of publishing on ZivaHub is the institutional support from data curation officers. While this team cannot edit or make changes to the microdata – the data provided by the researcher – they can assist in the describing of the data (metadata) to ensure it is correctly categorised and labelled for maximum discoverability and likelihood of reuse.
DataFirst is an open research data repository based at UCT which also holds socioeconomic data from a number of African governments and research institutions. The data is easily accessible to researchers and policy analysts worldwide.
offers subject-specialist support with the anonymisation and special preparation required for microdata sharing
works with large-scale university projects and government agencies to encourage them to deposit their raw data for further use
quality-checks and anonymises data
provides an open data site where researchers can read about and download data
in the case of sensitive data, allows for the use of the data in their secure centre at UCT
supports data users
trains African researchers in data analysis.
Other data repositories
There is also a range of discipline-specific data repositories and alternatives to ZivaHub, for example Zenodo. DLS have set up a UCT community on Zenodo to which UCT-related data publications can be associated.
Researchers are encouraged to use any data repository that best suits their needs.