Avoiding the pitfalls of data lifecycle

20 Dec 2019
20 Dec 2019

A multi-institutional, multi-country study must tackle many issues dealing with the storage, security and compliance of the data collected.

Professor Catherine Ward of UCT’s Department of Psychology is working on the South African piece of an eight-country longitudinal birth-cohort study led by Cambridge University which seeks to understand violence against children. UCT eResearch worked with Ward to help her anticipate and resolve the many research data problems that are likely to crop up during the course of the study.

End all forms of violence against children – so states the United Nations Sustainable Development Goal 16.2. In order to achieve this critical goal, policy-makers need to understand the complex factors at play. This is the objective of the eight-country birth cohort study coordinated by a team at Cambridge with research collaborators across the globe. The goal of this study is to track a total of 12 000 pregnant women in eight low-income countries (see map of global distribution) through their pregnancies and into the adulthood of the children in the birth cohort themselves.

The eight countries included in the study are: Ghana, Jamaica, Pakistan, Philippines, Romania, South Africa, Sri Lanka and Vietnam.

“Trying to think through the various implications of data management in a research study of this level of complexity was daunting,” says Ward. “Without the proper support there are data issues a researcher can stumble into very naively, with devastating consequences ten or even 20 years down the line.”

Ward thus turned to eResearch analyst Renate Meyer to support her through the various steps of the research lifecycle. Ward used UCT Digital Libraries Services’ data management planning tool, UCT DMP, which she says was invaluable in the process.

Data collection

“These days this kind of data is collected electronically through a tablet,” explains Ward. This has a number of advantages including that the participants in the study can answer sensitive questions completely confidentially, and because the datcoa is then uploaded directly to the server.

 

“Without the proper support there are data issues a researcher can stumble into very naively, with devastating consequences ten or even 20 years down the line.”

Meyer and Ward discussed various data collection software options and the pros and cons of each.  “While the decision as to what software to use will be made at Cambridge, the advice from UCT’s Information and Communication Technology Services (ICTS) allows me to contribute meaningfully to the decision-making process,” says Ward.

Secure (African) storage versus ease of collaboration?

Once the data has been collected and uploaded to a server, many of the real data management challenges begin.

“We are asking questions about highly sensitive issues, including substance abuse, intimate partner violence and HIV status among others,” says Ward. “In addition, because we are tracking these women over decades we need to keep records of personal information - names, addresses, telephone numbers, telephone number of family members, etc.”

The project therefore cannot risk a security breach, but at the same time needs to be able to share anonymised data with collaborators all over the world. It is also extremely important to keep the data in Africa, under local jurisdiction.

Meyer worked with Ward to develop a plan to keep the sensitive data highly secure and offline – accessible only to the South African team – with the anonymized data for collaboration available to research partners but protected as UCT intellectual property.

Contracts and compliance

There are also a number of legal and contractual hoops a researcher needs to jump through in a multi-institutional, multi-country study such as this. And it was an awareness of the complexity of the compliance issues faced that prompted Ward to first contact eResearch.

Meyer and eResearch Director Dr. Dale Peters advised Ward to work with Research Contracts and Innovation (RC&I) on a data sharing agreement between sites to prevent battles further along the line as to who owns the data; who has access to it; and who is responsible for its long-term preservation.

This, according to Andrew Bailey, senior manager at RC&I, needs to be dealt with right in the beginning at the research contract or consortium agreement stage of the project.

 

“We will be collecting data over a 20 to 30-year period, at least, so we need to be sure the software we are using in 2045 can read the data collected in 2021.”

“These contracts will need to take funder requirements into account, different university policies and also different geographical locations as the legislation in different countries will differ,” says Bailey.

Long-term data management

Working with UCT eResearch Ward says she could also plan for eventualities she would never have considered herself, particularly the issue of data decay.

“We will be collecting data over a 20 to 30-year period, at least, so we need to be sure the software we are using in 2045 can read the data collected in 2021.”

This means the project needs to budget for software upgrades, translating data and a dedicated data steward position to ensure the data remains useable throughout the length of the study.

Offering a competitive edge

For Ward the eResearch support she received in working through these complex data issues at proposal stage was a huge relief.

“I am a psychologist with no insight into many of the complexities and technicalities we have to grapple with around data management and compliance,” she says.

Ward says she is grateful to UCT eResearch not only for the technical guidance and advice, but also for the legwork and time they put into helping her with her proposal.

“Having this kind of support,” she says, “has given us a real competitive edge.”

Story:  Natalie Simon

Photo:  Jonny Lindner:  Pixabay

For more stories like this, take a look at our 2018-2019 UCT eResearch Report