1. Data Protection, Data Management & Data Validation

1. Purpose

This Standard Operating procedure (SOP) describes the processes of handling personal data collected for clinical research purposes.

2. Application

Required reading for all personnel working on clinical research studies within Starship Child Health.

3. Introduction

3.1 Confidentiality
All staff have a duty of confidence in relation to personal data where it is processed for the purposes of clinical research. Participant confidentiality must be maintained and data must not be disclosed to anyone without authorisation.

In the New Zealand context, data is seen as taonga (something sacred, precious, or significant) (Whaanga et al. 2017). A taonga should be actively cared for in a manner that preserves its integrity and value. Health data is used in most health and disability research studies, as well as QI projects. Some of this data is prospectively collected for the purpose of research, but a growing proportion of data is collected through routine processes, for example through healthcare procedures or interaction with health agencies.

Personal confidential data is any data relating to an identified or identifiable person.
Examples of personal confidential data include but are not limited to;

  • NHI

  • Name

  • Street address

  • Phone number

  • Online identity (e.g., email, twitter name)

  • Identification numbers (e.g., community services card, driver’s licence).

  • Date of birth

  • Identification of relatives

  • Identification of employers

  • Clinical notes

  • Any other direct or indirect identifiers that carry significant risk of re-identification

3.2 Data Protection
During the entire data management process it is essential that all study data are kept in a secure location and in accordance with the terms of the Health Information Privacy Code 1994. All study records should be kept in pseudonymised form identifying participants by their study code rather than name, initials or hospital number.

3.2.1 Health Information Privacy Code 1994
Where a health agency collects health information directly from the individual concerned, or from the individual’s representative, the health agency must take such steps as are, in the circumstances, reasonable to ensure that the individual concerned (and the representative if collection is from the representative) is aware of:

(a) the fact that the information is being collected;

(b) the purpose for which the information is being collected;

(c) the intended recipients of the information;

(d) the name and address of: (i) the health agency that is collecting the information; and (ii) the agency that will hold the information;

(e) whether or not the supply of the information is voluntary or mandatory and if mandatory the particular law under which it is required;

(f) the consequences (if any) for that individual if all or any part of the requested information is not provided; and

(g) the rights of access to, and correction of, health information provided by rules 6 and 7.



3.3 NEAC Guidelines for Observational and Interventional Studies 2019 (Section 12 Health Data)


3.4 Data Types

3.4.1 Anonymised data
Data in a form that does not directly identify individuals, even when combined with other data.

3.4.2 Pseudo-anonymised data
Data that has a unique identifier that is not part of clinical information e.g. not NHS number or hospital number but a generated unique value that is linked to the individual.

3.5 Data Transfer
Appropriate measures should be taken to ensure data is not lost or falls into the wrong hands during transfer. Confidential data should never be sent through regular mail or via unencrypted email messages. All data transfers should be kept on a log. Data transfer agreements should be used to clearly define each parties responsibilities and these will need to be reviewed by legal and signed by the Service General Manager.

3.5.1 By post
Sealed or tamper proof-evident envelopes should be used to send information or these should be provided to participants if they are expected to send confidential details back. Envelopes should be clearly addressed to indicate who the recipient is along with the sender’s information details on the envelope in case of undeliverable items. Documents containing sensitive personal data and/or audio or video recordings of consultations or interviews should be labelled with only the unique study identifier and be sent only via registered post or courier.

3.5.2 Electronic & email
Any data transferred on removable media should be both password protected and encrypted and care should be taken not to lose devices.

Information that is utilised for secondary purposes must be subject to the pseudonymisation procedure. Information in email pertaining to research should be pseudo-anonymised.

District Health Board email addresses are secure as data stays behind the firewall but other email accounts are not secure as data goes through public firewalls. Email should therefore be encrypted.

Where files are being transferred these must be password protected. A record of time and date, location and purpose of transfer should be kept for audit trails.

4. Procedure

Only data that is essential for the purposes of the study should be collected as stated in the clinical trial protocol. The need for directly identifiable data should also be considered, such as collecting half the postcode instead of the full postcode or age instead of date of birth. ICH GCP Guidelines section 5.5.1 specifies that appropriately qualified individuals should supervise the study data handling, verify the data and conduct the statistical analyses.

4.1 Data Processing SOP
Before the study starts, it is essential that an SOP for data processing; management; and validation is put together and updated as necessary throughout the study. The SOP should contain information on the following:

  • Contact details for all study staff

  • Details of the flow of data from the investigator site to archiving

  • Procedures on how to complete the CRFs

  • Monitoring plans e.g. frequency, how Source Data Verification will be done, expected ranges for data values

  • Data Entry

    • How to use the data entry system

    • Double or single entry

    • Roles and responsibilities of study staff with regard to data

    • Procedures in case of discrepancies

  • Details of edit checks

  • Description of Post Data Entry Validation System

    • Who checks the consistency of the data?

    • Who queries the Investigator?

    • What is the format of the query form?

    • How many days are allocated to answer a query?

    • Who decides that a query is resolved?

  • Data Protection procedures, including a back-up system

Although the above list is not exhaustive it provides a basis for the Data Management SOP that can be adapted and expanded as necessary.

4.2 Data Management
Data management is the process of converting the collected research data, usually from Case Report Forms (CRFs) into electronic data that can be statistically analysed. Once the CRF has been designed in accordance with the study protocol, a database of where the collected information can be stored should be designed.

4.2.1 Data Management Software
Dependent on the size and type of study, a standard spreadsheet, a secure web application for building and managing online surveys and databases (REDCap) may suffice, or a Data Management System (DMS) may be necessary. It is important to consider the following when setting up a database;

  • user friendliness and the ease of being able to train people on how to use the system

  • password-protected access for users

  • ability for more than one user to access the system at any one time

  • ability to know who has entered specific data

  • ability for certain users to be allowed to make changes in a documented manner & no data should be deleted

  • ability to store and retrieve all data required for the study efficiently

  • ability for alerts to be created, for instance if values entered are outside an expected range, if a text value is entered instead of numeric value, if data required to be entered is missing

  • ability to maintain an audit trail for data entered (ICH GCP 5.5.3).

Under ICH GCP there should be a specific SOP for managing the study database in place. There should also be adequate backup for the data and if there is blinding involved in the study, the data entry processing systems should allow this to be maintained (ICH GCP 5.5.3).

4.3 Data Validation
During data entry by trained staff, an average of a 5% error is expected. Therefore data validation is an integral part of data management to ensure the most accurate data is available for analysis. Validation can be completed as part of ongoing trial monitoring either by members of the research team or independent monitors.

An Edit Check Specification (ECS) document should be put together by the study team which provides full details of the data entry checks that have been set up and all checks should be tested before the study begins. Data validation continues until all missing values and inconsistencies are corrected or clarified.

4.3.1 Source Data Verification (SDV)
Validation via monitoring is where Source Data Verification is performed. This involves checking data entered into the CRFs against that in the original source records e.g. patient’s hospital records for accuracy.

4.3.2 Single data entry checks
Suitable for smaller single centre studies with fewer staff available for data entry and/or less sophisticated software. Once data has been entered into the database, a visual check is performed comparing what is on the paper CRF to what was entered on screen.

4.3.3 Double data entry checks
If the software is available, two people enter the same CRF data onto the database independently of each other. If the entries do not match, the database will flag this up.

4.3.4 Final data entry checks
When total data entry is complete for the study, systematic computer tests should be run to find any missing values or values outside of range. Logical checks should also be made to ensure consistent reporting between fields.

4.4 Independent Data Monitoring Committees (IDMCs)
It is recommended for large, complex trials that an Independent Data Monitoring Committee (IDMC) is set up to carry out reviews of trial data at staged intervals during the study. The role of the IDMC is to review interim results and determine whether or not there are any safety issues or any reason why the study should not continue e.g. if interim results are showing strong evidence that the treatment/intervention is superior or inferior to the control.

The data reviewed by the monitoring committee should be current and should be validated up to the point of interim analysis to ensure it is of sufficient quality.
The membership of the committee should include experienced trial investigators, statisticians and clinicians; all of whom must be independent to the research team. The results should be reviewed at regular intervals as sufficient data accumulates. If there is a Trial Steering Committee for the study, the IDMC would normally make their recommendations for action through them.

4.5 Data Backup Systems
Whatever the format of the database software used to manage the study data there should always be a back-up system in place to guard against the loss of data due to software or environmental issues.

5. Grafton Medical Records

For access to Patient Clinical Data related to research particularly where there may be an ethics waiver, permission should be sought through application from Grafton Medical Records.

  • For urgent requests, phone: (09) 307 4949 ext 22288

  • For non-urgent requests:

    • email: GROI@adhb.govt.nz

    • mail: Release of Information Team , Clinical Records Department, Building 21, Auckland City Hospital, Private Bag 92024, Auckland 1023

Business hours for the Release of Information Team are Monday to Friday, 9.00am to 3.00pm

6. References & Acknowledgements

  • International Conference on Harmonisation (ICH) of Good Clinical Practice

  • Health & Disability Research Ethics Committee

  • National Ethics Advisory Committee

  • Privacy Act 1993

  • Protection of Personal and Property Rights Act 1988

  • Grafton Medical Records (ADHB)

  • HISO 10064:2017 Health Information Governance Guidelines