This is an example workflow to show the use of trial data and package ctrdata.
Safety data analysis across registers
Registers of clinical trials include safety data that is collected during the trial.
The safety data collection is part of pharmacovigilance activities and employs a highly structured approach to the definition and detection of adverse events, to their assessment including causality and severity, and to the categorisation of the affected organs, functioning or other observations.
The overall approach is to obtain details of allocation or reporting groups, to merge this with details (counts) of all recorded types of serious adverse events SAEs per group, and finally to merge EUCTR and CTGOV data into a single data frame.
To start, necessary packages are loaded and a database collection is prepared:
Get trials of interest
Here, the research question is to tabulate safety data and events of special interest, for a class of medicinal products.
Safety data are part of the results-related data in registers such as EUCTR and CTGOV. Only EUCTR and CTGOV provide structured results-related information.
This use case looks into antibody drug conjugates, which can be associated with specific immunological inflammatory phenomena.
Trials for such medicinal products with result-related data in the therapeutic area oncology can be found with queries proposed by ctrdata, which are then loaded from these registers into the database collection.
queries <- ctrGenerateQueries(
intervention = "antibody drug conjugate",
onlyWithResults = TRUE,
condition = "cancer"
)
result <- lapply(
queries[c("EUCTR", "CTGOV2")],
ctrLoadQueryIntoDb,
con = dbc,
euctrresults = TRUE
)Several hundred records are found and stored.
Get data fields of interest
As a next step, from all retrieved trials, those of phase 2, 3 or 4 trials are selected, as they can be assumed to have a sufficiently long observation of the potential occurrences and larger participant numbers. Furthermore, the trials should have an internal control group and not be an extension or follow-up of a previous trial.
This selection is done by calculating trial concepts that are pre-defined in ctrdata and using them to filter relevant from all trials. In the same steps, fields are obtained from the database collection that hold the safety data of interest. The fields were identified beforehand (e.g., using https://regulatorysciencedata.eu/shiny/clinicaltrials/).
trialData <- dbGetFieldsIntoDf(
fields = c(
# This example only looks at SAEs
# but can be extended to all AEs
# by adding relevant fields and
# adapting subsequent code chunks
#
"adverseEvents.reportingGroups", # EUCTR
"adverseEvents.seriousAdverseEvents.seriousAdverseEvent", # EUCTR
#
"resultsSection.adverseEventsModule.seriousEvents", # CTGOV2
"resultsSection.adverseEventsModule.eventGroups" # CTGOV2
),
calculate = c(
"f.isUniqueTrial",
"f.trialPhase",
"f.trialTitle",
"f.controlType"
),
con = dbc
) %>%
filter(.isUniqueTrial)
# Overview cross-tabulation
with(trialData, table(
.trialPhase, .controlType)) .controlType
.trialPhase none no-treatment placebo active placebo+active other
phase 1 0 0 0 19 0 0
phase 1+2 1 0 0 27 0 0
phase 2 4 0 0 64 0 2
phase 2+3 0 0 0 1 0 0
phase 2+4 0 0 0 0 0 0
phase 3 0 0 4 24 0 1
phase 3+4 0 0 0 0 0 0
phase 1+2+3 0 0 0 0 0 0
phase 4 0 0 0 5 0 0
phase 1+2+3+4 0 0 0 0 0 0
# A tibble: 5 × 9
`_id` adverseEvents.seriousAdverseEvents.seriou…¹ resultsSection.adver…²
<chr> <list> <list>
1 NCT02001623 <lgl [1]> <df [87 × 5]>
2 NCT02980341 <lgl [1]> <df [52 × 5]>
3 NCT01476410 <lgl [1]> <df [40 × 5]>
4 NCT02988817 <lgl [1]> <df [103 × 5]>
5 NCT04014075 <lgl [1]> <df [40 × 5]>
# ℹ abbreviated names: ¹adverseEvents.seriousAdverseEvents.seriousAdverseEvent,
# ²resultsSection.adverseEventsModule.seriousEvents
# ℹ 6 more variables: resultsSection.adverseEventsModule.eventGroups <list>,
# adverseEvents.reportingGroups.reportingGroup <list>, .isUniqueTrial <lgl>,
# .trialPhase <ord>, .trialTitle <chr>, .controlType <fct>
The above shows that the safety data are represented in a nested data structure, that is, in a hierarchy of several levels (shown as lists or data frames within the trialData data frame).
To simplify handling nested data, ctrdata provides the function dfTrials2Long(), which generates for each scalar a data item (field) a single row in a long and narrow data frame.
A unique identifier is created for fields that belong together in a trial, e.g. reporting groups, endpoint definition and results.
trialDataLong <- trialData %>%
dfTrials2Long()
# Total 139712 rows, 45 unique names of variables
trialDataLong %>%
slice_sample(n = 10L)# A tibble: 10 × 4
`_id` identifier name value
<chr> <chr> <chr> <chr>
1 NCT00947856 18.1 resultsSection.adverseEventsModule.… 3
2 NCT01777152 91.1 resultsSection.adverseEventsModule.… 223
3 NCT03907488 63.1 resultsSection.adverseEventsModule.… EG000
4 2015-001377-40-IT 11.1 adverseEvents.seriousAdverseEvents.… 0
5 NCT02573324 20.6 resultsSection.adverseEventsModule.… EG005
6 NCT02341625 63.9 resultsSection.adverseEventsModule.… 30
7 NCT02341625 42 resultsSection.adverseEventsModule.… Peritonitis
8 NCT02631876 6.2 resultsSection.adverseEventsModule.… 109
9 NCT03729596 3.10 resultsSection.adverseEventsModule.… 1
10 NCT03474107 184.1 resultsSection.adverseEventsModule.… 0
Mangle EUCTR data
First, EUCTR data are used to obtain the reporting groups details, including id’s which are references in results data.
# Obtain reporting group data for adverse events,
# (including group identifier and title as well as
# number of subjects with an AE, number with an SAE
# number of exposed, for each reporting group)
euctrGroups <- dfName2Value(
df = trialDataLong,
valuename = paste0(
"adverseEvents.reportingGroups.reportingGroup.",
"(id|title|subject)")
) %>%
# Remove empty reporting groups
na.omit() %>%
# Expand into one row per trial and reporting group
pivot_wider(id_cols = c(`_id`, identifier)) %>%
select(-identifier) %>%
# Shorten variable names
rename(
title =
adverseEvents.reportingGroups.reportingGroup.title,
reportingGroupId =
adverseEvents.reportingGroups.reportingGroup.id)
# Returning values for 24 out of 125 trials
euctrGroups %>%
slice_sample(n = 5L)# A tibble: 5 × 6
`_id` reportingGroupId adverseEvents.reportingGroups.report…¹
<chr> <chr> <chr>
1 2009-010000-28-FR ReportingGroup-21307 40
2 2017-002985-28-BE ReportingGroup-114825 6
3 2020-002818-41-HU ReportingGroup-167229 46
4 2012-004902-82-BE ReportingGroup-125107 902
5 2015-001377-40-IT ReportingGroup-70662 11
# ℹ abbreviated name:
# ¹adverseEvents.reportingGroups.reportingGroup.subjectsAffectedByNonSeriousAdverseEvents
# ℹ 3 more variables:
# adverseEvents.reportingGroups.reportingGroup.subjectsAffectedBySeriousAdverseEvents <chr>,
# adverseEvents.reportingGroups.reportingGroup.subjectsExposed <chr>,
# title <chr>
Next, serious adverse events (SAEs) details are obtained. From the long data frame obtained with dfTrials2Long(), a range of fields is extracted and pivoted into a wide format that has one row for each SAE for each group for each trial:
# Obtain details for SAEs, e.g. organ system,
# dictionary info, occurrences and fatalities
euctrSaesDetails <- dfName2Value(
df = trialDataLong,
valuename = paste0(
"adverseEvents.seriousAdverseEvents.seriousAdverseEvent.",
"(.*)")
)
# Returning values for 24 out of 125 trials
# For each trial and each AR,
# - first create a set without the values (counts)
# - merge both into a table where one row corresponds
# to one reporting group for one AR for one trial
euctrSaesDetails <- left_join(
euctrSaesDetails %>%
pivot_wider(id_cols = c(`_id`, identifier)) %>%
filter(!grepl("[.]", identifier)) %>%
select(-contains("values")),
# - then create a set of only the values (counts)
euctrSaesDetails %>%
filter(grepl("[.]", identifier)) %>%
pivot_wider(id_cols = c(`_id`, identifier)) %>%
mutate(identifier = sub("([0-9]+)[.]?.*", "\\1", identifier)),
by = c("_id", "identifier")
) %>%
# Shorten reportingGroupId
rename(
reportingGroupId =
adverseEvents.seriousAdverseEvents.seriousAdverseEvent.values.value.reportingGroupId
) %>%
# Remove empty reportingGroupId
filter(!is.na(reportingGroupId))
euctrSaesDetails %>%
slice_sample(n = 5L)# A tibble: 5 × 17
`_id` identifier adverseEvents.seriousAd…¹ adverseEvents.seriou…²
<chr> <chr> <chr> <chr>
1 2012-000660-22-BE 26 ADV_EVT_ASSESS_TYPE.non_… false
2 2011-000033-36-BE 22 ADV_EVT_ASSESS_TYPE.syst… true
3 2018-001321-68-3RD 33 ADV_EVT_ASSESS_TYPE.non_… false
4 2018-001321-68-3RD 65 ADV_EVT_ASSESS_TYPE.non_… false
5 2012-004879-38-BE 4 ADV_EVT_ASSESS_TYPE.syst… false
# ℹ abbreviated names:
# ¹adverseEvents.seriousAdverseEvents.seriousAdverseEvent.assessmentMethod.value,
# ²adverseEvents.seriousAdverseEvents.seriousAdverseEvent.dictionaryOverridden
# ℹ 13 more variables:
# adverseEvents.seriousAdverseEvents.seriousAdverseEvent.organSystem.eutctId <chr>,
# adverseEvents.seriousAdverseEvents.seriousAdverseEvent.organSystem.version <chr>,
# adverseEvents.seriousAdverseEvents.seriousAdverseEvent.term <chr>, …
To complete the EUCTR steps, the group details and SAE details are merged:
saesEuctr <- left_join(
x = euctrSaesDetails,
y = euctrGroups,
by = c("_id", "reportingGroupId")
) %>%
# Shorten variable names by removing leading parts
rename_with(~ sub(
"adverseEvents.seriousAdverseEvents.seriousAdverseEvent.(|values.value.)", "", .x),
starts_with("adverse")
) %>%
# Select variables of interest
select(
`_id`,
title,
term,
subjectsExposed,
subjectsAffected
) %>%
# Convert to integer (because dfTrials2Long
# extracts data always as character strings)
mutate(
subjectsExposed = as.integer(subjectsExposed),
subjectsAffected = as.integer(subjectsAffected)
)
# Show example rows
saesEuctr %>%
slice_sample(n = 5L)# A tibble: 5 × 5
`_id` title term
<chr> <chr> <chr>
1 2021-003160-27-BE Part A Haemolytic anaemia
2 2012-000660-22-BE Trastuzumab Emtansine 3.6 mg Haematemesis
3 2018-002816-29-DE Part1:BelantamabMafodotin3.4mg/kg+… Platelet count decreased
4 2012-004879-38-BE T-DM1 + P Epistaxis
5 2011-000033-36-BE Part A Arm 4 100mg/m2/20 MBq Endometrial cancer
# ℹ 2 more variables: subjectsExposed <int>, subjectsAffected <int>
Mangle CTGOV data
Second, CTGOV data are mangled, starting with details on reporting groups:
ctgovGroups <- trialData %>%
# Select variables of interest
select(`_id`, resultsSection.adverseEventsModule.eventGroups) %>%
# Expand into columns
unnest(resultsSection.adverseEventsModule.eventGroups) %>%
# Rename to indicate id of group
rename(groupId = id) %>%
# Remove empty event groups
filter(!is.na(groupId))
ctgovGroups %>%
slice_sample(n = 5L)# A tibble: 5 × 10
`_id` groupId title description
<chr> <chr> <chr> <chr>
1 NCT04296890 EG000 Mirvetuximab Soravtansine "Participants received s…
2 NCT04622774 EG006 Dose Escalation - Schedule A: I… "Participants received I…
3 NCT02988817 EG014 Expansion Part Cohort 5: Enapot… "Participants with advan…
4 NCT01703949 EG001 Arm B (Brentuximab Vedotin, Niv… "Patients receive brentu…
5 NCT02001623 EG010 Dose Expansion Part: Endometria… "Participants with endom…
# ℹ 6 more variables: seriousNumAffected <int>, seriousNumAtRisk <int>,
# otherNumAffected <int>, otherNumAtRisk <int>, deathsNumAffected <int>,
# deathsNumAtRisk <int>
The SAE details are obtained from a different field, and the stats field provides counts by allocation group, type of SAE and organ system.
# A tibble: 5 × 17
`_id` adverseEvents.seriou…¹ term organSystem
<chr> <list> <chr> <chr>
1 NCT02573324 <lgl [1]> ATELECTASIS Respiratory, thoracic an…
2 NCT03677596 <lgl [1]> Fungal sepsis Infections and infestati…
3 NCT01925612 <lgl [1]> Visual impairment Eye disorders
4 NCT04014075 <lgl [1]> Nausea Gastrointestinal disorde…
5 NCT03729596 <lgl [1]> Flank pain Musculoskeletal and conn…
# ℹ abbreviated name: ¹adverseEvents.seriousAdverseEvents.seriousAdverseEvent
# ℹ 13 more variables: sourceVocabulary <chr>, assessmentType <chr>,
# groupId <chr>, numAffected <int>, numAtRisk <int>, numEvents <int>,
# notes <chr>, resultsSection.adverseEventsModule.eventGroups <list>,
# adverseEvents.reportingGroups.reportingGroup <list>, .isUniqueTrial <lgl>,
# .trialPhase <ord>, .trialTitle <chr>, .controlType <fct>
To complete the CTGOV steps, the group and SAE details are merged:
# A tibble: 10 × 5
`_id` title term numAtRisk
<chr> <chr> <chr> <int>
1 NCT03729596 SCCHN Expansion Sepsis pasteurella 13
2 NCT02785900 Placebo + HMA Facial pain 128
3 NCT01461538 BV 1.8 mg/kg Q3Week Abdominal pain 46
4 NCT02341625 BMS-986148 0.2MG/KG Q3W Impaired gastric emptying 2
5 NCT02572167 Part 3: Same-day Dose Jugular vein thrombosis 30
6 NCT02573324 Open-Label Depatuxizumab Maf… WHITE BLOOD CELL COUNT D… 5
7 NCT03525678 PACT Phase: GSK2857916 2.5 m… Gastric fibrosis 1
8 NCT02001623 Dose Escalation Part: 1.8 mg… Stress fracture 3
9 NCT03149549 Part C1 6 mg/kg Q2W Ear pain 6
10 NCT01925612 Part 3: RCHOP Bronchiolitis 12
# ℹ 1 more variable: numAffected <int>
Merge register data
Finally, data from EUCTR and CTGOV (after adjusting column names) can directly be merged, representing SAEs from both registers for the selected types of trials.
[1] 26830 5
# Show sample rows with higher
# number of affected subjects
saesMerged %>%
arrange(desc(subjectsAffected)) %>%
slice_head(n = 50L) %>%
slice_tail(n = 10L)# A tibble: 10 × 5
`_id` title term
<chr> <chr> <chr>
1 NCT01979536 Arm CZ (Crizotinib, Combination… Mucositis oral
2 NCT03474107 Chemotherapy Febrile neutropenia
3 2018-001321-68-3RD Mylotarg: Monotherapy Febrile neutropenia
4 NCT01712490 ABVD Pneumonia
5 NCT01780662 Dose 1.8mg/kg White blood cell decreas…
6 NCT02363283 Treatment (Glembatumumab Vedoti… Peripheral Neuropathy
7 NCT02785900 Placebo + HMA Pneumonia
8 2018-001321-68-3RD Mylotarg with Standard Chemothe… Disease progression
9 NCT01631552 Overall Safety Population Pneumonia
10 NCT01712490 A+AVD Sepsis
# ℹ 2 more variables: subjectsExposed <int>, subjectsAffected <int>
The dataset saesMerged has 26830 rows of SAEs for 116 trials. It can be exported into a spreadsheet format as follows for further work or reference.
writexl::write_xlsx(
x = saesMerged,
path = "SAEs.xlsx")