Clinical trials analyses: ctrdata, a tool for leveraging register data

A readily-usable tool with a permissive licence for CTIS, EUCTR, CTGOV2 and ISRCTN with examples addressing research questions
tools
code
trials
Author

Ralf Herold

Published

2025-04-27

Modified

2025-10-18

Overview

Our package ctrdata is continually developed and offered for the R system since 2015. It facilitates investigating and understanding trends in design and conduct of trials, their availability for participants and using their protocols and results for research and meta-analyses. The package can be used with information and documents available in the

Its features include,

  • Protocol- and results-related trial information is easily downloaded, including any trial documents available in registers.
  • Information is stored as JSON in a document-centric database (DuckDB, PostgreSQL, RSQLite or MongoDB), for fast offline access.
  • Find active substance synonyms, identify unique (de-duplicated) records across registers, merge and recode fields, easily access deeply-nested fields

Getting started

Package ctrdata is on CRAN: https://cran.r-project.org/package=ctrdata. Within R, package ctrdata can be installed with: install.packages("ctrdata"). See code below for how to install the latest working development version. The documentation shows a full simple workflow and has Articles with detailed trial analysis: https://rfhb.github.io/ctrdata/. Issues, feature requests and bugs reports are most welcome at https://github.com/rfhb/ctrdata/issues and general comments at the bottom of this page.

Milestones

  • July 2025, EUCTR: save documents into trial folder, optionally load just one protocol of multi-country trials.
  • Version 1.23.0 got a technology upgrade, a speed-up for EUCTR, the register with much structured results data.
  • From March 2025, new functions for easy analysis of a trial concept(s) across registers and retrieval of trials from simple user input.
  • In November 2024, accelerated database and trial data functions by contributions to RSQLite 2.3.8 and nodbi 0.11.0
  • Since 2024-06-30, support CTIS2 as relaunched on 2024-06-17, make queries to CTGOV (retired on 2024-06-25) work with CTGOV2.
  • Version 1.18.0 can retrieve historic versions of studies as structured data from CTGOV2 (example), published on 2024-05-13.
  • By November 2023, freed from all dependencies on command line tools of the operating system.
  • The European Union’s new Clinical Trials Information System (CTIS) is supported since March 2023.
  • Since 2019, ctrdata supports several databases (through package nodbi, now maintained by the same author).
  • Results data are supported from the EU Clinical Trials Register since 2017, from ClinicalTrials.Gov since 2015.
  • On 15 September 2015, package ctrdata was first published on CRAN.

Disclaimer

When using package ctrdata, the registers’ terms and conditions need to be respected and are shown with ctrOpenSearchPagesInBrowser(copyright = TRUE). Please cite package ctrdata in any publication as: “Ralf Herold (2025). ctrdata: Retrieve and Analyze Clinical Trials in Public Registers. R package version 1.23.0, https://cran.r-project.org/package=ctrdata

References

Package ctrdata has been used for unpublished work and for:

  • Jong et al. (2025) Experiences with Low-Intervention Clinical Trials—the New Category under the European Union Clinical Trials Regulation. Clinical Trials https://doi.org/10.1177/17407745241309293
  • Lopez-Rey et al. (2025) Use of Bayesian Approaches in Oncology Clinical Trials: A Cross-Sectional Analysis’. Frontiers in Pharmacology https://doi.org/10.3389/fphar.2025.1548997
  • Russek et al. (2025) Supplementing Single-Arm Trials with External Control Arms—Evaluation of German Real-World Data. Clinical Pharmacology & Therapeutics https://doi.org/10.1002/cpt.3684
  • Alzheimer’s disease Horizon Scanning Report (2024)
  • Kundu et al. (2024) Analysis of Factors Influencing Enrollment Success in Hematology Malignancy Cancer Clinical Trials (2008-2023). Blood Meeting Abstracts https://doi.org/10.1182/blood-2024-207446
  • Sood et al. (2022) Managing the evidence infodemic: Automation approaches used for developing NICE COVID-19 living guidelines. https://doi.org/10.1101/2022.06.13.22276242
  • Lasch et al. (2022) The Impact of COVID‐19 on the Initiation of Clinical Trials in Europe and the United States. https://doi.org/10.1002/cpt.2534
  • Blogging on Innovation coming to paediatric research
  • Cancer Research UK (2017) The impact of collaboration: The value of UK medical research to EU science and health

Code example

This covers how to obtain information of trials of interest from all supported registers, for plotting their start and completion over time. For more sophisticated examples, see the Articles under Documentation above.

# Install our package ctrdata into the library
install.packages("ctrdata")

# Connect to (or newly create) an SQLite database.
# See vignette("nodbi-overview") for how to store in
# the file system and for connecting other databases
db <- nodbi::src_sqlite(collection = "some_collection_name")

# Define queries
qs <- ctrdata::ctrGenerateQueries(
  condition = "neuroblastoma",
  phase = "phase 2")

# Load trials from all 4 registers in one shot
lt <- lapply(
  qs[c("CTIS", "EUCTR", "CTGOV2", "ISRCTN")], 
  ctrdata::ctrLoadQueryIntoDb, 
  con = db)

# Calculate concepts across registers
ct <- ctrdata::dbGetFieldsIntoDf(
  fields = "ctrname",
  calculate = c(
  "f.startDate",
  "f.statusRecruitment",
  "f.isUniqueTrial"),
  con = db)

# Select unique trials
ct <- ct[ct$.isUniqueTrial, ]

# Inspect data frame
names(ct)
# [1] "_id"
# [2] "ctrname"
# [3] ".startDate"
# [4] ".statusRecruitment"
# [5] ".isUniqueTrial"

(tt <- table(ct$ctrname))
# CTGOV2  CTIS  EUCTR
#    243     2     53

# Example plot
library(ggplot2)
ggplot(ct) +
  stat_ecdf(
  aes(
  x = .startDate,
  colour = .statusRecruitment)) +
  labs(
  title = "Conduct of therapeutic-exploratory neuroblastoma trials",
  subtitle = paste0(
  "Data from ", paste0(names(tt), collapse = ", ")),
  x = "Date of start (proposed or realised)",
  y = "Cumulative proportion of trials",
  colour = "Current status",
  caption = Sys.Date()
  )
ggsave("ctrdata-codeexample.svg")

Graph from ctrdata code example above

Data models

Package ctrdata uses the data models that are implicit in data retrieved from the different registers. For EUCTR, CTGOV and ISRCTN, the trial information retrieved by ctrdata corresponds to a documented data model of the respective register, see table row “Definitions of fields” on help(ctrdata-registers). However, for CTIS, the data retrieved is the structured information with which the CTIS webapp is populated, and this information is merged by ctrdata into a data model that is reasonable and useful but does not (cannot) correspond to the (unknown) data model of this register. The approach is further explained here. A possible future development is to provide a mapping to a canonical data model, which however does not exist at the moment and will require an international approach and collaboration. At this time, ctrdata already provides functions for easy analysis of a trial concept(s) across registers and their data models.

To explore the structure, field names and values of a single trial from one of the registers, use function ctrShowOneTrial() in package ctrdata and this post, which provides and interactive widget such as below. To find any fields of interest in all trials in the database collection, use dbFindFields().

ctrdata function ctrShowOneTrial