Clinical trials analyses: ctrdata, a tool for leveraging register data

A readily-usable tool with a permissive licence for CTIS, EUCTR, CTGOV and ISRCTN with examples addressing research questions
tools
code
trials
Author

Ralf Herold

Published

2025-04-27

Modified

2026-01-26

Overview

Our package ctrdata is continually developed and offered for the R system since 2015. It facilitates investigating and understanding trends in design and conduct of trials, their availability for participants and using their protocols and results for research and meta-analyses. The package can be used with information and documents available in the

Its features include,

  • Trials and studies can easily be searched and found across the four registers in one go
  • Structured protocol- and results-related trial information is readily downloaded, including documents available from registers
  • Information is stored as JSON in a document database (DuckDB, PostgreSQL, RSQLite or MongoDB), for fast offline access
  • Trial concepts are pre-defined for readily analysing them across different registers; unique (de-duplicated) records can be identified; deeply-nested fields such as in results can easily be accessed

An introduction together with worked examples and technical explanations has recently been published:

  • Herold R. Aggregating and analysing clinical trials data from multiple public registers using R package ctrdata. Research Synthesis Methods. Published online 2025:1-33 doi:10.1017/rsm.2025.10061 (code)

Getting started

Package ctrdata is on CRAN: https://cran.r-project.org/package=ctrdata. Within R, package ctrdata can be installed with install.packages("ctrdata"). See code below for how to install the latest working development version. The documentation shows a full simple workflow and has articles with detailed trial analysis, see https://rfhb.github.io/ctrdata/. Issues, feature requests and bugs reports are most welcome at https://github.com/rfhb/ctrdata/issues or through links in the top right corner of this page.

Milestones

  • January 2026, more robustly throttle and retry downloading of data and documents
  • July 2025: save EUCTR documents into folders by trial, and optionally load just one protocol of multi-country trials
  • Version 1.23.0 got a technology upgrade with speed-ups including for EUCTR with its much structured results data
  • March 2025: new functions for easy analyses of trial concepts across registers and finding trials from simple input
  • In November 2024, accelerated database and trial data functions by contributions to RSQLite 2.3.8 and nodbi 0.11.0
  • Since 2024-06-30, support for CTIS as relaunched on 2024-06-17 and translation of queries to classic CTGOV (retired 2024-06-25) to the current CTGOV2
  • Version 1.18.0 of 2024-05-13 retrieves historic versions of studies as structured data from CTGOV2 (example)
  • By November 2023, removed all dependencies on command line tools of the operating system
  • The European Union’s new Clinical Trials Information System (CTIS) is supported since March 2023
  • Since 2019, ctrdata supports several databases (through package nodbi, now maintained by the same author)
  • Results data are supported from the EU Clinical Trials Register since 2017, from ClinicalTrials.Gov since 2015
  • On 15 September 2015, package ctrdata was first published on CRAN.

Disclaimer

When using package ctrdata, the registers’ terms and conditions need to be respected and are shown with ctrOpenSearchPagesInBrowser(copyright = TRUE). Please cite the package (citation("ctrdata"), style = "text")) in any publication as:

Herold R (2025). “Aggregating and analysing clinical trials data from multiple public registers using R package ctrdata.” Research Synthesis Methods, 1–33. https://doi.org/10.1017/rsm.2025.10061 or Herold R (2025). ctrdata: Retrieve and Analyze Clinical Trials Data from Public Registers. doi:10.32614/CRAN.package.ctrdata, R package version 1.25.1, https://CRAN.R-project.org/package=ctrdata

References

Package ctrdata has been used for unpublished work and for:

Code example

This covers how to obtain information of trials of interest from all supported registers, for plotting their start and completion over time. For more sophisticated examples, see the articles above.

# Install our package ctrdata into the library
install.packages("ctrdata")

# Connect to (or newly create) an SQLite database.
# See vignette("nodbi-overview") for how to store in
# the file system and for connecting other databases
db <- nodbi::src_sqlite(collection = "some_collection_name")

# Define queries
qs <- ctrdata::ctrGenerateQueries(
  condition = "neuroblastoma",
  phase = "phase 2")

# Load trials from all 4 registers in one go
lt <- lapply(
  qs[c("CTIS", "EUCTR", "CTGOV2", "ISRCTN")], 
  ctrdata::ctrLoadQueryIntoDb, 
  con = db)

# Calculate concepts across registers
ct <- ctrdata::dbGetFieldsIntoDf(
  fields = "ctrname",
  calculate = c(
    "f.startDate",
    "f.statusRecruitment",
    "f.isUniqueTrial"),
  con = db)

# Select unique trials
ct <- ct[ct$.isUniqueTrial, ]

# Inspect data frame
names(ct)
# [1] "_id"
# [2] "ctrname"
# [3] ".startDate"
# [4] ".statusRecruitment"
# [5] ".isUniqueTrial"

table(ct$ctrname)
# CTGOV2  CTIS  EUCTR
#    243     2     53

# Example plot
library(ggplot2)
ggplot(ct) +
  stat_ecdf(
  aes(
  x = .startDate,
  colour = .statusRecruitment)) +
  labs(
  title = "Conduct of therapeutic-exploratory neuroblastoma trials",
  subtitle = paste0(
  "Data from ", paste0(names(tt), collapse = ", ")),
  x = "Date of start (proposed or realised)",
  y = "Cumulative proportion of trials",
  colour = "Current status",
  caption = Sys.Date()
  )
ggsave("ctrdata-codeexample.svg")

Graph from ctrdata code example above

Data models

Package ctrdata uses the data models that are implicit in data retrieved from the different registers.

For EUCTR, CTGOV and ISRCTN, the trial information retrieved by ctrdata corresponds to a documented data model of the respective register, see table row “Definitions of fields” on help(ctrdata-registers). However, for CTIS, the data retrieved is the structured information with which the CTIS webapp is populated, and this information is merged by ctrdata into a data model that is reasonable and useful but does not (cannot be expected to) correspond to the (unknown) data model of this register. The approach is further explained here.

A possible future development is to provide a mapping to a canonical data model, which however does not exist at the moment and will require an international approach and collaboration. At this time, ctrdata already provides functions for easy analysis of a trial concept(s) across registers and their data models.

To explore the data structure, field names and field values of a single trial from one of the registers, use function ctrShowOneTrial() in package ctrdata or see this post, which provides an interactive widget such as below. To find all fields or specific fields of interest in all trials in the database collection, use dbFindFields().

ctrdata function ctrShowOneTrial