Relational Dataset Archive


Relational Dataset Archive: An archive of standard, versioned benchmark relational datasets.

GitHub release (latest by date)

Basic Usage

This is a collection of datasets that have passed all checks set in the “Relational Data Linter.”

Many are split into standard cross-validation folds for benchmarking relational learning and inference algorithms.

One of these libraries can be used to manage these datasets locally:

Screenshot of the GitHub assets for datasets 0.0.5. It shows a table of dataset names, version numbers, and their size in bytes.

Contributing a Dataset

I would love more datasets, and I would love any feedback for whether this is useful to your research!

I drew quite a bit of inspiration for this from Jonas Schouterden’s RelationalDatasets repository.

Data Versioning and Downloading

Specific Version: Versions of each data archive may be downloaded by sending requests to a url with the following pattern, where {VERSION} represents a tag and {NAME} is the name for a dataset:{VERSION}/{NAME}_{VERSION}.zip



Download version v0.0.4 of toy_cancer:

curl -L >

Download version v0.0.4 of webkb:

curl -L >


Load version v0.0.4 of toy_cancer:

from relational_datasets import load

train, test = load("toy_cancer", "v0.0.4")


Load version v0.0.4 of toy_cancer:

using RelationalDatasets

train, test = load("toy_cancer", "v0.0.4")