Relational Dataset Archive

Overview

Relational Dataset Archive: An archive of standard, versioned benchmark relational datasets.

GitHub release (latest by date)

Source Code: https://github.com/srlearn/datasets
Documentation (Latest): https://srlearn.github.io/relational-datasets/downloads/

Basic Usage

This is a collection of datasets that have passed all checks set in the “Relational Data Linter.”

Many are split into standard cross-validation folds for benchmarking relational learning and inference algorithms.

One of these libraries can be used to manage these datasets locally:

Python: relational-datasets
Julia: RelationalDatasets.jl

Contributing a Dataset

I would love more datasets, and I would love any feedback for whether this is useful to your research!

Email me at hayesall@iu.edu
or open an issue on GitHub here: https://github.com/srlearn/datasets/issues

I drew quite a bit of inspiration for this from Jonas Schouterden’s RelationalDatasets repository.

Data Versioning and Downloading

Specific Version: Versions of each data archive may be downloaded by sending requests to a url with the following pattern, where {VERSION} represents a tag and {NAME} is the name for a dataset:

https://github.com/srlearn/datasets/releases/download/{VERSION}/{NAME}_{VERSION}.zip

Examples

curl

Download version v0.0.4 of toy_cancer:

curl -L https://github.com/srlearn/datasets/releases/download/v0.0.4/toy_cancer_v0.0.4.zip > toy_cancer_v0.0.4.zip

Download version v0.0.4 of webkb:

curl -L https://github.com/srlearn/datasets/releases/download/v0.0.4/webkb_v0.0.4.zip > webkb_v0.0.4.zip

relational-datasets

Load version v0.0.4 of toy_cancer:

from relational_datasets import load

train, test = load("toy_cancer", "v0.0.4")

RelationalDatasets.jl

Load version v0.0.4 of toy_cancer:

using RelationalDatasets

train, test = load("toy_cancer", "v0.0.4")