Overview
Relational Dataset Archive: An archive of standard, versioned benchmark relational datasets.
- Source Code: https://github.com/srlearn/datasets
- Documentation (Latest): https://srlearn.github.io/relational-datasets/downloads/
Basic Usage
This is a collection of datasets that have passed all checks set in the “Relational Data Linter.”
Many are split into standard cross-validation folds for benchmarking relational learning and inference algorithms.
One of these libraries can be used to manage these datasets locally:
- Python: relational-datasets
- Julia: RelationalDatasets.jl
Contributing a Dataset
I would love more datasets, and I would love any feedback for whether this is useful to your research!
- Email me at
hayesall@iu.edu
- or open an issue on GitHub here: https://github.com/srlearn/datasets/issues
I drew quite a bit of inspiration for this from Jonas Schouterden’s RelationalDatasets repository.
Data Versioning and Downloading
Specific Version: Versions of each data archive may be downloaded by sending
requests to a url with the following pattern, where {VERSION}
represents a tag
and {NAME}
is the name for a dataset:
https://github.com/srlearn/datasets/releases/download/{VERSION}/{NAME}_{VERSION}.zip
Examples
curl
Download version v0.0.4
of toy_cancer
:
curl -L https://github.com/srlearn/datasets/releases/download/v0.0.4/toy_cancer_v0.0.4.zip > toy_cancer_v0.0.4.zip
Download version v0.0.4
of webkb
:
curl -L https://github.com/srlearn/datasets/releases/download/v0.0.4/webkb_v0.0.4.zip > webkb_v0.0.4.zip
relational-datasets
Load version v0.0.4
of toy_cancer
:
from relational_datasets import load
train, test = load("toy_cancer", "v0.0.4")
RelationalDatasets.jl
Load version v0.0.4
of toy_cancer
:
using RelationalDatasets
train, test = load("toy_cancer", "v0.0.4")