Overview
relational-datasets is a library to load benchmark datasets for relational learning.
- Source Code: https://github.com/srlearn/relational-datasets
- Documentation (Latest): https://srlearn.github.io/relational-datasets/
Basic Usage
The main use is loading training and test folds. For example, we could load fold 2 of webkb:
1
2
3
from relational_datasets import load
train, test = load("webkb", fold=2)
It also tries to help bridge the gap with vector-structured data by providing methods to convert standard datasets:
1
2
3
4
5
6
7
from relational_datasets.convert import from_numpy
import numpy as np
data, modes = from_numpy(
np.array([[0, 1, 1], [0, 1, 2], [1, 2, 2]]),
np.array([0, 0, 1]),
)
Or as a more realistic example, it can convert the “Breast Cancer Wisconsin” dataset from scikit-learn:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import KBinsDiscretizer
from relational_datasets.convert import from_numpy
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
disc = KBinsDiscretizer(n_bins=5, encode="ordinal")
X_train = disc.fit_transform(X_train).astype(int)
X_test = disc.transform(X_test).astype(int)
train, modes = from_numpy(X_train, y_train)
test, _ = from_numpy(X_test, y_test)
Installation
The latest stable version can be installed from PyPi using pip:
pip install relational-datasets