relational-datasets

Overview

relational-datasets is a library to load benchmark datasets for relational learning.

Total alerts codecov Python Package Builds Documentation Deploy Total Downloads Monthly Downloads


Basic Usage

The main use is loading training and test folds. For example, we could load fold 2 of webkb:

1
2
3
from relational_datasets import load

train, test = load("webkb", fold=2)

It also tries to help bridge the gap with vector-structured data by providing methods to convert standard datasets:

1
2
3
4
5
6
7
from relational_datasets.convert import from_numpy
import numpy as np

data, modes = from_numpy(
  np.array([[0, 1, 1], [0, 1, 2], [1, 2, 2]]),
  np.array([0, 0, 1]),
)

Or as a more realistic example, it can convert the “Breast Cancer Wisconsin” dataset from scikit-learn:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import KBinsDiscretizer
from relational_datasets.convert import from_numpy

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

disc = KBinsDiscretizer(n_bins=5, encode="ordinal")
X_train = disc.fit_transform(X_train).astype(int)
X_test = disc.transform(X_test).astype(int)

train, modes = from_numpy(X_train, y_train)
test, _ = from_numpy(X_test, y_test)

Installation

The latest stable version can be installed from PyPi using pip:

pip install relational-datasets