SRLBoost

Overview

SRLBoost: ⚡ Fast implementations of boosted relational dependency networks and Markov logic networks.

Source Code: https://github.com/srlearn/SRLBoost

Getting Started

SRLBoost can be built as a maven package. For example, on Windows:

git clone https://github.com/srlearn/SRLBoost.git
cd .\SRLBoost\
mvn package

Then learning should feel familiar if you’re familiar with other distributions like BoostSRL. After switching out X.Y.Z with the latest version:

java -jar .\target\srlboost-X.Y.Z-jar-with-dependencies.jar -l -train .\data\Toy-Cancer\train\ -target cancer

Full notes are available with the repository: https://github.com/srlearn/SRLBoost#getting-started

Motivation

I was one of the main people behind releasing “BoostSRL,” but wanted to go in a different direction with the software.
At one point there was discussion around developing a “BoostSRL-Lite” implementation. But this didn’t really go anywhere (and as you’ll see in the benchmark, it wasn’t especially lite).

SRLBoost aims to be a small and fast core—not to implement every possible feature.

Benchmarks

Size Comparison

“BoostSRL-Lite” cut around 6,000 lines of Java out of “BoostSRL.”

“SRLBoost” cut close to 50,000 lines of code.

Time Comparison

“BoostSRL” and “BoostSRL-Lite” are nearly indistinguishable in terms of runtime
“SRLBoost” is at least twice as fast

The following diagram compares the learning time (in seconds) for the three implementations on three benchmark datasets. On larger datasets like imdb, SRLBoost took an average of 5 seconds while the other two implementations took close to 20 seconds:

On large datasets with lots of relations (like cora), this difference is even more pronounced. SRLBoost is so much faster that it’s difficult to visualize the difference on a linear scale:

Are there any downsides?

Metrics are indistinguishable on the first three datasets. But on the cora benchmark, being 15x faster also led to differences in some key metrics. Specifically, AUC-ROC decreased by 0.04 and AUC-PR decreased by 0.01.

BoostSRL-v1.1.1 appeared to have significantly worse F1 compared to the other two implementations, but it’s unclear why.¹

Implementation	cora mean AUC ROC	cora mean AUC PR	cora mean CLL	cora mean F1
SRLBoost	0.61	0.93	-0.27	0.96
BoostSRL-Lite	0.65	0.94	-0.29	0.96
BoostSRLv1.1.1	0.65	0.94	-0.29	0.78

Conclusion

I’m implementing this as the core for srlearn, so most of the user interfaces for using SRLBoost are documented there.

My best guess is that this is a bug introduced when thresholding changed between v1.0 and v1.1 (See commit 5a91ba0). If this is this case, there’s might exist a threshold setting that makes these two the same. ↩