Overview
SRLBoost: ⚡ Fast implementations of boosted relational dependency networks and Markov logic networks.
- Source Code: https://github.com/srlearn/SRLBoost
Getting Started
SRLBoost can be built as a maven package. For example, on Windows:
git clone https://github.com/srlearn/SRLBoost.git
cd .\SRLBoost\
mvn package
Then learning should feel familiar if you’re familiar with other
distributions like BoostSRL. After switching out X.Y.Z
with the latest version:
java -jar .\target\srlboost-X.Y.Z-jar-with-dependencies.jar -l -train .\data\Toy-Cancer\train\ -target cancer
Full notes are available with the repository: https://github.com/srlearn/SRLBoost#getting-started
Motivation
- I was one of the main people behind releasing “BoostSRL,” but wanted to go in a different direction with the software.
- At one point there was discussion around developing a “BoostSRL-Lite” implementation. But this didn’t really go anywhere (and as you’ll see in the benchmark, it wasn’t especially lite).
SRLBoost aims to be a small and fast core—not to implement every possible feature.
Benchmarks
Size Comparison
“BoostSRL-Lite” cut around 6,000 lines of Java out of “BoostSRL.”
“SRLBoost” cut close to 50,000 lines of code.
Time Comparison
- “BoostSRL” and “BoostSRL-Lite” are nearly indistinguishable in terms of runtime
- “SRLBoost” is at least twice as fast
The following diagram compares the learning time (in seconds)
for the three implementations on three benchmark datasets.
On larger datasets like imdb
, SRLBoost took an average of 5 seconds
while the other two implementations took close to 20 seconds:
On large datasets with lots of relations (like cora
), this difference is even
more pronounced. SRLBoost is so much faster that it’s difficult to
visualize the difference on a linear scale:
Are there any downsides?
Metrics are indistinguishable on the first three datasets.
But on the cora
benchmark, being 15x faster also led to
differences in some key metrics. Specifically,
AUC-ROC decreased by 0.04 and AUC-PR decreased by 0.01.
BoostSRL-v1.1.1 appeared to have significantly worse F1 compared to the other two implementations, but it’s unclear why.1
Implementation | cora mean AUC ROC | cora mean AUC PR | cora mean CLL | cora mean F1 |
---|---|---|---|---|
SRLBoost | 0.61 | 0.93 | -0.27 | 0.96 |
BoostSRL-Lite | 0.65 | 0.94 | -0.29 | 0.96 |
BoostSRLv1.1.1 | 0.65 | 0.94 | -0.29 | 0.78 |
Conclusion
I’m implementing this as the core for srlearn
, so most of the user interfaces
for using SRLBoost are documented there.