Relational Data Linter

Overview

Relational Data Linter: A grammar and linter to check that relational or inductive logic programming datasets meet standards.

Source Code: https://github.com/srlearn/linter/
Documentation (Latest): https://srlearn.github.io/linter/

Download

Precompiled binaries are listed on the GitHub Releases page.

The latest version can be downloaded with these links:

Platform	Link
Linux/amd64	Download
macOS/amd64	Download
Windows/amd64	Download

Usage

Example 1: No Errors

When the dataset is well-formatted, nothing is returned.

Here are the contents of pos1.txt

smokes(person1).
friends(person1,person2).
friends(person2,person1).

Running the linter produces no output—no issues are found.

./linter -tokens -file=examples/pos/pos1.txt
./linter -file=examples/pos/pos1.txt
# (No output for either case)

Example 2: Bad Data

When there is something in the data that cannot be recognized, problems are directed to stderr.

Here’s a file called neg1.txt:

friends(person1,person2).
Bad Data.

This file cannot be properly tokenized or parsed.

$ ./linter -tokens -file=examples/neg/neg1.txt
line 2:0 token recognition error at: 'B'
line 2:3 token recognition error at: ' '
line 2:4 token recognition error at: 'D'

$ ./linter -file=examples/neg/neg1.txt
line 2:0 token recognition error at: 'B'
line 2:3 token recognition error at: ' '
line 2:4 token recognition error at: 'D'
line 2:5 missing '(' at 'ata'
line 2:8 mismatched input '.' expecting {')', ','}

Example 3: Regression Examples

The parser can also look for regressionExample values, used in regression data sets.

The parser will not check whether an entire dataset is correct (regressionExample in labeled as positive, empty negative examples, and facts). But this could be accomplished fairly easily elsewhere.

regressionExample(medv(id100),33.2).
regressionExample(medv(id101),27.5).
regressionExample(medv(id10),18.9).
regressionExample(medv(id102),26.5).

Build from Source

Building requires a Go compiler.

cd cmd
go build

A copy of the generated ANTLR parser files are committed to the repository, and rebuilding them requires an ANTLR Parser Generator.

make clean
make linter

Limitations

This grammar is extremely conservative currently: the only tokens allowed are lowercase characters, integers, and underscores.

a(x_1,y_1).
b(x_1).

Contributions

Alexander L. Hayes - Indiana University, Bloomington

Some ideas were taken from the FOPC_MLN_ILP_Parser developed by Jude Shavlik and Trevor Walker (and possibly contributed to by many others who went unnamed in the source code). There are a few versions of their Tokenizers (StreamTokenizerJWS and StreamTokenizerTAW) and Parser currently used in other projects.