Overview
Relational Data Linter: A grammar and linter to check that relational or inductive logic programming datasets meet standards.
- Source Code: https://github.com/srlearn/linter/
- Documentation (Latest): https://srlearn.github.io/linter/
Download
Precompiled binaries are listed on the GitHub Releases page.
The latest version can be downloaded with these links:
Platform | Link |
---|---|
Linux/amd64 | Download |
macOS/amd64 | Download |
Windows/amd64 | Download |
Usage
Example 1: No Errors
When the dataset is well-formatted, nothing is returned.
Here are the contents of pos1.txt
smokes(person1).
friends(person1,person2).
friends(person2,person1).
Running the linter produces no output—no issues are found.
./linter -tokens -file=examples/pos/pos1.txt
./linter -file=examples/pos/pos1.txt
# (No output for either case)
Example 2: Bad Data
When there is something in the data that cannot be recognized, problems are directed to stderr.
Here’s a file called neg1.txt
:
friends(person1,person2).
Bad Data.
This file cannot be properly tokenized or parsed.
$ ./linter -tokens -file=examples/neg/neg1.txt
line 2:0 token recognition error at: 'B'
line 2:3 token recognition error at: ' '
line 2:4 token recognition error at: 'D'
$ ./linter -file=examples/neg/neg1.txt
line 2:0 token recognition error at: 'B'
line 2:3 token recognition error at: ' '
line 2:4 token recognition error at: 'D'
line 2:5 missing '(' at 'ata'
line 2:8 mismatched input '.' expecting {')', ','}
Example 3: Regression Examples
The parser can also look for regressionExample
values, used in regression
data sets.
The parser will not check whether an entire dataset is correct
(regressionExample
in labeled as positive, empty negative examples, and
facts). But this could be accomplished fairly easily elsewhere.
regressionExample(medv(id100),33.2).
regressionExample(medv(id101),27.5).
regressionExample(medv(id10),18.9).
regressionExample(medv(id102),26.5).
Build from Source
Building requires a Go compiler.
cd cmd
go build
A copy of the generated ANTLR parser files are committed to the repository, and rebuilding them requires an ANTLR Parser Generator.
make clean
make linter
Limitations
This grammar is extremely conservative currently: the only tokens allowed are lowercase characters, integers, and underscores.
a(x_1,y_1).
b(x_1).
Contributions
- Alexander L. Hayes - Indiana University, Bloomington
Some ideas were taken from the FOPC_MLN_ILP_Parser
developed by
Jude Shavlik and Trevor Walker (and possibly contributed to by many others
who went unnamed in the source code). There are a few versions of their
Tokenizers
(StreamTokenizerJWS
and
StreamTokenizerTAW)
and Parser
currently used in other projects.