rnlp

Overview

rnlp: Relational Natural Language Processing. Lifting raw text into a relational representation.

Source Code: https://github.com/srlearn/rnlp
Documentation (Latest): https://rnlp.readthedocs.io/en/latest/

Motivation

If you’re reasoning about documents, there are multiple levels you can reason at:

Do documents contain a specific phrase?
Do these sentences contain a phrase while others don’t?

rnlp builds a representation of sentences, words, and properties of those words. This hierarchy of concepts can then be passed to a relational learner to determine how to separate the concept classes.

Consider the U.S. Declaration of Independence. There is a section in the middle sometimes called the “list of grievances” where the writers spelled out problems with colonial rule:

In Congress, July 4, 1776. The unanimous Declaration of the thirteen united
States of America, When in the Course of human events, it becomes necessary
for one people to dissolve the political bands which have connected them
with another, and to assume among the powers of the earth, the separate and
equal station to which the Laws of Nature and of Nature's God entitle them,
a decent respect to the opinions of mankind requires that they should
declare the causes which impel them to the separation.
...
He has refused his Assent to Laws, the most wholesome and necessary for the
public good.

Is there a rule that determines whether a sentence is one of the grievances? When we pose this as a learning problem, we get the following model:

A tree model learned with BoostSRL, showing that a sentence is likely to be part of the list of grievances if it begins with the words He or For.

Some Historical Notes

rnlp evolved out of a project with Kaushik Roy where we were doing information extraction from financial documents.

Some of these notes are still on the starling-lab BoostSRL wiki: https://starling.utdallas.edu/software/boostsrl/wiki/natural-language-processing/