Title: A Probabilistic Approach to Extract Qualitative Knowledge for Early Prediction of Gestational Diabetes. Authors: Athresh Karanam, Alexander L. Hayes, Harsha Kokel, David M. Haas, Predrag Radivojac, and Sriraam Natarajan.
Risk of Gestational Diabetes increases a Body Mass Index increases.
Same content as previous slide, but now a rule appears showing that the risk of gestational diabetes increases as BMI increases.
Motivation for working with qualitative influence statements include: human interpretability, alignment with how people think about risk, concisely expresses a trend over all variable-and-value combinations, and refine or repair models when data is noisy or sparse.
A data matrix is on the left, showing one dependent variable and four independent variables. On the right are four monotonic influence statements.
The data matrix and monotonic influence statements now point to a Bayesian Network. This illustrates prior work, and cites two papers (Altendorf et al. 2005 and Yang and Natarajan 2013) where data and QI statements were used as inductive bias to guide learning.
- Eric E. Altendorf, Angelo C. Restificar, Thomas G. Dietterich (2005). "*[Learning from Sparse Data by Exploiting Monotonicity Constraints](https://arxiv.org/abs/1207.1364) *." p. 18-26, Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence - Shuo Yang, Sriraam Natarajan (2013). "*[Knowledge Intensive Learning: Combining Qualitative Constraints with Causal Independence for Parameter Learning in Probabilistic Models](https://starling.utdallas.edu/assets/pdfs/KIL_ECML13.pdf)*", European Conference on Machine Learning (ECMLPKDD)
The arrow from the rules to the Bayes net is now reversed: the Bayes net points to the rules, illustrating that we want to extract QI statements from a probabilistic model.
A red arrow now points from the data matrix to the rules, suggesting that going directly from data to rules without using a probabilistic model may be less effective.
The toy example from the previous slides are replaced with a small data matrix representing patient health data and gestational diabetes diagnoses. An arrow labeled PC Algorithm points to a Bayesian network. And an arrow labeled QuaKE points from the Bayes net to a list of rules.
A table compares the performance of the QuaKE algorithm compared to a baselines that only uses the data. QuaKE has a precision of 0.923 whereas the baseline only achieves 0.636. This answers the question: Does QuaKE help uncover rules that align with prior knowledge.
A table shows a monotonic influence statement (BMI and Gestational Diabetes) and a synergistic influence statement (Education, Smoking in the previous three months, and the affect on gestational diabetes). This illustrates a case where we recovered a rule that the expert expected, but also could provide an answer where prior knowledge was uncertain.
The conclusion slide shows the title and authors again, and has a link to the project webpage. https://starling.utdallas.edu/papers/QuaKE/