Curriculum Vitae
Alexander L. Hayes
hayesall@iu.edu
Indiana University Bloomington
Luddy School of Informatics, Computing, and Engineering
ProHealth Lab: Informatics East 255
918 E. 10th Street
Bloomington, IN 47401
Technical Skills
Languages: Python, Shell Scripting, Java, C/C++, JavaScript, Racket, Julia
Libraries: NumPy, SciPy, scikit-learn, Pandas, NetworkX, pytest
Tools: Git, GitHub, GitHub Actions, JIRA, ReadTheDocs, Travis-CI, CircleCI, AppVeyor, CodeCov, PyPi
Development Platforms: Linux/UNIX, Jekyll, Android, Arduino, Google Cloud Platform
Documentation Tools: LaTeX, Sphinx, Javadoc, Doxygen, Markdown, ReStructured Text
Workflows: Continuous Integration (CI), Gitflow
Education
Doctor of Philosophy (Ph.D.) Health Informatics (in progress)
2019–present
Luddy School of Informatics, Computing, and Engineering
Indiana University, Bloomington, IN
Master of Science (M.S.) Health Informatics
2023
Luddy School of Informatics, Computing, and Engineering
Indiana University, Bloomington, IN
Bachelor of Science (B.S.) Computer Science
Security Informatics Minor, Class of 2017, GPA: 3.5 Cumulative
School of Informatics, Computing, and Engineering
Indiana University, Bloomington, IN
Experience
Indiana University, Bloomington
Luddy School of Informatics, Computer Science, and Engineering
-
Research Assistant — Computer Vision Lab (January 2022 — Present)
- Investigated explainability in time series problems
- Implemented a Bayesian network (BN) explainability technique as a Python package targeting the pomegranate library
- Extended the technique toward handling time series problems where the sequence is represented by a dynamic Bayesian network (DBN)
-
Mentor - Research Experience for Undergraduates — ProHealth Lab (May 2022 — July 2022)
- Mentored for a project analyzing smartwatch data alongside clinical data
- Extended prior work for infrastructure development in the Hoosier Moms Cohort
- Wrote course material for exploratory data analysis, scientific programming, and git
-
Research Assistant – ProHealth Lab – Precision Health Initiative (January 2019 — December 2021)
- Secondary analysis on incidence of gestational diabetes
- Developed tools for data cleaning and pre-processing for creating reproducible data partitions: numom2b.org.
- Solved the binary class imbalance problem (imbalance of 1 to 32).
- Reduced features (original feature space ~7000 variables)
- Explained predictions for a clinical decision support setting.
- Infrastructure development for Hoosier Moms Cohort
- Implemented caching to work with snapshots of the database
- Decreased analysis time from >72 hours to <5 minutes
- Prototyped a dashboard for exploratory visualization (hmc-dashboard)
- Secondary analysis on incidence of gestational diabetes
CareBand Inc.
222 West Merchandise Mart Plaza #1230, Chicago, IL
-
Developer and Machine Learning Research Consultant (February 2020 — August 2020)
- Implemented solutions for indoor location tracking.
- Developed models to analyze trends in user behavior.
The University of Texas at Dallas
Department of Computer Science, Richardson, TX
-
Teaching Assistant (August 2018 — December 2018)
- Fall 2018 – Automata Theory – CS 4384.001
Led two lectures on finite automata minimization. Graded assignments and exams, prepared and verified automata examples prior to lectures, and held four hours of office hours per week to answer questions.
- Fall 2018 – Automata Theory – CS 4384.001
-
Research Assistant – StARLinG Lab (May 2018 — August 2018)
- Extended the lab’s open source tool for converting raw text into relational facts. Rewrote the software so it could be used as a command-line tool or as an imported Python package. Released the software as rnlp.
- Documented, unit tested, and ensured correctness of a Python port of Relational Functional Gradient Boosting (rfgb).
-
Teaching Assistant (August 2017 — May 2018)
- Spring 2018 – C Programming in a UNIX Environment – CS 3377.501
Provided feedback on C++ programming assignments and bash scripts in terms of documentation, style, and functionality of code. - Fall 2017 – Automata Theory – CS 4384.001
Graded assignments and exams, prepared and verified automata examples prior to lectures, and provided additional support to students outside of class.
- Spring 2018 – C Programming in a UNIX Environment – CS 3377.501
Indiana University, Bloomington
Department of Informatics and Computer Science
- Undergraduate Researcher, STARAI Lab (July 2017 — July 2018)
- Explored methods combining Natural Language Processing and Statistical Relational Learning for information extraction on SEC Form S-1 Documents.
- Facilitated the public release of the lab’s source code onto GitHub, distributed as BoostSRL. Maintained the BoostSRL wiki and tutorials.
- Undergraduate Researcher, ProHealth Research Experience for Undergraduates (May 2016 — August 2016)
- Built on research which previously inferred adverse side-effects of drugs based on text data mined from the web. Our work focused on predicting drug-drug interactions from data mined from OpenFDA, PubMed, and a variety of Blogs.
- Camp Counselor, SICE Summer Camp (2014, 2015, 2016, 2017)
- Led sessions on intermediate Python programming, Scratch, Raspberry Pi, information security, and data analytics.
- Introduced high school students to Indiana University’s campus, navigated them between sessions where they learned about computer science and informatics.
Software
srlearn
: A Python Library for Gradient-Boosted Statistical Relational ModelsSRLBoost
: A Java library for learning and inference with SRL models: up to 15x faster than existing librariesrelational-datasets
: Python/Julia libraries for working with benchmark datasets for statistical relational learningrnlp
: Converting text to relational facts
Publications and Poster Presentations
- Alexander L. Hayes, Lucas Newman-Johnson, David Crandall. 2022. Dynamic Bayesian Rule Learning for Interpretable Time Series Prediction. Poster Presentation. April 29, 2022. Innovation Hall, IUPUI, Indianapolis, IN, USA. — Poster: https://hayesall.com/posters/dynamic_rule_extraction_poster_v1.pdf
- Athresh Karanam, Alexander L. Hayes, Harsha Kokel, David M. Haas, Predrag Radivojac, and Sriraam Natarajan. 2021. A Probabilistic Approach to Extract Qualitative Knowledge for Early Prediction of Gestational Diabetes. Nineteenth International Conference on Artificial Intelligence in Medicine. June 15-18, 2021. Online (Hosted in Porto, Portugal). https://doi.org/10.1007/978-3-030-77211-6_59 — .pdf: https://hayesall.com/publications/quake-gdm.pdf
- Alexander L. Hayes. 2020. srlearn: A Python Library for Gradient-Boosted Statistical Relational Models. Ninth International Workshop on Statistical Relational AI. February 7, 2020. New York City, NY, USA. — Code: https://github.com/hayesall/srlearn-StarAI-2020-workshop/ — .pdf: https://hayesall.com/publications/srlearn-python-library.pdf — Poster: https://hayesall.com/posters/srlearn-2020-workshop-poster.png
- Alexander L. Hayes. 2019. srlearn: A Python Library for Gradient-Boosted Statistical Relational Models. HCI Fest Poster Presentation. December 5, 2019. Bloomington, IN, USA. — Code: https://github.com/hayesall/srlearn-StarAI-2020-workshop/ — .pdf: https://hayesall.com/publications/srlearn-python-library.pdf
- Alexander L. Hayes, Mayukh Das, Phillip Odom, and Sriraam Natarajan. 2017. User Friendly Automatic Construction of Background Knowledge: Mode Construction from ER Diagrams. Knowledge Capture Conference (K-CAP '17). December 4-6, 2017. Austin, TX, USA. https://doi.org/10.1145/3148011.3148027 — Code: https://github.com/hayesall/Walk-ER/ — .pdf: https://hayesall.com/publications/construction-background-knowledge.pdf
- Alexander Hayes, Savannah Smith, Ciabhan Connelly, Devendra Dhami, and Sriraam Natarajan. 2016. Predicting Drug-Drug Interactions: Combining Machine Learning and Natural Language Processing. ProHealth REU, School of Informatics and Computing, Indiana University. July 2016. Bloomington, IN, USA. — Code: https://github.com/hayesall/DrugInteractionDiscovery — Poster: https://hayesall.com/posters/ddi-2016-prohealth-reu.png
- Aaron Porter and Alexander Hayes. 2016. Stress-Induced Video Capture: Forensic Capture for People with Visual Impairments. School of Informatics and Computer Science Spring Research Symposium. April 2016. Bloomington, IN, USA.
Conference Attendance
- International Conference on Artificial Intelligence in Medicine (AIME) 2021: Online, Hosted in Porto, Portugal. Spotlight Paper Presentation. (2021-06-15, 2021-06-18) Conference URL
- Association for the Advancement of Artificial Intelligence (AAAI) 2020: Hilton New York Midtown, New York, New York, USA. Workshop Poster Presentation. (2020-02-06, 2020-02-08)
- Ninth International Workshop on Statistical Relational AI (StarAI 2020) Workshop URL
- International Conference of Machine Learning (ICML) 2019: Long Beach Convention Center, Long Beach, California, USA. Attendee. (2019-06-14, 2019-06-15)
- 2019 Workshop on Human-in-the-Loop Learning (HILL) Workshop URL, ICML Schedule
- The Third Workshop on Tractable Probabilistic Modeling (TPM) Workshop URL, ICML Schedule
Service - Open Source Contributions
scikit-learn-contrib / imbalanced-learn
imbalanced-learn
“A Python package to Tackle the Curse of Imbalanced Datasets in Machine Learning”
Changes proposed:
- Fix typos in specificity_score
- Fixed a bug caused by external changes in the scikit-learn package
- Implemented a method for showing system information to assist in bug reporting
Code review:
Community questions I helped resolve:
- Why doesn’t SMOTE+Tomek-Links accept an SVM-SMOTE during oversampling?
- Why do datasets fail for imblearn==0.4.3?
- Why do Tomek Links take so long?
- Why does parallelism fail?
SPFlow / SPFlow
SPFlow
“An easy and extensible library for sum-product networks.”
Changes proposed:
- Automatically build and deploy documentation to a webserver when changes occur
- Rework the documentation. Instead of writing everything in a README file, write documentation in a series of files that can be exported as web pages
Community questions I helped resolved:
microsoft / LightGBM
LightGBM
“A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.”
Changes proposed:
Code review:
Community questions I helped resolve:
- How to optimize for prediction speed for real-time application?
- How to adapt the test build to use GitHub Actions?
- Why are pages not found when JavaScript is disabled?
google-research / arxiv-latex-cleaner
arxiv-latex-cleaner
: “arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv”
Changes proposed: