Curriculum Vitae
Alexander L. Hayes
hayesall@iu.edu
Indiana University Bloomington
Luddy School of Informatics, Computing, and Engineering
ProHealth Lab: Informatics East 255
918 E. 10th Street
Bloomington, IN 47401
Technical Skills
Languages: Python, Shell Scripting, Java, C/C++, JavaScript, Racket, Julia
Libraries: NumPy, SciPy, scikit-learn, Pandas, NetworkX, pytest
Tools: Git, GitHub, GitHub Actions, JIRA, ReadTheDocs, Travis-CI, CircleCI, AppVeyor, CodeCov, PyPi
Development Platforms: Linux/UNIX, Jekyll, Android, Arduino, Google Cloud Platform
Documentation Tools: LaTeX, Sphinx, Javadoc, Doxygen, Markdown, ReStructured Text
Workflows: Continuous Integration (CI), Gitflow
Education
Doctor of Philosophy (Ph.D.) Health Informatics (in progress)
Mathematics Minor
2019–present, expected graduation date: 2026
Luddy School of Informatics, Computing, and Engineering
Indiana University, Bloomington, IN
Master of Science (M.S.) Health Informatics
Class of 2023
Luddy School of Informatics, Computing, and Engineering
Indiana University, Bloomington, IN
Bachelor of Science (B.S.) Computer Science
Security Informatics Minor, Class of 2017, GPA: 3.5 Cumulative
School of Informatics, Computing, and Engineering
Indiana University, Bloomington, IN
Experience
Indiana University, Bloomington
Luddy School of Informatics, Computer Science, and Engineering
-
Instructor of Record — (August 2024 — Present)
- Fall 2024 – Information Infrastructure II – INFO-I 211
Taught from the book Erika and I wrote over the summer, adapting it from an asynchronous online course to a synchronous in-person course.
- Fall 2024 – Information Infrastructure II – INFO-I 211
-
Associate Instructor — (January 2023 — July 2024)
- Summer 2024 – Information Infrastructure II – INFO-I 211 (asynchronous, online)
Supervisor: Erika Lee
Piloted GitHub Actions to finish putting the “auto” in autograding, and wrote the first draft of a book with Erika Lee for teaching an asynchronous web app development course. - Spring 2024 – Information Infrastructure II – INFO-I 211
Supervisor: Matt Hottell
Large cohort (200+ students) stress-tested previous workflows. Worked on autograding infrastructure: enough to orchestrate copies of GitHub repos, automatically test them, or bootstrap a Flask application. - Fall 2023 – Information Infrastructure II – INFO-I 211
Supervisor: Matt Hottell
Took a role as a course manager, writer, and grading. The key objective was to minimize the number of people touching the gradebook. - Summer 2023 – Information Infrastructure II – INFO-I 211
Supervisor: Matt Hottell
Moved all course assignments into GitHub repositories, and introduced “unit testing” to motivate how we know when programs meet expectations. - Summer 2023 – Information Infrastructure I – INFO-I 210
Supervisor: Shabnam Kavousian
Worked with Shabnam Kavousian to develop and teach an alternate “intro Python programming” curriculum with permission from the department chair and director of undergraduate studies. Collected data and wrote an internal pilot study, finding that the students who took the alternate I210 with Shabnam and I tended to outperform their peers in future classes. - Spring 2023 – Information Infrastructure II – INFO-I 211
Supervisor: Matt Hottell and Erika Lee
Led two weekly lab sessions, hosted evening help sessions, guided students during lectures, and learned how the course worked.
- Summer 2024 – Information Infrastructure II – INFO-I 211 (asynchronous, online)
-
Research Assistant — Computer Vision Lab (January 2022 — January 2023)
- Investigated explainability in time series problems
- Implemented a Bayesian network (BN) explainability technique as a Python package targeting the pomegranate library
- Extended the technique toward handling time series problems where the sequence is represented by a dynamic Bayesian network (DBN)
-
Graduate Mentor - Research Experience for Undergraduates — ProHealth Lab (May 2022 — July 2022)
- Mentored for a project analyzing smartwatch data alongside clinical data
- Extended prior work for infrastructure development in the Hoosier Moms Cohort
- Wrote course material for exploratory data analysis, scientific programming, and git
-
Research Assistant – ProHealth Lab – Precision Health Initiative (January 2019 — December 2021)
- Secondary analysis on incidence of gestational diabetes
- Developed tools for data cleaning and pre-processing for creating reproducible data partitions: numom2b.org.
- Solved the binary class imbalance problem (imbalance of 1 to 32).
- Reduced features (original feature space ~7000 variables)
- Explained predictions for a clinical decision support setting.
- Infrastructure development for Hoosier Moms Cohort
- Implemented caching to work with snapshots of the database
- Decreased analysis time from >72 hours to <5 minutes
- Prototyped a dashboard for exploratory visualization (hmc-dashboard)
- Secondary analysis on incidence of gestational diabetes
CareBand Inc.
222 West Merchandise Mart Plaza #1230, Chicago, IL
-
Developer and Machine Learning Research Consultant (February 2020 — August 2020)
- Implemented solutions for indoor location tracking.
- Developed models to analyze trends in user behavior.
The University of Texas at Dallas
Department of Computer Science, Richardson, TX
-
Teaching Assistant (August 2018 — December 2018)
- Fall 2018 – Automata Theory – CS 4384.001
Led two lectures on finite automata minimization. Graded assignments and exams, prepared and verified automata examples prior to lectures, and held four hours of office hours per week to answer questions.
- Fall 2018 – Automata Theory – CS 4384.001
-
Research Assistant – StARLinG Lab (May 2018 — August 2018)
- Extended the lab’s open source tool for converting raw text into relational facts. Rewrote the software so it could be used as a command-line tool or as an imported Python package. Released the software as rnlp.
- Documented, unit tested, and ensured correctness of a Python port of Relational Functional Gradient Boosting (rfgb).
-
Teaching Assistant (August 2017 — May 2018)
- Spring 2018 – C Programming in a UNIX Environment – CS 3377.501
Provided feedback on C++ programming assignments and bash scripts in terms of documentation, style, and functionality of code. - Fall 2017 – Automata Theory – CS 4384.001
Graded assignments and exams, prepared and verified automata examples prior to lectures, and provided additional support to students outside of class.
- Spring 2018 – C Programming in a UNIX Environment – CS 3377.501
Indiana University, Bloomington
Department of Informatics and Computer Science
- Undergraduate Researcher, STARAI Lab (July 2017 — July 2018)
- Explored methods combining Natural Language Processing and Statistical Relational Learning for information extraction on SEC Form S-1 Documents.
- Facilitated the public release of the lab’s source code onto GitHub, distributed as BoostSRL. Maintained the BoostSRL wiki and tutorials.
- Undergraduate Researcher, ProHealth Research Experience for Undergraduates (May 2016 — August 2016)
- Built on research which previously inferred adverse side-effects of drugs based on text data mined from the web. Our work focused on predicting drug-drug interactions from data mined from OpenFDA, PubMed, and a variety of Blogs.
- Camp Counselor, SICE Summer Camp (2014, 2015, 2016, 2017)
- Led sessions on intermediate Python programming, Scratch, Raspberry Pi, information security, and data analytics.
- Introduced high school students to Indiana University’s campus, navigated them between sessions where they learned about computer science and informatics.
Software
srlearn
: A Python Library for Gradient-Boosted Statistical Relational ModelsSRLBoost
: A Java library for learning and inference with SRL models: up to 15x faster than existing librariesrelational-datasets
: Python/Julia libraries for working with benchmark datasets for statistical relational learningrnlp
: Converting text to relational facts
Publications and Poster Presentations
- Alexander L. Hayes, Lucas Newman-Johnson, David Crandall. 2022. Dynamic Bayesian Rule Learning for Interpretable Time Series Prediction. Poster Presentation. April 29, 2022. Innovation Hall, IUPUI, Indianapolis, IN, USA. — Poster: https://hayesall.com/posters/dynamic_rule_extraction_poster_v1.pdf
- Athresh Karanam, Alexander L. Hayes, Harsha Kokel, David M. Haas, Predrag Radivojac, and Sriraam Natarajan. 2021. A Probabilistic Approach to Extract Qualitative Knowledge for Early Prediction of Gestational Diabetes. Nineteenth International Conference on Artificial Intelligence in Medicine. June 15-18, 2021. Online (Hosted in Porto, Portugal). https://doi.org/10.1007/978-3-030-77211-6_59 — .pdf: https://hayesall.com/publications/quake-gdm.pdf
- Alexander L. Hayes. 2020. srlearn: A Python Library for Gradient-Boosted Statistical Relational Models. Ninth International Workshop on Statistical Relational AI. February 7, 2020. New York City, NY, USA. — Code: https://github.com/hayesall/srlearn-StarAI-2020-workshop/ — .pdf: https://hayesall.com/publications/srlearn-python-library.pdf — Poster: https://hayesall.com/posters/srlearn-2020-workshop-poster.png
- Alexander L. Hayes. 2019. srlearn: A Python Library for Gradient-Boosted Statistical Relational Models. HCI Fest Poster Presentation. December 5, 2019. Bloomington, IN, USA. — Code: https://github.com/hayesall/srlearn-StarAI-2020-workshop/ — .pdf: https://hayesall.com/publications/srlearn-python-library.pdf
- Alexander L. Hayes, Mayukh Das, Phillip Odom, and Sriraam Natarajan. 2017. User Friendly Automatic Construction of Background Knowledge: Mode Construction from ER Diagrams. Knowledge Capture Conference (K-CAP '17). December 4-6, 2017. Austin, TX, USA. https://doi.org/10.1145/3148011.3148027 — Code: https://github.com/hayesall/Walk-ER/ — .pdf: https://hayesall.com/publications/construction-background-knowledge.pdf
- Alexander Hayes, Savannah Smith, Ciabhan Connelly, Devendra Dhami, and Sriraam Natarajan. 2016. Predicting Drug-Drug Interactions: Combining Machine Learning and Natural Language Processing. ProHealth REU, School of Informatics and Computing, Indiana University. July 2016. Bloomington, IN, USA. — Code: https://github.com/hayesall/DrugInteractionDiscovery — Poster: https://hayesall.com/posters/ddi-2016-prohealth-reu.png
- Aaron Porter and Alexander Hayes. 2016. Stress-Induced Video Capture: Forensic Capture for People with Visual Impairments. School of Informatics and Computer Science Spring Research Symposium. April 2016. Bloomington, IN, USA.
Conference Attendance
- The 2024 Decoding the Disciplines Conference: Adapting Decoding for the Next Generation 2024: Indiana University School of Education, Bloomington, Indiana. (2024-10-31, 2024-11-02) Conference URL
- International Conference on Artificial Intelligence in Medicine (AIME) 2021: Online, Hosted in Porto, Portugal. Spotlight Paper Presentation. (2021-06-15, 2021-06-18) Conference URL
- Association for the Advancement of Artificial Intelligence (AAAI) 2020: Hilton New York Midtown, New York, New York, USA. Workshop Poster Presentation. (2020-02-06, 2020-02-08)
- Ninth International Workshop on Statistical Relational AI (StarAI 2020) Workshop URL
- International Conference of Machine Learning (ICML) 2019: Long Beach Convention Center, Long Beach, California, USA. Attendee. (2019-06-14, 2019-06-15)
- 2019 Workshop on Human-in-the-Loop Learning (HILL) Workshop URL, ICML Schedule
- The Third Workshop on Tractable Probabilistic Modeling (TPM) Workshop URL, ICML Schedule
Service - Open Source Contributions
scikit-learn-contrib / imbalanced-learn
imbalanced-learn
“A Python package to Tackle the Curse of Imbalanced Datasets in Machine Learning”
Changes proposed:
- Fix typos in specificity_score
- Fixed a bug caused by external changes in the scikit-learn package
- Implemented a method for showing system information to assist in bug reporting
Code review:
Community questions I helped resolve:
- Why doesn’t SMOTE+Tomek-Links accept an SVM-SMOTE during oversampling?
- Why do datasets fail for imblearn==0.4.3?
- Why do Tomek Links take so long?
- Why does parallelism fail?
SPFlow / SPFlow
SPFlow
“An easy and extensible library for sum-product networks.”
Changes proposed:
- Automatically build and deploy documentation to a webserver when changes occur
- Rework the documentation. Instead of writing everything in a README file, write documentation in a series of files that can be exported as web pages
Community questions I helped resolved:
microsoft / LightGBM
LightGBM
“A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.”
Changes proposed:
Code review:
Community questions I helped resolve:
- How to optimize for prediction speed for real-time application?
- How to adapt the test build to use GitHub Actions?
- Why are pages not found when JavaScript is disabled?
google-research / arxiv-latex-cleaner
arxiv-latex-cleaner
: “arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv”
Changes proposed: