Hi.
[2026: Actively seeking opportunities in industry positions.]
I'm a PhD student studying Computer Science at University of Chicago, advised by Sanjay Krishnan as a part of the ChiData Group. I did my undergraduate at Princeton University, and am originally from Ottawa, Canada (eh).
My research focuses on building systems for semi-structured machine learning data. Currently, I am particularly interested in projects that coordinate data at scale (e.g., across distributed environments) or enable safe, automatic feedback loops between data generation and use (e.g., agentic pipelines). My research projects include Jupyter Notebook tracking (Github), data lineage and provenance (Github), and machine learning error classification (Github).
I'm currently working on the TableVault project. This project aims to answer: how to query from ML-generated data across distributed AI experiments?
You can contact me at: j2zhao@uchicago.edu