Home | Jean Michel A. Sarr

I build systems for LLM research: infrastructure, synthetic data pipelines, and evaluation loops that make model iteration faster and more reliable.

As a Research Engineer, I specialize in designing systems that eliminate bottlenecks in LLM development. My work spans two complementary domains: infrastructure engineering and synthetic data research.

On the infrastructure side, I architect systems that decouple experimental logic from execution, making it easier to run more experiments with less operational overhead. I build tools for model iteration, data loading, evaluation, and multimodal workflows where adding new experiments, datasets, or models incurs constant overhead rather than linear complexity.

On the research side, my expertise is synthetic data for post-training, specifically how it scales to new domains when rigorous evaluation infrastructure enables tight iteration loops. I cover generation for Supervised Fine-Tuning (SFT) and preference learning, with deep knowledge of synthetic alignment methods, including RLHF/RLAIF limitations and alternatives, synthesized in my research series.

This dual expertise is grounded in my PhD research at Sorbonne University, where I developed methods using synthetic data to predict model behavior under distribution shift—principles I now apply to designing robust, production-scale systems that unblock researchers at frontier labs.

My writing here is personal and reflects my own views, not those of my employer.

I write about infrastructure design and the shift from human to synthetic labeling at jmamath.github.io.