I Kept Starting Over

May 24, 2026·
Jean Michel A. Sarr
Jean Michel A. Sarr
· 4 min read

Why I kept abandoning writing projects, and why I am building Distill to create a sustainable loop for research, memory, and public thinking.

I had a blog. Then a YouTube channel.

I stopped the blog to finish my PhD thesis. I stopped the YouTube channel when I moved to Ghana for work. Both times, when I came back, the gap felt too wide to close. So I didn’t.

These were pre-AI years. No tools that could help carry the load. Just me, a quality bar I could not lower, and not enough hours. I am a perfectionist by nature, and that made it worse: I would over-polish everything until the effort became unsustainable. Motivation was never the problem. Cost was.

I’ve been thinking about that pattern for a while.


Why This Matters to Me

If you follow AI research, you know that three things drive model performance: compute, model size, and data. That is what the scaling laws tell us. Frontier labs have spent the last few years pushing hard on compute and model size. Those levers are capital-intensive. The organizations best positioned to exploit them are the ones with billions to spend on infrastructure and talent.

Data is different.

It is still expensive to collect and curate at scale. But it is less purely capital-gated. Quality, curation strategy, domain specificity, and timing matter as much as money. That makes it the lever most worth watching for anyone trying to find an edge, the equivalent of the trader looking for alpha in a market where the big players have already claimed the obvious positions.

I am drawn to this question for personal reasons. During my PhD, I worked on distribution shifts and synthetic data, I saw firsthand how the right data can dramatically improve model performance. After academia, I moved into industry and designed data pipelines that feed frontier models. I have watched from the inside how data decisions shape what models can and cannot do. I know this is a domain of real competitive advantage.

Keeping up with how that thesis is actually evolving is hard with a full-time job.

The field moves fast. Every month brings new papers, new benchmark results, new engineering choices from frontier labs that quietly change what the thesis implies. A belief that felt solid six months ago may have already weakened. Or strengthened. It is hard to tell if you are only reading in the present tense.

I kept missing the updates. Not because I wasn’t reading. But because reading without memory is just re-reading. Nothing accumulated. Every time I sat down to think about the data advantage question, I was starting from the same place.

I tried newsletters. RSS feeds. Notion pages. They help with collection. None of them help with accumulation. They don’t tell you when a new paper weakens something you believed last month. They don’t notice when three independent signals are converging on the same conclusion. They produce more input, not better understanding.


What I Started Building

So I started building something.

The core idea: a system that monitors research continuously, builds a living knowledge base around a topic, and updates that knowledge base as new evidence arrives. Not a one-off summary. Not a search engine. Something that remembers, and revises what it remembers over time.

The domain I am starting with is the data advantage thesis in AI. But the system is designed so that the topic it tracks and the audience it briefs are configuration, not code. The machinery is general. The intelligence about what matters lives in the config.

I call it Distill.


The Bet

Here is what I am actually betting on.

Not just that the system is useful for research. But that building it forces me to think clearly about hard problems, and posting about those problems gives me something real to say, without starting from scratch each time.

The tool is both the subject and the solution to my old consistency problem. If I am shipping commits and making decisions, I have material. The quality comes from the thinking behind the decisions, not from hours of production work.

That is the loop I am trying to close.


Building in Public

I am going to build this in public.

I will post regularly about the problems I hit and the decisions I make. Not polished retrospectives, dispatches from inside the work. When I link to a commit, it is because something there is worth thinking about, not because the code needs to be read.

If you are interested in how data advantages in AI are actually evolving, or in how to build systems that accumulate knowledge rather than just retrieve it, or if you have ever abandoned something because your own quality bar made it unsustainable, follow along.

I don’t know yet if this works. That’s the point.