A developer named Cristi Constantin published a complete build log on June 10 detailing how they trained a custom small language model from scratch for roughly $80 in rented GPU compute, with a local PC handling data processing. The model is modest by design. The point is not the capability. The point is that the full stack is still accessible to one person with eighty dollars.
The timing is a useful contrast. This week Oracle disclosed $55.7 billion in capital expenditure commitments, and OpenAI formalized a lease for 10 gigawatts of data center capacity. The frontier is compressing into a small number of organizations with balance sheets that dwarf mid-size national economies. Constantin’s build log is the other end of that spectrum, and it demonstrates something the scale stories tend to obscure: the fundamentals of LLM training are not locked behind the frontier.
Constantin wrote every component from scratch. That includes the base-training scripts, the fine-tuning scripts, the data-processing pipeline, and the custom dataset assembly. The choice to avoid off-the-shelf frameworks was deliberate and pedagogical. The goal was to understand every layer by building it, not to produce the most capable model.
The author frames the result as a “vintage” LLM: intentionally small, intentionally simple, fully owned. The compute bill came out to around $80. Data processing ran on a local machine the author already owned, so the total cost reflects cloud GPU rental alone.
All code and the model itself are available through the build log at crlf.link. That reproducibility is the thing that makes this worth covering. A build log with no artifacts is a story about what someone claims to have done. A build log with published code and a trained model is a teaching artifact that any developer can clone, run, and modify.
The pedagogical value here is specific. Training a small model from scratch, rather than fine-tuning a pretrained base or calling an API, forces contact with the parts of the pipeline that are usually abstracted away: tokenizer design choices, data deduplication, the difference between base training and instruction tuning, learning rate schedules, loss curves that do not behave the way the documentation implies. These are the same fundamentals that govern the models running in production at frontier labs. The scale is different. The concepts are not.
The counterweight framing matters because the industry narrative has shifted toward inevitability. If the future of AI requires 10-gigawatt campuses and $55 billion capex cycles, then individual builders are spectators. Constantin’s build log pushes back on that framing without making a grandiose argument. It simply demonstrates, with code and a receipt, that you can still build the thing.
The $80 number will age. GPU prices move, dataset hosting costs change, and the definition of a “useful” small model shifts as the frontier advances. But the principle the project illustrates does not age at the same rate: comprehension and control over your own models are worth more than capability you rent from someone else’s infrastructure. That trade-off only sharpens as frontier models grow more closed and more expensive.
For developers currently working with fine-tuned open-weight models or API-dependent products, Constantin’s build log is worth reading as a calibration exercise. Understanding what is actually happening inside the training loop changes the quality of decisions made at every level above it, from dataset curation to evaluation design to knowing which model failures are systematic and which are fixable.
Source: Cristi Constantin’s build log published at crlf.link on 2026-06-10.