Allen Institute for AI shipped OlmoEarth v1.1 on May 15, cutting compute costs by up to 3x while matching v1.0 performance on a mix of research benchmarks and partner-defined tasks. The release, announced on the Hugging Face blog, matters because compute is by far the highest cost over the full OlmoEarth lifecycle, and cheaper inference directly determines how many organizations can run planet-scale satellite analysis.
OlmoEarth is a family of transformer-based foundation models built to process satellite imagery. Remote sensing inputs arrive as multidimensional tensors of spatial pixels, spectral channels, and time steps; the model converts those arrays into token sequences the same way a language model converts words. Compute costs in transformers scale quadratically with sequence length, so a shorter sequence is not just faster but substantially cheaper.
The central engineering decision in v1.1 was rethinking what one token should represent. The v1.0 models processed Sentinel-2 satellite data by creating a separate token for each spatial patch at each resolution (10 m, 20 m, and 60 m) and each time step. A two-timestep Sentinel-2 image produced six tokens per patch. OlmoEarth v1.1 collapses the three resolution-based tokens into one, producing three times fewer tokens per patch and the same reduction in multiply-accumulate operations, the standard proxy for inference cost.
The naive version of this approach failed. Collapsing resolution tokens without changing training produced a 10 percentage-point drop on m-eurosat kNN, a standard remote-sensing classification benchmark. AllenAI’s fix required modifying the pre-training regimen, details of which are in the technical report. The final result holds benchmark parity with v1.0 despite the compression. AllenAI notes some regressions exist at specific tasks and recommends consulting the technical report before switching workloads.
The release includes Base, Tiny, and Nano model sizes, so teams can match model scale to their compute budget. All weights and training code are published on the Hugging Face model hub.
Why this matters beyond the benchmark line: Earth-observation foundation models serve a distinct and underreported set of use cases. Since OlmoEarth v1 shipped in November 2025, partners have used it to track mangrove forest loss, classify drivers of deforestation, and produce country-scale crop-type maps within days rather than months. These workloads cover agriculture supply chains, carbon credit verification, compliance monitoring under emerging deforestation due-diligence regulations, and military logistics intelligence. Most of this analysis currently runs on closed API services from satellite imagery vendors, where per-scene pricing limits refresh frequency.
A 3x compute reduction changes the refresh economics. A team that previously could afford weekly continental-scale analysis can now run it daily for the same budget. That shift is practically significant for time-sensitive applications: tracking flood extent after a storm, monitoring port activity across a region, or detecting crop stress before harvest. It also changes the build-vs-buy calculation for organizations that want to run models on-premises for data-sovereignty reasons.
The v1.1 release is also a deliberate scientific control. AllenAI trained v1.1 on the same dataset as v1.0, meaning any performance difference between the two isolates the effect of architectural and training changes rather than data changes. That makes v1.1 useful as a benchmark reference for researchers studying pre-training methodology in remote sensing, not just a product upgrade.
The release announcement does not include independent benchmark results from parties outside AllenAI, and parity claims reference a benchmark mix that includes tasks the team constructed with partners. That is standard for domain-specific models, but teams evaluating the switch should run their own fine-tuning comparisons on held-out tasks before committing inference budgets.
Teams currently running OlmoEarth v1.0 at continental or global scale should benchmark v1.1 fine-tuning on their specific task within the next month; a confirmed 3x cost reduction at equivalent accuracy would alter the business case for nearly every organization that has deferred planet-scale deployments on budget grounds.
Sourced from the Allen Institute for AI post by Kyle Wiggers published on the Hugging Face blog on May 15, 2026.