Prism ML released Bonsai Image 4B on May 29, a family of diffusion models designed to run entirely on-device, including on an iPhone, without any API call or cloud dependency. The release marks a concrete step in the transition of generative image inference from research-demo to deployable product artifact.

Bonsai Image 4B comes in two variants targeting different constraint profiles. The 1-bit variant uses extreme quantization, compressing the 4-billion-parameter model to fit within the strictest mobile memory budgets. Prism ML positions it for embedded and edge applications where deployment footprint and bandwidth are the binding constraints, accepting lower visual quality as the tradeoff. The ternary variant uses 3-state quantization, which provides more representational headroom: better visual quality and improved prompt fidelity while remaining compact enough to run on consumer hardware.

The comparison Prism ML draws is against SDXL-Turbo and Stable Diffusion 3 medium. Bonsai is smaller in parameter count than either. The claim is that it trades some output quality for a deployment footprint that neither of those models can match on a phone. SDXL-Turbo requires a GPU with several gigabytes of VRAM to run usably; Stable Diffusion 3 medium sits at roughly 2 billion parameters but was not designed for mobile inference. Prism ML’s argument is that “good enough” on-device beats “better but API-only” for a specific class of product use cases.

The release announcement does not include independent benchmark numbers comparing Bonsai against SDXL-Turbo or Stable Diffusion 3 medium on standard image-quality metrics such as FID or CLIP score. The visual examples in the release materials are curated. The “iPhone-runnable at usable quality” claim is the headline; whether that quality holds up in production workflows against API-served alternatives is not answered by anything Prism ML has published.

For product teams building on iOS, the relevance extends beyond Bonsai itself. Apple’s Foundation Models framework, announced alongside iOS 18 and expanded in subsequent developer releases, already supports on-device text generation for apps that want to ship language features without a cloud round-trip. Bonsai extends that pattern to image generation. A developer who adopts both can ship an iOS app with multimodal generative capabilities, text and image, without any external API dependency. That changes the unit economics and the privacy posture of the product simultaneously.

The use cases Prism ML names are mobile creative tools, on-device avatar generation, content moderation, AR/VR overlays, and accessibility applications that require real-time visual generation. These are not new categories. What is new is that the deployable parameter budget now reaches 4 billion with quantized diffusion. A product that previously required a call to a hosted Stable Diffusion endpoint for every image can, in principle, run the full inference loop in the user’s pocket. The cost model, the latency model, and the data-handling model all change at once.

Prism ML has released the weights and a sample iOS application from the Bonsai Image 4B release page, which lowers the friction for teams evaluating the model against their specific quality threshold. The practical test is whether Bonsai’s ternary variant clears the visual quality bar for a given product’s users, and that test requires running it against actual content rather than trusting the release materials.

Teams currently evaluating API-based image generation for mobile products should run the Bonsai ternary variant against their own quality rubric before renewing or expanding a hosted inference contract.

Source: Prism ML release announcement for Bonsai Image 4B, published May 29, 2026, at prismml.com.