Using DiffusionGemma in Corporate LLM

DiffusionGemma, Google's new open text diffusion model, can now be used in Corporate LLM: you connect it as your own endpoint via Bring Your Own Model, the model computes on your hardware, and requests never leave your infrastructure. Fast, openly licensed, in your team workspace.

Overview: DiffusionGemma is not built into Corporate LLM as a hosted model; it is connected via Bring Your Own Model. You serve the model via an OpenAI-compatible endpoint, for example with vLLM, and register it in the BYOM tab. After that, DiffusionGemma is available in the model picker, complete with Spaces, Agents, and team management. BYOM is available on the Free plan and all paid tiers, secured with IP pinning. That gives you a fast, local model in a shared interface, without sending your data outside the building.

What is DiffusionGemma and why is it so fast?

DiffusionGemma generates text via diffusion: not token by token, but an entire 256-token block in parallel. According to Google, that makes it up to four times faster — over 700 tokens per second on a single RTX 5090 — and quantized, it fits in 18 GB of VRAM. It is Apache 2.0-licensed, so it runs license-free on your own hardware. For how the architecture works in detail and what it changes, read the DiffusionGemma guide.

For everyday work, one thing matters from all of this: a fast model now runs offline on a consumer GPU. Exactly this combination — high speed plus full data sovereignty — was the long-standing problem with local AI. DiffusionGemma takes the speed argument off the table.

Using DiffusionGemma in Corporate LLM: what works starting today

Corporate LLM does not host DiffusionGemma itself. Instead, you connect your own DiffusionGemma endpoint via Bring Your Own Model. The model runs wherever you operate it: on your own workstation, in your own data center, or at your hosting provider of choice. Corporate LLM puts the working environment on top: model picker, Spaces, shared Agents, and team management.

The decisive point is freedom of choice per use case. For each Space or Agent, you decide whether a task goes to your local DiffusionGemma, to another model connected via BYOM, or to the EU-hosted standard selection with a data processing agreement (DPA). The tooling stack stays the same; only the model behind the request changes. BYOM is available on the Free plan and all paid tiers, with no surcharge for the model itself.

Connecting DiffusionGemma via BYOM: 3 steps

Three steps — that is all it takes.

Serve the model. Run DiffusionGemma behind an OpenAI-compatible endpoint. vLLM is the standard for this and delivers the high token rates the model is known for. Quantized, it fits in 18 GB of VRAM, so it runs on a high-end consumer GPU.
Register the endpoint. In the BYOM tab, enter the endpoint URL and your key. Every outbound connection goes through an IP-pinning dispatcher that verifies the target IP in advance and blocks internal ranges. Credentials are stored encrypted.
Use it as a team. DiffusionGemma then appears in the model picker and can be selected per Space or Agent. Token consumption for the connection is reported separately, without Corporate LLM counting your raw numbers against an invoice.

If you want to understand the BYOM mechanism from the ground up, you will find the details in our overview of custom models in Corporate LLM.

Why local DiffusionGemma matters for mid-sized companies

In everyday use, local AI rarely failed on quality — it almost always failed on two practical hurdles: too slow, and too cumbersome for teams. DiffusionGemma addresses the first, BYOM the second. A fast, openly licensed model runs on a single workstation, and through Corporate LLM the whole team uses it with permission management and shared Agents, instead of it fizzling out as a single-seat experiment.

A law firm summarizes client correspondence locally, an HR department pre-sorts applications, a support team answers internal knowledge questions. In all cases, the inputs stay on your own hardware, and the answer arrives in seconds. For the broad middle of such tasks, a fast local model is often the calmer choice than a frontier model behind someone else's API.

As long as DiffusionGemma computes on your own hardware, inputs never leave your infrastructure. That eliminates the transfer under Art. 44 GDPR and the subsequent articles on third-country transfers, and the technical and organizational measures under Art. 32 GDPR become easier to fulfill, because the data stays in-house. Corporate LLM acts as a routing layer that keeps the connection credentials secure and writes audit logs.

Two points remain, as always: if the hardware sits with an external hosting provider, you need a data processing agreement (DPA) under Art. 28 GDPR with them. And the GDPR obligations — purpose limitation, deletion policy, and documentation — continue to apply unchanged. The fact that the model is openly licensed changes nothing here: the license governs the use of the model, not the handling of your data.

For the bigger picture — how to set up AI in your company in a GDPR-compliant way overall, from the right provider plan through EU hosting to local operation — see the guide Using Claude in a GDPR-compliant way.

Testing DiffusionGemma: your next steps

If you want to test DiffusionGemma, serve it via vLLM and hook the endpoint into the BYOM tab — on the Free plan just as in any paid plan. If you first want to understand what is behind the speed and when local AI is worth it at all, read the DiffusionGemma guide and the guide to local AI models. That way, you have the fastest open model in an interface the whole team uses, and the data stays where it belongs.

Frequently asked questions

Can I use DiffusionGemma in Corporate LLM?

Yes. You connect DiffusionGemma as your own endpoint via Bring Your Own Model. The model runs on your hardware or at your hosting provider; Corporate LLM provides the interface, Spaces, Agents, and team management on top.

Which plan includes Bring Your Own Model?

BYOM is available on the Free plan and all paid tiers. You connect your own endpoint at no extra charge; model usage runs on your own hardware or under your contract with the respective provider.

How do I connect DiffusionGemma technically?

Serve DiffusionGemma via an OpenAI-compatible endpoint, for example with vLLM, and enter the endpoint URL plus your key in the BYOM tab. Every outbound connection is secured through an IP-pinning dispatcher.

Does my data stay in-house with DiffusionGemma via BYOM?

Yes, if the endpoint runs on your own hardware or at your EU hosting provider. Requests go directly to your endpoint; Corporate LLM only keeps the connection credentials secure and writes audit logs.

Does Corporate LLM host DiffusionGemma itself?

No. You connect DiffusionGemma as your own model via BYOM. For ready-hosted models, Corporate LLM offers the EU-hosted model selection with a data processing agreement (DPA).

DiffusionGemma Now Free and Unlimited in Corporate LLM