What hardware do I need for a 70B model?

As a guideline, around 40–48 GB of memory at 4-bit quantization (rule of thumb: roughly 0.5–0.6 GB per billion parameters plus context). In practice that means, for example, two RTX 3090s (48 GB combined), one RTX PRO 6000, or a Mac Studio with 64–128 GB.

Is my laptop enough for local AI?

For models up to about 8 billion parameters, a recent laptop with 16 GB RAM is often enough. From around 14B, a device with a dedicated GPU or Apple Silicon with plenty of unified memory is recommended. (Guideline)

Does the graphics card or the memory decide whether a model runs?

Both, but separately: memory decides WHETHER a model fits at all; memory bandwidth decides HOW FAST it responds. Token generation is almost always bandwidth-limited, not compute-limited.

What does quantization do?

Quantization such as Q4_K_M shrinks a model to roughly a quarter of the memory with only a small loss in quality. That is why 4-bit is the de-facto standard for running models locally.

Local AI · Hardware Calculator

Which local AI model runs on your hardware?

The biggest hurdle with local AI is not the installation – it is the question: which model actually runs well on my device? Pick your hardware and we recommend the best open-source model you can run.

Prefer to compare all models? Go to the model database · Background in the in-depth guide

Your hardware

Apple Macs share memory between the system and AI (unified memory). For dedicated GPUs, only the VRAM counts. Usable memory is always slightly less than installed.

Recommendation · ~11 GB usable

Qwen3 14B

14B parameters

German: strong~9 tokens/s

The sweet spot: strong, versatile, runs on mid-range GPUs.

Memory: ~11 GBQ4
Context: 32Ktokens
License: Apache 2.0

Tokens/s is a rough reference value (memory bandwidth ÷ model size); real-world numbers depend on quantization, context length, and software.

ReasoningTool calling

Little headroom for very long contexts. A smaller quantization (e.g. Q4 instead of Q5) or the next-smaller model frees up room.

Also runs

Gemma 4 12B~9.5 GB
Phi-4 14B~11 GB
Gemma 3 12B~9.5 GB

With more memory

Mistral Small 3.1 24Bfrom ~16 GB
Gemma 3 27Bfrom ~18 GB

How to get started: With Ollama (one command) or LM Studio (app), a local model is up and running in a few minutes.

More than just the model? Corporate LLM turns it into a production system: RAG, agent system, skills, and connectors.

Why local AI?

Full data sovereignty

Prompts and documents never leave your device. GDPR-compliant without having to trust a cloud.

No token costs

Buy the hardware once, then use it without limits. No per-request billing.

Offline & without limits

Works without internet, without rate limits, and without provider outages.

Frequently asked questions

What hardware do I need for a 70B model?: As a guideline, around 40–48 GB of memory at 4-bit quantization (rule of thumb: roughly 0.5–0.6 GB per billion parameters plus context). In practice that means, for example, two RTX 3090s (48 GB combined), one RTX PRO 6000, or a Mac Studio with 64–128 GB.
Is my laptop enough for local AI?: For models up to about 8 billion parameters, a recent laptop with 16 GB RAM is often enough. From around 14B, a device with a dedicated GPU or Apple Silicon with plenty of unified memory is recommended. (Guideline)
Does the graphics card or the memory decide whether a model runs?: Both, but separately: memory decides WHETHER a model fits at all; memory bandwidth decides HOW FAST it responds. Token generation is almost always bandwidth-limited, not compute-limited.
What does quantization do?: Quantization such as Q4_K_M shrinks a model to roughly a quarter of the memory with only a small loss in quality. That is why 4-bit is the de-facto standard for running models locally.

More than just the model

Run your model locally and Corporate LLM turns it into a production system: RAG, agent system, skills, and connectors. 100% GDPR-compliant.

Start for free View models & pricing