Question 1

Does "sovereign AI" really mean no data goes to OpenAI or Gemini?

Accepted Answer

Yes — the default setup runs the entire LLM on your server (or on one of my GPU servers in Germany). There is no fallback to external APIs unless you explicitly configure that for low-sensitivity use cases.

Question 2

Does Mistral Small 3.1 reach GPT-4 quality?

Accepted Answer

For structured domain tasks (document extraction, summarization, RAG answers) — yes, sometimes better with fine-tuning. For long-form creative writing: slightly behind. We evaluate in project context. For code-specific workflows I use Codestral, for voice-to-text Voxtral.

Question 3

Do I need my own hardware?

Accepted Answer

No. Dedicated GPU servers in Germany from ~€200/month are the standard path. If you prefer running it in-house: my AI-workshop packages deliver ready-made on-prem systems starting at €3,499. Own hardware only for very high load or specific compliance requirements.

Question 4

What are operating costs after launch?

Accepted Answer

GPU hosting €150-500/month depending on model size and load, plus monitoring and updates. Typically 20-40% cheaper than equivalent OpenAI bills — and predictable.

Question 5

How does it integrate with my existing stack?

Accepted Answer

Via REST, GraphQL or WebSocket. Standard patterns: chat widget, document upload, batch processing, webhooks. Also as an MCP server (Model Context Protocol).

Question 6

What about the EU AI Act?

Accepted Answer

On-prem LLMs are easier to document w.r.t. transparency. For high-risk applications I refer AI lawyers — legal assessments aren’t my trade.

AI is a tool. Not an OpenAI subscription.

Why not just OpenAI or Gemini?

Your documents. Your model. Your server.

Common questions

Let’s talk.