Zum Inhalt springen
stackschmiede.de
DE
Sovereign AI · On-prem · RAG

AI is a tool. Not an OpenAI subscription.

Most "AI integrations" are thin wrappers around ChatGPT. It works — but it ships every prompt, every document, every customer conversation to a US provider. For legal, medical, public-sector or R&D contexts, that's not an option.

Why not just OpenAI or Anthropic?

Because you hand yourself over to a provider who can unilaterally change prices, terms of use, API behavior and regions — without your input. Every major LLM vendor in the last two years has pushed price increases, model deprecations and rate-limit changes that customers could not respond to except by paying.

On-prem LLMs on your server are insurance: predictable cost (hosting instead of token roulette), data stays in-house, features can't be cancelled overnight. That's not ideology — that's business continuity management.

02 / Sovereign AI

Your own ChatGPT. On your server.

Most "AI features" are API wrappers: your data flows to OpenAI, your costs flow to AWS, your GDPR compliance becomes someone else’s problem. There is another way. Local-first, fully sovereign, with control over model and data.

US-Cloud EU / On-Prem
OpenAI / Anthropic Llama 3.3 or Mistral on your server
Pinecone Qdrant self-hosted
ChatGPT plugin RAG over your docs
AWS Bedrock vLLM on Hetzner GPU
demo.stackschmiede.de/ausmalbild
Live

Stable Diffusion XL · Line-Art LoRA · Hetzner GPU · text-prompt based

Common questions

Does "sovereign AI" really mean no data goes to OpenAI / Anthropic?

Yes — the default setup runs the entire LLM on your server (or my Hetzner GPU in Falkenstein). There is no fallback to external APIs unless you explicitly configure that for low-sensitivity use cases.

Does Llama 3.3 reach GPT-4 quality?

For structured domain tasks (document extraction, summarization, RAG answers) — yes, sometimes better with fine-tuning. For long-form creative writing: slightly behind. We evaluate in project context.

Do I need my own hardware?

No. Hetzner GPU dedicated servers from ~€200/month are the standard path. Own hardware only for very high load or specific compliance requirements.

What are operating costs after launch?

GPU hosting €150-500/month depending on model size and load, plus monitoring and updates. Typically 20-40% cheaper than equivalent OpenAI bills — and predictable.

How does it integrate with my existing stack?

Via REST, GraphQL or WebSocket. Standard patterns: chat widget, document upload, batch processing, webhooks. Also as an MCP server (Model Context Protocol).

What about the EU AI Act?

On-prem LLMs are easier to document w.r.t. transparency. For high-risk applications I refer AI lawyers — legal assessments aren’t my trade.

06 / Contact

Let’s talk.

Three channels, one contact. Reply within 24 hours on business days.

  • Phone (on request via email)
    Number shared after a short email pre-clarification.
  • Form
    Right — with project context
Response: < 24h on business days
Data transfer: encrypted (TLS 1.3)
Spam protection: Cloudflare Turnstile (no reCAPTCHA)