AI is a tool. Not an OpenAI subscription.
Most "AI integrations" are thin wrappers around ChatGPT. It works — but it ships every prompt, every document, every customer conversation to a US provider. For legal, medical, public-sector or R&D contexts, that's not an option.
Why not just OpenAI or Anthropic?
Because you hand yourself over to a provider who can unilaterally change prices, terms of use, API behavior and regions — without your input. Every major LLM vendor in the last two years has pushed price increases, model deprecations and rate-limit changes that customers could not respond to except by paying.
On-prem LLMs on your server are insurance: predictable cost (hosting instead of token roulette), data stays in-house, features can't be cancelled overnight. That's not ideology — that's business continuity management.
Your own ChatGPT. On your server.
Most "AI features" are API wrappers: your data flows to OpenAI, your costs flow to AWS, your GDPR compliance becomes someone else’s problem. There is another way. Local-first, fully sovereign, with control over model and data.


Stable Diffusion XL · Line-Art LoRA · Hetzner GPU · text-prompt based
Common questions
Does "sovereign AI" really mean no data goes to OpenAI / Anthropic?
Yes — the default setup runs the entire LLM on your server (or my Hetzner GPU in Falkenstein). There is no fallback to external APIs unless you explicitly configure that for low-sensitivity use cases.
Does Llama 3.3 reach GPT-4 quality?
For structured domain tasks (document extraction, summarization, RAG answers) — yes, sometimes better with fine-tuning. For long-form creative writing: slightly behind. We evaluate in project context.
Do I need my own hardware?
No. Hetzner GPU dedicated servers from ~€200/month are the standard path. Own hardware only for very high load or specific compliance requirements.
What are operating costs after launch?
GPU hosting €150-500/month depending on model size and load, plus monitoring and updates. Typically 20-40% cheaper than equivalent OpenAI bills — and predictable.
How does it integrate with my existing stack?
Via REST, GraphQL or WebSocket. Standard patterns: chat widget, document upload, batch processing, webhooks. Also as an MCP server (Model Context Protocol).
What about the EU AI Act?
On-prem LLMs are easier to document w.r.t. transparency. For high-risk applications I refer AI lawyers — legal assessments aren’t my trade.
Let’s talk.
Three channels, one contact. Reply within 24 hours on business days.
- E-Mailkontakt@stackschmiede.de
- Phone (on request via email)Number shared after a short email pre-clarification.
- FormRight — with project context