Zum Inhalt springen
stackschmiede.de
DE
#01 4-8 weeks

Local AI for your business

I bring AI into your business — tailored to your needs, on your server, not at OpenAI. Sort mail, search files, transcribe dictations, draft routine replies. Patient records, client files and personnel data stay in the house.

Range
€4,900–€12,000
ex VAT
Duration
4-8 weeks
Terms
14 days
50% upfront

What I do for you

I bring AI into your business — from the first question “where does this actually pay off?” to a fully running system. In four concrete steps:

  • Consulting — we walk through your processes and find the spots where AI can take over routine work. And the spots where it adds nothing.
  • Selection — the right model (Mistral, Llama 3, Qwen, Codestral, Voxtral) and the right tools (Ollama for simple setups, vLLM for high load) for your hardware and your tasks.
  • Build — set up the server, install the model, connect it to your data, build the interface, sort out permissions.
  • Handover — training, operations handbook, optional ongoing maintenance via retainer.

Entirely on your server (or on mine in Germany), not at OpenAI, Google or Anthropic in the US. Patient records, client files, personnel data and trade secrets stay in the house.

Where AI takes over work today

  • Tame the mail flood — incoming emails sorted, summarized, draft replies in your team’s voice. Saves ~60-90 min per day per case worker.
  • Make documents and knowledge searchable — manuals, contracts, minutes, Confluence/Sharepoint as a searchable knowledge base. Answers cite the source file — no fabricated content.
  • Transcribe dictations and meetings — Voxtral converts speech to text. On the device or on your server.
  • Code help for developers — like GitHub Copilot, but in-house. Codestral as an internal helper, no code leaves the building — not even as training data.
  • Query a database or shift plan in plain language — “Who’s free Tuesday afternoon?” instead of three clicks in the HR software.
  • Pre-draft standard customer replies — recognize incoming requests, draft a fitting reply, human sends. Works for service requests, appointment confirmations, status updates.

Why local — and not just ChatGPT?

If you use ChatGPT, Gemini or Claude directly from the app, every question and every uploaded document goes to the US. For personal notes that’s fine. For client files, patient records or employee evaluations it is not legally permitted — and for many other data sets simply unwise.

The good news: local models (Mistral, Llama 3, Qwen) today reach 80-95 % of ChatGPT’s quality — for routine tasks like summarizing, sorting, translating or drafting simple replies they are on par. With tools like Ollama such a system can be operated on a mid-sized server without a deep IT team.

Cloud AI (OpenAI, Google, Anthropic)Local AI (your server)
GDPR straight awayno, processing in the USyes, data does not leave the house
Professional secrecy (§ 203 StGB)problematic to impermissiblepreserved
Costsper-request, hard to planfixed price, predictable
Vendor dependencehigh (prices, model changes)low (model still runs in 5 years)
Adapt to your language, your documentslimitedfine-tuning possible
Internet requiredyes, alwaysno

Honest cost comparison

ChatGPT Enterprise costs ~€20/user/month. For a 30-person company = €600/month = €7,200/year. Plus: your inputs, emails, documents flow to OpenAI.

My AI Workshop M (€8,999 one-off, RTX 4090 · 128 GB RAM, optional €199/mo maintenance) pays back against ChatGPT Enterprise after 15-18 months. From year 2 onward it is pure savings — plus full data sovereignty and no exposure to OpenAI price hikes.

Alternative path with no hardware investment — a dedicated GPU slot on my servers, from €200/month including setup, updates, server monitoring. Started only on demand — under normal load often €30-50/month effective.

My flagship

This is my flagship topic. I combine two strengths here:

For your first AI step I am the right mix of “knows the craft” and “is actively in the new wave”. Not a buzzword consultant, but someone who builds and runs this himself.

When does this make sense?

  • Patient records / medical reports → outsourcing to OpenAI not legally permissible.
  • Case files / contracts → attorney-client privilege beats convenience.
  • R&D docs / patents → competitive secrets stay in-house.
  • Public sector → BSI baseline protection, no data transfers to third countries.
  • Heavy usage → from ~50,000 queries/month, your own server becomes cheaper than pay-per-query in the cloud.

Tech stack (for IT leads)

Inference: vLLM (performance), Ollama (simple, my favorite for mid-market setups), llama.cpp (minimal) — depending on server size. Models: Mistral Small 3.1, Llama 3, Qwen, Codestral, Voxtral — chosen by task. Embeddings: intfloat/multilingual-e5-large, BAAI/bge-m3, nomic-embed-text-v2 — German-optimized. Vector DB: Qdrant (recommended), Weaviate, pgvector (if you already run Postgres). GPU hosting: my GPU servers in Germany (from ~€200/mo, with on-demand activation down to ~€30-50/mo) or your own data center. Alternative: AI-Workshop full packages with own hardware from €3,499.

Process

  1. Week 1 — clarification: walk through your processes together, find sensible use cases, set test criteria.
  2. Week 2 — first attempt: small dataset, compare 2-3 models, pick one.
  3. Weeks 3-6 — build: set up server, install models, connect to your data, interface, permissions, monitoring.
  4. Week 7 — tests + tuning: feedback loop, prompt tuning, optional fine-tuning on your domain.
  5. Week 8 — rollout: setup, training, handover or retainer.

Pricing

Fixed price recommended, based on a scope note. For unclear data situations: 2-week clarification at €2,900, then scope lock-in for the main project.

Hardware full packages (AI Workshop) separately: S €3,499 · M €8,999 · L €17,999, each with optional maintenance (€99/€199/€399 per month).

Includes

  • Consulting: walk through your processes together, find where AI actually saves time
  • Setup in-house or on your server (Mistral, Llama, Qwen, Codestral, Voxtral — matched to the task)
  • Integration with your data: documents, mail, databases, existing software
  • Web interface or API for your existing systems
  • Test set with your real use cases, so you can see how well the AI works in your situation
  • Setup documentation, training, handbook for model updates
  • Optional: Fine-tuning on your domain (with Unsloth)