Open position
Applied AI Engineer
About Us
We build software that makes governance, risk, and compliance (GRC) approachable: turning dense regulatory frameworks and internal policy into something organizations can work with. AI is central to how we do that: surfacing the right information, making sense of complex documents, and keeping outputs trustworthy in a domain where accuracy matters.
We're a small team, early in the journey. The core architecture is in place and proving itself, but there's plenty left to shape. You'll have real ownership over both what you build and how it evolves. We're based in Stockholm and work hybrid.
Role Overview
This role owns the AI layer end-to-end: how LLMs are prompted, orchestrated, and evaluated, and how documents are ingested, understood, and retrieved. The core architecture is in place and proving itself; there's real influence over where it goes next.
We're looking for a strong mid-level to senior engineer who has built LLM-powered systems that real users depend on (not just prototypes) and who wants meaningful ownership from early on. You'll work closely with the backend team. You don't need a background in GRC, but you need to be genuinely curious about the problem.
What You'll Work On
- Owning LLM prompt design, orchestration, and evaluation across our Azure AI Foundry deployments (Claude and GPT family): questioning what exists, improving what matters, proposing what's next
- Driving retrieval quality across the full RAG pipeline: chunking, embedding, indexing (Milvus / HNSW), and re-ranking, with eval sets that prove a change is an improvement, not just a variation
- Building the eval and safety layer: hallucination detection, citation faithfulness, contradiction detection, and regression suites in CI — in a regulated domain, evaluation is a first-class engineering concern
- Extending and hardening the Celery-based document processing pipeline (extraction → graph): improving reliability, observability, and cost efficiency
- Shipping to production via AKS and Terraform, instrumenting what you build, and staying accountable after launch
- Challenging the architecture and ways of working when you see a better approach, and making the case
What We're Looking For
- 5+ years of software engineering experience, with at least 2 years building LLM-powered systems in production
- Strong Python; comfort in a typed, tested codebase (we use mypy --strict, ruff, and pytest)
- Hands-on experience with RAG: vector databases (Milvus, pgvector, Pinecone, or similar), embedding model selection, chunking strategies, re-ranking
- A practical approach to evals: you can talk concretely about how you've measured retrieval quality or output reliability on a real system
- Experience with prompt engineering for structured output and multi-step pipelines, including handling failures and partial results
- Understanding of where LLMs fail and how to defend against it: prompt injection, hallucination, citation drift, model version changes
- Product instinct: you connect technical decisions to user outcomes
Helpful But Not Required
- Experience in a regulated domain (GRC, finance, legal, healthcare) where audit trails and source attribution matter
- Celery, Redis, or similar distributed task-queue experience
- Apache AGE or other graph database experience
- Microsoft Teams bot or Adaptive Card development
How to Apply
Send your resume and preferably a short cover letter detailing your relevant experience and what excites you about this role to careers@nooga.net.
Applied AI Engineer