Role Roadmap * 8 Stages * 4-8 Months to Job-Ready
Your path to becoming a GenAI Engineer
From Python basics to deploying production LLM systems -- covering RAG, agents, fine-tuning, and system design. Built around what top GenAI teams actually expect on day one.
GenAI Engineer
Design, build, and ship production LLM systems -- from prompt to deployment
$140k
Avg US Salary
Explosive
Job Demand
4-9mo
Time to Job-ready
Python
Primary Language
Skills You'll Build
✓ Python
✓ OpenAI / Anthropic API
✓ LangChain / LlamaIndex
✓ Vector DBs
✓ RAG Pipelines
+ HuggingFace
+ FastAPI
+ Docker
+ AWS Bedrock
+ Weights & Biases
LoRA / QLoRA
vLLM
Kubernetes
Guardrails AI
Essential
Strongly Recommended
Nice to Have
Salary Range (US)
$95-120k
Junior
0-2 years
$125-155k
Mid-level
2-4 years
$160-220k+
Senior
4+ years
Which Roles Does This Roadmap Prepare You For?
See all 15 AI roles ->
Directly prepares you
Strong overlap -- skills transfer
Not covered here
Generative AI Engineer
✓ Primary target role
AI Engineer
✓ Strong fit — ~80% overlap
Prompt Engineer
↗ Stages 2–4 directly apply
AI Agent Engineer
↗ See Agentic AI Roadmap
AI Safety Engineer
↗ Stage 7 (Eval & Safety) applies
ML / DL Engineer
→ See ML Engineer Roadmap
MLOps Engineer
→ Not covered here
AI Research Scientist
→ Needs separate PhD-track path
8 Stages * ~26h Total
01
Foundations
Solid foundations before you touch any LLM. You'll cover modern Python patterns, REST API design, async programming, and just enough cloud to start shipping. Skip if you're already comfortable with these.
Python 3.10+ features (dataclasses, typing, walrus)
REST API consumption with
httpx / requestsAsync Python --
asyncio, aiohttpJSON, environment variables & secrets management
Cloud basics -- S3, Lambda, IAM roles
Docker fundamentals -- images, containers, Compose
Git workflows for AI projects
Learning Resources
Mini Project: Async API Aggregator
Build an async Python script that fetches data from 3 public APIs concurrently, transforms the JSON, and writes results to S3. Containerise it with Docker. Estimated: 4-6 hours.
Common interview question
You're building a GenAI app making 1M API calls per day and hitting rate limits. Walk through how you'd architect async batching, request queuing, and exponential backoff to stay within provider limits.
Practice this question →
Ready to practice? Test your Foundations knowledge in the Interview Simulator.
Start Mock Interview →
02
LLM Fundamentals
Understand what LLMs actually are -- not just how to call them. You'll learn how tokenisation works, what embeddings represent geometrically, how autoregressive inference happens, and the basics of fine-tuning. This mental model separates strong from average engineers.
Transformer architecture -- attention, keys, queries, values
Tokenisation -- BPE, SentencePiece, token counting
Embeddings -- semantic space, cosine similarity
Autoregressive inference & sampling strategies (temp, top-p, top-k)
Context window mechanics & KV cache
Pre-training vs. instruction tuning vs. RLHF
LoRA & QLoRA fine-tuning intuition
Model families -- GPT-4o, Claude, Gemini, LLaMA, Mistral
Calling APIs -- OpenAI, Anthropic, HuggingFace Inference
Learning Resources
Project: Token Counter & Embedding Explorer
Build a CLI tool that tokenises any text, shows token IDs, counts cost, and visualises embedding similarity between sentence pairs using OpenAI's
text-embedding-3-small. Plot a UMAP cluster of 50 sentences. Estimated: 5-8 hours.Common interview question
What is the difference between temperature and top-p sampling? When would you set temperature to 0 vs 0.7 in a production GenAI application, and why?
Practice this question →
Ready to practice? Test your LLM Fundamentals knowledge in the Interview Simulator.
Start Mock Interview →
03
Prompt Engineering
Prompt engineering is not just "write clear instructions." It's a systematic discipline with measurable outputs. Learn the techniques used by production teams at Anthropic, OpenAI, and Google -- and how to test them rigorously.
System vs. user vs. assistant roles
Zero-shot, one-shot, few-shot prompting
Chain-of-thought (CoT) & step-back prompting
Structured output -- JSON mode, function calling
Prompt chaining & decomposition
Meta-prompting & self-critique loops
Prompt injection attacks & defence
Prompt versioning with LangSmith / PromptLayer
Learning Resources
Project: Structured Data Extractor
Build an extraction pipeline that takes unstructured job descriptions and outputs clean JSON (role, skills, salary, location) using function calling / JSON mode. Add a test harness that scores accuracy against 50 golden examples. Estimated: 6-10 hours.
Common interview question
Your RAG chatbot returns accurate facts but users say the answers feel unhelpful and robotic. How would you improve it using prompt engineering alone — without touching the retrieval system?
Practice this question →
Ready to practice? Test your Prompt Engineering knowledge in the Interview Simulator.
Start Mock Interview →
04
RAG Systems
Retrieval-Augmented Generation is the most deployed GenAI pattern in production. Build RAG from scratch, understand every failure mode, and learn to evaluate pipelines rigorously. This stage alone can get you hired.
RAG architecture -- naive, advanced, modular
Document loaders -- PDF, HTML, Notion, Confluence
Chunking strategies -- fixed, recursive, semantic, RAPTOR
Embedding models -- choice, dimensions, speed
Vector databases -- Pinecone, Weaviate, Chroma, pgvector
Retrieval -- dense, sparse (BM25), hybrid
Re-ranking with cross-encoders (Cohere, FlashRank)
Query rewriting, HyDE, step-back
RAG evaluation -- RAGAS, faithfulness, relevance, answer correctness
Common failures -- hallucination, retrieval drift, context bleed
Metadata filtering & multi-index routing
Learning Resources
Project: PDF Q&A with RAG Evaluation
Build a production-quality PDF chatbot using LangChain + Chroma. Implement chunking experiments (fixed vs. semantic), hybrid retrieval, and re-ranking. Score your pipeline with RAGAS across 3 chunking strategies. Deploy as a FastAPI endpoint. Estimated: 10-15 hours.
Common interview question
Design a RAG system for a legal firm with 500,000 documents and strict accuracy requirements. Cover chunking strategy, retrieval approach, hybrid search, re-ranking, and how you would evaluate it.
Practice this question →
Ready to practice? Test your RAG Systems knowledge in the Interview Simulator.
Start Mock Interview →
05
Agents
Agents are LLMs that can take actions in the world. Learn to design reliable agentic systems -- from simple tool use to multi-agent workflows with memory. The hardest part isn't making them work; it's making them work reliably.
ReAct pattern -- Reasoning + Acting loops
Tool / function calling -- design & schema
Agent frameworks -- LangGraph, AutoGen, CrewAI
Planning strategies -- linear, DAG, tree-of-thought
Memory systems -- episodic, semantic, procedural
Long-term memory with vector stores & graph DBs
Multi-agent orchestration & delegation
Human-in-the-loop checkpointing
Debugging agent failures -- tracing, replay
Learning Resources
Project: Research Agent with Memory
Build a research agent using LangGraph that searches the web, reads articles, deduplicates findings, and writes a structured report. Add episode memory so it recalls previous research sessions. Implement human-in-the-loop for approval before writing output. Estimated: 12-16 hours.
Common interview question
Your ReAct agent is looping — it calls the same tool repeatedly without making progress. How do you diagnose this in production and what safeguards would you add to prevent it?
Practice this question →
Ready to practice? Test your Agents knowledge in the Interview Simulator.
Start Mock Interview →
06
GenAI System Design
Designing GenAI systems at scale requires different thinking from traditional software. Learn how to trade off latency, cost, and quality; architect for LLM fallbacks; and design systems that stay reliable when the model surprises you.
GenAI architecture patterns -- gateway, router, fallback
Latency optimisation -- streaming, caching, batching
Prompt caching & semantic caching (GPTCache)
Model routing -- cost vs. quality trade-off
LLM observability -- tokens, latency, cost dashboards
Guardrail layers -- input sanitisation, output validation
Multi-tenant LLM architecture & rate limit management
Learning Resources
Design Challenge: Multi-tenant LLM Gateway
Design (and partially implement) a multi-tenant LLM gateway that routes requests between GPT-4o and Claude based on cost budget, caches identical prompts semantically, and emits per-tenant cost dashboards. Write an architectural decision record (ADR). Estimated: 8-12 hours.
Common interview question
Design a production customer support chatbot handling 10,000 conversations per day. Cover the full stack: retrieval, model selection, caching, latency, monitoring, and human escalation.
Practice this question →
Ready to practice? Test your GenAI System Design knowledge in the Interview Simulator.
Start Mock Interview →
07
Deployment
Shipping GenAI to production has unique constraints: model size, GPU availability, cold starts, and inference cost. Learn to deploy across managed APIs, serverless containers, and self-hosted GPU infrastructure.
Managed inference APIs -- OpenAI, Bedrock, Vertex AI
FastAPI + streaming responses (SSE)
Serverless containers -- AWS Lambda, Cloud Run
GPU inference -- Modal, RunPod, Replicate
Self-hosted models -- vLLM, Ollama, llama.cpp
CI/CD for GenAI -- GitHub Actions + model registry
Scaling with Kubernetes & horizontal pod autoscaling
Cost monitoring & budget alerts
Learning Resources
Project: Production LLM API with CI/CD
Deploy a FastAPI app that streams responses from Claude/GPT-4o, with a fallback to a self-hosted Mistral on Modal. Add GitHub Actions CI that runs integration tests and deploys on merge. Measure p50/p99 latency and track cost per request. Estimated: 10-14 hours.
Common interview question
Your LLM API costs tripled this month with no traffic increase. Walk through how you would diagnose the cause and reduce spend by 50 percent without degrading response quality.
Practice this question →
Ready to practice? Test your Deployment knowledge in the Interview Simulator.
Start Mock Interview →
08
Evaluation & Safety
You can't improve what you can't measure. Evaluation and safety are production requirements, not afterthoughts. Learn to build robust eval suites, run red-teaming exercises, and implement output guardrails that don't destroy UX.
LLM evaluation frameworks -- Evals (OpenAI), HELM, DeepEval
Reference-based vs. LLM-as-judge evaluation
Hallucination detection -- faithfulness, groundedness
Bias & toxicity measurement
Red-teaming techniques -- jailbreaks, prompt injection
Input guardrails -- intent classification, PII detection
Output guardrails -- Guardrails AI, NeMo Guardrails
Responsible AI frameworks -- EU AI Act, NIST RMF basics
Learning Resources
Capstone: GenAI Eval & Safety Suite
Build an automated evaluation suite for your RAG chatbot from Stage 4. Implement LLM-as-judge scoring, a red-team test set with 30 adversarial prompts, PII detection guardrail, and a Streamlit dashboard showing pass/fail trends over time. Estimated: 12-18 hours.
Common interview question
How do you evaluate a RAG pipeline systematically? What metrics matter, how do you build a golden test set, and how do you catch hallucinations before they reach users?
Practice this question →
Ready to practice? Test your Evaluation & Safety knowledge in the Interview Simulator.
Start Mock Interview →
🚀
Ready to Start?
Generate a personalised GenAI roadmap based on your current skills and target role. Takes 2 minutes.
Career Outcomes
Jobs this roadmap opens
Roles you’ll qualify for after completing all stages
GenAI Engineer
● Explosive Growth
$130k–$190k US · £70k–£120k UK
LLM Engineer
● Explosive Growth
$140k–$210k US · £75k–£130k UK
Full-Stack AI Engineer
● Very High Demand
$130k–$200k US · £70k–£120k UK
AI Product Engineer
● Very High Demand
$120k–$180k US · £65k–£110k UK
AI Solutions Architect
● High Demand
$150k–$220k US · £80k–£140k UK
Applied AI Researcher
● High Demand
$140k–$200k US · £75k–£130k UK
Career Planning
Ready to build your personalized AI career plan?
Want the full career picture first?
This roadmap covers the skills — our complete guide covers the job market, salary data, and 3 paths to get hired as a GenAI Engineer.
How to Become a GenAI Engineer →