Role Roadmap * 8 Stages * 4-8 Months to Job-Ready

Your path to becoming a GenAI Engineer

From Python basics to deploying production LLM systems -- covering RAG, agents, fine-tuning, and system design. Built around what top GenAI teams actually expect on day one.

8
Stages
~26h
Total Content
4-9
Months to Job-ready
Free
To Start
Your Progress
0%
0 of 8 stages complete
🤖
GenAI Engineer
Design, build, and ship production LLM systems -- from prompt to deployment
$140k
Avg US Salary
Explosive
Job Demand
4-9mo
Time to Job-ready
Python
Primary Language
Skills You'll Build
✓ Python ✓ OpenAI / Anthropic API ✓ LangChain / LlamaIndex ✓ Vector DBs ✓ RAG Pipelines + HuggingFace + FastAPI + Docker + AWS Bedrock + Weights & Biases LoRA / QLoRA vLLM Kubernetes Guardrails AI
Essential Strongly Recommended Nice to Have
Salary Range (US)
$95-120k
Junior
0-2 years
$125-155k
Mid-level
2-4 years
$160-220k+
Senior
4+ years
Which Roles Does This Roadmap Prepare You For?
See all 15 AI roles ->
Directly prepares you Strong overlap -- skills transfer Not covered here
🤖
Generative AI Engineer
✓ Primary target role
AI Engineer
✓ Strong fit — ~80% overlap
🧩
Prompt Engineer
↗ Stages 2–4 directly apply
🤖
AI Agent Engineer
↗ See Agentic AI Roadmap
🛡️
AI Safety Engineer
↗ Stage 7 (Eval & Safety) applies
🔬
ML / DL Engineer
→ See ML Engineer Roadmap
📦
MLOps Engineer
→ Not covered here
🔭
AI Research Scientist
→ Needs separate PhD-track path
8 Stages * ~26h Total
01
Foundations
Free Python · APIs · Cloud ~3h * 7 lessons
Solid foundations before you touch any LLM. You'll cover modern Python patterns, REST API design, async programming, and just enough cloud to start shipping. Skip if you're already comfortable with these.
Python 3.10+ features (dataclasses, typing, walrus)
REST API consumption with httpx / requests
Async Python -- asyncio, aiohttp
JSON, environment variables & secrets management
Cloud basics -- S3, Lambda, IAM roles
Docker fundamentals -- images, containers, Compose
Git workflows for AI projects
🛠️
Mini Project: Async API Aggregator
Build an async Python script that fetches data from 3 public APIs concurrently, transforms the JSON, and writes results to S3. Containerise it with Docker. Estimated: 4-6 hours.
Common interview question
You're building a GenAI app making 1M API calls per day and hitting rate limits. Walk through how you'd architect async batching, request queuing, and exponential backoff to stay within provider limits.
Practice this question →
Ready to practice? Test your Foundations knowledge in the Interview Simulator.
Start Mock Interview →
02
LLM Fundamentals
Free Tokenisation · Embeddings · Inference ~4h * 9 lessons
Understand what LLMs actually are -- not just how to call them. You'll learn how tokenisation works, what embeddings represent geometrically, how autoregressive inference happens, and the basics of fine-tuning. This mental model separates strong from average engineers.
Transformer architecture -- attention, keys, queries, values
Tokenisation -- BPE, SentencePiece, token counting
Embeddings -- semantic space, cosine similarity
Autoregressive inference & sampling strategies (temp, top-p, top-k)
Context window mechanics & KV cache
Pre-training vs. instruction tuning vs. RLHF
LoRA & QLoRA fine-tuning intuition
Model families -- GPT-4o, Claude, Gemini, LLaMA, Mistral
Calling APIs -- OpenAI, Anthropic, HuggingFace Inference
🛠️
Project: Token Counter & Embedding Explorer
Build a CLI tool that tokenises any text, shows token IDs, counts cost, and visualises embedding similarity between sentence pairs using OpenAI's text-embedding-3-small. Plot a UMAP cluster of 50 sentences. Estimated: 5-8 hours.
Common interview question
What is the difference between temperature and top-p sampling? When would you set temperature to 0 vs 0.7 in a production GenAI application, and why?
Practice this question →
Ready to practice? Test your LLM Fundamentals knowledge in the Interview Simulator.
Start Mock Interview →
03
Prompt Engineering
Free Few-shot · CoT · Structured Output ~3h * 8 lessons
Prompt engineering is not just "write clear instructions." It's a systematic discipline with measurable outputs. Learn the techniques used by production teams at Anthropic, OpenAI, and Google -- and how to test them rigorously.
System vs. user vs. assistant roles
Zero-shot, one-shot, few-shot prompting
Chain-of-thought (CoT) & step-back prompting
Structured output -- JSON mode, function calling
Prompt chaining & decomposition
Meta-prompting & self-critique loops
Prompt injection attacks & defence
Prompt versioning with LangSmith / PromptLayer
🛠️
Project: Structured Data Extractor
Build an extraction pipeline that takes unstructured job descriptions and outputs clean JSON (role, skills, salary, location) using function calling / JSON mode. Add a test harness that scores accuracy against 50 golden examples. Estimated: 6-10 hours.
Common interview question
Your RAG chatbot returns accurate facts but users say the answers feel unhelpful and robotic. How would you improve it using prompt engineering alone — without touching the retrieval system?
Practice this question →
Ready to practice? Test your Prompt Engineering knowledge in the Interview Simulator.
Start Mock Interview →
04
RAG Systems
Free Chunking · Retrieval · Vector DBs · Eval ~5h * 11 lessons
Retrieval-Augmented Generation is the most deployed GenAI pattern in production. Build RAG from scratch, understand every failure mode, and learn to evaluate pipelines rigorously. This stage alone can get you hired.
RAG architecture -- naive, advanced, modular
Document loaders -- PDF, HTML, Notion, Confluence
Chunking strategies -- fixed, recursive, semantic, RAPTOR
Embedding models -- choice, dimensions, speed
Vector databases -- Pinecone, Weaviate, Chroma, pgvector
Retrieval -- dense, sparse (BM25), hybrid
Re-ranking with cross-encoders (Cohere, FlashRank)
Query rewriting, HyDE, step-back
RAG evaluation -- RAGAS, faithfulness, relevance, answer correctness
Common failures -- hallucination, retrieval drift, context bleed
Metadata filtering & multi-index routing
🛠️
Project: PDF Q&A with RAG Evaluation
Build a production-quality PDF chatbot using LangChain + Chroma. Implement chunking experiments (fixed vs. semantic), hybrid retrieval, and re-ranking. Score your pipeline with RAGAS across 3 chunking strategies. Deploy as a FastAPI endpoint. Estimated: 10-15 hours.
Common interview question
Design a RAG system for a legal firm with 500,000 documents and strict accuracy requirements. Cover chunking strategy, retrieval approach, hybrid search, re-ranking, and how you would evaluate it.
Practice this question →
Ready to practice? Test your RAG Systems knowledge in the Interview Simulator.
Start Mock Interview →
05
Agents
Pro Tools · Planning · Memory · Workflows ~4h * 9 lessons
Agents are LLMs that can take actions in the world. Learn to design reliable agentic systems -- from simple tool use to multi-agent workflows with memory. The hardest part isn't making them work; it's making them work reliably.
ReAct pattern -- Reasoning + Acting loops
Tool / function calling -- design & schema
Agent frameworks -- LangGraph, AutoGen, CrewAI
Planning strategies -- linear, DAG, tree-of-thought
Memory systems -- episodic, semantic, procedural
Long-term memory with vector stores & graph DBs
Multi-agent orchestration & delegation
Human-in-the-loop checkpointing
Debugging agent failures -- tracing, replay
🛠️
Project: Research Agent with Memory
Build a research agent using LangGraph that searches the web, reads articles, deduplicates findings, and writes a structured report. Add episode memory so it recalls previous research sessions. Implement human-in-the-loop for approval before writing output. Estimated: 12-16 hours.
Common interview question
Your ReAct agent is looping — it calls the same tool repeatedly without making progress. How do you diagnose this in production and what safeguards would you add to prevent it?
Practice this question →
Ready to practice? Test your Agents knowledge in the Interview Simulator.
Start Mock Interview →
06
GenAI System Design
Pro Architecture · Latency · Cost · Scale ~3h * 7 lessons
Designing GenAI systems at scale requires different thinking from traditional software. Learn how to trade off latency, cost, and quality; architect for LLM fallbacks; and design systems that stay reliable when the model surprises you.
GenAI architecture patterns -- gateway, router, fallback
Latency optimisation -- streaming, caching, batching
Prompt caching & semantic caching (GPTCache)
Model routing -- cost vs. quality trade-off
LLM observability -- tokens, latency, cost dashboards
Guardrail layers -- input sanitisation, output validation
Multi-tenant LLM architecture & rate limit management
🛠️
Design Challenge: Multi-tenant LLM Gateway
Design (and partially implement) a multi-tenant LLM gateway that routes requests between GPT-4o and Claude based on cost budget, caches identical prompts semantically, and emits per-tenant cost dashboards. Write an architectural decision record (ADR). Estimated: 8-12 hours.
Common interview question
Design a production customer support chatbot handling 10,000 conversations per day. Cover the full stack: retrieval, model selection, caching, latency, monitoring, and human escalation.
Practice this question →
Ready to practice? Test your GenAI System Design knowledge in the Interview Simulator.
Start Mock Interview →
07
Deployment
Pro Serverless · GPUs · Inference APIs ~3h * 8 lessons
Shipping GenAI to production has unique constraints: model size, GPU availability, cold starts, and inference cost. Learn to deploy across managed APIs, serverless containers, and self-hosted GPU infrastructure.
Managed inference APIs -- OpenAI, Bedrock, Vertex AI
FastAPI + streaming responses (SSE)
Serverless containers -- AWS Lambda, Cloud Run
GPU inference -- Modal, RunPod, Replicate
Self-hosted models -- vLLM, Ollama, llama.cpp
CI/CD for GenAI -- GitHub Actions + model registry
Scaling with Kubernetes & horizontal pod autoscaling
Cost monitoring & budget alerts
🛠️
Project: Production LLM API with CI/CD
Deploy a FastAPI app that streams responses from Claude/GPT-4o, with a fallback to a self-hosted Mistral on Modal. Add GitHub Actions CI that runs integration tests and deploys on merge. Measure p50/p99 latency and track cost per request. Estimated: 10-14 hours.
Common interview question
Your LLM API costs tripled this month with no traffic increase. Walk through how you would diagnose the cause and reduce spend by 50 percent without degrading response quality.
Practice this question →
Ready to practice? Test your Deployment knowledge in the Interview Simulator.
Start Mock Interview →
08
Evaluation & Safety
Free Evals · Red-teaming · Guardrails ~3h * 8 lessons
You can't improve what you can't measure. Evaluation and safety are production requirements, not afterthoughts. Learn to build robust eval suites, run red-teaming exercises, and implement output guardrails that don't destroy UX.
LLM evaluation frameworks -- Evals (OpenAI), HELM, DeepEval
Reference-based vs. LLM-as-judge evaluation
Hallucination detection -- faithfulness, groundedness
Bias & toxicity measurement
Red-teaming techniques -- jailbreaks, prompt injection
Input guardrails -- intent classification, PII detection
Output guardrails -- Guardrails AI, NeMo Guardrails
Responsible AI frameworks -- EU AI Act, NIST RMF basics
🛠️
Capstone: GenAI Eval & Safety Suite
Build an automated evaluation suite for your RAG chatbot from Stage 4. Implement LLM-as-judge scoring, a red-team test set with 30 adversarial prompts, PII detection guardrail, and a Streamlit dashboard showing pass/fail trends over time. Estimated: 12-18 hours.
Common interview question
How do you evaluate a RAG pipeline systematically? What metrics matter, how do you build a golden test set, and how do you catch hallucinations before they reach users?
Practice this question →
Ready to practice? Test your Evaluation & Safety knowledge in the Interview Simulator.
Start Mock Interview →
🚀
Ready to Start?
Generate a personalised GenAI roadmap based on your current skills and target role. Takes 2 minutes.
Career Outcomes
Jobs this roadmap opens
Roles you’ll qualify for after completing all stages
🤖
GenAI Engineer
● Explosive Growth
$130k–$190k US · £70k–£120k UK
LLM Engineer
● Explosive Growth
$140k–$210k US · £75k–£130k UK
🌐
Full-Stack AI Engineer
● Very High Demand
$130k–$200k US · £70k–£120k UK
🌟
AI Product Engineer
● Very High Demand
$120k–$180k US · £65k–£110k UK
📈
AI Solutions Architect
● High Demand
$150k–$220k US · £80k–£140k UK
🔮
Applied AI Researcher
● High Demand
$140k–$200k US · £75k–£130k UK
Practice for these interviews now →
Career Planning
Ready to build your personalized AI career plan?
Start Skill Gap Analysis →
Want the full career picture first?
This roadmap covers the skills — our complete guide covers the job market, salary data, and 3 paths to get hired as a GenAI Engineer.
How to Become a GenAI Engineer →