● Coming Soon Live Training Batch Register Interest →
📅 1:1 Session Book a Session + Resume Review
₹2,999/$29 FREE 🎁 Opening Offer
Book Session →
Interview Prep * ML * Deep Learning * MLOps

Crack Your Next ML Interview

400+ questions that actually get asked at top AI/ML companies -- with model answers, follow-ups, and a self-score rubric. Practice like the role is already yours.

Beginner → Advanced
Machine Learning
Deep Learning
MLOps
Machine Learning Fundamentals
Core ML concepts, metrics, generalization * beginner -> intermediate
10+ questions
Q1 * Explain the bias-variance tradeoff.
Level: Beginner
Expected answer
Bias-variance tradeoff describes how model complexity affects generalization:
  • High bias -> underfitting (model too simple, misses patterns).
  • High variance -> overfitting (model too complex, memorizes noise).
  • Goal is to find a balance that minimizes total error on unseen data.
Regularization, model choice, and data size all influence this tradeoff.
Follow‑up questions
  • How does regularization affect bias and variance?
  • Give an example of a high‑bias model and a high‑variance model.
  • How would you detect overfitting in practice?
Evaluation rubric
Strong
Clearly defines bias and variance, explains under/overfitting, and connects to regularization.
OK
Mentions under/overfitting but not how to control the tradeoff.
Weak
Vague explanation; confuses bias with data bias or fairness only.
Q2 * What is regularization and why is it used?
Level: Beginner
Expected answer
Regularization adds a penalty term to the loss function to discourage overly complex models:
  • L2 (Ridge): penalizes squared weights, encourages small but non‑zero weights.
  • L1 (Lasso): penalizes absolute weights, encourages sparsity and feature selection.
  • Reduces overfitting by controlling model capacity.
Follow‑up questions
  • When would you prefer L1 over L2?
  • How does regularization interact with feature scaling?
  • What happens if the regularization strength is too high?
Evaluation rubric
Strong
Explains L1 vs L2, connects to overfitting and model complexity, mentions trade‑offs.
OK
Knows it "prevents overfitting" but not how or why different types matter.
Weak
Treats regularization as a generic "tuning trick" with no detail.
Q3 * What is cross‑validation and why is it useful?
Level: Beginner
Expected answer
Cross‑validation splits data into multiple folds to estimate how a model generalizes:
  • Train on k‑1 folds, validate on the remaining fold; repeat for all folds.
  • Reduces variance compared to a single train/validation split.
  • Stratified CV is important for classification with imbalanced classes.
Follow‑up questions
  • When would you avoid k‑fold CV (e.g., time series)?
  • How do you adapt CV for time‑dependent data?
  • How does CV interact with hyperparameter tuning?
Evaluation rubric
Strong
Describes k‑fold clearly, mentions stratification and limitations (e.g., time series).
OK
Knows the basic idea but not when to use different variants.
Weak
Confuses CV with simple train/test split.
Q4 * What is the curse of dimensionality?
Level: Intermediate
Expected answer
The curse of dimensionality refers to phenomena that arise in high‑dimensional spaces:
  • Distances between points become less meaningful.
  • Data becomes sparse; models need exponentially more data.
  • Many algorithms (KNN, clustering) degrade as dimensions grow.
Dimensionality reduction (PCA, feature selection) helps mitigate this.
Follow‑up questions
  • How does PCA help with high‑dimensional data?
  • When would you prefer feature selection over PCA?
  • How does the curse of dimensionality affect KNN?
Evaluation rubric
Strong
Explains sparsity, distance issues, and connects to specific algorithms and mitigation methods.
OK
Gives a high‑level definition without concrete implications or examples.
Weak
Very vague; no connection to model performance.
Q5 * Why can accuracy be misleading for imbalanced datasets?
Level: Intermediate
Expected answer
In imbalanced datasets, a model can achieve high accuracy by predicting the majority class only:
  • Accuracy ignores class distribution and costs of different errors.
  • Metrics like precision, recall, F1, and ROC/PR curves are more informative.
  • For rare events (fraud, disease), recall and precision are critical.
Follow‑up questions
  • When would you optimize for recall over precision?
  • How do you choose a decision threshold?
  • How do ROC and PR curves differ in interpretation?
Evaluation rubric
Strong
Explains majority‑class issue, suggests better metrics, and ties to real‑world examples.
OK
Knows accuracy is "not good" but doesn't articulate why or what to use instead.
Weak
Treats accuracy as always sufficient.
Deep Learning
Neural networks, training dynamics, architectures * intermediate -> advanced
8+ questions
Q1 * Explain backpropagation in neural networks.
Level: Intermediate
Expected answer
Backpropagation computes gradients of the loss with respect to weights using the chain rule:
  • Forward pass: compute outputs and loss.
  • Backward pass: propagate gradients layer by layer.
  • Optimizer (SGD, Adam) updates weights using these gradients.
It enables efficient training of deep networks by reusing intermediate activations.
Follow‑up questions
  • Why do we need non‑linear activation functions?
  • What happens if activations saturate?
  • How does batch size affect gradient estimates?
Evaluation rubric
Strong
Describes forward/backward passes, gradients, and optimizer roles clearly.
OK
High‑level idea only; lacks detail on chain rule or gradient flow.
Weak
Confuses backprop with generic "feedback" or trial‑and‑error.
Q2 * What causes vanishing and exploding gradients?
Level: Intermediate
Expected answer
In deep networks, repeated multiplication of gradients through layers can:
  • Shrink towards zero (vanishing) with certain activations (sigmoid, tanh).
  • Grow very large (exploding) with poor initialization or deep stacks.
Mitigations include ReLU‑like activations, careful initialization, residual connections, and normalization.
Follow‑up questions
  • How do residual connections help?
  • Why are LSTMs more robust than vanilla RNNs?
  • What role does gradient clipping play?
Evaluation rubric
Strong
Explains the math intuition and lists multiple mitigation strategies with examples.
OK
Knows it "happens in deep networks" but not why or how to fix it properly.
Weak
No clear understanding of gradient behavior in deep nets.
Q3 * Compare CNNs, RNNs, and Transformers.
Level: Intermediate
Expected answer
  • CNNs: local receptive fields, weight sharing; great for images and spatial data.
  • RNNs: sequential processing with hidden state; good for sequences but hard to parallelize.
  • Transformers: self‑attention, fully parallel, capture long‑range dependencies efficiently.
Transformers have largely replaced RNNs in NLP due to better scaling and performance.
Follow‑up questions
  • Why is self‑attention more flexible than fixed convolution kernels?
  • When might you still use CNNs today?
  • How do Transformers handle very long sequences?
Evaluation rubric
Strong
Clear comparison with use cases and trade‑offs; mentions parallelism and long‑range context.
OK
Knows basic differences but not why Transformers dominate modern NLP/CV tasks.
Weak
Confuses architectures or gives very shallow distinctions.
Q4 * What is batch normalization and why is it used?
Level: Intermediate
Expected answer
Batch normalization normalizes activations within a mini‑batch:
  • Reduces internal covariate shift.
  • Stabilizes and speeds up training.
  • Allows higher learning rates and can have a regularization effect.
In Transformers, LayerNorm is more common due to different architecture patterns.
Follow‑up questions
  • Why is LayerNorm preferred in Transformers?
  • What issues arise with very small batch sizes?
  • How does batch norm interact with dropout?
Evaluation rubric
Strong
Explains normalization, training stability, and mentions alternatives like LayerNorm/GroupNorm.
OK
Knows it "helps training" but not the mechanics or trade‑offs.
Weak
No clear understanding of normalization layers.
MLOps
Production ML, pipelines, monitoring, CI/CD * intermediate -> advanced
10+ questions
Q1 * What is MLOps and how is it different from traditional ML?
Level: Intermediate
Expected answer
MLOps focuses on operationalizing ML models:
  • End‑to‑end lifecycle: data, training, deployment, monitoring, retraining.
  • Emphasizes reliability, reproducibility, automation, and collaboration.
  • Bridges ML with DevOps practices (CI/CD, infra as code, observability).
Traditional ML often stops at model training and offline evaluation.
Follow‑up questions
  • What are the main challenges when moving from notebook to production?
  • How do you structure teams around MLOps?
  • What tools have you used for MLOps?
Evaluation rubric
Strong
Clear distinction between experimentation and production; mentions lifecycle, automation, and tooling.
OK
High‑level definition without concrete practices or examples.
Weak
Treats MLOps as just "deploying models with Docker".
Q2 * What is a feature store and why is it important?
Level: Intermediate
Expected answer
A feature store is a centralized system for managing ML features:
  • Stores feature definitions, values, and metadata.
  • Serves features consistently for training and online inference.
  • Helps prevent training/serving skew and duplicate feature logic.
Follow‑up questions
  • How does a feature store integrate with batch and streaming data?
  • What problems arise without a feature store?
  • Have you used any feature store tools (Feast, Tecton, etc.)?
Evaluation rubric
Strong
Explains consistency, reuse, and skew prevention with concrete examples of usage.
OK
Knows it "stores features" but not why it matters for production ML.
Weak
No clear understanding of feature management challenges.
Q3 * Explain data drift vs model drift. How do you detect them?
Level: Intermediate
Expected answer
  • Data drift: input distribution changes over time (e.g., new user behavior).
  • Model drift: model performance degrades, even if inputs look similar.
  • Detection: monitor feature distributions, PSI, performance metrics, and business KPIs.
Retraining, recalibration, or model replacement may be needed depending on the cause.
Follow‑up questions
  • How would you set thresholds for drift alerts?
  • What's your strategy for safe retraining?
  • How do you handle concept drift in streaming systems?
Evaluation rubric
Strong
Distinguishes data vs model drift clearly and proposes concrete monitoring strategies.
OK
Knows drift exists but not how to detect or respond systematically.
Weak
No clear concept of drift or its impact on production systems.
Q4 * How would you design CI/CD for ML models?
Level: Advanced
Expected answer
CI/CD for ML extends software CI/CD with ML‑specific steps:
  • CI: unit tests, data validation, training pipeline tests, reproducibility checks.
  • CD: automated deployment to staging, canary or shadow deployments, rollback strategies.
  • Model registry, versioning, and approval workflows for promotion to production.
Follow‑up questions
  • What tests are unique to ML compared to standard software?
  • How do you handle model rollbacks?
  • How would you integrate data quality checks into the pipeline?
Evaluation rubric
Strong
Describes a full pipeline with tests, staging, canary/shadow, registry, and monitoring hooks.
OK
Talks about "deploying models with CI/CD" but lacks ML‑specific steps or safeguards.
Weak
No understanding of how CI/CD changes for ML workloads.
ML Coding Tasks (Python)
Hands‑on ML and DL exercises * intermediate -> advanced
6+ tasks
Q1 * Implement logistic regression from scratch.
Level: Intermediate
Expected answer
Candidate should outline:
  • Sigmoid function for probabilities.
  • Binary cross‑entropy loss.
  • Gradient descent or mini‑batch gradient descent updates.
  • Convergence criteria and evaluation on a validation set.
Exact syntax is less important than a correct mathematical and implementation flow.
Follow‑up questions
  • How would you add L2 regularization?
  • How do you handle numerical stability in the sigmoid?
  • How would you extend this to multi‑class classification?
Evaluation rubric
Strong
Correct loss, gradients, and update loop; mentions regularization and stability concerns.
OK
Understands high‑level idea but struggles with gradient derivation or implementation details.
Weak
Cannot outline a working training loop or loss function.
Q2 * Write a function to compute F1 score given y_true and y_pred.
Level: Intermediate
Expected answer
Candidate should:
  • Compute TP, FP, FN from predictions and labels.
  • Compute precision = TP / (TP + FP), recall = TP / (TP + FN).
  • Compute F1 = 2 * precision * recall / (precision + recall).
  • Handle edge cases (division by zero).
Follow‑up questions
  • How would you extend this to multi‑class (macro vs micro F1)?
  • When is F1 more useful than accuracy?
  • How would you choose a threshold to maximize F1?
Evaluation rubric
Strong
Correct implementation and understanding of precision/recall trade‑offs and edge cases.
OK
Knows formula but may miss edge cases or multi‑class extensions.
Weak
Confuses F1 with accuracy or other metrics.