🌿 Machine Learning Track

ML Track Beginner ⏱ 45 min

Build a Fraud Detection Classifier

Build a transaction fraud detector like the fraud engines major banks run in production. Train classifiers on real financial transaction data, handle severe class imbalance, and evaluate with precision-recall metrics that matter in banking.

scikit-learn pandas matplotlib Google Colab

Open Lab 1 in Google Colab

Opens a blank notebook — copy the code from the Code Snippets tab to practice

Learning Outcomes

Generate and explore a synthetic bank-style payment transaction dataset
Handle extreme class imbalance (fraud is <0.1% of transactions)
Train a production-style Random Forest fraud classifier
Interpret precision, recall, F1 and why accuracy is misleading for fraud
Tune the decision threshold to minimise false negatives (missed fraud)
Save and reload the trained fraud model with joblib

Prerequisites

PythonBasic syntax, loops, functions

Google AccountFor Colab access

MathHigh-school statistics

No API key needed

This lab uses only open-source libraries. No external API keys required.

1Title & Introduction

Markdown cell explaining what a classifier is, the Iris / Titanic / Breast Cancer dataset used, and what the learner will build by the end of the notebook.

2Prerequisites & Setup

Pip installs for scikit-learn, pandas, matplotlib, seaborn. Import block. Runtime check (CPU is sufficient).

3Load & Explore Data

Load dataset via sklearn.datasets. Convert to DataFrame. df.head(), df.describe(), class distribution bar chart.

4Preprocess & Split

Handle missing values. Define features X and target y. train_test_split with stratify. Discuss why test data must stay unseen.

5Train Classifiers

Train Decision Tree, then Random Forest. Print training vs validation accuracy. Explain overfitting via the gap.

6Evaluate & Visualise

Classification report. Confusion matrix heatmap. Feature importance bar chart. ROC curve for binary datasets.

7Save the Model

Joblib dump/load. Predict on one new sample. Discuss model versioning basics.

8TODO, Challenge & Wrap-up

Student tasks + stretch challenges + summary of what was learned.

Setup & Data Loading

# ── SETUP ──────────────────────────────────────────
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, precision_recall_curve
import joblib

# ════════════════════════════════════════════════════════════════
#  STEP 1 — SHARED BANK DATA GENERATOR  (run this cell first)
#  Labs 1, 2 and 3 all use the same 10,000 synthetic banking
#  customers so your work is consistent across the ML track.
#  seed=42 → same results every run, no downloads needed.
# ════════════════════════════════════════════════════════════════
np.random.seed(42)
N = 10_000

customer_id    = [f"CUS_{i:05d}" for i in range(N)]
age            = np.random.randint(22, 72, N)
annual_income  = np.random.lognormal(11, 0.55, N).clip(18_000, 480_000)
years_employed = np.clip(np.random.exponential(6, N), 0, 40)
dti_ratio      = np.random.uniform(0.05, 0.65, N)  # customer debt-to-income burden
num_accounts   = np.random.randint(1, 9, N)
total_debt     = annual_income * dti_ratio * np.random.uniform(3, 8, N)

# Credit score: income + tenure + low DTI → higher score (spans 300–850)
income_pct  = np.log1p(annual_income - 18_000) / np.log1p(462_000)
employ_pct  = np.minimum(years_employed / 25, 1.0)
credit_score = (
    300
    + income_pct  * 180
    + employ_pct  * 120
    + (1 - np.minimum(dti_ratio / 0.65, 1.0)) * 200
    + np.random.normal(0, 25, N)
).clip(300, 850).astype(int)

customers = pd.DataFrame({
    'customer_id':    customer_id,
    'age':            age,
    'annual_income':  annual_income.round(2),
    'years_employed': years_employed.round(1),
    'dti_ratio':      dti_ratio.round(4),
    'num_accounts':   num_accounts,
    'total_debt':     total_debt.round(2),
    'credit_score':   credit_score,
})
print(f"OK {len(customers):,} customers | income ${customers.annual_income.median():,.0f} median | score {customers.credit_score.mean():.0f} avg (range {customers.credit_score.min()}-{customers.credit_score.max()})")

# ── STEP 2 — LAB 1 VIEW: derive payment transactions from shared customers ──
# Each customer makes 1-5 transactions; low credit score skews toward fraud patterns
tx_rows, tx_types_pool = [], ['CASH_OUT','PAYMENT','CASH_IN','TRANSFER','DEBIT']
for _, cust in customers.iterrows():
    for _ in range(np.random.randint(1, 6)):
        tx_type  = np.random.choice(tx_types_pool, p=[0.35,0.34,0.22,0.06,0.03])
        amount   = float(np.random.lognormal(5.5, 1.8))
        old_orig = cust['annual_income'] / 12 * np.random.uniform(0.5, 3)
        new_orig = max(0, old_orig - amount)
        old_dest = float(np.random.lognormal(6, 2))
        risk_score = ((1 - cust['credit_score'] / 850) * 0.5
                      + (0.3 if tx_type in ['TRANSFER','CASH_OUT'] else 0)
                      + (0.2 if new_orig < old_orig * 0.05 else 0))
        fraud    = int(np.random.rand() < risk_score * 0.04)  # ~1.3% fraud rate
        tx_rows.append({'customer_id': cust['customer_id'], 'type': tx_type,
                        'amount': round(amount,2), 'oldbalanceOrg': round(old_orig,2),
                        'newbalanceOrig': round(new_orig,2), 'oldbalanceDest': round(old_dest,2),
                        'newbalanceDest': round(old_dest+amount,2), 'isFraud': fraud})

df = pd.DataFrame(tx_rows)
print(f"Transactions: {len(df):,} | Fraud rate: {df['isFraud'].mean():.4%}")
df['type'].value_counts().plot(kind='bar', title='Transaction Types — retail-bank style (same customers as Labs 2 & 3)')
plt.show()

Train / Test Split

# Features used in real fraud scoring systems
features = ['amount', 'oldbalanceOrg', 'newbalanceOrig',
            'oldbalanceDest', 'newbalanceDest', 'step']

# Encode transaction type
df['type_encoded'] = df['type'].map({'CASH_OUT':0,'PAYMENT':1,'CASH_IN':2,'TRANSFER':3,'DEBIT':4})
features.append('type_encoded')

X = df[features]; y = df['isFraud']

# Stratify to keep fraud ratio in both splits
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y)

print(f"Train fraud cases: {y_train.sum()} / {len(y_train):,}")

Train & Evaluate Fraud Models

# Baseline — predict "not fraud" always (deceptive 99.9% accuracy)
baseline_acc = 1 - y_test.mean()
print(f"Naive baseline accuracy: {baseline_acc:.4%}")

# Random Forest with class_weight to penalise missed fraud
rf = RandomForestClassifier(
    n_estimators=100,
    class_weight='balanced',   # critical for imbalanced data
    random_state=42, n_jobs=-1
)
rf.fit(X_train, y_train)

preds = rf.predict(X_test)
print(classification_report(y_test, preds,
      target_names=['Legit', 'Fraud']))

Threshold Tuning + Feature Importance

# Default threshold (0.5) vs lower threshold (catches more fraud)
probs = rf.predict_proba(X_test)[:, 1]
for threshold in [0.5, 0.3, 0.2]:
    preds_t = (probs >= threshold).astype(int)
    cm = confusion_matrix(y_test, preds_t)
    fn = cm[1,0]  # false negatives = missed fraud
    print(f"Threshold {threshold} → Missed fraud: {fn} | False alerts: {cm[0,1]}")

# What signals matter most to the fraud model?
feat_imp = pd.Series(rf.feature_importances_, index=features)
feat_imp.sort_values().plot(kind='barh', title='Fraud Signal Importance (bank-style)')
plt.tight_layout(); plt.show()

Save & Reload Model

joblib.dump(rf, 'fraud_detector.joblib')

# Score a new transaction (like a bank's real-time fraud API)
loaded_model = joblib.load('fraud_detector.joblib')
new_txn = pd.DataFrame([{
    'amount': 9999.99, 'oldbalanceOrg': 10000, 'newbalanceOrig': 0,
    'oldbalanceDest': 0, 'newbalanceDest': 9999.99,
    'step': 1, 'type_encoded': 3  # TRANSFER
}])
prob = loaded_model.predict_proba(new_txn)[0, 1]
print(f"Fraud probability: {prob:.2%} → {'🚨 FLAG FOR REVIEW' if prob > 0.3 else '✅ APPROVE'}")

TODO Tasks (Complete before moving on)

Filter the dataset to only TRANSFER and CASH_OUT transactions (where fraud occurs). How does recall change?
Try thresholds of 0.1, 0.2, 0.3, 0.4, 0.5. Plot precision vs recall at each. Which threshold would a bank use to minimise chargebacks?
Add two engineered features: balance_diff_orig and balance_diff_dest. Does fraud F1 improve?
Use cross_val_score with scoring='f1' (5 folds). Why is accuracy the wrong metric here?

Challenge Section

Apply SMOTE (imblearn.over_sampling) to oversample fraud cases. Compare F1 vs class_weight='balanced'.
Train an XGBoost classifier (xgboost). XGBoost is widely used by major banks for production fraud models — compare its recall to Random Forest.
Build a precision-recall AUC comparison chart across Logistic Regression, Random Forest, and XGBoost.
Advanced: Simulate a real-time scoring API — wrap the model in a function that takes a transaction dict, returns risk score + recommended action (approve/review/decline).

Wrap-up

You've completed the full supervised classification loop: load data, split correctly, train multiple models, evaluate with proper metrics, and persist the best model. The skills here — stratified splits, classification reports, feature importance — appear in every ML job interview. Next lab: Linear Regression for continuous targets.

Saved in your browser — persists between visits

🗺️ This lab maps to Step 3 — Model Training Fundamentals on the ML Engineer Roadmap.

ML Track Beginner ⏱ 50 min

Build a Loan Default Risk Predictor

Predict the probability a borrower defaults on a loan — the core model behind retail-bank credit risk decisioning. Master logistic regression, credit feature engineering, and the metrics lenders actually use: KS statistic and Gini coefficient.

scikit-learn pandas seaborn matplotlib

Open Lab 2 in Google Colab

Opens a blank notebook — copy the code from the Code Snippets tab to practice

Learning Outcomes

Generate and explore a synthetic bank-style loan application dataset
Engineer credit features: debt-to-income, utilisation rate, payment history score
Train a Logistic Regression default predictor like a real underwriting model
Evaluate with ROC-AUC, KS statistic, and Gini coefficient (industry standard)
Build a scorecard: convert model output to a 300-850 credit score range
Interpret feature coefficients — what drives default risk?

Prerequisites

Lab 1Completed or equivalent

MathBasic algebra, mean/variance

Dataset

Synthetic bank-style loan application data generated with NumPy. 8,000 applicants with income, credit amount, annuity, employment history, and credit bureau scores — no download or Kaggle login required.

1Introduction — Credit Risk in Banking

How major banks decide who gets a loan. The cost of false negatives (approving bad loans) vs false positives (rejecting good customers). Regulatory context: Fair Lending, ECOA.

2Setup & Imports

Pip installs, imports, seed for reproducibility.

3Generate & Explore Loan Data

Generate 8,000 synthetic bank-style loan applications with NumPy. Default rate calculation. Distribution of loan amounts, income, and credit bureau scores. Correlation heatmap — no CSV download needed.

4Credit Feature Engineering

Build industry features: debt_to_income, credit_utilisation, payment_history_score, days_employed_pct. Correlation heatmap with default target.

5Train Logistic Regression

Scale features. Fit LogisticRegression. Interpret coefficients as risk multipliers. Compare against a Random Forest benchmark.

6Lender Metrics: AUC, KS, Gini

ROC curve and AUC. KS statistic — how well model separates defaulters from non-defaulters. Gini coefficient (2×AUC−1). Industry benchmark: AUC > 0.70 for production.

7Build a Credit Scorecard

Convert model probability to a 300–850 score using log-odds scaling. Assign letter grades (A–F). Show how banks map scores to interest rate bands.

8TODO, Challenge & Wrap-up

Student exercises, stretch goals, and key takeaways.

Load & Explore

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score, roc_curve
import numpy as np

# ── LAB 2 VIEW: derive loan applications from shared customers ──
# Run the shared generator cell first (from Lab 1), then continue here.
# Same 10k customers — each now applies for a loan.
N = len(customers)
credit_amt   = customers['annual_income'].values * np.random.uniform(1.5, 5.5, N)
annuity      = credit_amt / np.random.uniform(36, 120, N)
goods_price  = credit_amt * np.random.uniform(0.85, 1.05, N)
fam_members  = np.random.choice([1,2,3,4,5], N, p=[0.2,0.35,0.25,0.15,0.05])
# EXT_SOURCE scores derived from the same credit_score — consistent across labs
ext_source_1 = (customers['credit_score'].values / 850 * np.random.uniform(0.8, 1.1, N)).clip(0, 1)
ext_source_2 = (customers['credit_score'].values / 850 * np.random.uniform(0.7, 1.05, N)).clip(0, 1)
ext_source_3 = (customers['credit_score'].values / 850 * np.random.uniform(0.75, 1.0, N)).clip(0, 1)
# Logistic default probability driven by credit score + DTI (same signal as Lab 3)
cs2      = customers['credit_score'].values.astype(float)
dti2     = customers['dti_ratio'].values
logit2   = ((cs2 - 560) / 120) * (-3.5) + ((dti2 - 0.35) / 0.20) * 2.0 - 3.0
TARGET   = (np.random.rand(N) < 1 / (1 + np.exp(-logit2))).astype(int)

df = customers[['customer_id']].copy()
df['AMT_INCOME_TOTAL']  = customers['annual_income'].values
df['AMT_CREDIT']        = credit_amt.round(2)
df['AMT_ANNUITY']       = annuity.round(2)
df['AMT_GOODS_PRICE']   = goods_price.round(2)
df['CNT_FAM_MEMBERS']   = fam_members
df['DAYS_EMPLOYED']     = (-customers['years_employed'].values * 365).astype(int)
df['EXT_SOURCE_1']      = ext_source_1.round(4)
df['EXT_SOURCE_2']      = ext_source_2.round(4)
df['EXT_SOURCE_3']      = ext_source_3.round(4)
df['TARGET']            = TARGET

print(f"Applications: {len(df):,} | Default rate: {TARGET.mean():.2%} | Same customers as Lab 1")
df['AMT_INCOME_TOTAL'].hist(bins=50, log=True)
plt.title('Applicant Income Distribution — retail-bank style (same customers as Lab 1)'); plt.show()

Feature Engineering

# ── CREDIT FEATURE ENGINEERING (industry standard) ──
df['debt_to_income']      = df['AMT_ANNUITY'] / df['AMT_INCOME_TOTAL'].clip(lower=1)
df['credit_utilisation']  = df['AMT_CREDIT']  / df['AMT_GOODS_PRICE'].clip(lower=1)
df['income_per_person']   = df['AMT_INCOME_TOTAL'] / df['CNT_FAM_MEMBERS'].clip(lower=1)
df['employed_years']      = (-df['DAYS_EMPLOYED'].clip(upper=0)) / 365

# Correlation with default
eng_features = ['debt_to_income','credit_utilisation','income_per_person','employed_years']
corr = df[eng_features + ['TARGET']].corr()
sns.heatmap(corr[['TARGET']].sort_values('TARGET'), annot=True, cmap='coolwarm')
plt.title('Credit Feature Correlation with Default'); plt.show()

Train & Metrics

features = ['debt_to_income', 'credit_utilisation', 'income_per_person',
            'employed_years', 'EXT_SOURCE_1', 'EXT_SOURCE_2', 'EXT_SOURCE_3']

X = df[features].fillna(0); y = df['TARGET']
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y)

scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s  = scaler.transform(X_test)

# Logistic Regression — interpretable, used in regulated banking (ECOA compliance)
model = LogisticRegression(max_iter=1000, class_weight='balanced')
model.fit(X_train_s, y_train)

probs = model.predict_proba(X_test_s)[:, 1]
auc = roc_auc_score(y_test, probs)
gini = 2 * auc - 1
print(f"ROC-AUC: {auc:.4f} | Gini: {gini:.4f}")
print("Industry production benchmark: AUC > 0.72")

Residual Plot

# Credit Scorecard: convert probability → 300-850 score
# Standard formula used in FICO-style scorecards
PDO = 20   # Points to Double the Odds
BASE_SCORE = 600
BASE_ODDS  = 50   # 50:1 good:bad at base score
factor = PDO / np.log(2)
offset = BASE_SCORE - factor * np.log(BASE_ODDS)

log_odds = np.log((1 - probs) / probs.clip(1e-9))
scores   = (offset + factor * log_odds).clip(300, 850)

pd.cut(scores, bins=[300,579,669,739,799,850],
       labels=['Poor','Fair','Good','Very Good','Exceptional']).value_counts().plot(kind='bar')
plt.title('Credit Score Distribution (Bank Scorecard)')
plt.show()

TODO Tasks

Add AMT_CREDIT / AMT_INCOME_TOTAL as a loan-to-income feature. Does AUC improve?
Plot the ROC curve. Mark the operating point where TPR ≥ 0.70 — a common industry minimum recall for defaults.
Compare Logistic Regression vs Random Forest AUC. Which is better? Which is more explainable for regulators?
Calculate the KS statistic manually: max separation between cumulative default and non-default distributions.

Challenge Section

Build a full scorecard table: feature, coefficient, odds contribution, score points. Format it like a bank model validation report.
Use SHAP (shap library) to explain why a specific applicant was declined. This is the technique banks use for adverse action notices.
Advanced: Test for disparate impact — does the model approve/decline at different rates across gender or race proxy features? Calculate the 4/5ths rule threshold.

Wrap-up

You've built a full regression pipeline: EDA → feature engineering → scaling → training → evaluation → diagnosis. Residual analysis is a skill most junior engineers skip — you didn't. Next: combine everything into a reusable Pipeline with hyperparameter tuning.

Saved in your browser — persists between visits

🗺️ This lab maps to Step 3 — Regression & Feature Engineering on the ML Engineer Roadmap.

ML Track Intermediate ⏱ 60 min

Build a Credit Scoring Pipeline

Build the automated credit scoring pipeline used in real bank underwriting workflows. Package mixed feature preprocessing, model training, and hyperparameter tuning into a single deployable artifact — the exact MLOps pattern banks use in production.

scikit-learn Pipeline GridSearchCV ColumnTransformer joblib

Open Lab 3 in Google Colab

Opens a blank notebook — copy the code from the Code Snippets tab to practice

Learning Outcomes

Build an end-to-end credit scoring Pipeline for bank-style underwriting
Handle mixed applicant data: numeric income/debt + categorical employment/loan type
Run GridSearchCV to tune the scoring model across 5-fold CV
Prevent data leakage — a critical compliance issue in bank ML systems
Serialize the full pipeline as one artifact for model registry deployment
Score new loan applications with a single pipeline.predict_proba() call

Prerequisites

Lab 1 & 2Completed

PythonIntermediate (classes, dicts)

Why Pipelines in Banking?

Regulatory model validation requires reproducible, auditable preprocessing. Pipelines ensure identical transformations at training and scoring — preventing data leakage that can trigger fair lending violations.

1Bank Underwriting Context

How a bank credit decisioning pipeline works end-to-end. Why data leakage is a regulatory violation. The model validation process banks follow (SR 11-7 guidance).

2Setup & Generate Loan Data

Imports. Generate synthetic bank-style loan applications with NumPy — income, credit amount, annuity, employment history, credit bureau scores, contract type, and housing type. No Kaggle download required.

3ColumnTransformer for Credit Data

Numeric branch: SimpleImputer(median) → StandardScaler for income/debt features. Categorical branch: SimpleImputer → OrdinalEncoder for employment type, loan purpose, housing. Combine into one transformer.

4Build the Scoring Pipeline

Attach GradientBoostingClassifier to ColumnTransformer inside Pipeline. Fit entire pipeline in one line. Score an application with raw data — no manual preprocessing.

5GridSearchCV — Tune the Scorer

Param grid over classifier__ prefix. Optimise for ROC-AUC (bank standard). 5-fold stratified CV. Best params + holdout AUC. Log results for model validation report.

6Model Validation Report

Format CV results as a validation table (required by SR 11-7). Stability index (PSI) across time splits. Gini coefficient across demographic segments for fair lending check.

7Deploy to Model Registry

joblib.dump the best pipeline. Simulate a model registry: version tag, metadata JSON, champion/challenger framework. Score new applications in batch and real-time modes.

8TODO, Challenge & Wrap-up

Tasks + advanced challenge.

ColumnTransformer + Pipeline

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OrdinalEncoder
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import roc_auc_score

# ── LAB 3 VIEW: derive underwriting pipeline input from shared customers ──
# Run the shared generator cell first (from Lab 1), then continue here.
# Same 10k customers — now structured for the sklearn Pipeline.
N = len(customers)
credit   = customers['annual_income'].values * np.random.uniform(2, 5, N)
annuity  = credit / np.random.uniform(36, 120, N)
ext2     = (customers['credit_score'].values / 850 * np.random.uniform(0.8, 1.1, N)).clip(0, 1)
ext3     = (customers['credit_score'].values / 850 * np.random.uniform(0.75, 1.0, N)).clip(0, 1)
contract = np.random.choice(['Cash loans','Revolving loans'], N, p=[0.9,0.1])
gender   = np.random.choice(['M','F'], N, p=[0.44,0.56])
income_type = np.random.choice(['Working','Commercial','Pensioner','State servant'], N, p=[0.52,0.23,0.18,0.07])
housing  = np.random.choice(['House / apartment','Rented','With parents'], N, p=[0.72,0.16,0.12])
# Logistic default probability — same signal as Lab 2 for consistency
cs3    = customers['credit_score'].values.astype(float)
dti3   = customers['dti_ratio'].values
logit3 = ((cs3 - 560) / 120) * (-3.5) + ((dti3 - 0.35) / 0.20) * 2.0 - 3.0
target = (np.random.rand(N) < 1 / (1 + np.exp(-logit3))).astype(int)

df = pd.DataFrame({
    'AMT_INCOME_TOTAL': customers['annual_income'].values,
    'AMT_CREDIT': credit.round(2),        'AMT_ANNUITY': annuity.round(2),
    'DAYS_BIRTH': (-customers['age'].values * 365),
    'DAYS_EMPLOYED': (-customers['years_employed'].values * 365).astype(int),
    'CREDIT_SCORE': cs3,                   'DTI_RATIO': dti3,
    'EXT_SOURCE_2': ext2.round(4),         'EXT_SOURCE_3': ext3.round(4),
    'NAME_CONTRACT_TYPE': contract,         'CODE_GENDER': gender,
    'NAME_INCOME_TYPE': income_type,        'NAME_HOUSING_TYPE': housing,
    'TARGET': target
})
X = df.drop(columns=['TARGET']); y = df['TARGET']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)
print(f"Underwriting pipeline data: {len(df):,} | Default rate: {target.mean():.2%} | Same customers as Labs 1 & 2")

# Underwriting feature split
numeric_features = ['AMT_INCOME_TOTAL','AMT_CREDIT','AMT_ANNUITY',
                    'DAYS_BIRTH','DAYS_EMPLOYED','CREDIT_SCORE','DTI_RATIO',
                    'EXT_SOURCE_2','EXT_SOURCE_3']
cat_features     = ['NAME_CONTRACT_TYPE','CODE_GENDER',
                    'NAME_INCOME_TYPE','NAME_HOUSING_TYPE']

numeric_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler',  StandardScaler()),
])
cat_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('encoder', OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1)),
])
preprocessor = ColumnTransformer([
    ('num', numeric_transformer, numeric_features),
    ('cat', cat_transformer,     cat_features),
])
pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('classifier',   GradientBoostingClassifier(random_state=42)),
])
pipeline.fit(X_train, y_train)
auc = roc_auc_score(y_test, pipeline.predict_proba(X_test)[:,1])
print(f"Credit Scoring Pipeline AUC: {auc:.4f}")

GridSearchCV over Pipeline params

from sklearn.model_selection import GridSearchCV, StratifiedKFold

# Banks tune on AUC — the industry-standard metric
param_grid = {
    'classifier__n_estimators':   [100, 200],
    'classifier__max_depth':      [3, 5],
    'classifier__learning_rate':  [0.05, 0.1],
}
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
search = GridSearchCV(pipeline, param_grid,
                      cv=cv, scoring='roc_auc',
                      n_jobs=-1, verbose=1)
search.fit(X_train, y_train)

print("Best params:", search.best_params_)
print(f"Best CV AUC: {search.best_score_:.4f}")
print(f"Holdout AUC: {roc_auc_score(y_test, search.best_estimator_.predict_proba(X_test)[:,1]):.4f}")

Save Full Pipeline

# Save full pipeline — model registry pattern
import json, datetime
joblib.dump(search.best_estimator_, 'pnc_credit_pipeline_v1.joblib')

metadata = {
    "model_name": "credit_scoring_pipeline",
    "version": "1.0.0",
    "auc": round(roc_auc_score(y_test, search.best_estimator_.predict_proba(X_test)[:,1]), 4),
    "trained_at": datetime.datetime.utcnow().isoformat(),
    "features": numeric_features + cat_features
}
with open('model_card.json', 'w') as f: json.dump(metadata, f, indent=2)

# Score a new raw loan application — no manual preprocessing needed
loaded = joblib.load('pnc_credit_pipeline_v1.joblib')
new_application = X_test.iloc[[0]]
default_prob = loaded.predict_proba(new_application)[0, 1]
print(f"Default probability: {default_prob:.2%} → {'DECLINE' if default_prob > 0.15 else 'APPROVE'}")

TODO Tasks

Add a SelectKBest(k=10, score_func=f_classif) step between preprocessor and classifier. Which 10 features drive credit risk most?
Switch from GradientBoosting to XGBoost (XGBClassifier). Update param grid prefixes. Does AUC improve?
Replace GridSearchCV with RandomizedSearchCV(n_iter=20). Compare wall-clock time and best AUC. Which would a bank prefer in production?
Print cv_results_ sorted by mean_test_roc_auc. Build a model selection table showing top 5 configurations.

Challenge Section

Write a custom sklearn transformer (BaseEstimator + TransformerMixin) that computes debt_to_income and credit_utilisation from Lab 2. Plug it into the Pipeline before the ColumnTransformer.
Simulate a champion/challenger framework: train two pipelines with different classifiers, compare AUC on a holdout, log winner to model_card.json.
Advanced: Wrap the pipeline in a FastAPI /score endpoint that accepts a raw loan application JSON and returns default probability + decision. This is the real-time underwriting API pattern banks use.

Wrap-up

You've built a production-grade ML pipeline: mixed preprocessing, automated tuning, and single-artifact serialization. This is the workflow used by ML engineers at every major tech company. The ML Track is complete — you're ready for GenAI.

Saved in your browser — persists between visits

🗺️ This lab maps to Step 5 — ML Pipelines & MLOps Foundations on the ML Engineer Roadmap.

✦ GenAI Track

GenAI Track Beginner ⏱ 40 min

Build a Banking FAQ Chatbot

Build a multi-turn virtual banking assistant like the virtual assistants major banks ship. Call the OpenAI API, craft banking-specific system prompts, maintain conversation history, and handle account/product queries — all from a Colab notebook with your own API key.

OpenAI API openai Python SDK getpass Google Colab

Open Lab 4 in Google Colab

Opens a blank notebook — copy the code from the Code Snippets tab to practice

Learning Outcomes

Securely input an OpenAI API key in Colab using getpass
Call chat.completions.create with banking system prompts
Maintain multi-turn conversation history for account queries
Write system prompts that constrain the bot to banking topics only
Handle streaming responses like production assistant UIs
Detect out-of-scope queries and route to human agents

Prerequisites

API KeyOpenAI account (free tier OK)

PythonBasic loops & functions

Cost Estimate

~$0.01–$0.05 (≈ ₹1–₹4) for the full lab using gpt-4o-mini — only a few cents of usage.

🔑 How to get your OpenAI API key (about 2 minutes)

Go to platform.openai.com/signup and create an account (or log in).
Open platform.openai.com/api-keys.
Click Create new secret key, give it a name, and copy it right away — OpenAI shows the key only once.
Paste it when the notebook prompts. It's read securely via getpass and never saved in the notebook.

💡 New OpenAI accounts usually need a small prepaid balance to activate API access (minimum about $5 ≈ ₹430) — free trial credits are limited and often expired. Actual usage for this lab is only a few cents. Keep your key private and never commit it to GitHub. You'll reuse this same key in Labs 5, 7, 8, 9 and 10.

1Banking Chatbot Context

How real banking virtual assistants work. Roles: system (bank persona), user (customer), assistant (bot). Why conversation history = session state. Guardrails for regulated industries.

2Setup & API Key

!pip install openai. Secure key entry with getpass.getpass(). Never hardcode keys — explain why.

3First API Call

Single-turn: send one user message, print the assistant reply. Inspect the full response object — choices, usage, finish_reason.

4Multi-turn Chat Loop

Build a messages list. Append user input and assistant reply each turn. Run a while True loop with a quit command.

5Banking System Prompts

Write a banking-constrained system prompt: respond only to account, product, and fee questions; for anything else say "I'll connect you with an agent." Compare different bank persona styles.

6Streaming Responses

stream=True + iterate chunks. Print tokens as they arrive. Compare UX vs waiting for full response.

7Guardrails & Cost Control

Detect out-of-scope queries (investments, legal advice) and refuse with a handoff message. Token counting with tiktoken. Estimate cost per customer session — banks run millions of sessions daily.

8TODO, Challenge & Wrap-up

Student tasks + stretch goals.

Secure API Key Setup

!pip install openai tiktoken -q

import getpass
from openai import OpenAI

api_key = getpass.getpass("Enter your OpenAI API key: ")
client  = OpenAI(api_key=api_key)

print("✓ Client ready")

Single-turn Call

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system",  "content": "You are an Acme Bank virtual assistant. Help with account balances, transfers, fees, and products only. For anything else, say: I'll connect you with a specialist."},
        {"role": "user",    "content": "What is the monthly fee for a standard checking account?"},
    ],
    temperature=0.7,
    max_tokens=300,
)

reply = response.choices[0].message.content
print(f"Assistant: {reply}")
print(f"\nTokens used: {response.usage.total_tokens}")

Multi-turn Chat Loop

SYSTEM_PROMPT = """You are an Acme Bank virtual assistant. You help customers with:
- Account balances and transaction history
- Wire transfers and payment questions
- Credit card benefits and fees
- Branch and ATM locations
For investment advice, legal questions, or anything outside banking, say:
"I'll connect you with a bank specialist." Never guess account details."""
messages = [{"role": "system", "content": SYSTEM_PROMPT}]

print('Chat started. Type "quit" to exit.\n')
while True:
    user_input = input("You: ").strip()
    if user_input.lower() in ["quit", "exit"]:
        print("Goodbye!"); break

    messages.append({"role": "user", "content": user_input})

    resp = client.chat.completions.create(
        model="gpt-4o-mini", messages=messages, temperature=0.7
    )
    reply = resp.choices[0].message.content
    messages.append({"role": "assistant", "content": reply})
    print(f"Bot: {reply}\n")

Streaming Response

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role":"user", "content":"What are the current savings account interest rates?"}],
    stream=True,
)

print("Bot: ", end="", flush=True)
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)
print()

TODO Tasks

Rewrite the system prompt for a different bank persona. Change tone and product names. Test with 3 identical questions — how do answers differ?
Add a guardrail: if the user asks about stock tips or legal advice, the bot must respond with a human handoff message. Test with 5 edge cases.
Add max_history=8 to trim old turns. Simulate a 20-turn customer session. Does the bot still answer correctly without the early context?
Try temperature=0.0 vs temperature=0.9 for "explain overdraft fees". Which would a bank prefer in production and why?

Challenge Section

Add intent classification: before passing to the LLM, classify the user query as one of [balance_inquiry, transfer, complaint, out_of_scope]. Route accordingly.
Add a /escalate command that flags the session for human review (print a ticket ID + summary). Banks use this when sentiment turns negative.
Advanced: Wrap in Gradio ChatInterface with a custom bank-branded header. Add a "Speak to an agent" button that clears the chat and prints a case number.

Wrap-up

You've built a real multi-turn chatbot using the same API that powers production AI products. Key insight: conversation history IS the memory — manage it carefully. Next: make the bot answer questions about your own documents using RAG.

Saved in your browser — persists between visits

🗺️ This lab maps to Step 2 — LLM APIs & Prompt Engineering on the GenAI Engineer Roadmap.

GenAI Track Intermediate ⏱ 75 min

Build a Compliance Doc Q&A Bot

Build a RAG chatbot that answers questions from bank policy PDFs, CFPB regulations, and loan agreements — the exact system banks use for compliance Q&A. Chunk documents, embed into FAISS, and retrieve with source citations for regulatory auditability.

LangChain FAISS OpenAI Embeddings PyPDF2

Open Lab 5 in Google Colab

Opens a blank notebook — copy the code from the Code Snippets tab to practice

Learning Outcomes

Explain why RAG beats fine-tuning for private banking documents
Parse and chunk CFPB/bank policy PDFs with overlapping windows
Generate and store OpenAI embeddings of compliance text in FAISS
Build a RetrievalQA chain with page citations for regulatory audit trails
Tune chunk size for dense compliance text vs long loan agreements
Detect when a question cannot be answered from the retrieved context

Prerequisites

Lab 4Completed (OpenAI API)

API KeyOpenAI — same key as Lab 4 (embeddings ~$0.01 ≈ ₹1)

PDFCFPB complaint guide or any bank terms PDF

1Compliance RAG Architecture

Why banks use RAG for compliance: CFPB regulations change frequently, fine-tuning can't keep up. Diagram: Policy PDF → chunks → FAISS → retrieval → GPT-4 → cited answer. Auditability requirement: every answer must cite its source page.

2Setup

!pip install langchain langchain-openai faiss-cpu pypdf. API key via getpass.

3Load Compliance PDFs

Load CFPB's "What is a HELOC?" guide + a sample mortgage agreement PDF. Merge into one document corpus. Print page count, identify dense legal sections.

4Chunk the Document

RecursiveCharacterTextSplitter with chunk_size=1000, overlap=200. Print chunk count. Visualise chunk lengths as histogram.

5Embed & Index

OpenAIEmbeddings. FAISS.from_documents. Save index to disk. Print embedding dimension.

6Build Compliance Q&A Chain

RetrievalQA.from_chain_type with return_source_documents=True. Ask: "What is the APR cap on a HELOC?" Print answer + source page + document name. Refuse if not in context.

7Evaluate Compliance Accuracy

Test 10 compliance questions. Score each answer: correct / partially correct / hallucinated. Compare chunk sizes 500 vs 1000 on dense legal text. Discuss why hallucinations are a regulatory liability in banking.

8TODO, Challenge & Wrap-up

Student tasks + stretch goals.

Install & Download PDF

!pip install langchain langchain-openai faiss-cpu pypdf -q

# ── Download the CFPB "Your Money, Your Goals" financial toolkit PDF ──
# Public domain document published by the Consumer Financial Protection Bureau.
# No login required — downloads directly in Colab.
!wget -q -O cfpb_your_money_your_goals.pdf \
  "https://files.consumerfinance.gov/f/documents/cfpb_your-money-your-goals_toolkit_2023-04.pdf"
!ls -lh cfpb_your_money_your_goals.pdf

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
import getpass, os

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API key: ")

loader    = PyPDFLoader("cfpb_your_money_your_goals.pdf")
all_pages = loader.load()
print(f"Loaded {len(all_pages)} pages from CFPB compliance document")

Chunk + Embed + Index

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200
)
chunks = splitter.split_documents(pages)
print(f"Total chunks: {len(chunks)}")

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)
vectorstore.save_local("pdf_faiss_index")
print("✓ Vector store saved")

Build & Query RAG Chain

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
    return_source_documents=True,
    chain_type="stuff",
)

question = "What are the main conclusions of this document?"
result   = qa_chain.invoke({"query": question})

print("Answer:", result["result"])
print("\nSources:")
for doc in result["source_documents"]:
    print(f"  Page {doc.metadata.get('page', '?')}: {doc.page_content[:120]}...")

TODO Tasks

Test chunk sizes 300, 700, and 1500 on a dense CFPB regulation PDF. Which produces the most precise compliance answers?
Add a custom prompt template: "Answer only using the retrieved context. If the answer is not in the documents, respond: This information is not available in the provided compliance documents."
Load the FAISS index from disk without re-embedding. Confirm answers are identical — this is how banks avoid re-indexing large document sets on every query.
Change chain_type to "map_reduce". When does this improve compliance answers vs "stuff"?

Challenge Section

Load the CFPB complaint database CSV alongside your PDF. Build a hybrid corpus: regulatory text + real complaint examples. Test if retrieval improves for edge-case questions.
Add a hallucination detector: after the LLM answers, check if each sentence is grounded in the retrieved chunks using cosine similarity. Flag answers with <0.7 similarity as "unverified."
Advanced: Build a Gradio UI where compliance officers upload policy PDFs, ask questions, and see answers with highlighted source passages — a bank-style internal compliance assistant.

Wrap-up

You've built a production RAG pipeline — the architecture behind every "chat with your docs" product. The key levers are chunk strategy, embedding model, retriever k, and chain type. Next: swap FAISS for a managed vector database in Lab 6.

Saved in your browser — persists between visits

🗺️ This lab maps to Step 4 — RAG & Vector Search on the GenAI Engineer Roadmap.

GenAI Track Intermediate ⏱ 60 min

Build a Financial Document Search Engine

Scale from local FAISS to a managed cloud vector database. Index SEC filings, earnings reports, and bank prospectuses. Run semantic search across 1000+ financial documents, filter by ticker or filing type, and measure retrieval latency — skills used in every bank's research and compliance systems.

FAISS (in-memory) sentence-transformers LangChain HuggingFace

Open Lab 6 in Google Colab

Opens a blank notebook — copy the code from the Code Snippets tab to practice

Learning Outcomes

Index a public U.S. bank 10-K filing (auto-downloaded from SEC EDGAR) into a FAISS vector store
Generate sentence embeddings with a free HuggingFace model
Run semantic search across financial documents (e.g. "credit risk exposure")
Filter by ticker symbol, filing type, and fiscal year as metadata
Compare semantic vs keyword search for financial jargon
Measure query latency at scale — banks need sub-100ms retrieval

Prerequisites

Lab 5Completed (RAG/FAISS)

FAISSRuns locally — no account needed

Cost

100% free. FAISS runs in-memory on Colab CPU — no account needed. The 10-K is a public SEC filing, auto-downloaded from SEC EDGAR (public). sentence-transformers embeddings run locally.

1Financial Search Context

How investment banks use vector search to retrieve relevant sections from thousands of SEC filings instantly. Why keyword search fails for financial jargon ("net interest margin" vs "NIM"). ANN vs exact search at bank scale.

2Setup

!pip install sentence-transformers faiss-cpu pypdf. No API keys needed. Auto-download JPM 10-K via wget from SEC EDGAR. Build FAISS IndexFlatIP in memory.

3Download & Parse 10-K

Auto-download a public bank 10-K directly from SEC EDGAR with wget — public, no login. Parse with pypdf. Chunk page-by-page with sliding window. Each chunk tagged with ticker, filing year, and page number.

4Generate Embeddings

sentence-transformers/all-MiniLM-L6-v2. Batch encode all documents. Print shape, timing. Visualise 2D UMAP projection of embeddings.

5Build FAISS Index

Normalise embeddings (L2). Add to IndexFlatIP for cosine similarity. Print vector count and index size. Query latency benchmark at 100, 1k, and 10k vectors.

6Query Financial Documents

Run analyst-style queries: "credit risk exposure to commercial real estate", "liquidity risk management strategy", "regulatory capital requirements". Print top-5 results with ticker + section + score.

7Filter by Bank & Filing Type

Filter by ticker="JPM" to search only that bank's filings. Filter by fiscal_year=2023. Compare retrieval with and without filters. Benchmark latency — production requirement is <100ms.

8TODO, Challenge & Wrap-up

Student tasks + stretch goals.

Install & Download SEC 10-K

!pip install sentence-transformers faiss-cpu pypdf -q

# ── Download a public bank annual report (10-K) from SEC EDGAR ──
# Publicly accessible — no login or API key required.
!wget -q -O jpm_2023_10k.pdf \
  "https://www.sec.gov/Archives/edgar/data/19617/000001961724000285/jpm-20231231.pdf"
!ls -lh jpm_2023_10k.pdf

Load, Chunk & Embed

from pypdf import PdfReader
from sentence_transformers import SentenceTransformer
import faiss, numpy as np, time

# Extract text page-by-page, keep page number as metadata
reader = PdfReader("jpm_2023_10k.pdf")
raw_pages = [(str(i+1), page.extract_text() or "")
             for i, page in enumerate(reader.pages)]
print(f"Extracted {len(raw_pages)} pages from the 10-K")

# Chunk: sliding window 400 words, 50-word overlap
def chunk_page(page_num, text, size=400, overlap=50):
    words = text.split()
    chunks = []
    for i in range(0, len(words), size - overlap):
        chunk = " ".join(words[i : i + size])
        if chunk.strip():
            chunks.append({"page": page_num, "text": chunk,
                           "ticker": "JPM", "filing": "10-K 2023"})
    return chunks

corpus = [c for pg, txt in raw_pages for c in chunk_page(pg, txt)]
print(f"Total chunks: {len(corpus)}")

# Embed with sentence-transformers (free, no API key)
model      = SentenceTransformer("all-MiniLM-L6-v2")
texts      = [c["text"] for c in corpus]
embeddings = model.encode(texts, batch_size=64, show_progress_bar=True)

# Build FAISS index (in-memory, no server needed)
dim   = embeddings.shape[1]
index = faiss.IndexFlatIP(dim)          # inner-product = cosine on normalised vecs
faiss.normalize_L2(embeddings)
index.add(embeddings)
print(f"✓ FAISS index: {index.ntotal} vectors | dim={dim}")

Semantic Query + Metadata Filter

def search_filings(query, top_k=5, ticker_filter=None):
    t0    = time.time()
    q_vec = model.encode([query])
    faiss.normalize_L2(q_vec)
    scores, idxs = index.search(q_vec, top_k * 3)  # over-fetch for filtering
    latency = (time.time() - t0) * 1000
    results = [(scores[0][i], corpus[idxs[0][i]]) for i in range(len(idxs[0]))]
    if ticker_filter:
        results = [(s, d) for s, d in results if d["ticker"] == ticker_filter]
    print(f"\nQuery: '{query}'  [{latency:.0f}ms]")
    for score, doc in results[:top_k]:
        print(f"  [{score:.3f}] p.{doc['page']} ({doc['ticker']} {doc['filing']}) — {doc['text'][:110]}...")

# Analyst-style queries against the real JPM 10-K
search_filings("credit risk exposure commercial real estate")
search_filings("Basel III capital requirements CET1 ratio")
search_filings("liquidity risk management strategy", ticker_filter="JPM")

TODO Tasks

Add more banks' 10-K PDFs (any public tickers): download from SEC EDGAR, chunk, embed, and add to the same FAISS index with their respective ticker metadata. Then filter by ticker to search a single bank's disclosures.
Run the same 5 analyst queries with both FAISS semantic search and keyword search (str.find). Which finds more relevant risk disclosures — and why does semantic win on paraphrased questions?
Plot a 2D UMAP of 200 SEC filing embeddings, coloured by ticker. Do different banks' risk disclosures cluster together or separately?
Benchmark FAISS query latency at 100, 1k, and 10k vectors. At what scale would you graduate from local FAISS to a managed vector DB like Pinecone or Weaviate?

Challenge Section

Implement hybrid search: BM25 keyword scores (rank_bm25) + cosine similarity (weighted sum). Financial jargon like "NIM" or "CET1" often needs keyword matching — measure the improvement.
Build a Gradio UI for bank analysts: query input → top-5 SEC filing chunks shown as cards with ticker, year, and similarity score.
Advanced: Use a finance-specific embedding model (FinBERT) instead of MiniLM. Compare retrieval quality on financial queries — domain-specific models significantly outperform general ones for SEC filings.

Wrap-up

You've operated a real cloud vector database end-to-end: index creation, batch upsert, semantic querying, metadata filtering, and latency measurement. These are daily tasks for GenAI engineers. Next: deploy a chatbot UI to HuggingFace Spaces.

Saved in your browser — persists between visits

🗺️ This lab maps to Step 4 — Vector Databases & Retrieval on the GenAI Engineer Roadmap.

GenAI Track Intermediate ⏱ 50 min

Deploy a Loan Advisor Chatbot

Build and deploy a bank-style mortgage eligibility advisor using Gradio and HuggingFace Spaces. Customers enter income, debt, and credit score — the bot explains eligibility, rates, and next steps. Your first live, publicly shareable banking AI app.

Gradio HuggingFace Spaces OpenAI API huggingface_hub

Open Lab 7 in Google Colab

Opens a blank notebook — copy the code from the Code Snippets tab to practice

Learning Outcomes

Build a Gradio form-based loan advisor (income, debt, credit score inputs)
Write a mortgage eligibility system prompt with realistic product constraints
Store OpenAI API key securely as a HuggingFace Space Secret
Deploy the loan advisor to HuggingFace Spaces with a live public URL
Add input validation: reject out-of-range DTI or credit score values
Style the UI to match a bank's brand (logo, colors, disclaimer footer)

Prerequisites

Lab 4OpenAI API chatbot

HF AccountFree at huggingface.co

HF TokenWrite token from settings

1Loan Advisor Context

How bank mortgage pre-qualification tools work. Inputs: income, monthly debts, credit score, loan amount. Output: eligibility verdict + rate band + next steps. Why Gradio + HF Spaces is the fastest way to prototype banking AI tools.

2Setup

!pip install gradio openai huggingface_hub. API keys via getpass. HF token login.

3Build Loan Advisor Locally

Define advise_fn(income, monthly_debt, credit_score, loan_amount). Compute DTI ratio. Call GPT-4o-mini with a bank mortgage system prompt. Return eligibility + recommended products.

4Build the Gradio Form

gr.Interface with numeric inputs (income, debt, credit score, loan amount). Output: eligibility verdict textbox + rate band. Add a disclaimer: "This is an AI estimate, not a formal approval."

5Prepare Space Files

Write app.py (loan advisor logic + Gradio UI) and requirements.txt. Add OPENAI_API_KEY as a Space Secret. Add a README with sample bank branding and a regulatory disclaimer.

6Deploy to HF Spaces

Use huggingface_hub to create and push the Space. Visit the live URL. Share with teammates.

7TODO, Challenge & Wrap-up

Student tasks + stretch goals.

Local Gradio Chatbot

!pip install gradio openai -q
import gradio as gr
from openai import OpenAI
import getpass, os

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI key: ")
client = OpenAI()

def advise_fn(annual_income, monthly_debt, credit_score, loan_amount):
    dti = (monthly_debt / (annual_income / 12)) * 100
    SYSTEM = """You are an Acme Bank mortgage pre-qualification advisor.
Given DTI, credit score, and loan amount:
- DTI < 36% and credit score >= 680: likely eligible, suggest conventional loan
- DTI 36-43% or score 620-679: suggest FHA loan, flag higher rate
- DTI > 43% or score < 620: not pre-qualified, suggest credit improvement steps
Always add: "This is an AI estimate only, not a formal loan approval.""""
    user_msg = f"Income: ${annual_income:,}/yr | Monthly debts: ${monthly_debt} | Credit score: {credit_score} | Loan: ${loan_amount:,} | DTI: {dti:.1f}%"
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role":"system","content":SYSTEM},{"role":"user","content":user_msg}],
        temperature=0.2
    )

    stream   = client.chat.completions.create(
        model="gpt-4o-mini", messages=messages,
        temperature=temperature, stream=True
    )
    reply = ""
    for chunk in stream:
        reply += chunk.choices[0].delta.content or ""
        yield reply  # streaming to Gradio

demo = gr.ChatInterface(
    fn=chat_fn,
    additional_inputs=[
        gr.Textbox("You are a helpful AI career coach.", label="System Prompt"),
        gr.Slider(0, 2, value=0.7, step=0.1, label="Temperature"),
    ],
    title="CareerStack AI Coach",
    description="Powered by GPT-4o-mini",
)
demo.launch(share=True)

Deploy to HuggingFace Spaces

from huggingface_hub import HfApi, login
import getpass

login(token=getpass.getpass("HuggingFace write token: "))
api = HfApi()

# Create the Space repo
api.create_repo(
    repo_id="YOUR_USERNAME/careerstack-chatbot",
    repo_type="space",
    space_sdk="gradio",
    exist_ok=True,
)

# Upload app files
api.upload_file(path_or_fileobj="app.py",
                path_in_repo="app.py",
                repo_id="YOUR_USERNAME/careerstack-chatbot",
                repo_type="space")
api.upload_file(path_or_fileobj="requirements.txt",
                path_in_repo="requirements.txt",
                repo_id="YOUR_USERNAME/careerstack-chatbot",
                repo_type="space")
print("✓ Deployed: https://huggingface.co/spaces/YOUR_USERNAME/careerstack-chatbot")

requirements.txt

gradio>=4.0.0
openai>=1.0.0

TODO Tasks

Add a model selector dropdown (gpt-4o-mini, gpt-4o) to the Gradio UI. Make sure it's passed to the API call.
Add a "Clear" button that resets the conversation history. Test that the system prompt is preserved after clearing.
Add a token counter that displays running total tokens used in the session. Use tiktoken.
Set OPENAI_API_KEY as a Space Secret in the HF UI (not in code). Verify the deployed Space works without the key in any file.

Challenge Section

Replace the OpenAI backend with a free HuggingFace Inference API model (e.g. mistralai/Mistral-7B-Instruct-v0.2). Your Space becomes fully free.
Add a PDF upload component. When a PDF is uploaded, chunk it and inject the top-3 retrieved chunks into the system context automatically (mini-RAG in the UI).
Advanced: Add usage analytics: log every query + response to a Google Sheet using the gspread library. Monitor your Space usage from Sheets.

Wrap-up

You shipped a live AI app with a public URL. That URL is a portfolio piece. The GenAI Track is complete — you can build, run, and deploy LLM-powered apps. The Agentic AI Track takes you from apps to autonomous agents.

Saved in your browser — persists between visits

🗺️ This lab maps to Step 6 — Deployment & Productionisation on the GenAI Engineer Roadmap.

⚡ Agentic AI Track

Agentic AI Track Intermediate ⏱ 60 min

Build a Financial Research Agent

Give an LLM tools to pull live stock data, search financial news, and calculate financial ratios — building the research agent workflow used by analysts at major banks. Watch the agent chain tool calls to answer complex market questions.

LangChain Agents OpenAI Tool Calling Custom Tools AgentExecutor

Open Lab 8 in Google Colab

Opens a blank notebook — copy the code from the Code Snippets tab to practice

Learning Outcomes

Understand the ReAct loop: Thought → Action → Observation in a finance context
Build a stock data tool using yfinance (free, no API key)
Build a financial news search tool for market context
Build a financial ratio calculator (P/E, ROE, debt-to-equity)
Chain tool calls: fetch stock → calculate ratio → find news → synthesise report
Handle tool errors when tickers are invalid or data is unavailable

Prerequisites

Lab 4OpenAI API basics

PythonDecorators & functions

API KeyOpenAI — same key as Lab 4

1Financial Research Agents

How banks' internal AI research assistants work. LLM + financial data tools + ReAct loop. Why bank analysts use agents: 10K filings, earnings calls, market data — too much for a single prompt.

2Setup

!pip install langchain langchain-openai. API key. Verbose mode to see agent reasoning.

3Define Financial Tools

Stock data tool: yfinance.Ticker → price, market cap, P/E, 52-week range. Financial calculator: P/E ratio, ROE, debt-to-equity from raw numbers. News tool: mock financial headlines keyed by ticker. Docstrings must be precise — the LLM reads them to choose tools.

4OpenAI Tool Calling (native)

Define financial tools as JSON schema. Call with tools= param. Parse tool_calls. Execute: call yfinance, run calculation, or return mock news. Feed results back. Loop until agent has enough data to answer.

5LangChain AgentExecutor

Same agent using LangChain's create_openai_tools_agent. Wrap in AgentExecutor. Run with verbose=True to see full trace.

6Test with Complex Queries

Queries that require both tools in sequence. Analyse the agent's decision trace. Identify when it gets confused.

7TODO, Challenge & Wrap-up

Student tasks + stretch goals.

Define Tools with @tool

!pip install langchain langchain-openai -q
from langchain_core.tools import tool
import math, ast, operator

@tool
def calculator(expression: str) -> str:
    """Evaluate a mathematical expression safely.
    Input: a math expression string like '2 * (3 + 4)' or 'sqrt(16)'.
    Returns: the numeric result as a string."""
    try:
        allowed = {k: v for k, v in vars(math).items() if not k.startswith('_')}
        result  = eval(expression, {"__builtins__": {}}, allowed)
        return str(result)
    except Exception as e:
        return f"Error: {e}"

@tool
def mock_search(query: str) -> str:
    """Search the web for factual information.
    Input: a search query string.
    Returns: a short factual answer."""
    knowledge = {
        "openai":   "OpenAI was founded in 2015. GPT-4o is their latest flagship model.",
        "python":   "Python 3.12 is the latest stable release as of 2024.",
        "langchain":"LangChain is an open-source framework for building LLM applications.",
    }
    for key, val in knowledge.items():
        if key in query.lower():
            return val
    return "No results found for that query."

tools = [calculator, mock_search]

LangChain AgentExecutor

from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a financial research analyst at a top bank. Use tools to fetch stock data, calculate financial ratios, and find news. Always cite the data source. Never guess financial figures."),
    MessagesPlaceholder("chat_history", optional=True),
    ("human", "{input}"),
    MessagesPlaceholder("agent_scratchpad"),
])

agent    = create_openai_tools_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = executor.invoke({
    "input": "Analyse AAPL: get the current stock data, calculate its P/E ratio interpretation, and find recent news. Give me a buy/hold/sell recommendation."
})
print("\nFinal Answer:", result["output"])

TODO Tasks

Run the agent on three tickers of your choice (e.g., AAPL, MSFT, GOOGL). Build a comparison table: ticker, price, P/E ratio, dividend yield, and agent buy/hold/sell verdict.
Add a get_sector_average tool that returns average P/E for the banking sector (hardcoded at 11x). Have the agent compare each bank vs sector average.
Set max_iterations=3. Test a 4-step analysis. What happens? Increase to 6 and observe the difference in answer completeness.
Force a tool error by passing an invalid ticker (e.g. "XYZABC"). Add handle_parsing_errors=True and confirm the agent recovers gracefully.

Challenge Section

Add ConversationBufferMemory so the agent remembers previous tickers discussed. Ask "Compare it to MSFT" after asking about AAPL — the agent should know "it" = AAPL.
Replace mock news with real yfinance news: yf.Ticker("AAPL").news returns recent articles. Parse and format the top 3 headlines.
Advanced: Implement the ReAct loop manually without LangChain: parse tool_calls from OpenAI response, dispatch to your financial tools, loop until the model issues a final answer. This is how production bank AI systems bypass framework overhead.

Wrap-up

You've built an agent that thinks, acts, and observes — the core loop of every AI agent. The critical insight: tool docstrings are prompts. Write them clearly and the agent uses tools correctly. Next: give the agent more tools and more complex decision logic.

Saved in your browser — persists between visits

🗺️ This lab maps to Step 2 — AI Agents & Tool Use on the Agentic AI Roadmap.

Agentic AI Track Advanced ⏱ 75 min

Build a Portfolio Risk Agent

Scale up to 4 tools — weather, math, search, and a Python REPL — and teach the agent to route queries intelligently. Build conditional logic, tool chaining, and structured output parsing.

LangChain Python REPL Tool Structured Output PydanticOutputParser

Open Lab 9 in Google Colab

Opens a blank notebook — copy the code from the Code Snippets tab to practice

Learning Outcomes

Build a 4-tool agent with weather, math, search, and code execution
Implement structured output with Pydantic models
Parse and validate tool results before feeding back to LLM
Handle multi-step reasoning chains across tools
Add a router layer to pre-classify query type
Evaluate agent accuracy with a test suite of 10 queries

Prerequisites

Lab 8Single-tool agent

PydanticBasic models

API KeyOpenAI — same key as Lab 4

1Multi-Tool Design

How does an LLM choose between 4 tools? Tool description quality. Overlap and ambiguity between tools. Design principles for tool APIs.

2Setup

Pip installs. API key. Import block.

3Define 4 Tools

Weather tool (mock API, returns temp + conditions for any city). Math tool (safe eval). Search tool (mock + optional real DuckDuckGo). Python REPL tool (execute arbitrary Python, return stdout).

4Structured Output

Define a Pydantic AgentResponse model with fields: answer, tools_used, confidence. Use with_structured_output.

5Build & Run Agent

AgentExecutor with all 4 tools. Run 10 diverse test queries. Log tool usage per query. Build a summary table.

6Decision Logic Analysis

Count how often each tool is called. Identify misrouted queries. Improve tool descriptions based on failures.

7TODO, Challenge & Wrap-up

Student tasks + stretch goals.

4 Tools Definition

from langchain_core.tools import tool
import math, subprocess, sys

@tool
def get_weather(city: str) -> str:
    """Get current weather for a city.
    Input: city name (e.g. 'London', 'Tokyo').
    Returns: temperature in Celsius and weather conditions."""
    mock_weather = {
        "london":  "12°C, Cloudy with light rain",
        "tokyo":   "24°C, Sunny and clear",
        "new york":"18°C, Partly cloudy",
        "mumbai":  "31°C, Hot and humid",
    }
    return mock_weather.get(city.lower(), f"Weather data unavailable for {city}")

@tool
def run_python(code: str) -> str:
    """Execute Python code and return the output.
    Input: valid Python code as a string.
    Use for data analysis, calculations, or string manipulation.
    Returns: stdout output of the code."""
    try:
        result = subprocess.run(
            [sys.executable, "-c", code],
            capture_output=True, text=True, timeout=10
        )
        return result.stdout or result.stderr or "(no output)"
    except Exception as e:
        return f"Error: {e}"

tools = [get_weather, calculator, mock_search, run_python]

Structured Output with Pydantic

from pydantic import BaseModel, Field
from typing import List
from langchain_openai import ChatOpenAI

class AgentResponse(BaseModel):
    answer:      str       = Field(description="The final answer to the user's question")
    tools_used:  List[str] = Field(description="List of tool names used")
    confidence:  float     = Field(description="Confidence score 0.0-1.0")
    reasoning:   str       = Field(description="Brief explanation of reasoning")

# Build structured-output chain (separate from the agent for final answer parsing)
structured_llm = ChatOpenAI(model="gpt-4o-mini").with_structured_output(AgentResponse)

Test Suite & Analysis

test_queries = [
    "What's the weather in Tokyo and convert 24C to Fahrenheit?",
    "Write Python to generate the first 10 Fibonacci numbers.",
    "What is sqrt(2) * pi to 4 decimal places?",
    "Who created LangChain and what is 100/7?",
    "Compare the weather in London and Mumbai.",
]

results = []
for q in test_queries:
    r = executor.invoke({"input": q})
    results.append({"query": q, "answer": r["output"]})
    print(f"Q: {q[:60]}...\nA: {r['output'][:100]}\n")

pd.DataFrame(results).to_csv("agent_results.csv", index=False)

TODO Tasks

Add a 5th tool: get_time(timezone: str) that returns the current time in a given timezone using datetime + pytz.
Run the 10-query test suite and build a pandas DataFrame showing query, tools_used, and answer. Which tool is called most?
Find 2 queries where the agent picks the wrong tool. Rewrite the tool docstring to fix the misrouting.
Add return_intermediate_steps=True to AgentExecutor. Print the full chain of tool calls for a complex query.

Challenge Section

Add a pre-router: call the LLM first to classify the query as "math", "weather", "search", or "code". Then only pass the relevant tool(s) to the agent. Measure whether this improves accuracy.
Implement a retry loop: if the agent's structured output confidence < 0.7, re-run with a modified prompt asking it to be more careful.
Advanced: Build a Gradio UI that shows the agent's reasoning trace in a collapsible panel alongside the final answer. Users should see every tool call.

Wrap-up

A 4-tool agent with structured output and a test suite is production-ready thinking. You measured performance, caught misrouting, and improved it — that's the engineering discipline that separates hobby projects from shipped products. One lab left: system design.

Saved in your browser — persists between visits

🗺️ This lab maps to Step 3 — Multi-Tool Agents on the Agentic AI Roadmap.

Agentic AI Track Advanced ⏱ 90 min

Build a Regulatory Change Monitor

Design and build a production-grade search system: query rewriting, vector retrieval, cross-encoder re-ranking, and structured response synthesis. The capstone lab — connects every concept from all 9 previous labs.

LangChain FAISS CrossEncoder OpenAI Gradio

Open Lab 10 in Google Colab

Opens a blank notebook — copy the code from the Code Snippets tab to practice

Learning Outcomes

Design a multi-stage RAG + ranking pipeline
Implement query rewriting to expand search recall
Re-rank retrieved chunks with a cross-encoder model
Synthesise a cited, structured response from ranked chunks
Measure and compare pipeline quality at each stage
Build a complete system diagram and explain trade-offs

Prerequisites

Labs 5, 6, 8, 9All completed

API KeyOpenAI — same key as Lab 4

ConceptsRAG, embeddings, agents

1System Design Overview

Full pipeline diagram: Query → Rewriter → Retriever → Re-ranker → Synthesiser → Answer. Why each stage exists. Where production systems fail without each stage.

2Setup & Corpus

Install all dependencies. Load a 200-document corpus (AI/ML Wikipedia articles). Build FAISS index from Lab 5/6 approach.

3Stage 1 — Query Rewriting

Use an LLM to expand a user query into 3 alternative phrasings. Retrieve candidates for all 3. Union the result sets. Measure recall improvement vs single query.

4Stage 2 — Dense Retrieval

FAISS top-20 candidates for each rewritten query. Deduplicate by chunk id. Print candidate pool size.

5Stage 3 — Cross-Encoder Re-ranking

cross-encoder/ms-marco-MiniLM-L-6-v2 scores each candidate against the original query. Sort by score. Keep top-5. Compare order before vs after re-ranking.

6Stage 4 — Response Synthesis

Feed top-5 chunks to GPT-4o-mini with a synthesis prompt. Require inline citations ([1], [2]…). Return structured JSON: answer + sources + confidence.

7End-to-End Evaluation

Run 10 queries. For each, record retrieval recall, re-ranking delta, and answer quality (manual 1-5 rating). Build a metrics table.

8Gradio UI + TODO + Wrap-up

Full Gradio UI. Student tasks. Capstone challenge. Final wrap-up for the entire Labs curriculum.

Stage 1 — Query Rewriting

from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import JsonOutputParser

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.3)

REWRITE_PROMPT = """You are a search query optimizer.
Given a user query, generate 3 alternative phrasings that capture
the same intent but use different vocabulary.
Return JSON: {{"queries": ["q1", "q2", "q3"]}}

User query: {query}"""

def rewrite_query(query: str) -> list[str]:
    response = llm.invoke(REWRITE_PROMPT.format(query=query))
    parsed   = JsonOutputParser().parse(response.content)
    return [query] + parsed["queries"]  # original + 3 variants

queries = rewrite_query("how do transformers work in NLP")
print("\n".join(queries))

Stage 3 — Cross-Encoder Re-ranking

!pip install sentence-transformers -q
from sentence_transformers import CrossEncoder

reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

def rerank(query: str, candidates: list, top_k: int = 5) -> list:
    pairs  = [(query, c.page_content) for c in candidates]
    scores = reranker.predict(pairs)
    ranked = sorted(zip(scores, candidates), key=lambda x: x[0], reverse=True)
    return [doc for _, doc in ranked[:top_k]]

Stage 4 — Synthesis with Citations

SYNTHESIS_PROMPT = """Answer the question using ONLY the provided context.
Cite sources inline as [1], [2], etc.
If the answer cannot be found, say "I don't know."

Question: {question}

Context:
{context}

Return JSON: {{"answer": "...", "sources": [1,2], "confidence": 0.95}}"""

def synthesise(question: str, chunks: list) -> dict:
    context = "\n\n".join(
        [f"[{i+1}] {c.page_content}" for i, c in enumerate(chunks)]
    )
    resp   = llm.invoke(SYNTHESIS_PROMPT.format(question=question, context=context))
    return JsonOutputParser().parse(resp.content)

def full_pipeline(query: str) -> dict:
    queries    = rewrite_query(query)          # Stage 1
    candidates = []
    for q in queries:                          # Stage 2
        candidates += vectorstore.similarity_search(q, k=8)
    candidates = list({c.page_content: c for c in candidates}.values())
    top_chunks = rerank(query, candidates)     # Stage 3
    return synthesise(query, top_chunks)       # Stage 4

result = full_pipeline("how do transformers work in NLP")
print(result["answer"])

Gradio UI

import gradio as gr

def search_ui(query):
    result = full_pipeline(query)
    sources_text = ", ".join([f"[{s}]" for s in result.get("sources", [])])
    confidence   = result.get("confidence", 0)
    return (
        result["answer"],
        f"Sources used: {sources_text} | Confidence: {confidence:.0%}"
    )

gr.Interface(
    fn=search_ui,
    inputs=gr.Textbox(label="Search Query", placeholder="Ask anything about AI..."),
    outputs=[
        gr.Textbox(label="Answer", lines=6),
        gr.Textbox(label="Metadata"),
    ],
    title="CareerStack AI Search",
    description="Query → Rewrite → Retrieve → Re-rank → Synthesise",
).launch(share=True)

TODO Tasks

Build a metrics table: for 5 queries, record (a) top-1 doc before re-ranking and (b) top-1 doc after re-ranking. How often did the order change?
Add a "query expansion" step that extracts synonyms for the key noun in the query and appends them as additional sub-queries.
Add a fallback: if confidence < 0.6, return "I don't have enough information" instead of a low-confidence answer.
Deploy the Gradio UI to HuggingFace Spaces (using the skills from Lab 7). Share your live search system URL.

Capstone Challenge

Replace the Wikipedia corpus with CareerStack's own blog articles. Build a search engine over all the HTML files in this project directory.
Add an LLM-as-judge evaluation: after each synthesis, ask a second LLM call to rate the answer quality 1-5. Log and surface low-quality answers for human review.
Final Boss: Add a conversational memory layer so follow-up questions are resolved in context. User asks "What is RAG?" → "How does it compare to fine-tuning?" and the second query understands the context is RAG vs fine-tuning.

Curriculum Wrap-up

You've completed all 10 CareerStack labs — 3 ML, 4 GenAI, 3 Agentic AI. You can classify data, predict prices, build pipelines, call LLM APIs, build RAG systems, operate vector databases, deploy chatbots, build agents with tools, and design production search systems. That is the full AI engineering skill stack. You're ready for your first AI engineering role.

Saved in your browser — persists between visits

🗺️ This lab maps to Step 5 — Production AI Systems on the Agentic AI Roadmap. Capstone lab — unlocks the CareerStack completion certificate.

Hands-On AI Labs

Build a Fraud Detection Classifier

Learning Outcomes

Prerequisites

No API key needed

TODO Tasks (Complete before moving on)

Build a Loan Default Risk Predictor

Learning Outcomes

Prerequisites

Dataset

TODO Tasks

Build a Credit Scoring Pipeline

Learning Outcomes

Prerequisites

Why Pipelines in Banking?

TODO Tasks

Build a Banking FAQ Chatbot

Learning Outcomes

Prerequisites

Cost Estimate

TODO Tasks

Build a Compliance Doc Q&A Bot

Learning Outcomes

Prerequisites

TODO Tasks

Build a Financial Document Search Engine

Learning Outcomes

Prerequisites

Cost

TODO Tasks

Deploy a Loan Advisor Chatbot

Learning Outcomes

Prerequisites

TODO Tasks

Build a Financial Research Agent

Learning Outcomes

Prerequisites

TODO Tasks

Build a Portfolio Risk Agent

Learning Outcomes

Prerequisites

TODO Tasks

Build a Regulatory Change Monitor

Learning Outcomes

Prerequisites

TODO Tasks

Ready to build your own project?