01CI Fundamentals โ What Good CI Looks Like
The CI Pipeline Stages
- Trigger: push, pull_request, schedule, manual
- Checkout: fetch code at specific commit SHA
- Build: compile, generate artifacts
- Test: unit โ integration โ e2e (fast to slow)
- Lint / Static Analysis: code quality, security scan
- Package: Docker build, binary, SBOM
- Publish: push to registry with immutable tag
- Notify: Slack, GitHub PR comment, PagerDuty
CI Best Practices
- Fast feedback: CI must complete in <10 min โ engineers wait
- Fail fast: run cheapest checks first (lint before integration tests)
- Deterministic: same commit = same result every time
- Immutable artifacts: tag images with git SHA, never "latest"
- Cache aggressively: deps, build outputs, Docker layers
- Parallelize: run independent jobs concurrently
- Test in isolation: no shared state between test runs
- Don't skip tests: every merge to main must be green
Trunk-Based vs Feature Branch
- Trunk-Based: everyone merges to main daily โ requires feature flags
- Forces: small PRs, fast review, continuous integration (not just continuous)
- Used by: Google, Meta, Netflix at scale
- Feature Branch: long-lived branches per feature
- Risk: merge conflicts, integration surprises, big bang merges
- Better default for teams: short-lived feature branches (โค2 days) + PR
- GitFlow: adds release/hotfix branches โ good for versioned software
What Makes a Good Test Suite
- Test pyramid: many unit tests, fewer integration, few e2e
- Unit tests: fast (<1ms each), isolated, no I/O, no network
- Integration tests: test real DB/cache โ use testcontainers
- e2e tests: test full user flows โ slow, run in staging
- Contract tests: verify API contracts between services (Pact)
- Code coverage: 80% is a reasonable target โ 100% is diminishing returns
- Mutation testing: verifies tests actually catch real bugs
Quiz โ CI Fundamentals
Q1. Why should Docker images be tagged with git SHA instead of "latest"?
Q2. The test pyramid recommends:
02GitHub Actions โ Deep Dive
Complete Production Pipeline
Key Actions Concepts
- Workflow: YAML file โ triggered by events
- Job: runs on a runner (ubuntu-latest, self-hosted)
- Step: individual task within a job โ uses: or run:
- needs: declare job dependencies for ordering
- outputs: pass data between jobs
- matrix: run job for multiple values (Go versions, OS)
- environment: deployment target with required reviewers
- concurrency: cancel in-progress runs on new push
Secrets & Variables
- Secrets: encrypted, never logged โ
${{ secrets.MY_SECRET }} - GITHUB_TOKEN: auto-generated per run โ use for registry login, PR comments
- Variables: non-sensitive config โ
${{ vars.REGION }} - Environments: scope secrets to staging/production environments
- Required reviewers on environments gate production deploys
- Never echo secrets โ GitHub masks them but avoid it anyway
- Rotate secrets periodically โ use short-lived OIDC tokens where possible
Caching Strategies
actions/cache: key/restore-keys pattern for any directory- Docker layer cache:
cache-from: type=ghaโ reuses unchanged layers - Go modules: cache
~/go/pkg/modkeyed by go.sum hash - Node modules: cache
node_moduleskeyed by package-lock.json hash - Cache key strategy: broad restore-key for partial hits, exact key for full hits
- Cache size limit: 10GB per repo in GitHub Actions
Matrix Builds
- Test across multiple Go/Node versions simultaneously
- Test across multiple OS (ubuntu, macos, windows)
- All matrix combinations run in parallel
fail-fast: falseโ don't cancel others if one failsinclude:add extra variables to specific combinationsexclude:remove specific combinations- Max 256 matrix jobs per workflow run
Quiz โ GitHub Actions
Q1. In GitHub Actions, what does needs: [lint-and-test] on a job mean?
Q2. What is the best cache key strategy for Go module dependencies?
03Deployment Strategies
Blue-Green Deployment
Canary Deployment
Rolling Update (Kubernetes Default)
- Replace pods one by one (or in batches) โ default K8s strategy
maxSurge:extra pods during update (25% or count)maxUnavailable:pods that can be unavailable (25% or count)- Zero downtime if readiness probes are correct
- Rollback:
kubectl rollout undo deployment/myapp - Issue: both versions run simultaneously โ API must be backward compatible
Deployment Strategy Comparison
- Rolling: default โ simple, zero downtime, mixed versions
- Blue-Green: instant cutover, full resource cost (2x), simple rollback
- Canary: real traffic testing, gradual confidence, complex routing
- Recreate: downtime but no mixed versions โ DB migrations
- A/B: canary by user segment โ needs user ID routing
- Choose canary for user-facing changes; rolling for internal services
Feature Flags
- Decouple deployment from release โ deploy dark, enable gradually
- Target by: user ID, percentage, region, beta users, employee
- Tools: LaunchDarkly, Unleash (self-hosted), Flagsmith
- Trunk-based development: feature flags allow unfinished code on main
- Kill switch: disable broken feature instantly without redeploy
- Lifecycle: create โ test in prod โ full rollout โ clean up (delete flag!)
- Tech debt: stale flags are a liability โ set expiry dates
Database Migrations with Deploys
- Never break backward compatibility in a migration
- Expand-contract pattern: add new column (expand) โ migrate data โ remove old (contract) across 3 deploys
- Run migrations before deploying new code โ code must handle both old and new schema
- Tooling: Flyway, Liquibase, golang-migrate
- Zero-downtime: avoid locks on large tables โ use
ADD COLUMN DEFAULT NULLnotNOT NULL - Always test migration rollback โ does
migrate downwork?
Quiz โ Deployment Strategies
Q1. Blue-green deployment's main advantage over rolling update is:
Q2. Feature flags enable what deployment practice?
Q3. The expand-contract pattern for database migrations means:
04GitOps & ArgoCD
GitOps Principles
GitOps = Git as the single source of truth for declarative infrastructure and application state. Four core principles:
1. Declarative: the entire system is described declaratively (Kubernetes manifests, Helm charts).
2. Versioned & immutable: desired state stored in Git โ every change is a commit, fully auditable.
3. Pulled automatically: approved changes are applied automatically by a software agent (ArgoCD, Flux).
4. Continuously reconciled: agent detects drift between desired (Git) and actual (cluster) state and corrects it.
ArgoCD Configuration
ArgoCD vs Flux
- ArgoCD: UI-driven, explicit Application resources, multi-cluster, RBAC
- Flux: CLI-first, GitRepository + Kustomization CRDs, Helm operator, simpler
- ArgoCD: better for teams that want visibility and multi-cluster management
- Flux: better for fully automated, code-first GitOps
- Both: pull-based (agent inside cluster pulls from Git โ no cluster access needed from CI)
- Pull vs push: pull is safer โ cluster credentials never leave the cluster
Helm Deep Dive
- Helm = Kubernetes package manager โ templates + values
values.yaml: default values โ override per environmenthelm upgrade --install: idempotent โ create or update--atomic: rollback automatically on failure--wait: wait for all resources to be ready- Chart hooks: pre-install, post-install, pre-upgrade โ run jobs at lifecycle points
- Subcharts: dependencies in
Chart.yamlโ include postgres, redis as deps
Quiz โ GitOps
Q1. What is the "pull" model in GitOps (ArgoCD/Flux)?
Q2. ArgoCD selfHeal means:
05Secrets Management โ Vault & Beyond
HashiCorp Vault
External Secrets Operator (ESO)
- Kubernetes operator โ syncs secrets from external providers into K8s Secrets
- Supports: Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault
- ExternalSecret resource: define what to sync and where
- Refresh interval: secrets auto-updated when source changes
- Better than Vault Agent sidecar: no sidecar per pod, centrally managed
- Use with Sealed Secrets (Bitnami) to encrypt secrets in Git
Secrets Anti-Patterns
- โ Hardcode secrets in code or Dockerfiles
- โ Store secrets in Git (even in private repos)
- โ Long-lived static credentials โ prefer short-lived with rotation
- โ Share secrets between environments (prod key in dev)
- โ Echo secrets in CI logs โ GitHub masks them but avoid it
- โ Use OIDC for CI โ no static credentials at all
- โ Rotate secrets regularly, audit who accessed what
OIDC for CI/CD
- GitHub Actions can authenticate to AWS/GCP/Azure without static credentials
- GitHub issues short-lived OIDC token per workflow run
- Cloud provider validates token โ issues short-lived cloud credentials
- No AWS_ACCESS_KEY_ID stored in GitHub Secrets
- Configure:
id-token: writepermission + cloud provider trust policy - This is the modern, most secure approach for CI cloud access
Secret Rotation
- Vault dynamic secrets: generate DB creds on demand, expire after TTL
- Database secrets engine: creates temporary Postgres users per request
- Revocation: Vault can revoke all leases on a path instantly
- AWS IAM: use IAM roles with STS AssumeRole โ no long-lived keys
- Rotation triggers: time-based (90 days), on breach, on engineer departure
- Zero-downtime rotation: ensure app can re-read secrets without restart
Quiz โ Secrets Management
Q1. OIDC for CI/CD (e.g., GitHub Actions to AWS) is better than static credentials because:
Q2. Vault dynamic secrets for a database means:
06Pipeline Optimization & Observability
Making CI Faster
- Parallelize: run lint, test, security scan as parallel jobs
- Cache layers: Docker BuildKit cache mounts for apt-get, go modules
- Test sharding: split tests across multiple runners (pytest-split, go test -run)
- Incremental builds: only rebuild changed packages (Bazel, Nx, Turborepo)
- Self-hosted runners: larger machines, pre-installed deps, no cold starts
- Fail fast: lint first โ cheapest check before expensive ones
- Target: main branch CI under 5 minutes, PR CI under 10 minutes
Optimized Dockerfile
- Multi-stage build: build stage โ minimal runtime image
- Layer order: COPY dependencies first, then source code
- Base image: distroless or alpine for minimal attack surface
- Non-root user: always run as non-root in production
- .dockerignore: exclude .git, node_modules, test files
- Pin base image with digest:
FROM golang:1.22@sha256:abc123 - BuildKit cache mounts:
RUN --mount=type=cache,target=/root/.cache go build
Pipeline Observability
- Track: pipeline duration, failure rate, flaky test rate
- Alert on: CI failure rate > 10%, build time > 10 min
- Flaky tests: auto-retry 2x then report โ fix or quarantine, never ignore
- DORA metrics: deployment frequency, lead time, MTTR, change failure rate
- Tools: Datadog CI Visibility, BuildKite, GitHub Actions insights
- Mean time to restore (MTTR): how fast can you detect and fix a broken deploy?
DORA Metrics โ Elite Team Targets
- Deployment Frequency: multiple times per day (elite)
- Lead Time for Changes: < 1 hour from commit to prod (elite)
- Change Failure Rate: < 5% of deployments cause incidents (elite)
- MTTR: < 1 hour to restore after failure (elite)
- These metrics predict software delivery performance and business outcomes
- Source: Google's State of DevOps report (annual)
Optimized Multi-Stage Dockerfile (Go)
07Best Resources
Official docs โ start with Quickstart, then Workflow syntax reference. Most complete source.
Best YouTube channel for CI/CD fundamentals. Watch: GitHub Actions, ArgoCD, Helm courses. Clear and structured.
Read: Getting Started, Application CRD, App of Apps pattern, Sync Waves. Takes ~3 hours.
Annual Google research report. The definitive data on what makes high-performing engineering teams. Free.
Start with: Getting Started, Kubernetes auth method, KV secrets engine, dynamic database secrets.
How to eliminate AWS_ACCESS_KEY from GitHub Secrets using OIDC. 20 minutes. Essential for secure CI.
Quiz โ Optimization
Q1. In a Dockerfile, why should you COPY go.mod/go.sum before COPY . .?
Q2. Which DORA metric measures how quickly a team recovers from a production incident?