CI/CD in a Day

01CI Fundamentals — What Good CI Looks Like

The CI Pipeline Stages

Trigger: push, pull_request, schedule, manual
Checkout: fetch code at specific commit SHA
Build: compile, generate artifacts
Test: unit → integration → e2e (fast to slow)
Lint / Static Analysis: code quality, security scan
Package: Docker build, binary, SBOM
Publish: push to registry with immutable tag
Notify: Slack, GitHub PR comment, PagerDuty

CI Best Practices

Fast feedback: CI must complete in <10 min — engineers wait
Fail fast: run cheapest checks first (lint before integration tests)
Deterministic: same commit = same result every time
Immutable artifacts: tag images with git SHA, never "latest"
Cache aggressively: deps, build outputs, Docker layers
Parallelize: run independent jobs concurrently
Test in isolation: no shared state between test runs
Don't skip tests: every merge to main must be green

Trunk-Based vs Feature Branch

Trunk-Based: everyone merges to main daily — requires feature flags
Forces: small PRs, fast review, continuous integration (not just continuous)
Used by: Google, Meta, Netflix at scale
Feature Branch: long-lived branches per feature
Risk: merge conflicts, integration surprises, big bang merges
Better default for teams: short-lived feature branches (≤2 days) + PR
GitFlow: adds release/hotfix branches — good for versioned software

What Makes a Good Test Suite

Test pyramid: many unit tests, fewer integration, few e2e
Unit tests: fast (<1ms each), isolated, no I/O, no network
Integration tests: test real DB/cache — use testcontainers
e2e tests: test full user flows — slow, run in staging
Contract tests: verify API contracts between services (Pact)
Code coverage: 80% is a reasonable target — 100% is diminishing returns
Mutation testing: verifies tests actually catch real bugs

Quiz — CI Fundamentals

Q1. Why should Docker images be tagged with git SHA instead of "latest"?

SHA tags are shorter and easier to type "latest" is mutable — SHA tags are immutable, giving reproducible and traceable deployments Docker registries don't support the "latest" tag SHA tags are required for Kubernetes deployments

Q2. The test pyramid recommends:

Equal numbers of unit, integration, and e2e tests More e2e tests since they cover the most code Many fast unit tests, fewer integration tests, very few e2e tests Skipping unit tests — only integration tests matter

02GitHub Actions — Deep Dive

Complete Production Pipeline

# .github/workflows/ci.yml
name: CI/CD Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  # Job 1: Fast checks — run first to fail fast
  lint-and-test:
    runs-on: ubuntu-latest
    services:
      postgres:                          # sidecar container for integration tests
        image: postgres:15
        env:
          POSTGRES_PASSWORD: test
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
    steps:
      - uses: actions/checkout@v4

      - name: Setup Go
        uses: actions/setup-go@v5
        with:
          go-version: '1.22'
          cache: true                    # caches go module downloads

      - name: Cache build artifacts
        uses: actions/cache@v4
        with:
          path: ~/.cache/go-build
          key: ${{ runner.os }}-gobuild-${{ hashFiles('**/*.go') }}
          restore-keys: ${{ runner.os }}-gobuild-

      - name: Lint
        uses: golangci/golangci-lint-action@v4
        with:
          version: latest

      - name: Run tests
        run: go test -race -coverprofile=coverage.out ./...
        env:
          DATABASE_URL: postgres://postgres:test@localhost/testdb

      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          file: coverage.out

  # Job 2: Build and push image (only on main)
  build-push:
    needs: lint-and-test
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
      digest: ${{ steps.build.outputs.digest }}
    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3   # enables BuildKit

      - name: Login to registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=sha-           # sha-abc1234
            type=ref,event=branch          # main

      - name: Build and push
        id: build
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          cache-from: type=gha            # GitHub Actions cache for Docker layers
          cache-to: type=gha,mode=max
          sbom: true                      # generate software bill of materials
          provenance: true               # SLSA provenance attestation

  # Job 3: Deploy (only after image built)
  deploy-staging:
    needs: build-push
    runs-on: ubuntu-latest
    environment: staging                 # requires manual approval if configured
    steps:
      - name: Deploy to staging
        run: |
          # Update Helm values with new image tag
          helm upgrade --install myapp ./charts/myapp \
            --set image.tag=${{ needs.build-push.outputs.image-tag }} \
            --namespace staging \
            --wait \
            --timeout 5m

Key Actions Concepts

Workflow: YAML file — triggered by events
Job: runs on a runner (ubuntu-latest, self-hosted)
Step: individual task within a job — uses: or run:
needs: declare job dependencies for ordering
outputs: pass data between jobs
matrix: run job for multiple values (Go versions, OS)
environment: deployment target with required reviewers
concurrency: cancel in-progress runs on new push

Secrets & Variables

Secrets: encrypted, never logged — ${{ secrets.MY_SECRET }}
GITHUB_TOKEN: auto-generated per run — use for registry login, PR comments
Variables: non-sensitive config — ${{ vars.REGION }}
Environments: scope secrets to staging/production environments
Required reviewers on environments gate production deploys
Never echo secrets — GitHub masks them but avoid it anyway
Rotate secrets periodically — use short-lived OIDC tokens where possible

Caching Strategies

actions/cache: key/restore-keys pattern for any directory
Docker layer cache: cache-from: type=gha — reuses unchanged layers
Go modules: cache ~/go/pkg/mod keyed by go.sum hash
Node modules: cache node_modules keyed by package-lock.json hash
Cache key strategy: broad restore-key for partial hits, exact key for full hits
Cache size limit: 10GB per repo in GitHub Actions

Matrix Builds

Test across multiple Go/Node versions simultaneously
Test across multiple OS (ubuntu, macos, windows)
All matrix combinations run in parallel
fail-fast: false — don't cancel others if one fails
include: add extra variables to specific combinations
exclude: remove specific combinations
Max 256 matrix jobs per workflow run

Quiz — GitHub Actions

Q1. In GitHub Actions, what does needs: [lint-and-test] on a job mean?

The job will skip if lint-and-test fails The job won't start until lint-and-test completes successfully The job runs in parallel with lint-and-test The job will reuse lint-and-test's runner

Q2. What is the best cache key strategy for Go module dependencies?

Use the git commit SHA — always fresh Use a hash of go.sum — cache is valid as long as dependencies don't change Use the current date — cache refreshes daily Never cache — dependencies change too often

03Deployment Strategies

Blue-Green Deployment

# Two identical environments: blue (live) and green (new version)
# Switch traffic atomically — instant rollback by switching back

## Kubernetes implementation with two Deployments:
# Blue (currently serving traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-blue
spec:
  replicas: 5
  selector:
    matchLabels:
      app: myapp
      version: blue

---
# Green (new version, not yet receiving traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-green
spec:
  replicas: 5
  selector:
    matchLabels:
      app: myapp
      version: green

---
# Service selects by version label — change this to switch traffic
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
    version: blue   # change to "green" to switch traffic — zero downtime

# Switch:  kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'
# Rollback: kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}'
# After validation: delete blue deployment to save resources

Canary Deployment

# Canary: route small % of traffic to new version first
# Gradually increase % as confidence grows
# Rollback by setting canary weight to 0

## Nginx Ingress canary annotation approach:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-canary
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"  # 10% of traffic
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-v2
            port: { number: 80 }

## Canary with Argo Rollouts (automated, metric-based):
# Defines a progressive rollout with automatic pause and analysis
apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
  strategy:
    canary:
      steps:
      - setWeight: 5     # 5% traffic
      - pause: {duration: 5m}
      - setWeight: 20    # 20% traffic
      - analysis:        # check error rate before proceeding
          templates:
          - templateName: error-rate
      - setWeight: 50
      - pause: {duration: 10m}
      - setWeight: 100

# Analysis: if error rate > 1%, automatically rollback the canary

Rolling Update (Kubernetes Default)

Replace pods one by one (or in batches) — default K8s strategy
maxSurge: extra pods during update (25% or count)
maxUnavailable: pods that can be unavailable (25% or count)
Zero downtime if readiness probes are correct
Rollback: kubectl rollout undo deployment/myapp
Issue: both versions run simultaneously — API must be backward compatible

Deployment Strategy Comparison

Rolling: default — simple, zero downtime, mixed versions
Blue-Green: instant cutover, full resource cost (2x), simple rollback
Canary: real traffic testing, gradual confidence, complex routing
Recreate: downtime but no mixed versions — DB migrations
A/B: canary by user segment — needs user ID routing
Choose canary for user-facing changes; rolling for internal services

Feature Flags

Decouple deployment from release — deploy dark, enable gradually
Target by: user ID, percentage, region, beta users, employee
Tools: LaunchDarkly, Unleash (self-hosted), Flagsmith
Trunk-based development: feature flags allow unfinished code on main
Kill switch: disable broken feature instantly without redeploy
Lifecycle: create → test in prod → full rollout → clean up (delete flag!)
Tech debt: stale flags are a liability — set expiry dates

Database Migrations with Deploys

Never break backward compatibility in a migration
Expand-contract pattern: add new column (expand) → migrate data → remove old (contract) across 3 deploys
Run migrations before deploying new code — code must handle both old and new schema
Tooling: Flyway, Liquibase, golang-migrate
Zero-downtime: avoid locks on large tables — use ADD COLUMN DEFAULT NULL not NOT NULL
Always test migration rollback — does migrate down work?

04GitOps & ArgoCD

GitOps Principles

GitOps = Git as the single source of truth for declarative infrastructure and application state. Four core principles:

1. Declarative: the entire system is described declaratively (Kubernetes manifests, Helm charts).
2. Versioned & immutable: desired state stored in Git — every change is a commit, fully auditable.
3. Pulled automatically: approved changes are applied automatically by a software agent (ArgoCD, Flux).
4. Continuously reconciled: agent detects drift between desired (Git) and actual (cluster) state and corrects it.

# GitOps workflow:
# 1. Developer pushes code → CI builds image, tags with git SHA
# 2. CI updates image tag in the GitOps repo (Helm values or kustomize)
# 3. PR created in GitOps repo — reviewed and merged
# 4. ArgoCD detects difference between Git and cluster state
# 5. ArgoCD applies the diff — new image deployed
# 6. Rollback = git revert → ArgoCD reconciles back to old state

# CI updates image tag automatically:
- name: Update image tag in GitOps repo
  run: |
    git clone https://github.com/org/gitops-repo.git
    cd gitops-repo
    sed -i "s|image.tag:.*|image.tag: ${{ env.IMAGE_TAG }}|" apps/myapp/values.yaml
    git commit -am "chore: update myapp to ${{ env.IMAGE_TAG }}"
    git push

ArgoCD Configuration

# ArgoCD Application — declares what to deploy and where
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-production
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/org/gitops-repo
    targetRevision: main
    path: apps/myapp
    helm:
      valueFiles:
        - values-production.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true      # delete resources removed from Git
      selfHeal: true   # revert manual kubectl changes (drift correction)
    syncOptions:
      - CreateNamespace=true
    retry:
      limit: 3
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

# App of Apps pattern — one root Application manages all other Applications
# Allows managing multiple services from a single ArgoCD root

# Sync Waves — control order of resource creation:
# metadata.annotations:
#   argocd.argoproj.io/sync-wave: "-1"  # apply first (migrations, CRDs)
#   argocd.argoproj.io/sync-wave: "0"   # default
#   argocd.argoproj.io/sync-wave: "1"   # apply after wave 0

ArgoCD vs Flux

ArgoCD: UI-driven, explicit Application resources, multi-cluster, RBAC
Flux: CLI-first, GitRepository + Kustomization CRDs, Helm operator, simpler
ArgoCD: better for teams that want visibility and multi-cluster management
Flux: better for fully automated, code-first GitOps
Both: pull-based (agent inside cluster pulls from Git — no cluster access needed from CI)
Pull vs push: pull is safer — cluster credentials never leave the cluster

Helm Deep Dive

Helm = Kubernetes package manager — templates + values
values.yaml: default values — override per environment
helm upgrade --install: idempotent — create or update
--atomic: rollback automatically on failure
--wait: wait for all resources to be ready
Chart hooks: pre-install, post-install, pre-upgrade — run jobs at lifecycle points
Subcharts: dependencies in Chart.yaml — include postgres, redis as deps

Quiz — GitOps

Q1. What is the "pull" model in GitOps (ArgoCD/Flux)?

CI pushes manifests directly to the cluster via kubectl An agent running inside the cluster pulls desired state from Git and applies it — cluster credentials never leave the cluster Git pulls from the cluster to detect state changes Developers manually pull and apply changes

Q2. ArgoCD selfHeal means:

ArgoCD automatically fixes security vulnerabilities ArgoCD automatically reverts manual kubectl changes that drift from the Git-declared state ArgoCD automatically scales up when load increases ArgoCD retries failed deployments

05Secrets Management — Vault & Beyond

HashiCorp Vault

# Vault is the industry standard for secrets management
# Core concepts: secrets engines, auth methods, policies, leases

## Enable the KV secrets engine
vault secrets enable -path=secret kv-v2

## Store a secret
vault kv put secret/myapp/production \
  DATABASE_URL="postgres://user:pass@db:5432/prod" \
  API_KEY="sk_prod_abc123"

## Kubernetes auth method — pods authenticate using ServiceAccount JWT
vault auth enable kubernetes
vault write auth/kubernetes/config \
  kubernetes_host="https://kubernetes.default.svc"

## Policy — what a service account can access
vault policy write myapp-policy - <## Bind K8s service account to policy
vault write auth/kubernetes/role/myapp \
  bound_service_account_names=myapp \
  bound_service_account_namespaces=production \
  policies=myapp-policy \
  ttl=1h   # lease expires in 1h — auto-renew or re-auth

## In the pod: use Vault Agent Sidecar or ESO to inject secrets
## Vault Agent runs as sidecar, authenticates, writes secrets to shared volume
annotations:
  vault.hashicorp.com/agent-inject: "true"
  vault.hashicorp.com/role: "myapp"
  vault.hashicorp.com/agent-inject-secret-config: "secret/data/myapp/production"

External Secrets Operator (ESO)

Kubernetes operator — syncs secrets from external providers into K8s Secrets
Supports: Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault
ExternalSecret resource: define what to sync and where
Refresh interval: secrets auto-updated when source changes
Better than Vault Agent sidecar: no sidecar per pod, centrally managed
Use with Sealed Secrets (Bitnami) to encrypt secrets in Git

Secrets Anti-Patterns

❌ Hardcode secrets in code or Dockerfiles
❌ Store secrets in Git (even in private repos)
❌ Long-lived static credentials — prefer short-lived with rotation
❌ Share secrets between environments (prod key in dev)
❌ Echo secrets in CI logs — GitHub masks them but avoid it
✅ Use OIDC for CI — no static credentials at all
✅ Rotate secrets regularly, audit who accessed what

OIDC for CI/CD

GitHub Actions can authenticate to AWS/GCP/Azure without static credentials
GitHub issues short-lived OIDC token per workflow run
Cloud provider validates token → issues short-lived cloud credentials
No AWS_ACCESS_KEY_ID stored in GitHub Secrets
Configure: id-token: write permission + cloud provider trust policy
This is the modern, most secure approach for CI cloud access

Secret Rotation

Vault dynamic secrets: generate DB creds on demand, expire after TTL
Database secrets engine: creates temporary Postgres users per request
Revocation: Vault can revoke all leases on a path instantly
AWS IAM: use IAM roles with STS AssumeRole — no long-lived keys
Rotation triggers: time-based (90 days), on breach, on engineer departure
Zero-downtime rotation: ensure app can re-read secrets without restart

Quiz — Secrets Management

Q1. OIDC for CI/CD (e.g., GitHub Actions to AWS) is better than static credentials because:

OIDC is faster to configure No long-lived static credentials are stored — GitHub issues short-lived tokens per run that expire automatically OIDC gives broader permissions than IAM keys OIDC works without any cloud provider configuration

Q2. Vault dynamic secrets for a database means:

Vault stores a fixed username/password that rotates every 90 days Vault generates unique, temporary database credentials per request that expire after a TTL Vault encrypts the database at rest The database manages its own credentials

06Pipeline Optimization & Observability

Making CI Faster

Parallelize: run lint, test, security scan as parallel jobs
Cache layers: Docker BuildKit cache mounts for apt-get, go modules
Test sharding: split tests across multiple runners (pytest-split, go test -run)
Incremental builds: only rebuild changed packages (Bazel, Nx, Turborepo)
Self-hosted runners: larger machines, pre-installed deps, no cold starts
Fail fast: lint first — cheapest check before expensive ones
Target: main branch CI under 5 minutes, PR CI under 10 minutes

Optimized Dockerfile

Multi-stage build: build stage → minimal runtime image
Layer order: COPY dependencies first, then source code
Base image: distroless or alpine for minimal attack surface
Non-root user: always run as non-root in production
.dockerignore: exclude .git, node_modules, test files
Pin base image with digest: FROM golang:1.22@sha256:abc123
BuildKit cache mounts: RUN --mount=type=cache,target=/root/.cache go build

Pipeline Observability

Track: pipeline duration, failure rate, flaky test rate
Alert on: CI failure rate > 10%, build time > 10 min
Flaky tests: auto-retry 2x then report — fix or quarantine, never ignore
DORA metrics: deployment frequency, lead time, MTTR, change failure rate
Tools: Datadog CI Visibility, BuildKite, GitHub Actions insights
Mean time to restore (MTTR): how fast can you detect and fix a broken deploy?

DORA Metrics — Elite Team Targets

Deployment Frequency: multiple times per day (elite)
Lead Time for Changes: < 1 hour from commit to prod (elite)
Change Failure Rate: < 5% of deployments cause incidents (elite)
MTTR: < 1 hour to restore after failure (elite)
These metrics predict software delivery performance and business outcomes
Source: Google's State of DevOps report (annual)

Optimized Multi-Stage Dockerfile (Go)

# syntax=docker/dockerfile:1

# Stage 1: Build
FROM golang:1.22-alpine AS builder
WORKDIR /app

# Copy and cache dependencies FIRST (changes less often)
COPY go.mod go.sum ./
RUN --mount=type=cache,target=/go/pkg/mod \  # BuildKit cache mount
    go mod download

# Copy source (changes more often — invalidates cache here, not above)
COPY . .

RUN --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/root/.cache/go-build \
    CGO_ENABLED=0 GOOS=linux go build -ldflags="-w -s" -o /app/server ./cmd/server

# Stage 2: Minimal runtime image
FROM gcr.io/distroless/static-debian12
# distroless: no shell, no package manager, minimal CVE surface

COPY --from=builder /app/server /server

# Non-root user — distroless has nonroot:65532 built in
USER nonroot:nonroot

EXPOSE 8080
ENTRYPOINT ["/server"]

# Result: ~5MB image vs ~300MB for golang:latest
# Build time with cache: ~10s vs ~60s without

07Best Resources

DOCS

GitHub Actions Documentation

Official docs — start with Quickstart, then Workflow syntax reference. Most complete source.

VIDEO

TechWorld with Nana — CI/CD & DevOps

Best YouTube channel for CI/CD fundamentals. Watch: GitHub Actions, ArgoCD, Helm courses. Clear and structured.

DOCS

ArgoCD Official Documentation

Read: Getting Started, Application CRD, App of Apps pattern, Sync Waves. Takes ~3 hours.

BOOK

DORA State of DevOps Research

Annual Google research report. The definitive data on what makes high-performing engineering teams. Free.

DOCS

HashiCorp Vault Documentation

Start with: Getting Started, Kubernetes auth method, KV secrets engine, dynamic database secrets.

VIDEO

GitHub OIDC with AWS — No Static Credentials

How to eliminate AWS_ACCESS_KEY from GitHub Secrets using OIDC. 20 minutes. Essential for secure CI.

Quiz — Optimization

Q1. In a Dockerfile, why should you COPY go.mod/go.sum before COPY . .?

go.mod must be processed before source files by the compiler Docker layer caching — dependencies change less often than source code. This way, unchanged deps don't re-download on every code change. Security requirement — dependencies must be verified first It has no effect on build time

Q2. Which DORA metric measures how quickly a team recovers from a production incident?

Deployment Frequency Lead Time for Changes Change Failure Rate Mean Time to Restore (MTTR)

CI/CD in a Day

01CI Fundamentals — What Good CI Looks Like

The CI Pipeline Stages

CI Best Practices

Trunk-Based vs Feature Branch

What Makes a Good Test Suite

Quiz — CI Fundamentals

02GitHub Actions — Deep Dive

Complete Production Pipeline

Key Actions Concepts

Secrets & Variables

Caching Strategies

Matrix Builds

Quiz — GitHub Actions

03Deployment Strategies

Blue-Green Deployment

Canary Deployment

Rolling Update (Kubernetes Default)

Deployment Strategy Comparison

Feature Flags

Database Migrations with Deploys

Quiz — Deployment Strategies

04GitOps & ArgoCD

GitOps Principles

ArgoCD Configuration

ArgoCD vs Flux

Helm Deep Dive

Quiz — GitOps

05Secrets Management — Vault & Beyond

HashiCorp Vault

External Secrets Operator (ESO)

Secrets Anti-Patterns

OIDC for CI/CD

Secret Rotation

Quiz — Secrets Management

06Pipeline Optimization & Observability

Making CI Faster

Optimized Dockerfile

Pipeline Observability

DORA Metrics — Elite Team Targets

Optimized Multi-Stage Dockerfile (Go)

07Best Resources

Quiz — Optimization