PIPELINES ยท GITOPS ยท DEPLOYMENTS ยท SECRETS ยท OBSERVABILITY

CI/CD in a Day

Everything from writing a great GitHub Actions pipeline to GitOps with ArgoCD, deployment strategies (canary, blue-green), feature flags, secrets management, and pipeline optimization. Production-grade patterns used at scale.

H 0โ€“1
CI Fundamentals
H 1โ€“2
GitHub Actions Deep
H 2โ€“3
Deployment Strategies
H 3โ€“4
GitOps & ArgoCD
H 4โ€“5
Secrets & Vault
H 5โ€“6
Optimization & Observability

01CI Fundamentals โ€” What Good CI Looks Like

The CI Pipeline Stages

  • Trigger: push, pull_request, schedule, manual
  • Checkout: fetch code at specific commit SHA
  • Build: compile, generate artifacts
  • Test: unit โ†’ integration โ†’ e2e (fast to slow)
  • Lint / Static Analysis: code quality, security scan
  • Package: Docker build, binary, SBOM
  • Publish: push to registry with immutable tag
  • Notify: Slack, GitHub PR comment, PagerDuty

CI Best Practices

  • Fast feedback: CI must complete in <10 min โ€” engineers wait
  • Fail fast: run cheapest checks first (lint before integration tests)
  • Deterministic: same commit = same result every time
  • Immutable artifacts: tag images with git SHA, never "latest"
  • Cache aggressively: deps, build outputs, Docker layers
  • Parallelize: run independent jobs concurrently
  • Test in isolation: no shared state between test runs
  • Don't skip tests: every merge to main must be green

Trunk-Based vs Feature Branch

  • Trunk-Based: everyone merges to main daily โ€” requires feature flags
  • Forces: small PRs, fast review, continuous integration (not just continuous)
  • Used by: Google, Meta, Netflix at scale
  • Feature Branch: long-lived branches per feature
  • Risk: merge conflicts, integration surprises, big bang merges
  • Better default for teams: short-lived feature branches (โ‰ค2 days) + PR
  • GitFlow: adds release/hotfix branches โ€” good for versioned software

What Makes a Good Test Suite

  • Test pyramid: many unit tests, fewer integration, few e2e
  • Unit tests: fast (<1ms each), isolated, no I/O, no network
  • Integration tests: test real DB/cache โ€” use testcontainers
  • e2e tests: test full user flows โ€” slow, run in staging
  • Contract tests: verify API contracts between services (Pact)
  • Code coverage: 80% is a reasonable target โ€” 100% is diminishing returns
  • Mutation testing: verifies tests actually catch real bugs

Quiz โ€” CI Fundamentals

Q1. Why should Docker images be tagged with git SHA instead of "latest"?

Q2. The test pyramid recommends:

02GitHub Actions โ€” Deep Dive

Complete Production Pipeline

# .github/workflows/ci.yml name: CI/CD Pipeline on: push: branches: [main] pull_request: branches: [main] env: REGISTRY: ghcr.io IMAGE_NAME: ${{ github.repository }} jobs: # Job 1: Fast checks โ€” run first to fail fast lint-and-test: runs-on: ubuntu-latest services: postgres: # sidecar container for integration tests image: postgres:15 env: POSTGRES_PASSWORD: test options: >- --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5 steps: - uses: actions/checkout@v4 - name: Setup Go uses: actions/setup-go@v5 with: go-version: '1.22' cache: true # caches go module downloads - name: Cache build artifacts uses: actions/cache@v4 with: path: ~/.cache/go-build key: ${{ runner.os }}-gobuild-${{ hashFiles('**/*.go') }} restore-keys: ${{ runner.os }}-gobuild- - name: Lint uses: golangci/golangci-lint-action@v4 with: version: latest - name: Run tests run: go test -race -coverprofile=coverage.out ./... env: DATABASE_URL: postgres://postgres:test@localhost/testdb - name: Upload coverage uses: codecov/codecov-action@v4 with: file: coverage.out # Job 2: Build and push image (only on main) build-push: needs: lint-and-test if: github.ref == 'refs/heads/main' runs-on: ubuntu-latest permissions: contents: read packages: write outputs: image-tag: ${{ steps.meta.outputs.tags }} digest: ${{ steps.build.outputs.digest }} steps: - uses: actions/checkout@v4 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 # enables BuildKit - name: Login to registry uses: docker/login-action@v3 with: registry: ${{ env.REGISTRY }} username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Extract metadata id: meta uses: docker/metadata-action@v5 with: images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} tags: | type=sha,prefix=sha- # sha-abc1234 type=ref,event=branch # main - name: Build and push id: build uses: docker/build-push-action@v5 with: context: . push: true tags: ${{ steps.meta.outputs.tags }} cache-from: type=gha # GitHub Actions cache for Docker layers cache-to: type=gha,mode=max sbom: true # generate software bill of materials provenance: true # SLSA provenance attestation # Job 3: Deploy (only after image built) deploy-staging: needs: build-push runs-on: ubuntu-latest environment: staging # requires manual approval if configured steps: - name: Deploy to staging run: | # Update Helm values with new image tag helm upgrade --install myapp ./charts/myapp \ --set image.tag=${{ needs.build-push.outputs.image-tag }} \ --namespace staging \ --wait \ --timeout 5m

Key Actions Concepts

  • Workflow: YAML file โ€” triggered by events
  • Job: runs on a runner (ubuntu-latest, self-hosted)
  • Step: individual task within a job โ€” uses: or run:
  • needs: declare job dependencies for ordering
  • outputs: pass data between jobs
  • matrix: run job for multiple values (Go versions, OS)
  • environment: deployment target with required reviewers
  • concurrency: cancel in-progress runs on new push

Secrets & Variables

  • Secrets: encrypted, never logged โ€” ${{ secrets.MY_SECRET }}
  • GITHUB_TOKEN: auto-generated per run โ€” use for registry login, PR comments
  • Variables: non-sensitive config โ€” ${{ vars.REGION }}
  • Environments: scope secrets to staging/production environments
  • Required reviewers on environments gate production deploys
  • Never echo secrets โ€” GitHub masks them but avoid it anyway
  • Rotate secrets periodically โ€” use short-lived OIDC tokens where possible

Caching Strategies

  • actions/cache: key/restore-keys pattern for any directory
  • Docker layer cache: cache-from: type=gha โ€” reuses unchanged layers
  • Go modules: cache ~/go/pkg/mod keyed by go.sum hash
  • Node modules: cache node_modules keyed by package-lock.json hash
  • Cache key strategy: broad restore-key for partial hits, exact key for full hits
  • Cache size limit: 10GB per repo in GitHub Actions

Matrix Builds

  • Test across multiple Go/Node versions simultaneously
  • Test across multiple OS (ubuntu, macos, windows)
  • All matrix combinations run in parallel
  • fail-fast: false โ€” don't cancel others if one fails
  • include: add extra variables to specific combinations
  • exclude: remove specific combinations
  • Max 256 matrix jobs per workflow run

Quiz โ€” GitHub Actions

Q1. In GitHub Actions, what does needs: [lint-and-test] on a job mean?

Q2. What is the best cache key strategy for Go module dependencies?

03Deployment Strategies

Blue-Green Deployment

# Two identical environments: blue (live) and green (new version) # Switch traffic atomically โ€” instant rollback by switching back ## Kubernetes implementation with two Deployments: # Blue (currently serving traffic) apiVersion: apps/v1 kind: Deployment metadata: name: myapp-blue spec: replicas: 5 selector: matchLabels: app: myapp version: blue --- # Green (new version, not yet receiving traffic) apiVersion: apps/v1 kind: Deployment metadata: name: myapp-green spec: replicas: 5 selector: matchLabels: app: myapp version: green --- # Service selects by version label โ€” change this to switch traffic apiVersion: v1 kind: Service metadata: name: myapp spec: selector: app: myapp version: blue # change to "green" to switch traffic โ€” zero downtime # Switch: kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}' # Rollback: kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}' # After validation: delete blue deployment to save resources

Canary Deployment

# Canary: route small % of traffic to new version first # Gradually increase % as confidence grows # Rollback by setting canary weight to 0 ## Nginx Ingress canary annotation approach: apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: myapp-canary annotations: nginx.ingress.kubernetes.io/canary: "true" nginx.ingress.kubernetes.io/canary-weight: "10" # 10% of traffic spec: rules: - host: api.example.com http: paths: - path: / pathType: Prefix backend: service: name: myapp-v2 port: { number: 80 } ## Canary with Argo Rollouts (automated, metric-based): # Defines a progressive rollout with automatic pause and analysis apiVersion: argoproj.io/v1alpha1 kind: Rollout spec: strategy: canary: steps: - setWeight: 5 # 5% traffic - pause: {duration: 5m} - setWeight: 20 # 20% traffic - analysis: # check error rate before proceeding templates: - templateName: error-rate - setWeight: 50 - pause: {duration: 10m} - setWeight: 100 # Analysis: if error rate > 1%, automatically rollback the canary

Rolling Update (Kubernetes Default)

  • Replace pods one by one (or in batches) โ€” default K8s strategy
  • maxSurge: extra pods during update (25% or count)
  • maxUnavailable: pods that can be unavailable (25% or count)
  • Zero downtime if readiness probes are correct
  • Rollback: kubectl rollout undo deployment/myapp
  • Issue: both versions run simultaneously โ€” API must be backward compatible

Deployment Strategy Comparison

  • Rolling: default โ€” simple, zero downtime, mixed versions
  • Blue-Green: instant cutover, full resource cost (2x), simple rollback
  • Canary: real traffic testing, gradual confidence, complex routing
  • Recreate: downtime but no mixed versions โ€” DB migrations
  • A/B: canary by user segment โ€” needs user ID routing
  • Choose canary for user-facing changes; rolling for internal services

Feature Flags

  • Decouple deployment from release โ€” deploy dark, enable gradually
  • Target by: user ID, percentage, region, beta users, employee
  • Tools: LaunchDarkly, Unleash (self-hosted), Flagsmith
  • Trunk-based development: feature flags allow unfinished code on main
  • Kill switch: disable broken feature instantly without redeploy
  • Lifecycle: create โ†’ test in prod โ†’ full rollout โ†’ clean up (delete flag!)
  • Tech debt: stale flags are a liability โ€” set expiry dates

Database Migrations with Deploys

  • Never break backward compatibility in a migration
  • Expand-contract pattern: add new column (expand) โ†’ migrate data โ†’ remove old (contract) across 3 deploys
  • Run migrations before deploying new code โ€” code must handle both old and new schema
  • Tooling: Flyway, Liquibase, golang-migrate
  • Zero-downtime: avoid locks on large tables โ€” use ADD COLUMN DEFAULT NULL not NOT NULL
  • Always test migration rollback โ€” does migrate down work?

Quiz โ€” Deployment Strategies

Q1. Blue-green deployment's main advantage over rolling update is:

Q2. Feature flags enable what deployment practice?

Q3. The expand-contract pattern for database migrations means:

04GitOps & ArgoCD

GitOps Principles

GitOps = Git as the single source of truth for declarative infrastructure and application state. Four core principles:

1. Declarative: the entire system is described declaratively (Kubernetes manifests, Helm charts).
2. Versioned & immutable: desired state stored in Git โ€” every change is a commit, fully auditable.
3. Pulled automatically: approved changes are applied automatically by a software agent (ArgoCD, Flux).
4. Continuously reconciled: agent detects drift between desired (Git) and actual (cluster) state and corrects it.

# GitOps workflow: # 1. Developer pushes code โ†’ CI builds image, tags with git SHA # 2. CI updates image tag in the GitOps repo (Helm values or kustomize) # 3. PR created in GitOps repo โ€” reviewed and merged # 4. ArgoCD detects difference between Git and cluster state # 5. ArgoCD applies the diff โ€” new image deployed # 6. Rollback = git revert โ†’ ArgoCD reconciles back to old state # CI updates image tag automatically: - name: Update image tag in GitOps repo run: | git clone https://github.com/org/gitops-repo.git cd gitops-repo sed -i "s|image.tag:.*|image.tag: ${{ env.IMAGE_TAG }}|" apps/myapp/values.yaml git commit -am "chore: update myapp to ${{ env.IMAGE_TAG }}" git push

ArgoCD Configuration

# ArgoCD Application โ€” declares what to deploy and where apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: myapp-production namespace: argocd spec: project: default source: repoURL: https://github.com/org/gitops-repo targetRevision: main path: apps/myapp helm: valueFiles: - values-production.yaml destination: server: https://kubernetes.default.svc namespace: production syncPolicy: automated: prune: true # delete resources removed from Git selfHeal: true # revert manual kubectl changes (drift correction) syncOptions: - CreateNamespace=true retry: limit: 3 backoff: duration: 5s factor: 2 maxDuration: 3m # App of Apps pattern โ€” one root Application manages all other Applications # Allows managing multiple services from a single ArgoCD root # Sync Waves โ€” control order of resource creation: # metadata.annotations: # argocd.argoproj.io/sync-wave: "-1" # apply first (migrations, CRDs) # argocd.argoproj.io/sync-wave: "0" # default # argocd.argoproj.io/sync-wave: "1" # apply after wave 0

ArgoCD vs Flux

  • ArgoCD: UI-driven, explicit Application resources, multi-cluster, RBAC
  • Flux: CLI-first, GitRepository + Kustomization CRDs, Helm operator, simpler
  • ArgoCD: better for teams that want visibility and multi-cluster management
  • Flux: better for fully automated, code-first GitOps
  • Both: pull-based (agent inside cluster pulls from Git โ€” no cluster access needed from CI)
  • Pull vs push: pull is safer โ€” cluster credentials never leave the cluster

Helm Deep Dive

  • Helm = Kubernetes package manager โ€” templates + values
  • values.yaml: default values โ€” override per environment
  • helm upgrade --install: idempotent โ€” create or update
  • --atomic: rollback automatically on failure
  • --wait: wait for all resources to be ready
  • Chart hooks: pre-install, post-install, pre-upgrade โ€” run jobs at lifecycle points
  • Subcharts: dependencies in Chart.yaml โ€” include postgres, redis as deps

Quiz โ€” GitOps

Q1. What is the "pull" model in GitOps (ArgoCD/Flux)?

Q2. ArgoCD selfHeal means:

05Secrets Management โ€” Vault & Beyond

HashiCorp Vault

# Vault is the industry standard for secrets management # Core concepts: secrets engines, auth methods, policies, leases ## Enable the KV secrets engine vault secrets enable -path=secret kv-v2 ## Store a secret vault kv put secret/myapp/production \ DATABASE_URL="postgres://user:pass@db:5432/prod" \ API_KEY="sk_prod_abc123" ## Kubernetes auth method โ€” pods authenticate using ServiceAccount JWT vault auth enable kubernetes vault write auth/kubernetes/config \ kubernetes_host="https://kubernetes.default.svc" ## Policy โ€” what a service account can access vault policy write myapp-policy - <## Bind K8s service account to policy vault write auth/kubernetes/role/myapp \ bound_service_account_names=myapp \ bound_service_account_namespaces=production \ policies=myapp-policy \ ttl=1h # lease expires in 1h โ€” auto-renew or re-auth ## In the pod: use Vault Agent Sidecar or ESO to inject secrets ## Vault Agent runs as sidecar, authenticates, writes secrets to shared volume annotations: vault.hashicorp.com/agent-inject: "true" vault.hashicorp.com/role: "myapp" vault.hashicorp.com/agent-inject-secret-config: "secret/data/myapp/production"

External Secrets Operator (ESO)

  • Kubernetes operator โ€” syncs secrets from external providers into K8s Secrets
  • Supports: Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault
  • ExternalSecret resource: define what to sync and where
  • Refresh interval: secrets auto-updated when source changes
  • Better than Vault Agent sidecar: no sidecar per pod, centrally managed
  • Use with Sealed Secrets (Bitnami) to encrypt secrets in Git

Secrets Anti-Patterns

  • โŒ Hardcode secrets in code or Dockerfiles
  • โŒ Store secrets in Git (even in private repos)
  • โŒ Long-lived static credentials โ€” prefer short-lived with rotation
  • โŒ Share secrets between environments (prod key in dev)
  • โŒ Echo secrets in CI logs โ€” GitHub masks them but avoid it
  • โœ… Use OIDC for CI โ€” no static credentials at all
  • โœ… Rotate secrets regularly, audit who accessed what

OIDC for CI/CD

  • GitHub Actions can authenticate to AWS/GCP/Azure without static credentials
  • GitHub issues short-lived OIDC token per workflow run
  • Cloud provider validates token โ†’ issues short-lived cloud credentials
  • No AWS_ACCESS_KEY_ID stored in GitHub Secrets
  • Configure: id-token: write permission + cloud provider trust policy
  • This is the modern, most secure approach for CI cloud access

Secret Rotation

  • Vault dynamic secrets: generate DB creds on demand, expire after TTL
  • Database secrets engine: creates temporary Postgres users per request
  • Revocation: Vault can revoke all leases on a path instantly
  • AWS IAM: use IAM roles with STS AssumeRole โ€” no long-lived keys
  • Rotation triggers: time-based (90 days), on breach, on engineer departure
  • Zero-downtime rotation: ensure app can re-read secrets without restart

Quiz โ€” Secrets Management

Q1. OIDC for CI/CD (e.g., GitHub Actions to AWS) is better than static credentials because:

Q2. Vault dynamic secrets for a database means:

06Pipeline Optimization & Observability

Making CI Faster

  • Parallelize: run lint, test, security scan as parallel jobs
  • Cache layers: Docker BuildKit cache mounts for apt-get, go modules
  • Test sharding: split tests across multiple runners (pytest-split, go test -run)
  • Incremental builds: only rebuild changed packages (Bazel, Nx, Turborepo)
  • Self-hosted runners: larger machines, pre-installed deps, no cold starts
  • Fail fast: lint first โ€” cheapest check before expensive ones
  • Target: main branch CI under 5 minutes, PR CI under 10 minutes

Optimized Dockerfile

  • Multi-stage build: build stage โ†’ minimal runtime image
  • Layer order: COPY dependencies first, then source code
  • Base image: distroless or alpine for minimal attack surface
  • Non-root user: always run as non-root in production
  • .dockerignore: exclude .git, node_modules, test files
  • Pin base image with digest: FROM golang:1.22@sha256:abc123
  • BuildKit cache mounts: RUN --mount=type=cache,target=/root/.cache go build

Pipeline Observability

  • Track: pipeline duration, failure rate, flaky test rate
  • Alert on: CI failure rate > 10%, build time > 10 min
  • Flaky tests: auto-retry 2x then report โ€” fix or quarantine, never ignore
  • DORA metrics: deployment frequency, lead time, MTTR, change failure rate
  • Tools: Datadog CI Visibility, BuildKite, GitHub Actions insights
  • Mean time to restore (MTTR): how fast can you detect and fix a broken deploy?

DORA Metrics โ€” Elite Team Targets

  • Deployment Frequency: multiple times per day (elite)
  • Lead Time for Changes: < 1 hour from commit to prod (elite)
  • Change Failure Rate: < 5% of deployments cause incidents (elite)
  • MTTR: < 1 hour to restore after failure (elite)
  • These metrics predict software delivery performance and business outcomes
  • Source: Google's State of DevOps report (annual)

Optimized Multi-Stage Dockerfile (Go)

# syntax=docker/dockerfile:1 # Stage 1: Build FROM golang:1.22-alpine AS builder WORKDIR /app # Copy and cache dependencies FIRST (changes less often) COPY go.mod go.sum ./ RUN --mount=type=cache,target=/go/pkg/mod \ # BuildKit cache mount go mod download # Copy source (changes more often โ€” invalidates cache here, not above) COPY . . RUN --mount=type=cache,target=/go/pkg/mod \ --mount=type=cache,target=/root/.cache/go-build \ CGO_ENABLED=0 GOOS=linux go build -ldflags="-w -s" -o /app/server ./cmd/server # Stage 2: Minimal runtime image FROM gcr.io/distroless/static-debian12 # distroless: no shell, no package manager, minimal CVE surface COPY --from=builder /app/server /server # Non-root user โ€” distroless has nonroot:65532 built in USER nonroot:nonroot EXPOSE 8080 ENTRYPOINT ["/server"] # Result: ~5MB image vs ~300MB for golang:latest # Build time with cache: ~10s vs ~60s without

07Best Resources

DOCS
GitHub Actions Documentation

Official docs โ€” start with Quickstart, then Workflow syntax reference. Most complete source.

VIDEO
TechWorld with Nana โ€” CI/CD & DevOps

Best YouTube channel for CI/CD fundamentals. Watch: GitHub Actions, ArgoCD, Helm courses. Clear and structured.

DOCS
ArgoCD Official Documentation

Read: Getting Started, Application CRD, App of Apps pattern, Sync Waves. Takes ~3 hours.

BOOK
DORA State of DevOps Research

Annual Google research report. The definitive data on what makes high-performing engineering teams. Free.

DOCS
HashiCorp Vault Documentation

Start with: Getting Started, Kubernetes auth method, KV secrets engine, dynamic database secrets.

VIDEO
GitHub OIDC with AWS โ€” No Static Credentials

How to eliminate AWS_ACCESS_KEY from GitHub Secrets using OIDC. 20 minutes. Essential for secure CI.

Quiz โ€” Optimization

Q1. In a Dockerfile, why should you COPY go.mod/go.sum before COPY . .?

Q2. Which DORA metric measures how quickly a team recovers from a production incident?