Case Studies · Last updated June 2026

Case Study: One CI/CD Pipeline Library for 10+ Projects — Reusable GitHub Actions Workflows and Self-Hosted Runners

The Short Version

One versioned workflow library now powers CI/CD for 10+ client projects
Bootstrapping a pipeline for a new project went from about a day of YAML copy-paste to under an hour
GitHub-hosted minutes spend cut roughly 60% by moving Docker-heavy builds to self-hosted EC2 runners
Typical build time down about 40% (12 minutes to 7) thanks to persistent Docker layer and dependency caches
Long-lived AWS access keys removed from every repository, replaced with OIDC

Ten Repos, Ten Slightly Different Pipelines

Every client project I managed had its own copy of the workflow YAML. They had all started from the same template at some point and then drifted as each repo collected its own tweaks. When I fixed a caching bug in one pipeline, the other ten still had it. Rolling a fix out meant opening near-identical pull requests across 10+ repositories and chasing a review on each one.

The drift caused worse problems than the duplication. A couple of repos had lost their test gate during unrelated edits, so merges deployed without tests blocking them. Docker builds ran on GitHub-hosted runners, where every job starts on a blank machine, so image layers rebuilt from scratch on every run and the paid minutes added up. And each repository stored its own long-lived AWS access keys as secrets: ten-plus sets of credentials, none of them rotating on any real schedule.

One .github Repository, Five Reusable Workflows

Whatever replaced the copy-paste had to cover a mixed set of stacks: Node and React frontends, Django and Node APIs, some projects deploying to ECS and others to EKS. It also had to stay simple, because I am the library's only maintainer and could not afford an abstraction that needed constant care.

GitHub Actions lets a workflow in one repository be called from another through workflow_call. I built the shared pipelines into the organization's central .github repository, which works out to a monorepo of workflows that every project consumes:

build-and-test — installs dependencies, runs lint and the test suite, with version inputs for Node and Python
docker-build-push — builds the image and pushes it to the client's own ECR registry
deploy-ecs — registers a new task definition revision and updates the service
deploy-eks — commits the image tag bump to the GitOps repo, where Argo CD takes over (covered in the EKS GitOps case study)
terraform-plan-apply — plan on pull request, apply on merge

A consuming repository's entire pipeline is a caller workflow of about thirty lines:

# .github/workflows/pipeline.yml — the whole CI/CD config in a consuming repo
name: pipeline

on:
  push:
    branches: [develop, main]
  pull_request:

permissions:
  id-token: write   # mint the OIDC token for AWS
  contents: read

jobs:
  test:
    uses: org/.github/.github/workflows/build-and-test.yml@v2
    with:
      node-version: "20"

  image:
    needs: test
    if: github.event_name == 'push'
    uses: org/.github/.github/workflows/docker-build-push.yml@v2
    with:
      ecr-repository: client-api
      runner-labels: "self-hosted,docker"
    secrets:
      aws-role-arn: ${{ secrets.AWS_DEPLOY_ROLE_ARN }}

  deploy:
    needs: image
    uses: org/.github/.github/workflows/deploy-ecs.yml@v2
    with:
      cluster: api-cluster
      service: api
      environment: ${{ github.ref_name == 'main' && 'staging' || 'dev' }}
    secrets:
      aws-role-arn: ${{ secrets.AWS_DEPLOY_ROLE_ARN }}

All the actual logic — caching strategy, ECR login, task definition templating, rollout checks — lives behind the uses: lines. The caller declares what to build and where it deploys.

I migrated the projects one repo at a time, with no deploy freeze. All of these were active projects, so each one moved to the caller workflow without pausing its releases.

Tags and Major Versions, Like Any Other Dependency

The library is versioned with tags, and consuming repos pin a major version: @v1, @v2. Upgrades are deliberate. A breaking change gets a new major version and a heads-up to the teams consuming it, and each repo moves over when it suits them. Nothing forces an upgrade on a project mid-sprint, and when a pipeline behaves oddly the pinned tag tells me exactly which workflow code ran.

OIDC to AWS Instead of Stored Keys

Each client runs in their own AWS account with strict isolation, so the library never shares credentials or registries across clients. Every workflow that touches AWS authenticates through GitHub Actions OIDC using aws-actions/configure-aws-credentials with a role-to-assume per project. GitHub mints a short-lived token for the job, AWS validates it against the identity provider, and the job receives temporary credentials scoped to that project's role in that client's account.

The trust policy on each role is scoped to a specific repository and branch, so a token minted in one repo cannot assume another project's role:

"Condition": {
  "StringEquals": {
    "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
  },
  "StringLike": {
    "token.actions.githubusercontent.com:sub": "repo:org/client-api:ref:refs/heads/main"
  }
}

Once every pipeline was on OIDC, I deleted the long-lived access keys from every repository. That continued the direction set by the pipeline security work in an earlier engagement: credentials that expire in minutes and never sit in repo settings.

Self-Hosted EC2 Runners for the Docker-Heavy Jobs

GitHub-hosted runners give you a clean machine per job, which is good for isolation and bad for Docker builds, since no layer cache survives between runs. I moved the Docker-heavy jobs to self-hosted runners on EC2, where the Docker layer cache and the npm and pip caches persist on the instance disk. A typical build dropped from about 12 minutes to 7.

Runners carry capability labels such as docker and terraform, and the reusable workflows request what they need. Consuming repos never know anything about the runner fleet.

Self-hosted runners have sharp edges around untrusted code, so a few rules are fixed:

Runners serve private repositories only
Pull requests from forks never execute on self-hosted runners
The runner AMI is re-imaged on a regular cycle rather than patched in place

The cost side held up too. A t3.large on demand costs around eight cents an hour, so keeping one alive through business hours comes to roughly $20 a month. GitHub's larger hosted runners bill per minute for every minute of every build, and a 12-minute Docker build with zero cache reuse keeps the meter running the whole time, across every push, across 10+ projects. Moving those jobs onto the EC2 runners cut the hosted-minutes spend by about 60%.

Promotion Tied to Branches and GitHub Environments

The promotion path lives in the library, so every project inherits it. Merging to develop deploys the dev environment, merging to main deploys staging, and production deploys go through a GitHub Environment with required reviewers and protection rules. Because the path is inherited from the library, no project can skip staging, and every production deploy goes through the same gated workflow.

Deployments to ECS and EKS roll out gradually and are gated on health checks, so a bad image stops its own rollout instead of replacing every healthy task.

What Changed

A pipeline fix is now one tagged release instead of 10+ pull requests. When the next caching improvement came along, I rolled it out everywhere by cutting a new tag and posting a note to the consuming teams.

CI spend on hosted minutes is down about 60%, and typical builds are about 40% faster. The failed-deploy rate dropped as well, because every project runs the same tested gates instead of its own drifted copy.

Onboarding a new client repository now means copying the thirty-line caller workflow, creating an OIDC role in the client's AWS account, and adding the role ARN as a secret. Under an hour, where it used to take about a day of copy-paste and debugging.

Looking Back

If I were starting over, I would tag the library from the very first commit. Early on I edited shared workflows in place and broke consuming repos on their next run, which is the same lesson the Terraform module work taught me about anything other people depend on. I would also autoscale the runners sooner, since I paid for idle EC2 longer than I should have and actions-runner-controller or even simple scheduled scaling would have covered the usage pattern. And I would set a cache eviction policy before the runner disks fill, because mine filled mid-deploy, which is the worst possible moment to learn that docker system prune needs a schedule.

I build and maintain CI/CD pipelines like this for client teams.

If your team is patching the same workflow YAML across a dozen repos, I can fold it into one versioned reusable-workflow library, wire up OIDC to AWS, and migrate repo by repo without freezing your deploys.

See Services