— Cloud & DevOps
Multi-cloud, containers, delivery
The cloud is a commodity layer I assemble — not a vendor I am locked into.
I deploy across GCP, Azure and AWS, with edge runtimes and managed backends, choosing the strongest service for each role and orchestrating across providers. Containers, CI/CD and infrastructure as code sit at the centre; Linux runs underneath at administrator level.
I do not pick a cloud and bend every problem toward it. I pick the strongest service for each role and orchestrate across providers.
Multi-cloud, for me, is not a buzzword — it is a deliberate refusal to let one provider own the architecture. GCP, Azure and AWS each do some things better than the others, and edge runtimes and managed backends fill roles none of the big three cover cleanly. The job is to choose well per role and to keep the whole thing portable.
What makes that possible is the layer underneath: containers and infrastructure-as-code at the centre, so a workload defined once can target whichever provider an engagement already lives in. The cloud becomes a commodity I can swap, rather than a dependency I have to plan around.
And under the orchestrator there is still a host. I run Linux at administrator level — kernel tuning, sysctl hardening, systemd, container runtimes and the networking beneath the services — because the parts most people inherit as defaults are the parts I would rather run deliberately.
primary clouds I deploy to — GCP, Azure and AWS — chosen per role, not per habit
providers I let the architecture depend on; the cloud is a commodity layer I can swap
edge runtimes I run at the network boundary — Cloudflare Workers and Vercel Edge
managed-database backends I keep ready — Supabase, Firebase, PlanetScale, Neon
One application, three clouds, no lock-in.
A workload I keep portable: containers and infrastructure-as-code at the centre, deployable to whichever provider an engagement already runs on. Each provider is wired in for the role it does best, and the edge sits in front of all of them.
Service selection — strongest tool per role
- Model training & serving
- Vertex AI (GCP)
- Analytical queries at scale
- BigQuery (GCP)
- Event bus / fan-out
- Pub/Sub (GCP) · SQS/SNS (AWS)
- Scale-to-zero containers
- Cloud Run (GCP) · Container Apps (Azure)
- Full Kubernetes
- GKE · AKS · EKS
- Event-driven functions
- Cloud Functions · Azure Functions · Lambda
- Object storage
- Cloud Storage (GCP) · S3 (AWS)
- Relational database
- RDS (AWS) · Neon · PlanetScale
- Edge compute
- Cloudflare Workers · Vercel Edge
- CDN
- CloudFront (AWS)
For each role I pick the provider that does it best, then orchestrate across them — the cloud should be a commodity I can swap, never a dependency that owns the product.
What each cloud is actually for.
The four tabs below are not a ranking. Each is a set of roles a given provider does well, and the discipline is matching a workload to the one that fits rather than forcing everything onto a single account. GCP for data and models, Azure inside the Microsoft estate, AWS as the broad default, and the edge and managed backends for the fast path.
Google Cloud — where the data and model work tends to live
GCP is where I put workloads that touch data and models. Vertex AI for training and serving, BigQuery for analytical queries over large tables, and Pub/Sub as the message bus when services need to fan out events without knowing about each other.
For compute I reach for Cloud Run when a container should scale to zero between requests, Cloud Functions for small event-driven handlers, and GKE when a workload needs the full Kubernetes surface. Firestore and Cloud Storage cover document state and objects.
- Vertex AI for model training and serving
- Cloud Run · Cloud Functions · GKE for compute across the scaling spectrum
- Pub/Sub · BigQuery · Firestore · Cloud Storage for messaging, analytics, state and objects
Azure — where an engagement already lives in the Microsoft estate
When a client already runs on Microsoft identity and tooling, fighting that is wasted effort. Azure Kubernetes Service gives the same Kubernetes contract I use elsewhere, so a workload defined as containers and manifests moves across with little change.
Azure Functions covers event-driven compute, Container Apps handles the scale-to-zero container case, and Cognitive Services is a pragmatic route to vision, speech and language when building the model in-house is not the point of the project.
- AKS for managed Kubernetes inside the Microsoft estate
- Functions and Container Apps for event-driven and scale-to-zero compute
- Cognitive Services for vision, speech and language as a managed capability
AWS — the broad default with the deepest service catalogue
AWS is the broad default: when a team is already there, or when a specific managed service is the cleanest answer, I deploy to it directly. Lambda for event-driven functions, ECS or EKS for containers depending on how much Kubernetes the team wants to own.
S3 for durable object storage, RDS for managed relational databases, and SQS and SNS for queues and pub/sub. CloudFront sits in front as the CDN. The same containers and infrastructure-as-code that target GCP also target this.
- Lambda for functions · ECS / EKS for containers
- S3 · RDS for object and relational storage
- SQS · SNS for queuing and pub/sub · CloudFront at the edge
Edge runtimes and managed backends — the fast path
At the network boundary I run Cloudflare Workers and Vercel Edge: code that executes close to the user, with cold starts measured in milliseconds, for routing, auth checks and lightweight transforms before a request ever reaches a region.
For products that need to move quickly, a managed backend earns its keep. Supabase and Firebase give auth, storage and a database without standing up servers; PlanetScale and Neon give managed, branchable SQL. I choose them when speed to a working product matters more than owning the infrastructure.
- Cloudflare Workers and Vercel Edge at the boundary
- Supabase and Firebase as full managed backends
- PlanetScale and Neon for managed, branchable SQL
The unit of deployment is the same wherever it lands.
Docker · Kubernetes · container security
An immutable image, scheduled by Kubernetes, identical across providers.
Everything ships as a container. A workload is packaged as an immutable Docker image, built once, and that exact image is what runs in every environment — there is no rebuild that might quietly differ between staging and production.
Kubernetes schedules it the same way whether the cluster is GKE, AKS or EKS, so the workload is portable by construction. Security is built into the image rather than added later: minimal base images, non-root users, read-only filesystems, dropped Linux capabilities, and a scan before anything is pushed.
- One immutable image, built once, run everywhere
- Scheduled identically on GKE, AKS or EKS
- Minimal base images, non-root, read-only, dropped capabilities
- Scanned before it reaches a registry
Kubernetes workload — operating shape
- Orchestrator
- Kubernetes — GKE, AKS or EKS
- Unit of deploy
- Immutable container image, single build
- Scaling
- Horizontal pod autoscaling on metrics
- Config & secrets
- ConfigMaps and Secrets, mounted at runtime
- Ingress
- Managed load balancer · CDN in front
- Rollout
- Rolling, canary or blue-green
- Image source
- Registry with immutable, signed tags
The image is built once and promoted unchanged.
A pipeline I treat as non-negotiable infrastructure. From a commit, the image is built and tested once, scanned, pushed with an immutable tag, and promoted through every gate — so what runs in production is byte-for-byte what passed the tests.
Commit to production — GitHub Actions / GitLab CI
- 01 Commit A push to the repository is the only trigger; nothing is built by hand.
- 02 Build + test GitHub Actions or GitLab CI builds the container image once and runs the test suite against it.
- 03 Scan The image is scanned for known vulnerabilities and the dependency tree is checked before it can proceed.
- 04 Push The signed image is pushed to a registry with an immutable tag — never overwritten.
- 05 Apply IaC Infrastructure-as-code plans the change, shows the diff, then applies it so the environment matches the repository.
- 06 Promote The same image is rolled out behind a canary or blue-green switch, with the previous version one command away.
The pipeline, as a file.
A trimmed GitHub Actions workflow — build the image once, test it, scan it, then push it with an immutable tag. The same image is later promoted to each environment; nothing is rebuilt downstream.
name: build-and-deploy
on:
push:
branches: [ main ]
jobs:
ship:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
env:
IMAGE: ghcr.io/${{ github.repository }}:${{ github.sha }}
steps:
- uses: actions/checkout@v4
- name: Build image (once)
run: docker build -t "$IMAGE" .
- name: Test
run: docker run --rm "$IMAGE" go test ./...
- name: Scan for vulnerabilities
run: trivy image --exit-code 1 --severity HIGH,CRITICAL "$IMAGE"
- name: Push immutable tag
run: |
echo "${{ secrets.REGISTRY_TOKEN }}" | docker login ghcr.io -u "${{ github.actor }}" --password-stdin
docker push "$IMAGE" The image, defined the way it ships.
A multi-stage Dockerfile — compile in a full build image, then copy only the binary into a minimal runtime that runs as a non-root user. Small surface, nothing in the image that the program does not need.
# --- build stage: full toolchain, thrown away after compile ---
FROM golang:1.22 AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -trimpath -ldflags='-s -w' -o /out/app ./cmd/app
# --- runtime stage: minimal, non-root, only the binary ---
FROM gcr.io/distroless/static:nonroot
COPY --from=build /out/app /app
USER nonroot:nonroot
EXPOSE 8080
ENTRYPOINT ["/app"] Declared, planned, reviewed, applied.
The environment lives in the repository
Nobody clicks production into existence.
Every piece of infrastructure — networks, clusters, queues, databases — is declared as code and lives in version control next to the application. A change is proposed as a diff, planned as a dry run, reviewed like any other code, then applied.
The point is that the repository is the single source of truth. A periodic re-plan catches any manual drift, so the live environment is always reconciled back to what the code says it should be. An environment becomes reproducible rather than the product of remembered console clicks.
- Networks, clusters, queues and databases as code
- Plan shows the diff before anything changes
- Reviewed like code, not clicked in a console
- Drift checks keep the repository authoritative
Infrastructure as code — declare to reconcile
- 01 Write Infrastructure is declared as code — networks, clusters, queues, databases — in version control alongside the application.
- 02 Plan A dry run computes the diff between declared state and live state, so every change is visible before it happens.
- 03 Review The plan is reviewed like any other code change; nobody clicks in a console to mutate production.
- 04 Apply The plan is applied; the live environment now matches the repository exactly.
- 05 Drift check Periodic re-plans catch manual changes, so the declared state stays the single source of truth.
Linux at administrator level.
Above the orchestrator there is Kubernetes; below it there is still a Linux host, and that is a layer I run deliberately rather than inherit.
Containers do not remove the operating system — they sit on it. I run Linux at administrator level: kernel tuning for the workload, sysctl hardening to tighten the runtime kernel surface, systemd to define services with restart policies and resource limits, the container runtimes themselves, and the networking that carries every packet from the edge to a pod.
This is the same instinct that runs through the rest of my work. The parts most people accept as defaults — the kernel parameters, the firewall rules, the base image a container is built from — are the parts I would rather understand and set on purpose, because that is where reliability and security quietly come from.
Kernel tuning
Adjusting kernel parameters for the workload — file-descriptor limits, network buffers, scheduler behaviour — rather than accepting the distribution defaults.
sysctl hardening
Tightening the runtime kernel surface through sysctl: network stack settings, address-space protections, and disabling what a server has no reason to expose.
systemd
Services defined as systemd units with restart policies, resource limits and dependency ordering, so the host behaves predictably across reboots.
Container runtimes
Working at the runtime level — Docker and the OCI layer underneath — including namespaces, cgroups and the image internals, not just the high-level commands.
Networking
The networking underneath the services: routing, firewall rules, DNS, TLS termination and the path a packet takes from the edge to a pod.
Container security
Minimal base images, non-root users, read-only filesystems, dropped capabilities and image scanning — reducing what a compromised container can reach.
A system you cannot see into is one you cannot operate.
Once a workload is spread across functions, containers and queues on more than one provider, you cannot operate it by intuition. Observability is the part that turns a distributed system back into something you can reason about — metrics for trends, logs for detail, traces for the path a request took, and alerts that page on symptoms a user would actually feel.
I treat the observability stack as part of the build, not as something bolted on after the first incident. The four tabs below are the layers I instrument, and the order matters: a metric points at the problem, a trace narrows it to a hop, and the logs explain what happened on that exact request.
Numbers over time, so trends are visible before they become incidents
Metrics are the cheap, always-on signal: request rates, error rates, latency percentiles, resource saturation. They are what an autoscaler reads and what an alert fires on, because they are numeric and continuous.
I instrument the things that map to a user experience — the latency a request actually sees, the error rate a client actually hits — rather than only host-level counters that look healthy while the product is failing.
- Request rate, error rate, latency percentiles
- Resource saturation that drives autoscaling
- Signals tied to user experience, not only host counters
The detail you reach for once a metric has told you where to look
Logs are structured and centralised so a single request can be followed across the services it touched. A metric tells me something is wrong; the logs tell me what, on which request, with which input.
Structured fields matter more than free text — a log you can query and aggregate is worth far more than one you can only read line by line during an incident.
- Structured, queryable, centralised
- A single request traceable across services
- Queryable fields over free-text lines
The shape of a request as it crosses service boundaries
Distributed tracing follows one request through every hop — gateway, service, database, queue — and shows where the time actually went. In a system spread across functions, containers and queues, this is the only honest answer to where is it slow.
A trace turns a vague it feels slow into a specific span that owns most of the latency, which is the difference between guessing and fixing.
- One request followed across every hop
- Latency attributed to the span that owns it
- The honest answer to where the time went
Pages tied to symptoms a user would feel, not to noise
Alerts fire on symptoms — elevated error rate, latency past a threshold, a saturating resource — not on every transient blip. An alert that pages someone at 3 a.m. has to correspond to something a user would actually notice.
The aim is a small number of high-signal alerts. Too many low-signal pages train people to ignore them, which is worse than having none.
- Symptom-based, tied to user-visible impact
- High-signal over high-volume
- Thresholds chosen so a page means something
From the kernel to the cloud, one stack.
Host, container, pipeline, cloud — read bottom to top, the work is one continuous stack rather than four separate concerns.
Each layer rests on the one below it. A hardened Linux host carries a container runtime; an immutable image runs on Kubernetes; a CI/CD pipeline and infrastructure-as-code make the whole thing reproducible; and a multi-cloud deployment distributes it without locking into any one provider.
Pulled apart, these look like separate specialities. Run together, they are a single discipline: deliberate at every layer, portable across providers, and reproducible from a repository rather than from memory.
- Host Linux at administrator level Kernel tuning, sysctl hardening, systemd units, container runtimes and the networking underneath — the layer most people inherit, run deliberately.
- Container Docker and Kubernetes Workloads packaged as immutable container images and run on Kubernetes — GKE, AKS or EKS — so the unit of deployment is the same wherever it lands.
- Pipeline CI/CD and infrastructure as code GitHub Actions and GitLab CI build and promote the image; infrastructure declared as code makes the whole environment reproducible rather than hand-built.
- Cloud Multi-cloud, by role GCP, Azure and AWS — plus edge runtimes and managed backends — selected per role and orchestrated together, with no single provider owning the architecture.
The principles underneath the platform.
The providers and tools change with the engagement; the principles do not. These are the rules I apply whether the target is GCP, Azure, AWS, the edge or a managed backend — the part that makes the platform reproducible rather than incidental.
Pick the strongest service per role
For each role — model serving, the event bus, the database, the edge — I pick the provider that does it best, then orchestrate across them. The result is a system assembled from the right parts, not the convenient ones.
Never depend on a single cloud
Containers and infrastructure-as-code sit at the centre so the same workload can target GCP, Azure or AWS. The cloud is a commodity I can swap, not a dependency that owns the product.
Build the image once, promote it unchanged
An image is built a single time and moved through every gate to production byte-for-byte. What runs in production is exactly what passed the tests, not a rebuild that might differ.
Declare infrastructure, never click it
Networks, clusters, queues and databases are declared as code, planned, reviewed and applied. Nobody mutates production in a console, so the repository stays the single source of truth.
Harden the host underneath
Running Linux at administrator level — kernel tuning, sysctl hardening, systemd, container runtimes, networking — means the layer under the orchestrator is deliberate, not left at defaults.
Make the system observable
Metrics, logs and traces are part of the build, not bolted on after an incident. A system you cannot see into is a system you cannot operate.
Pick the strongest service per role, keep the workload portable with containers and code, build the image once, and harden the host underneath — everything else is detail.
Open to the right work
If you need a platform that runs across clouds without belonging to any one of them, that is the work I do.
If you are holding a problem that doesn't fit inside one field, that is the conversation I want.