Cloud & DevOps · Julian Contreras

— The stance

I do not pick a cloud and bend every problem toward it. I pick the strongest service for each role and orchestrate across providers.

Multi-cloud, for me, is not a buzzword — it is a deliberate refusal to let one provider own the architecture. GCP, Azure and AWS each do some things better than the others, and edge runtimes and managed backends fill roles none of the big three cover cleanly. The job is to choose well per role and to keep the whole thing portable.

What makes that possible is the layer underneath: containers and infrastructure-as-code at the centre, so a workload defined once can target whichever provider an engagement already lives in. The cloud becomes a commodity I can swap, rather than a dependency I have to plan around.

And under the orchestrator there is still a host. I run Linux at administrator level — kernel tuning, sysctl hardening, systemd, container runtimes and the networking beneath the services — because the parts most people inherit as defaults are the parts I would rather run deliberately.

0

primary clouds I deploy to — GCP, Azure and AWS — chosen per role, not per habit

0

providers I let the architecture depend on; the cloud is a commodity layer I can swap

0

edge runtimes I run at the network boundary — Cloudflare Workers and Vercel Edge

0

managed-database backends I keep ready — Supabase, Firebase, PlanetScale, Neon

— Multi-cloud topology

One application, three clouds, no lock-in.

A workload I keep portable: containers and infrastructure-as-code at the centre, deployable to whichever provider an engagement already runs on. Each provider is wired in for the role it does best, and the edge sits in front of all of them.

Service selection — strongest tool per role

Model training & serving: Vertex AI (GCP)
Analytical queries at scale: BigQuery (GCP)
Event bus / fan-out: Pub/Sub (GCP) · SQS/SNS (AWS)
Scale-to-zero containers: Cloud Run (GCP) · Container Apps (Azure)
Full Kubernetes: GKE · AKS · EKS
Event-driven functions: Cloud Functions · Azure Functions · Lambda
Object storage: Cloud Storage (GCP) · S3 (AWS)
Relational database: RDS (AWS) · Neon · PlanetScale
Edge compute: Cloudflare Workers · Vercel Edge
CDN: CloudFront (AWS)

— Provider by provider

What each cloud is actually for.

The four tabs below are not a ranking. Each is a set of roles a given provider does well, and the discipline is matching a workload to the one that fits rather than forcing everything onto a single account. GCP for data and models, Azure inside the Microsoft estate, AWS as the broad default, and the edge and managed backends for the fast path.

Google Cloud — where the data and model work tends to live

GCP is where I put workloads that touch data and models. Vertex AI for training and serving, BigQuery for analytical queries over large tables, and Pub/Sub as the message bus when services need to fan out events without knowing about each other.

For compute I reach for Cloud Run when a container should scale to zero between requests, Cloud Functions for small event-driven handlers, and GKE when a workload needs the full Kubernetes surface. Firestore and Cloud Storage cover document state and objects.

Vertex AI for model training and serving
Cloud Run · Cloud Functions · GKE for compute across the scaling spectrum
Pub/Sub · BigQuery · Firestore · Cloud Storage for messaging, analytics, state and objects

— Containers & Kubernetes

The unit of deployment is the same wherever it lands.

Docker · Kubernetes · container security

An immutable image, scheduled by Kubernetes, identical across providers.

Everything ships as a container. A workload is packaged as an immutable Docker image, built once, and that exact image is what runs in every environment — there is no rebuild that might quietly differ between staging and production.

Kubernetes schedules it the same way whether the cluster is GKE, AKS or EKS, so the workload is portable by construction. Security is built into the image rather than added later: minimal base images, non-root users, read-only filesystems, dropped Linux capabilities, and a scan before anything is pushed.

One immutable image, built once, run everywhere
Scheduled identically on GKE, AKS or EKS
Minimal base images, non-root, read-only, dropped capabilities
Scanned before it reaches a registry

DockerKubernetesContainer security

Kubernetes workload — operating shape

Orchestrator: Kubernetes — GKE, AKS or EKS
Unit of deploy: Immutable container image, single build
Scaling: Horizontal pod autoscaling on metrics
Config & secrets: ConfigMaps and Secrets, mounted at runtime
Ingress: Managed load balancer · CDN in front
Rollout: Rolling, canary or blue-green
Image source: Registry with immutable, signed tags

— CI/CD

The image is built once and promoted unchanged.

A pipeline I treat as non-negotiable infrastructure. From a commit, the image is built and tested once, scanned, pushed with an immutable tag, and promoted through every gate — so what runs in production is byte-for-byte what passed the tests.

Commit to production — GitHub Actions / GitLab CI

01 Commit A push to the repository is the only trigger; nothing is built by hand.
02 Build + test GitHub Actions or GitLab CI builds the container image once and runs the test suite against it.
03 Scan The image is scanned for known vulnerabilities and the dependency tree is checked before it can proceed.
04 Push The signed image is pushed to a registry with an immutable tag — never overwritten.
05 Apply IaC Infrastructure-as-code plans the change, shows the diff, then applies it so the environment matches the repository.
06 Promote The same image is rolled out behind a canary or blue-green switch, with the previous version one command away.

— A small piece of the real thing

The pipeline, as a file.

A trimmed GitHub Actions workflow — build the image once, test it, scan it, then push it with an immutable tag. The same image is later promoted to each environment; nothing is rebuilt downstream.

name: build-and-deploy
on:
  push:
    branches: [ main ]

jobs:
  ship:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    env:
      IMAGE: ghcr.io/${{ github.repository }}:${{ github.sha }}
    steps:
      - uses: actions/checkout@v4

      - name: Build image (once)
        run: docker build -t "$IMAGE" .

      - name: Test
        run: docker run --rm "$IMAGE" go test ./...

      - name: Scan for vulnerabilities
        run: trivy image --exit-code 1 --severity HIGH,CRITICAL "$IMAGE"

      - name: Push immutable tag
        run: |
          echo "${{ secrets.REGISTRY_TOKEN }}" | docker login ghcr.io -u "${{ github.actor }}" --password-stdin
          docker push "$IMAGE"

— A container, as a file

The image, defined the way it ships.

A multi-stage Dockerfile — compile in a full build image, then copy only the binary into a minimal runtime that runs as a non-root user. Small surface, nothing in the image that the program does not need.

# --- build stage: full toolchain, thrown away after compile ---
FROM golang:1.22 AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -trimpath -ldflags='-s -w' -o /out/app ./cmd/app

# --- runtime stage: minimal, non-root, only the binary ---
FROM gcr.io/distroless/static:nonroot
COPY --from=build /out/app /app
USER nonroot:nonroot
EXPOSE 8080
ENTRYPOINT ["/app"]

— Infrastructure as code

Declared, planned, reviewed, applied.

The environment lives in the repository

Nobody clicks production into existence.

Every piece of infrastructure — networks, clusters, queues, databases — is declared as code and lives in version control next to the application. A change is proposed as a diff, planned as a dry run, reviewed like any other code, then applied.

The point is that the repository is the single source of truth. A periodic re-plan catches any manual drift, so the live environment is always reconciled back to what the code says it should be. An environment becomes reproducible rather than the product of remembered console clicks.

Networks, clusters, queues and databases as code
Plan shows the diff before anything changes
Reviewed like code, not clicked in a console
Drift checks keep the repository authoritative

IaCReproducibleVersion-controlled

Infrastructure as code — declare to reconcile

01 Write Infrastructure is declared as code — networks, clusters, queues, databases — in version control alongside the application.
02 Plan A dry run computes the diff between declared state and live state, so every change is visible before it happens.
03 Review The plan is reviewed like any other code change; nobody clicks in a console to mutate production.
04 Apply The plan is applied; the live environment now matches the repository exactly.
05 Drift check Periodic re-plans catch manual changes, so the declared state stays the single source of truth.

— The host underneath

Linux at administrator level.

Above the orchestrator there is Kubernetes; below it there is still a Linux host, and that is a layer I run deliberately rather than inherit.

Containers do not remove the operating system — they sit on it. I run Linux at administrator level: kernel tuning for the workload, sysctl hardening to tighten the runtime kernel surface, systemd to define services with restart policies and resource limits, the container runtimes themselves, and the networking that carries every packet from the edge to a pod.

This is the same instinct that runs through the rest of my work. The parts most people accept as defaults — the kernel parameters, the firewall rules, the base image a container is built from — are the parts I would rather understand and set on purpose, because that is where reliability and security quietly come from.

01

Kernel tuning

Adjusting kernel parameters for the workload — file-descriptor limits, network buffers, scheduler behaviour — rather than accepting the distribution defaults.

02

sysctl hardening

Tightening the runtime kernel surface through sysctl: network stack settings, address-space protections, and disabling what a server has no reason to expose.

03

systemd

Services defined as systemd units with restart policies, resource limits and dependency ordering, so the host behaves predictably across reboots.

04

Container runtimes

Working at the runtime level — Docker and the OCI layer underneath — including namespaces, cgroups and the image internals, not just the high-level commands.

05

Networking

The networking underneath the services: routing, firewall rules, DNS, TLS termination and the path a packet takes from the edge to a pod.

06

Container security

Minimal base images, non-root users, read-only filesystems, dropped capabilities and image scanning — reducing what a compromised container can reach.

— Observability

A system you cannot see into is one you cannot operate.

Once a workload is spread across functions, containers and queues on more than one provider, you cannot operate it by intuition. Observability is the part that turns a distributed system back into something you can reason about — metrics for trends, logs for detail, traces for the path a request took, and alerts that page on symptoms a user would actually feel.

I treat the observability stack as part of the build, not as something bolted on after the first incident. The four tabs below are the layers I instrument, and the order matters: a metric points at the problem, a trace narrows it to a hop, and the logs explain what happened on that exact request.

Numbers over time, so trends are visible before they become incidents

Metrics are the cheap, always-on signal: request rates, error rates, latency percentiles, resource saturation. They are what an autoscaler reads and what an alert fires on, because they are numeric and continuous.

I instrument the things that map to a user experience — the latency a request actually sees, the error rate a client actually hits — rather than only host-level counters that look healthy while the product is failing.

Request rate, error rate, latency percentiles
Resource saturation that drives autoscaling
Signals tied to user experience, not only host counters

— The layers, bottom to top

From the kernel to the cloud, one stack.

Host, container, pipeline, cloud — read bottom to top, the work is one continuous stack rather than four separate concerns.

Each layer rests on the one below it. A hardened Linux host carries a container runtime; an immutable image runs on Kubernetes; a CI/CD pipeline and infrastructure-as-code make the whole thing reproducible; and a multi-cloud deployment distributes it without locking into any one provider.

Pulled apart, these look like separate specialities. Run together, they are a single discipline: deliberate at every layer, portable across providers, and reproducible from a repository rather than from memory.

Host Linux at administrator level Kernel tuning, sysctl hardening, systemd units, container runtimes and the networking underneath — the layer most people inherit, run deliberately.
Container Docker and Kubernetes Workloads packaged as immutable container images and run on Kubernetes — GKE, AKS or EKS — so the unit of deployment is the same wherever it lands.
Pipeline CI/CD and infrastructure as code GitHub Actions and GitLab CI build and promote the image; infrastructure declared as code makes the whole environment reproducible rather than hand-built.
Cloud Multi-cloud, by role GCP, Azure and AWS — plus edge runtimes and managed backends — selected per role and orchestrated together, with no single provider owning the architecture.

— How I work

The principles underneath the platform.

The providers and tools change with the engagement; the principles do not. These are the rules I apply whether the target is GCP, Azure, AWS, the edge or a managed backend — the part that makes the platform reproducible rather than incidental.

01

Pick the strongest service per role

For each role — model serving, the event bus, the database, the edge — I pick the provider that does it best, then orchestrate across them. The result is a system assembled from the right parts, not the convenient ones.

02

Never depend on a single cloud

Containers and infrastructure-as-code sit at the centre so the same workload can target GCP, Azure or AWS. The cloud is a commodity I can swap, not a dependency that owns the product.

03

Build the image once, promote it unchanged

An image is built a single time and moved through every gate to production byte-for-byte. What runs in production is exactly what passed the tests, not a rebuild that might differ.

04

Declare infrastructure, never click it

Networks, clusters, queues and databases are declared as code, planned, reviewed and applied. Nobody mutates production in a console, so the repository stays the single source of truth.

05

Harden the host underneath

Running Linux at administrator level — kernel tuning, sysctl hardening, systemd, container runtimes, networking — means the layer under the orchestrator is deliberate, not left at defaults.

06

Make the system observable

Metrics, logs and traces are part of the build, not bolted on after an incident. A system you cannot see into is a system you cannot operate.

Pick the strongest service per role, keep the workload portable with containers and code, build the image once, and harden the host underneath — everything else is detail.

The cloud is a commodity layer I assemble — not a vendor I am locked into.

One application, three clouds, no lock-in.

What each cloud is actually for.

Google Cloud — where the data and model work tends to live

Azure — where an engagement already lives in the Microsoft estate

AWS — the broad default with the deepest service catalogue

Edge runtimes and managed backends — the fast path

The unit of deployment is the same wherever it lands.

An immutable image, scheduled by Kubernetes, identical across providers.

The image is built once and promoted unchanged.

The pipeline, as a file.

The image, defined the way it ships.

Declared, planned, reviewed, applied.

Nobody clicks production into existence.

Linux at administrator level.

Kernel tuning

sysctl hardening

systemd

Container runtimes

Networking

Container security

A system you cannot see into is one you cannot operate.

Numbers over time, so trends are visible before they become incidents

The detail you reach for once a metric has told you where to look

The shape of a request as it crosses service boundaries

Pages tied to symptoms a user would feel, not to noise

From the kernel to the cloud, one stack.

The principles underneath the platform.

Pick the strongest service per role

Never depend on a single cloud

Build the image once, promote it unchanged

Declare infrastructure, never click it

Harden the host underneath

Make the system observable

If you need a platform that runs across clouds without belonging to any one of them, that is the work I do.