Skip to main content

Blog


Here you’ll find articles about cloud infrastructure, DevOps/automation, and personal projects, all written from my hands-on experience.

To get started, check out my latest articles:

Waking a Harman Kardon Aura from the Dead -- Bluetooth, PipeWire, and Too Many Rabbit Holes

Man… what a journey. It’s Sunday, March 1st, and I can see the sunrise from my window. This wasn’t some quick one-night hack; I’ve actually been wrestling with this project for months. But this weekend, I hit a point of no return. I was determined to finally cross the finish line. I previously swore to myself that I would stop these late-night projects, but here I am, crazy happy because I finally made it…

Building a Self-Healing k3s Homelab (Part 1): Foundation

Over the past few years I’ve been accumulating Raspberry Pis and single-board computers like some people accumulate unfinished side projects. It started back in school in France: breaking Linux installations, running a Pi-hole server, building a Magic Mirror. The nice thing about Pis is they’re small enough to throw in a bag. Through multiple moves and transatlantic travel they came with me, always finding some new use. Now I’ve finally settled in Calgary, and I have a pile of these devices sitting around.

Building a Self-Healing k3s Homelab (Part 2): Multi-Node, GitOps, and Growing Pains

Part 1 covered the hardware, k3s, GitOps setup with ArgoCD and Gitea, Longhorn storage, and the monitoring stack. Everything on paper was clean. The actual first few weeks were messier.

This is Part 2: the story of expanding from one node to two, migrating workloads, hardening resources, and accidentally wiping the entire ArgoCD control plane.


Starting Point: Why Two Nodes

The original cluster was single-node. Jarvis ran everything: Home Assistant, GitOps, monitoring, storage, all of it. This works fine until you try to schedule any real memory workload. Prometheus needs 700MB+ at minimum. Grafana takes another 300MB. Add Gitea and its PostgreSQL instance, and you’re staring at 2GB of non-home-automation workloads on a node that also has to run the k3s API server, Longhorn, and every IoT integration.

Building a Self-Healing k3s Homelab (Part 3): The NFS Nightmare

Part 2 ended with a functioning two-node cluster, all workloads placed correctly, resource limits set, monitoring green. Late October 2025, everything looked good.

Early November was five outages in nine days, all rooted in the same mistake: putting a kernel-level NFS mount on the control-plane node. This is the story of that, in all its embarrassing detail.


The Backup Plan

Longhorn provides volume snapshots natively. A snapshot is taken locally, stored in Longhorn’s own format, and can be restored. That’s fine for a bad deployment. For a node failure followed by a disk failure, you need something off-node.

Building a Self-Healing k3s Homelab (Part 4): Containerd's Ghost Sandboxes

Part 3 covered the NFS outage series and the fixes: soft mounts, then SSH rsync, then full NFS removal. By November 9, no kernel NFS mount existed anywhere in the cluster.

The cluster kept going down.

This is Part 4: the containerd sandbox leak pattern that appeared after every hard reboot, why it happens, what it does to the cluster, and how I finally automated the fix.


The Pattern

Every time Jarvis suffered a hard unclean shutdown (power cycle to recover from a freeze, or hardware watchdog reboot), the same thing happened when k3s came back up:

Building a Self-Healing k3s Homelab (Part 5): RCU Stalls, Watchdogs, and Actually Healing

Part 4 covered the containerd sandbox leak problem and the ExecStartPre fix. The sandbox leaks were solved. But there was still an underlying issue that kept forcing those hard reboots in the first place.

This is Part 5: the kernel RCU stall problem, why it’s dangerous on a Raspberry Pi running k3s, the mitigations I layered on, and the moment the cluster finally handled an outage without me.


What Is an RCU Stall?

RCU stands for Read-Copy-Update. It’s a synchronization mechanism built into the Linux kernel for situations where reads are very frequent and writes are rare. The basic idea: readers don’t take locks. Writers make a copy of the data, update it, and wait until all current readers are done before pointing the system at the new copy. The period readers are finishing up is called a “grace period.”

Reverse Engineering the iStrip+ App for Local Control – Part 2

In Part 1, I reverse engineered the iStrip+ app’s native AES encryption and discovered the encryption key hidden in libAES.so. With my coffee cup finally refilled, it was time to put that knowledge into practice: replicate the encryption in Python, generate valid BLE payloads, and build a Home Assistant integration.


Step 4: First Python Tests

With the AES key extracted and the protocol structure mapped out from Part 1, I was eager to test if everything actually worked. Time to write some Python and see if I could make the lamp do something. Anything.

Reverse Engineering the iStrip+ App for Local Control – Part 1

To add some ambiance to my apartment at the end-of-year 2024, I bought several smart lights. Among them was a sunset lamp, controllable via Bluetooth. While the official app worked fine, no Home Assistant integration existed.

As I wanted local control, secure, fast, and fully integrated with Home Assistant. Armed with my iPhone, a Raspberry Pi 4, an ESP32, and a healthy dose of determination, I set out to reverse engineer the sunset lamp and build a Home Assistant integration from scratch.

Creating a Home Assistant Integration for the Harman Kardon Aura Speaker

After finally setting up in my apartment in 2024, I wanted to automate my place as much as possible. One of the devices I own is a Harman Kardon Aura Plus speaker, which has great sound but limited smart features.

The Aura Plus is a high-end wireless speaker with 360° sound and ambient lighting. Unfortunately, while it sounds great, it lacks an open API. The official app talks over Bluetooth/Wi-Fi, and that’s where I saw an opportunity..

How B2B Sales Did Not Teach Me About CloudFront Functions

You’ve probably seen the posts:

  • “How B2B sales helped me run a marathon”
  • “How cold calling made me a better engineer”

This isn’t that. Unfortunately.


Redirects, DNS, and Terraform

This one started simple: I wanted to redirect the apex domain (vakintosh.com) to the www subdomain.

flowchart TD
    Start([Start]) --> A[User types vakintosh.com]
    A --> B[/Browser sends HTTP request/]
    B --> C[DNS resolves apex domain to CloudFront edge node]
    C --> D[CloudFront Function fires on Viewer Request event]
    D --> E{Is host vakintosh.com?}

    E -->|Yes| F[Return 301 Redirect Location: www.vakintosh.com]
    F --> G[/301 Response sent to browser/]
    G --> H[Browser follows redirect to www.vakintosh.com]
    H --> I[CloudFront forwards request to S3 origin]

    E -->|No| J[Rewrite URI e.g. /blog → /blog/index.html]
    J --> I

    I --> K[/Static content served from S3/]
    K --> Finish([Finish])
  • The user’s browser sends a request to vakintosh.com, which DNS resolves to a CloudFront edge node.
  • A CloudFront Function fires on the Viewer Request event, before the request ever reaches the S3 origin.
  • If the host is the apex domain, the function returns a 301 redirect to www.vakintosh.com directly from the edge.
  • If the host is already www, the function rewrites pretty URLs (e.g. /blog/blog/index.html) before forwarding to S3.

I figured I’d just do it manually in the Porkbun DNS console. Bad idea.

GitHub OIDC + AWS IAM + Terraform: A Practical Guide (and Pain Log)

I wanted to deploy my Hugo website using Terraform and GitHub Actions, securely, with least privilege, without Route 53, using my domain on Porkbun, and leveraging AWS Free Tier services.

Day 1 – AWS Account Setup + Role Plumbing

Started from scratch.

  • Created the AWS account
  • Set up MFA, secure root, all that
  • Made a single Admin IAM user (for CLI/debug, not daily use)

Then I created a role: GitHubAction-AssumeRoleWithAction.