Stop Stacking HATs: The AS/400 Was Right and Your Cyberdeck Is Wrong
- Patrick Duggan
- a few seconds ago
- 8 min read
The Pi community learned the wrong lesson from modularity. We looked at the 40-pin GPIO header, saw HATs clicking into place like Lego bricks, and decided that stacking four of them was the path to performance. UPS HAT on the bottom. M.2 HAT on top. Camera HAT above that. SDR HAT on top of that. Tower of power, cables stuffed between the layers, heat trapped in the middle, I/O fighting for attention on a shared bus.
It works. Barely. And it's the exact architectural mistake the PC industry made in the 1990s and spent the next twenty years unwinding.
Modern laptops don't ship with a modem card, a sound card, a graphics card, and a Wi-Fi card on separate PCI slots anymore. They ship with a single SoC that integrates all of those functions, a PMIC that orchestrates power across the whole package, engineered thermal paths that carry heat where it actually needs to go, and a north/south bridge arbitrating I/O. The integration is the performance.
The Pi community is rediscovering this slowly, and the rediscovery has a name. It's called the carrier board. And it's the AS/400 architecture coming back to edge computing forty years late.
The Unbundled Pi Stack
Take a real cyberdeck requirement: Pi 5 main compute, battery backup, NVMe storage, a camera with AI acceleration, and an SDR module. The conventional approach stacks four HATs:
A Geekworm or SupTronics UPS HAT for battery
A Raspberry Pi M.2 HAT+ for the NVMe drive
A Pi AI Kit with a Hailo-8L M.2 module for acceleration
An SDR HAT or USB-C dongle for radio
Each HAT does one thing. Each HAT makes independent assumptions about power delivery. Each HAT fights with the others for the 40-pin header's shared I²C and SPI buses. The PCIe lane is singular, so the M.2 slot and the Hailo can't coexist without a switch chip. The stacked boards trap heat in the middle of the pile. The total power draw approaches the Pi 5's 5V 5A envelope before you add any peripherals.
This is the "separate sound card and modem" era of Pi design. It works the same way Windows 95 worked with three expansion cards. The integration cost is paid by you.
What Laptops Know That HATs Don't
Open a modern ThinkPad or a MacBook and you will not find stacked PCBs with 40-pin connectors between them. You will find a single main board with a unified power distribution network, a Power Management IC orchestrating CPU states, GPU states, display brightness, battery charging, and USB-C PD negotiation. You will find thermal design that routes heat through vapor chambers or heat pipes from the silicon to dedicated fin stacks, sized for the specific power envelope. You will find a unified firmware that knows about every component because they were designed together.
That engineering is why a fifteen-inch laptop with a 100-watt chip and a high-DPI screen runs cooler, quieter, and longer than a stack of expansion cards doing the same work.
The laptop doesn't win because its CPU is faster. It wins because the architecture eliminates the friction between components. One power plane instead of four daisy-chained. One thermal solution instead of stacked boards. One firmware instead of four bootloaders pretending the others don't exist.
The AS/400 Did It Forty Years Ago
Before laptops existed, IBM was already forty years ahead of this conversation. The AS/400 shipped in 1988 with an architecture that the rest of the industry is still catching up to. Three ideas that matter here:
Dedicated coprocessors for specialized work. The AS/400 didn't run the database on the same silicon as the I/O. It shipped with a dedicated database coprocessor. Dedicated I/O coprocessors. Dedicated cryptographic coprocessors. The service processor orchestrated; it didn't do the heavy lifting. That philosophy is why an AS/400 from 1995 could outrun a similarly-priced Unix server doing the same OLTP workload.
The Technology Independent Machine Interface. TIMI was the contract. Applications were compiled to the TIMI abstraction, which sat above the actual hardware. When IBM moved from CISC to PowerPC RISC, not a single customer application needed to be rewritten. The interface was the durable thing. The silicon underneath was a swappable implementation detail. This is what we call "platform abstraction" today, but IBM shipped it with a working business on top of it four decades ago.
Object-based operating system. Every entity in the AS/400 had a type, attributes, and a defined set of permitted operations. Files were objects. Database records were objects. Jobs were objects. The OS enforced type safety at runtime, not at source-compile time. This is what modern typed document stores like Meilisearch and object storage APIs do; we just re-invented it with more syllables.
Put the three together and you get an architecture where each job runs on silicon specifically designed for that job, applications don't care what generation of silicon underlies them, and the whole system composes because everything follows a uniform contract.
The modern Pi+HATs stack has none of these properties. There is no coprocessor specialization; everything fights for the main CPU. There is no stable machine interface; upgrade from Pi 4 to Pi 5 and half the HATs need kernel driver updates. There is no object model; HATs are fire-and-forget GPIO with vendor-specific daemons talking to arbitrary userspace tools.
The Carrier Board Is The Answer
The fix is not a better HAT. The fix is to stop stacking HATs.
Every serious industrial Pi deployment starts with the Raspberry Pi Compute Module 5. The CM5 is a Pi 5 without the connectors, reduced to a SODIMM-style module that drops into a custom carrier board. The carrier board is where the integration lives. Power management, thermal design, peripheral selection, I/O routing, connector placement — all decided as one coherent system instead of a stack of independent vendors.
This is why the ClockworkPi uConsole exists. It's a CM5 (or CM4) on a custom carrier with a 5-inch display, a mechanical keyboard, a battery sled, speakers, and a modular expansion bay all designed together. No 40-pin stack. No trapped heat. No I/O arbitration between vendors. One PCB, one thermal solution, one firmware. The result fits in a bag, boots in seconds, runs Kali Linux, and does in 30 minutes what a Pi+four-HATs build takes a weekend to assemble and still hasn't solved the thermal problem.
The uConsole is the obvious example. The less obvious examples are everywhere. Every serious edge AI product — from Seeed's reTerminal to Radxa's Rock 5 ITX carriers to the carrier boards Luxonis ships for their OAK-SoM — is doing the same thing. The compute module becomes the CPU complex, and the carrier becomes the motherboard. It's a PC architecture at Pi scale, which is what the AS/400 told us the architecture should be in the first place.
Applied To Edge AI In 2026
The interesting thing about the current edge AI hardware market is that almost every accelerator has its own silicon doing non-trivial orchestration. You don't have to go to a full custom carrier to get the AS/400 philosophy; you can buy it in discrete components and let the main compute act as the service processor.
A Luxonis OAK-D Lite has a Movidius Myriad X VPU inside. The camera does object detection, depth estimation, pose tracking, and face identification on its own silicon. The Pi never sees a raw frame; it receives structured JSON. That's a dedicated vision coprocessor in the AS/400 sense, delivered as a USB-C peripheral.
A Hailo-10H M.2 module runs transformer inference at 40 TOPS and pulls under 5 watts. It's a dedicated NPU coprocessor that fits in the same M.2 slot that would otherwise just hold storage. Drop it in a carrier that has the slot, and the main CPU becomes an orchestrator instead of a bottleneck.
A ReSpeaker 6-Mic Array has an XMOS XU316 DSP doing beamforming, noise cancellation, and keyword spotting. The host CPU never touches the audio pipeline. When the keyword fires, the DSP wakes the host. Otherwise the host sleeps.
A Proxmark3 RDV4 has a Spartan-6 FPGA doing all the RFID heavy lifting. A QMK mechanical keyboard has an RP2040 or STM32 running macro scripts, FIDO2 authentication, and custom layers. A uBlox ZED-F9P GPS module does multi-constellation RTK positioning internally and hands the host centimeter-accurate coordinates.
Each one of those is a dedicated coprocessor optimized for its domain. Each one takes workload off the main CPU. Each one has its own power envelope, its own firmware, and its own contract with the host. The host is the service processor; the peripherals are the AS/400 coprocessors, forty years later, at edge scale.
The result is a "cyberdeck" — or a threat intelligence appliance, or a medical device, or a wearable vision augmentation rig — that outperforms a similarly-priced laptop at specialized workloads because it's not fighting for shared resources. Twelve pieces of specialized silicon, each doing one thing supremely well, coordinated by a main compute that doesn't drown.
The Wearable Analog
The same philosophy applies to any system where the main compute is constrained. Take a neck-worn Android device driving a Viture XR headset. The Android SoC has a modest NPU — enough for light inference, not enough to run a vision pipeline and a speech pipeline and a scene-understanding model in parallel. Stack more work on the main CPU and the battery dies in two hours.
The answer is not a better phone. The answer is USB-C peripherals that each carry their own silicon, playing the HAT role wirelessly. An OAK-D Lite handling camera-side inference. A Coral or Hailo dongle running Whisper locally. A DSP-backed mic array doing beamforming before the audio hits the phone. The neck-Android orchestrates; the peripherals do the work. Same AS/400 architecture, scaled down to something you wear.
This is how a patent-filed wearable medical device for visual distortion correction actually becomes clinically viable. Without specialized peripherals, the phone-class SoC can't keep up with the perception pipeline and still have battery life for a day in the real world. With specialized peripherals, the phone sleeps 80% of the time, the peripherals carry their own thermal and power budgets, and the device actually works.
The Infrastructure Consequence
The interesting part for anyone building platforms, not just hardware, is that the AS/400 philosophy applies to software architecture too. If you believe specialized silicon is the answer at the component level, the same logic says your cloud infrastructure should be a constellation of specialized services — not a monolith pretending to be microservices on identical VMs.
A threat intelligence platform doesn't want one giant container running the STIX feed, the search index, the behavioral scorer, the AI council, the blog, and the customer dashboard. It wants a STIX feed appliance with read-optimized caching silicon. A search appliance with dedicated NVMe and index-specific compute. A behavioral scoring service running on an NPU that handles bloom-filter correlation in microseconds. An AI council with its own inference hardware. Each one is a coprocessor. The orchestration layer is the service processor. The uniform contract between them is TIMI, delivered as a stable REST API.
This is exactly how DugganUSA is architected, whether we described it that way or not. Meilisearch is the database coprocessor. The STIX feed endpoints are the I/O coprocessor. The behavioral scorer is the crypto-signature coprocessor in spirit. The uniform API is TIMI. The 275+ consumers pulling our feeds across 46 countries don't know or care that the implementation moved from a single Azure Container App to a constellation of edge appliances. The contract is stable. The silicon underneath is swappable.
The Point
Stop stacking HATs. Start thinking about carriers. Every component in the system, whether software or silicon, should carry its own specialized silicon or code path for its domain. The main compute exists to orchestrate, not to do the heavy lifting. Uniform interfaces should be the durable thing, and the implementation underneath should be a swappable detail.
IBM shipped this in 1988 and the industry spent the next forty years rediscovering it by accident. The maker community is about to discover it again as the first generation of CM5 carrier boards and Hailo-10H M.2 modules and OAK-D Lite cameras start to replace the HAT stack with a real bundled-compute architecture.
The laptop was right. The AS/400 was righter. And the edge AI market in 2026 is finally building hardware that admits it.
— Patrick
