Skip to content

The Complete Software Architecture Walkthrough

Published: at 07:00 AM (18 min read)

Most systems that fail in production don’t fail because of bad code. They fail because nobody asked the right questions before writing the first line. Software architecture is the discipline of asking those questions first, and letting the answers drive every technology and design decision that follows.

This article walks through the full architecture design process for Chatterly — a real-time team messaging and collaboration platform — using the methodology taught in Memi Lavi’s “The Complete Guide to Becoming a Great Software Architect” course. By the end, you’ll have a clear picture of how to move from a blank page to a production-ready system design, and more importantly, why each decision is made the way it is.

Table of contents

Open Table of contents

Why Architecture Matters Before a Single Line of Code

Skipping architecture is one of the most expensive mistakes a development team can make. When you jump straight to code, you’re making architectural decisions anyway — you’re just making them blindly, one feature at a time, with no coherent picture of the whole system. The cost of changing a foundational decision at month six is orders of magnitude higher than making it correctly at month zero.

The software architect’s job is not to write code. It’s to make the decisions that shape the system before code gets written — and to make those decisions based on evidence, not preference. That means starting with one thing above everything else: requirements.

The architect’s mindset: understand the problem completely before proposing a solution. Technology choices are outputs of requirements, not inputs.

What Should the System Do? Functional Requirements

Functional requirements describe the system’s capabilities — the concrete things users can do with it. They are the what. Before designing anything, you need a complete and agreed-upon list of these.

To identify functional requirements, ask questions like:

For Chatterly, the functional requirements are:

This list tells you what the system needs to do. But it tells you almost nothing about how hard it needs to work — and that’s where the second type of requirements comes in.

What Should the System Handle? Non-Functional Requirements

Non-functional requirements (NFRs) define the system’s operational characteristics — performance, reliability, scale, and availability. They are the how well. And unlike functional requirements, NFRs directly drive the architecture. They’re what force you to choose Redis over a database, or partitioned tables over a single flat one.

To identify non-functional requirements, ask questions like:

For Chatterly, the NFRs were defined based on the product’s scale goals:

RequirementTarget
REST API reads< 50ms (Redis cache hit)
REST API writes< 100ms (PostgreSQL write + Redis pub/sub)
WebSocket delivery< 30ms (Redis O(1) fan-out)
Auth check< 5ms (Redis GET for JWT denylist)
Throughput~50,000 requests/second (horizontal Puma scaling)
Concurrent WS~1,000,000 connections (Redis pub/sub, Puma threads)
Message loss0% (PostgreSQL ACID transactions)
SLAPlatinum — high availability, zero data loss
Data volumeHundreds of millions of rows (partitioned table)

Never start designing the architecture without a solid understanding of both requirement types. FRs tell you what to build. NFRs tell you how hard it has to work. The NFRs are what determine whether you need Redis or a simple database cache, whether you need WebSockets or polling, and whether you need table partitioning or a flat schema. Get these wrong and the architecture will fail under real-world conditions no matter how clean the code is.

Mapping the Architecture: From Requirements to Components

Once the requirements are locked, the architecture emerges from them — not from personal preference or familiarity with a specific tool. Each component in Chatterly’s architecture exists because a specific requirement demanded it.

The Big Picture

Here is the overall system architecture:

                     ┌─────────────────────────┐
                     │   Browser (Next.js)      │
                     │   REST + WebSocket Client │
                     └──────────┬───────────────┘

            REST API / HTTP     │     WebSocket
           ┌────────────────────┴──────────────────────┐
           │                                           │
           ▼                                           ▼
┌───────────────────────┐               ┌───────────────────────┐
│      Rails API        │               │     Action Cable      │
│  (REST, Port 3001)    │──────────────►│  (WebSocket Server)   │
└──────────┬────────────┘    Pub/Sub    └──────────┬────────────┘
           │                Broadcast              │
           │ SQL                                   │ Pub/Sub
           ▼                                       ▼
┌───────────────────────┐               ┌───────────────────────┐
│      PostgreSQL       │               │         Redis         │
│  (Primary Data Store) │               │  (Pub/Sub, JWT,       │
│                       │               │   Presence, Cache,    │
└──────────▲────────────┘               │   Rate Limiting)      │
           │                            └──────────┬────────────┘
           │ SQL                                   │ Queue
           │                                       │
┌──────────┴────────────┐                          │
│        Sidekiq        │◄─────────────────────────┘
│  (Background Jobs)    │
└───────────────────────┘

All services ──► Sentry (Error Tracking & Logging)

Three architectural principles are visible immediately from this diagram:

Loose Coupling — The Rails API never directly contacts WebSocket clients. It writes to Redis, Action Cable reads from Redis, and each component only knows about the message format, not about the other component. Swap out Rails API and Action Cable is unaffected. This is what loose coupling looks like in practice.

Stateless Services — No service holds session state in memory. JWT tokens are stored in Redis with a TTL denylist, user presence is tracked in Redis with auto-expiry, and all persistent data lives in PostgreSQL. This means any process can be terminated or replaced without data loss, which is the prerequisite for horizontal scaling.

Separation of Concerns — Each component has a single, clearly bounded job: Rails API handles data mutations, Action Cable handles push delivery, Sidekiq handles async work, PostgreSQL stores data, Redis handles ephemeral state. No component reaches outside its boundary.

Rails API — The REST Service

Role

The Rails API is the single entry point for all HTTP requests. It handles authentication, conversation management, message creation, reactions, and call session lifecycle. It is the only component that writes to the database. After every write, it publishes an event to Redis — it doesn’t push to clients directly, it just fires the signal and moves on.

This separation is intentional: the Rails API must remain fast and predictable. Coupling it to WebSocket delivery would mean its response time depends on the health of client connections, which is unacceptable.

Technology Stack

Architecture

┌───────────────────────────────────────┐ ┌─────────────┐
│          Service Interface            │ │             │
│  (Controllers, Routes, Rack::Cors,    │ │             │
│   Devise-JWT middleware)              │ │             │
├───────────────────────────────────────┤ │   Logging   │
│           Business Logic              │ │   (Sentry)  │
│  (ActiveRecord Models, Pundit         │ │             │
│   Policies, Callbacks, Validations)   │ │             │
├───────────────────────────────────────┤ │             │
│            Data Access                │ │             │
│  (ActiveRecord queries, Blueprinter   │ │             │
│   Serializers, Redis cache layer)     │ │             │
└───────────────────────────────────────┘ └─────────────┘

              [PostgreSQL]

Every HTTP request flows top to bottom. The Service Interface layer receives the request, checks CORS headers, decodes the JWT, and routes to the correct controller — which contains as little logic as possible. The Business Logic layer is where validations, Pundit policy checks, and after_create_commit callbacks live — the callback is what fires the Redis broadcast after a write. The Data Access layer runs the ActiveRecord query and passes results through a Blueprinter serializer before returning them.

Logging (Sentry) is a cross-cutting concern — it’s accessible by all layers and captures every exception automatically.

Principle in focus — Caching: The Data Access layer includes a Redis cache. Read operations that would otherwise hit PostgreSQL are served from Redis at < 50ms. The cache is populated on write and invalidated when the underlying data changes. This is what makes the < 50ms read SLA achievable without a read replica.

Development Instructions


Action Cable — The Real-Time Push Layer

Role

Action Cable’s job is delivery, nothing more. It maintains persistent WebSocket connections with clients and pushes events to them the moment they occur. It never writes to the database. It never makes business logic decisions.

When the Rails API publishes an event to Redis after a write, Action Cable receives it via pub/sub and forwards it to all subscribed clients on the appropriate channel. This architecture enables real-time fan-out across thousands of connected clients without any polling. The client never has to ask — the server just tells it.

Technology Stack

Architecture

┌───────────────────────────────────────┐ ┌─────────────┐
│             Connection                │ │             │
│  (JWT auth, current_user identity,    │ │             │
│   reject_unauthorized_connection)     │ │             │
├───────────────────────────────────────┤ │   Logging   │
│              Channels                 │ │   (Sentry)  │
│  (ConversationChannel — messages,     │ │             │
│   PresenceChannel — online/offline,   │ │             │
│   CallChannel — WebRTC signaling)     │ │             │
├───────────────────────────────────────┤ │             │
│        Redis Pub/Sub Handler          │ │             │
│  (stream_from, broadcast_to,          │ │             │
│   Redis INCR rate limiting)           │ │             │
└───────────────────────────────────────┘ └─────────────┘

                 [Redis]

The Connection layer authenticates every WebSocket handshake — decoding the JWT, checking the Redis denylist, setting current_user. If auth fails, the connection is rejected before any subscription is allowed. The Channels layer is where the three channel classes live: ConversationChannel streams message events per conversation, PresenceChannel broadcasts online/offline state globally, CallChannel relays WebRTC signaling (SDP offers, answers, ICE candidates) between specific users. The Redis Pub/Sub Handler manages stream subscriptions and rate limiting via Redis INCR + TTL.

Principle in focus — Pub/Sub Messaging: The Rails API and Action Cable are completely decoupled. When Rails API publishes to a Redis channel, it doesn’t know Action Cable exists. When Action Cable subscribes to that channel, it doesn’t know Rails API exists. They communicate only through the message format they’ve agreed on. This means either side can be redeployed, restarted, or replaced independently.

Development Instructions


Sidekiq — The Background Processor

Role

Sidekiq handles work that cannot happen synchronously in the request-response cycle. It polls a Redis-backed job queue and processes jobs asynchronously. It does not expose an API — it listens.

The primary use case in Chatterly is the missed call job. When a call is initiated and the callee doesn’t respond within 30 seconds, Sidekiq transitions the call session from ringing to missed. This logic is time-delayed — it can’t live in an HTTP request cycle because no request triggered it. It must be scheduled at call creation time and executed independently.

This pattern keeps the Rails API non-blocking while ensuring time-sensitive state transitions happen reliably.

Technology Stack

Architecture

┌───────────────────────────────────────┐ ┌─────────────┐
│              Polling                  │ │             │
│  (Sidekiq process polls Redis queue   │ │             │
│   for enqueued jobs continuously)     │ │             │
├───────────────────────────────────────┤ │   Logging   │
│           Business Logic              │ │   (Sentry)  │
│  (Job classes: MissedCallJob,         │ │             │
│   future notification jobs, etc.)     │ │             │
├───────────────────────────────────────┤ │             │
│            Data Access                │ │             │
│  (ActiveRecord — reads and updates    │ │             │
│   CallSession and related models)     │ │             │
└───────────────────────────────────────┘ └─────────────┘

              [PostgreSQL]

The Polling layer is managed by Sidekiq itself — it continuously dequeues jobs and dispatches them to the appropriate worker class. Sidekiq handles concurrency, automatic retries on failure, and dead-letter queuing for jobs that exhaust their retries. The Business Logic layer contains the job classes, each with a single perform method. The Data Access layer uses ActiveRecord to read and write to PostgreSQL.

Principle in focus — Async Messaging via Queue: The Rails API enqueues the MissedCallJob with perform_in(30.seconds, call_session_id) and immediately returns a response to the client. The job sits in the Redis queue until 30 seconds pass, then Sidekiq picks it up. If Sidekiq restarts before the job runs, the job survives in Redis and is processed when Sidekiq comes back up. The queue provides durability that a simple sleep or in-memory timer cannot.

Development Instructions


Next.js Client — The Frontend

Role

The Next.js Client is the only component the end user directly interacts with. It renders the UI, manages client-side state, communicates with the Rails API over HTTP, maintains a real-time WebSocket connection to Action Cable, and handles peer-to-peer WebRTC connections for voice and video calls.

The client holds no persistent business data. Its state is ephemeral — rebuilt from the Rails API on each session. This is the client-side expression of the stateless principle.

Technology Stack

Architecture

┌────────────────────────────────────────────────────────┐
│                      UI Layer                          │
│  (Next.js pages, React components: MessageList,        │
│   Sidebar, CallOverlay, IncomingCallModal, Avatar,     │
│   PresenceIndicator, etc.)                             │
├────────────────────────────────────────────────────────┤
│                    State Layer                         │
│  (Zustand stores: authStore, conversationStore,        │
│   callStore — custom hooks: useConversation,           │
│   usePresence, useCall)                                │
├────────────────────────────────────────────────────────┤
│                  API Client Layer                      │
│  (lib/api.ts — Axios instance with JWT interceptor,    │
│   lib/cable.ts — Action Cable consumer singleton,      │
│   lib/webrtc.ts — RTCPeerConnection helpers)           │
└────────────────────────────────────────────────────────┘
         │ REST / HTTP              │ WebSocket + WebRTC
    [Rails API]             [Action Cable + Browser P2P]

The UI Layer contains all pages and components. Components are pure rendering — they read from Zustand stores and call store actions, with no direct API calls. The State Layer contains the stores and custom hooks that bridge the store to the API Client Layer, subscribing to Action Cable channels and updating state when WebSocket events arrive. The API Client Layer holds the low-level utilities: the Axios instance, the Action Cable consumer, and WebRTC helpers.

Principle in focus — Stateless Client: The client holds no persistent state between sessions. authStore, conversationStore, and callStore are all in-memory Zustand stores that start empty on page load and are populated entirely from API responses and WebSocket events. This means any tab refresh, any network interruption, any re-login starts from a clean, consistent server state — no stale or conflicting local data.

Development Instructions


Architecture Principles in Practice

Looking across the entire system, the same set of principles appears again and again. Here’s where each one shows up concretely in Chatterly:

Loose Coupling — The Rails API publishes to a Redis channel. Action Cable subscribes to that channel. Neither knows the other exists. Swap either component and the other is unaffected. This is the single most important principle in distributed systems design.

Stateless Services — Every Puma worker process is interchangeable. JWT verification uses Redis, not in-memory sessions. Presence tracking uses Redis TTL, not in-process timers. Any process can be killed and replaced without data loss or session interruption.

Caching — Redis sits between the Rails API and PostgreSQL for read operations. Hot data (conversations, user profiles, presence) is served from memory at < 50ms without touching the database. The cache is populated on write and invalidated on update.

Async Messaging via Queue — The MissedCallJob pattern shows why queues exist: when you need something to happen later, reliably, without blocking the current request, a queue is the answer. The Rails API doesn’t wait. Sidekiq processes it when the time comes. If Sidekiq goes down, the job survives.

Separation of Concerns — Each component has one job, one technology choice, and one scaling axis. Rails API scales horizontally by adding Puma processes. Action Cable scales by adding Redis pub/sub subscribers. Sidekiq scales by adding worker processes. PostgreSQL scales with a read replica. No component’s scaling affects another.

ACID Guarantees — PostgreSQL enforces transactional integrity on every write. A message creation either fully succeeds or fully rolls back — no partial state, no lost records. This is what the 0% message loss SLA is built on.

Time-based Partitioning — The messages table is partitioned by month at the PostgreSQL level. Queries against recent messages only scan the current month’s partition, not hundreds of millions of historical rows. This is the difference between a 2ms query and a 2-second query at scale.

Download the Architecture Document

The full Chatterly architecture document — formatted as a standalone, professional reference — is available as a PDF below. Use it as a template for your own system architecture documentation.

Download: Chatterly Architecture Document (PDF)

Conclusion

Software architecture is not about choosing the most interesting technology. It’s about understanding what the system needs to do and how hard it needs to work, then making the minimum set of decisions that satisfy those constraints reliably. Every component in Chatterly exists because a specific requirement demanded it. Every technology choice has a documented reason. Every architectural principle has a concrete expression in the codebase.

That’s what good architecture looks like — not clever, just clear.

Thank you for reading, and happy coding!