Most systems that fail in production don’t fail because of bad code. They fail because nobody asked the right questions before writing the first line. Software architecture is the discipline of asking those questions first, and letting the answers drive every technology and design decision that follows.
This article walks through the full architecture design process for Chatterly — a real-time team messaging and collaboration platform — using the methodology taught in Memi Lavi’s “The Complete Guide to Becoming a Great Software Architect” course. By the end, you’ll have a clear picture of how to move from a blank page to a production-ready system design, and more importantly, why each decision is made the way it is.
Table of contents
Open Table of contents
- Why Architecture Matters Before a Single Line of Code
- What Should the System Do? Functional Requirements
- What Should the System Handle? Non-Functional Requirements
- Mapping the Architecture: From Requirements to Components
- Architecture Principles in Practice
- Download the Architecture Document
- Conclusion
Why Architecture Matters Before a Single Line of Code
Skipping architecture is one of the most expensive mistakes a development team can make. When you jump straight to code, you’re making architectural decisions anyway — you’re just making them blindly, one feature at a time, with no coherent picture of the whole system. The cost of changing a foundational decision at month six is orders of magnitude higher than making it correctly at month zero.
The software architect’s job is not to write code. It’s to make the decisions that shape the system before code gets written — and to make those decisions based on evidence, not preference. That means starting with one thing above everything else: requirements.
The architect’s mindset: understand the problem completely before proposing a solution. Technology choices are outputs of requirements, not inputs.
What Should the System Do? Functional Requirements
Functional requirements describe the system’s capabilities — the concrete things users can do with it. They are the what. Before designing anything, you need a complete and agreed-upon list of these.
To identify functional requirements, ask questions like:
- What are the core user actions in the system?
- What data does the system create, read, update, and delete?
- What events does the system need to react to in real time?
- Are there different user roles with different permissions?
- What external integrations or third-party services are required?
For Chatterly, the functional requirements are:
- Register and authenticate users securely using JWT-based authentication
- Organize conversations into three types: channels (public), groups (private), and direct messages
- Send and receive real-time messages within conversations
- Support message threading via parent-message references
- Support soft deletion of messages (deleted content hidden, record preserved)
- Allow users to react to messages with emoji reactions
- Track and broadcast user online/offline presence in real time
- Initiate and manage peer-to-peer voice and video calls between users
- Support a full call lifecycle: calling, ringing, active, ended, declined, and missed states
- Expose a REST API for all data mutations and queries
- Broadcast real-time events (messages, presence, call signals) over WebSocket connections
This list tells you what the system needs to do. But it tells you almost nothing about how hard it needs to work — and that’s where the second type of requirements comes in.
What Should the System Handle? Non-Functional Requirements
Non-functional requirements (NFRs) define the system’s operational characteristics — performance, reliability, scale, and availability. They are the how well. And unlike functional requirements, NFRs directly drive the architecture. They’re what force you to choose Redis over a database, or partitioned tables over a single flat one.
To identify non-functional requirements, ask questions like:
- What’s the expected data growth rate in the first year?
- How many concurrent users will the system handle at peak?
- What’s the acceptable response latency for each type of operation?
- What’s the message or transaction loss tolerance?
- What’s the SLA — can the system afford minutes of downtime or must it be near-zero?
- Does the system need to scale horizontally, vertically, or both?
For Chatterly, the NFRs were defined based on the product’s scale goals:
| Requirement | Target |
|---|---|
| REST API reads | < 50ms (Redis cache hit) |
| REST API writes | < 100ms (PostgreSQL write + Redis pub/sub) |
| WebSocket delivery | < 30ms (Redis O(1) fan-out) |
| Auth check | < 5ms (Redis GET for JWT denylist) |
| Throughput | ~50,000 requests/second (horizontal Puma scaling) |
| Concurrent WS | ~1,000,000 connections (Redis pub/sub, Puma threads) |
| Message loss | 0% (PostgreSQL ACID transactions) |
| SLA | Platinum — high availability, zero data loss |
| Data volume | Hundreds of millions of rows (partitioned table) |
Never start designing the architecture without a solid understanding of both requirement types. FRs tell you what to build. NFRs tell you how hard it has to work. The NFRs are what determine whether you need Redis or a simple database cache, whether you need WebSockets or polling, and whether you need table partitioning or a flat schema. Get these wrong and the architecture will fail under real-world conditions no matter how clean the code is.
Mapping the Architecture: From Requirements to Components
Once the requirements are locked, the architecture emerges from them — not from personal preference or familiarity with a specific tool. Each component in Chatterly’s architecture exists because a specific requirement demanded it.
The Big Picture
Here is the overall system architecture:
┌─────────────────────────┐
│ Browser (Next.js) │
│ REST + WebSocket Client │
└──────────┬───────────────┘
│
REST API / HTTP │ WebSocket
┌────────────────────┴──────────────────────┐
│ │
▼ ▼
┌───────────────────────┐ ┌───────────────────────┐
│ Rails API │ │ Action Cable │
│ (REST, Port 3001) │──────────────►│ (WebSocket Server) │
└──────────┬────────────┘ Pub/Sub └──────────┬────────────┘
│ Broadcast │
│ SQL │ Pub/Sub
▼ ▼
┌───────────────────────┐ ┌───────────────────────┐
│ PostgreSQL │ │ Redis │
│ (Primary Data Store) │ │ (Pub/Sub, JWT, │
│ │ │ Presence, Cache, │
└──────────▲────────────┘ │ Rate Limiting) │
│ └──────────┬────────────┘
│ SQL │ Queue
│ │
┌──────────┴────────────┐ │
│ Sidekiq │◄─────────────────────────┘
│ (Background Jobs) │
└───────────────────────┘
All services ──► Sentry (Error Tracking & Logging)
Three architectural principles are visible immediately from this diagram:
Loose Coupling — The Rails API never directly contacts WebSocket clients. It writes to Redis, Action Cable reads from Redis, and each component only knows about the message format, not about the other component. Swap out Rails API and Action Cable is unaffected. This is what loose coupling looks like in practice.
Stateless Services — No service holds session state in memory. JWT tokens are stored in Redis with a TTL denylist, user presence is tracked in Redis with auto-expiry, and all persistent data lives in PostgreSQL. This means any process can be terminated or replaced without data loss, which is the prerequisite for horizontal scaling.
Separation of Concerns — Each component has a single, clearly bounded job: Rails API handles data mutations, Action Cable handles push delivery, Sidekiq handles async work, PostgreSQL stores data, Redis handles ephemeral state. No component reaches outside its boundary.
Rails API — The REST Service
Role
The Rails API is the single entry point for all HTTP requests. It handles authentication, conversation management, message creation, reactions, and call session lifecycle. It is the only component that writes to the database. After every write, it publishes an event to Redis — it doesn’t push to clients directly, it just fires the signal and moves on.
This separation is intentional: the Rails API must remain fast and predictable. Coupling it to WebSocket delivery would mean its response time depends on the health of client connections, which is unacceptable.
Technology Stack
- Rails 8.1 (API mode) + Ruby 3.4.4 — Convention over configuration, mature ActiveRecord ORM, built-in ecosystem for auth and serialization
- PostgreSQL 17 — ACID compliance, native time-based table partitioning for the messages table, CP system in CAP theorem terms (zero message loss)
- Devise 4.9 + devise-jwt 0.13 — JWT tokens in headers, stateless auth, compatible with cross-origin Next.js frontend
- Pundit 2.5 — Policy Objects pattern, one policy class per model (
ConversationPolicy,MessagePolicy,CallSessionPolicy), authorization logic fully out of controllers - Blueprinter — Serializer that controls exactly what fields are exposed to the client; the same serializer is reused in both HTTP responses and WebSocket broadcasts
- PgBouncer (via Supabase) — Connection pooler, keeps live PostgreSQL connections bounded under horizontal Puma scaling
Architecture
┌───────────────────────────────────────┐ ┌─────────────┐
│ Service Interface │ │ │
│ (Controllers, Routes, Rack::Cors, │ │ │
│ Devise-JWT middleware) │ │ │
├───────────────────────────────────────┤ │ Logging │
│ Business Logic │ │ (Sentry) │
│ (ActiveRecord Models, Pundit │ │ │
│ Policies, Callbacks, Validations) │ │ │
├───────────────────────────────────────┤ │ │
│ Data Access │ │ │
│ (ActiveRecord queries, Blueprinter │ │ │
│ Serializers, Redis cache layer) │ │ │
└───────────────────────────────────────┘ └─────────────┘
│
[PostgreSQL]
Every HTTP request flows top to bottom. The Service Interface layer receives the request, checks CORS headers, decodes the JWT, and routes to the correct controller — which contains as little logic as possible. The Business Logic layer is where validations, Pundit policy checks, and after_create_commit callbacks live — the callback is what fires the Redis broadcast after a write. The Data Access layer runs the ActiveRecord query and passes results through a Blueprinter serializer before returning them.
Logging (Sentry) is a cross-cutting concern — it’s accessible by all layers and captures every exception automatically.
Principle in focus — Caching: The Data Access layer includes a Redis cache. Read operations that would otherwise hit PostgreSQL are served from Redis at < 50ms. The cache is populated on write and invalidated when the underlying data changes. This is what makes the < 50ms read SLA achievable without a read replica.
Development Instructions
- Controllers must contain no business logic — their only jobs are authenticate, authorize, call the model, and render the serialized result
- Use
authorize @resource(Pundit) in every action that touches a resource — no exceptions - Use Blueprinter serializers consistently — never call
.to_jsondirectly on an ActiveRecord object - Trigger broadcasts from model callbacks (
after_create_commit), not from controllers - Use
prepared_statements: falseandadvisory_locks: falsewhen connecting through PgBouncer (transaction pooling mode does not support these features)
Action Cable — The Real-Time Push Layer
Role
Action Cable’s job is delivery, nothing more. It maintains persistent WebSocket connections with clients and pushes events to them the moment they occur. It never writes to the database. It never makes business logic decisions.
When the Rails API publishes an event to Redis after a write, Action Cable receives it via pub/sub and forwards it to all subscribed clients on the appropriate channel. This architecture enables real-time fan-out across thousands of connected clients without any polling. The client never has to ask — the server just tells it.
Technology Stack
- Action Cable (built into Rails) — no third-party WebSocket library needed
- Redis adapter (logical database /0) — chosen over Solid Cable (PostgreSQL-based) because Redis pub/sub fan-out is O(1) regardless of subscriber count; Solid Cable uses polling and becomes a bottleneck above ~10,000 concurrent WebSocket connections. At the 1M concurrent connections target, there is no alternative.
- devise-jwt — the same JWT auth used by the REST API is reused here; only authenticated users can establish a WebSocket connection
Architecture
┌───────────────────────────────────────┐ ┌─────────────┐
│ Connection │ │ │
│ (JWT auth, current_user identity, │ │ │
│ reject_unauthorized_connection) │ │ │
├───────────────────────────────────────┤ │ Logging │
│ Channels │ │ (Sentry) │
│ (ConversationChannel — messages, │ │ │
│ PresenceChannel — online/offline, │ │ │
│ CallChannel — WebRTC signaling) │ │ │
├───────────────────────────────────────┤ │ │
│ Redis Pub/Sub Handler │ │ │
│ (stream_from, broadcast_to, │ │ │
│ Redis INCR rate limiting) │ │ │
└───────────────────────────────────────┘ └─────────────┘
│
[Redis]
The Connection layer authenticates every WebSocket handshake — decoding the JWT, checking the Redis denylist, setting current_user. If auth fails, the connection is rejected before any subscription is allowed. The Channels layer is where the three channel classes live: ConversationChannel streams message events per conversation, PresenceChannel broadcasts online/offline state globally, CallChannel relays WebRTC signaling (SDP offers, answers, ICE candidates) between specific users. The Redis Pub/Sub Handler manages stream subscriptions and rate limiting via Redis INCR + TTL.
Principle in focus — Pub/Sub Messaging: The Rails API and Action Cable are completely decoupled. When Rails API publishes to a Redis channel, it doesn’t know Action Cable exists. When Action Cable subscribes to that channel, it doesn’t know Rails API exists. They communicate only through the message format they’ve agreed on. This means either side can be redeployed, restarted, or replaced independently.
Development Instructions
- Every channel action must call
rate_limit!before processing — this prevents clients from flooding channels - Membership/authorization must be checked in
subscribedbeforestream_from— a non-member must never receive another conversation’s events - Do not query the database inside relay actions (e.g.,
send_signalinCallChannel) — these are hot paths; relay data directly from client to target stream without a DB round-trip - Refactor channel membership guards to delegate to Pundit policies once they are built:
ConversationChannel#member?→ConversationPolicy,CallChannel#participant?→CallSessionPolicy
Sidekiq — The Background Processor
Role
Sidekiq handles work that cannot happen synchronously in the request-response cycle. It polls a Redis-backed job queue and processes jobs asynchronously. It does not expose an API — it listens.
The primary use case in Chatterly is the missed call job. When a call is initiated and the callee doesn’t respond within 30 seconds, Sidekiq transitions the call session from ringing to missed. This logic is time-delayed — it can’t live in an HTTP request cycle because no request triggered it. It must be scheduled at call creation time and executed independently.
This pattern keeps the Rails API non-blocking while ensuring time-sensitive state transitions happen reliably.
Technology Stack
- Sidekiq 7 — Redis-backed job queue; chosen over Rails 8’s built-in Solid Queue because it uses the same Redis infrastructure already in the stack, and provides better throughput at scale
- Redis logical database /1 — dedicated to Sidekiq job queues, isolated from Action Cable pub/sub (/0), cache (/2), and JWT+presence (/3)
- Runs as a separate process (
bundle exec sidekiq), independent from the Rails API process
Architecture
┌───────────────────────────────────────┐ ┌─────────────┐
│ Polling │ │ │
│ (Sidekiq process polls Redis queue │ │ │
│ for enqueued jobs continuously) │ │ │
├───────────────────────────────────────┤ │ Logging │
│ Business Logic │ │ (Sentry) │
│ (Job classes: MissedCallJob, │ │ │
│ future notification jobs, etc.) │ │ │
├───────────────────────────────────────┤ │ │
│ Data Access │ │ │
│ (ActiveRecord — reads and updates │ │ │
│ CallSession and related models) │ │ │
└───────────────────────────────────────┘ └─────────────┘
│
[PostgreSQL]
The Polling layer is managed by Sidekiq itself — it continuously dequeues jobs and dispatches them to the appropriate worker class. Sidekiq handles concurrency, automatic retries on failure, and dead-letter queuing for jobs that exhaust their retries. The Business Logic layer contains the job classes, each with a single perform method. The Data Access layer uses ActiveRecord to read and write to PostgreSQL.
Principle in focus — Async Messaging via Queue: The Rails API enqueues the
MissedCallJobwithperform_in(30.seconds, call_session_id)and immediately returns a response to the client. The job sits in the Redis queue until 30 seconds pass, then Sidekiq picks it up. If Sidekiq restarts before the job runs, the job survives in Redis and is processed when Sidekiq comes back up. The queue provides durability that a simplesleepor in-memory timer cannot.
Development Instructions
- Job arguments must be scalar types (strings, integers) — never pass ActiveRecord objects; serialize to an ID and reload inside
perform - Always guard state before mutating —
MissedCallJobmust confirm the call session is stillringingbefore transitioning; the call may have already been answered or declined by the time the job runs - Schedule with
perform_in(30.seconds, call_session_id)at the moment the call session is created withringingstatus - Log every step — there is no request context or UI for background jobs; logging is the only way to trace what happened
Next.js Client — The Frontend
Role
The Next.js Client is the only component the end user directly interacts with. It renders the UI, manages client-side state, communicates with the Rails API over HTTP, maintains a real-time WebSocket connection to Action Cable, and handles peer-to-peer WebRTC connections for voice and video calls.
The client holds no persistent business data. Its state is ephemeral — rebuilt from the Rails API on each session. This is the client-side expression of the stateless principle.
Technology Stack
- Next.js (App Router) — file-based routing, server components for fast initial loads, TypeScript out of the box
- Zustand — minimal state management; no Provider wrappers, no boilerplate; components access the store with a single hook
- Tailwind CSS — utility-first styling directly in JSX; no context-switching between component and stylesheet files
- Axios — configured HTTP client with a JWT interceptor that attaches the token to every request
- Action Cable JS — official Rails WebSocket client; a singleton consumer created once in
lib/cable.tsand shared across all channel subscriptions - WebRTC (browser-native) — peer-to-peer audio and video; Action Cable handles signaling, but the media streams flow directly between browsers, never through the server
Architecture
┌────────────────────────────────────────────────────────┐
│ UI Layer │
│ (Next.js pages, React components: MessageList, │
│ Sidebar, CallOverlay, IncomingCallModal, Avatar, │
│ PresenceIndicator, etc.) │
├────────────────────────────────────────────────────────┤
│ State Layer │
│ (Zustand stores: authStore, conversationStore, │
│ callStore — custom hooks: useConversation, │
│ usePresence, useCall) │
├────────────────────────────────────────────────────────┤
│ API Client Layer │
│ (lib/api.ts — Axios instance with JWT interceptor, │
│ lib/cable.ts — Action Cable consumer singleton, │
│ lib/webrtc.ts — RTCPeerConnection helpers) │
└────────────────────────────────────────────────────────┘
│ REST / HTTP │ WebSocket + WebRTC
[Rails API] [Action Cable + Browser P2P]
The UI Layer contains all pages and components. Components are pure rendering — they read from Zustand stores and call store actions, with no direct API calls. The State Layer contains the stores and custom hooks that bridge the store to the API Client Layer, subscribing to Action Cable channels and updating state when WebSocket events arrive. The API Client Layer holds the low-level utilities: the Axios instance, the Action Cable consumer, and WebRTC helpers.
Principle in focus — Stateless Client: The client holds no persistent state between sessions.
authStore,conversationStore, andcallStoreare all in-memory Zustand stores that start empty on page load and are populated entirely from API responses and WebSocket events. This means any tab refresh, any network interruption, any re-login starts from a clean, consistent server state — no stale or conflicting local data.
Development Instructions
- UI components must never call the API Client Layer directly — all data operations go through the State Layer (hooks and stores)
- The Action Cable consumer must be a singleton — one consumer, one connection, all channel subscriptions sharing it
- WebRTC peer connections must be created and managed entirely inside the
useCallhook, not inside components - JWT tokens must be stored in memory (Zustand) or
httpOnlycookies — never inlocalStorage, which is accessible to XSS attacks
Architecture Principles in Practice
Looking across the entire system, the same set of principles appears again and again. Here’s where each one shows up concretely in Chatterly:
Loose Coupling — The Rails API publishes to a Redis channel. Action Cable subscribes to that channel. Neither knows the other exists. Swap either component and the other is unaffected. This is the single most important principle in distributed systems design.
Stateless Services — Every Puma worker process is interchangeable. JWT verification uses Redis, not in-memory sessions. Presence tracking uses Redis TTL, not in-process timers. Any process can be killed and replaced without data loss or session interruption.
Caching — Redis sits between the Rails API and PostgreSQL for read operations. Hot data (conversations, user profiles, presence) is served from memory at < 50ms without touching the database. The cache is populated on write and invalidated on update.
Async Messaging via Queue — The MissedCallJob pattern shows why queues exist: when you need something to happen later, reliably, without blocking the current request, a queue is the answer. The Rails API doesn’t wait. Sidekiq processes it when the time comes. If Sidekiq goes down, the job survives.
Separation of Concerns — Each component has one job, one technology choice, and one scaling axis. Rails API scales horizontally by adding Puma processes. Action Cable scales by adding Redis pub/sub subscribers. Sidekiq scales by adding worker processes. PostgreSQL scales with a read replica. No component’s scaling affects another.
ACID Guarantees — PostgreSQL enforces transactional integrity on every write. A message creation either fully succeeds or fully rolls back — no partial state, no lost records. This is what the 0% message loss SLA is built on.
Time-based Partitioning — The messages table is partitioned by month at the PostgreSQL level. Queries against recent messages only scan the current month’s partition, not hundreds of millions of historical rows. This is the difference between a 2ms query and a 2-second query at scale.
Download the Architecture Document
The full Chatterly architecture document — formatted as a standalone, professional reference — is available as a PDF below. Use it as a template for your own system architecture documentation.
Download: Chatterly Architecture Document (PDF)
Conclusion
Software architecture is not about choosing the most interesting technology. It’s about understanding what the system needs to do and how hard it needs to work, then making the minimum set of decisions that satisfy those constraints reliably. Every component in Chatterly exists because a specific requirement demanded it. Every technology choice has a documented reason. Every architectural principle has a concrete expression in the codebase.
That’s what good architecture looks like — not clever, just clear.
Thank you for reading, and happy coding!