
Blog
/
Apr 24, 2025
Written by: Jaisal Friedman
Since Counsel’s inception in 2023, asynchronous chat has always been the backbone of our product and care model. By November 2024, message volumes had surged and it was clear that we had to re-architect our infrastructure for scale, reliability, and performance.
This blog post is a behind-the-scenes look at how we rebuilt our chat platform. If you’re building anything chat-related, especially in healthcare, we hope you can learn from our experience.
The Problem
We built our original chat system on Twilio Conversations, mainly for its cross-platform SMS support, which is critical for reaching our patients. But the more we scaled, the more issues we found:
Dropped messages: Users and physicians missed messages when the Twilio SDK silently failed to connect.
No retries or alerts: Failed messages weren’t retried or flagged, leaving users thinking they’d responded when they hadn’t.
Sluggish startup experience: The app took up to 2 minutes to load conversations because we had no control over how Twilio’s SDK loaded data.
In short, our chat infrastructure met the demand of our early users, but it didn’t reflect the quality of care we always strive to provide.
The New Architecture
We rebuilt our chat system around three key principles: control, observability, and fault tolerance.
API-first design: We wrapped all write actions in our own APIs. Twilio is now just an implementation detail. These APIs are idempotent and transactional, so we never end up with inconsistent data.
Startup optimization: Instead of loading all messages on launch, we fetch only paginated, recent data. This alone cut physician app startup time from 2 minutes to under 1 second.
Resilient offline support: When the SDK fails to connect, we don’t leave users stranded. Messages queue locally, retry automatically, and notify users if delivery ultimately fails.
Tackling Message Reconciliation
One of our trickiest bugs was handling messages that were “sent” locally, but never made it to Twilio. To fix this, we introduced a reconciliation loop between local and remote messages using Redux sagas. If a message isn’t confirmed by the server (via HTTP or WebSocket), it stays flagged as unconfirmed, and the UI reflects that state.
The below code showcases how we implemented the reconciliation logic using sagas:
Transactional APIs & Idempotency
Chat operations now go through an internal API layer that keeps Twilio in sync with our own database. To make this safe and repeatable, we implemented a higher-order rollback function and used unique idempotency keys on every request. Inspired by Stripe, we made every POST request idempotent.
When a message is submitted twice concurrently, we simultaneously check our database DB and lock the row for updates:
If it’s already there, we return the same response.
If not, we insert it, write to Twilio, and log it.
This guarantees consistency and shields our users from bugs in connectivity or retry logic.
Resilient Offline Support
Several parts of our client-side app rely on a WebSocket-based SDK to fetch data. We wanted to uncouple the usability of the app from the connectivity state of the SDK. WebSockets can be unreliable on mobile devices.
To achieve this we implemented a class that wraps all calls to the underlying WebSocket server.
Secondly, we made sure that every critical piece of data has an HTTP-based API that our app will start to use after the SDK is stuck connecting for longer than 10 seconds.
Why We Stayed on Twilio
We also debated moving off Twilio entirely. Options like Ably or a homegrown WebSocket solution looked appealing, but supporting SMS remains a core requirement for us. Until we no longer need it, we’ve architected our system so that swapping out Twilio in the future can be straightforward.
The Outcomes
Since rolling out this new infrastructure:
Message delivery failures have dropped to near-zero
Uptime has stayed above 99.9%
Most importantly, we’re now designed to deliver reliable, high-quality care at scale. Infrastructure may not be flashy, but it’s what makes our mission of multiplying the world’s clinical capacity possible.
If you’re building in health tech and struggling with real-time communication, we’ve been there, so feel free to reach out.
If you’re an engineer who likes working on hard, meaningful problems, we’re hiring.