Do Simpler Systems Fail Better?
I was recently reading Greg Kogan’s blog Simple Systems have less downtime.
It really caught my attention.
As a professional services consultant over the years, I’ve worked with almost 200 firms. And many of those required the unraveling of complex systems. And systems that were no longer well understood after the first wave of builders have long since gone.
So, this topic resonates strongly for me.
I believe if firms adopted these advice, I would have a lot less work over the years. Seriously.
Redundancy
Redundancy means backup systems. If your laptop fails, do you have a second one with all your up-to-date data? If us-east-1 fails, do you have a backup or live copy of your database in another region?
Redundancy isn’t just backup systems, it is backup people. If Jane who manages Salesforce gets in an accident, what will the business do to support the sales teams? If a system gets hacked and compromised, how can you restore the most recent data?
Complex systems fail in surprising ways. Having a plan B, and plan C, and for really essential services and plan D will save the day.
Take Greg’s example of a container ship:
- if the automatic system fails you can steer the thing manually. Wow!
- if other electronics fail, you can control the damn rudder by hand!
Incredible to think a ship that big is basically a giant sailboat when you disengaged the powered systems. That is truly a lesson for all of us startup engineers.
Read: How can 1% of something equal nothing?
Overlapping Skillsets
If you only have one guy who knows how to use the database platform, that’s a problem. If you have only one woman who knows how to program in Rust, that’s a problem. If there’s only one person who knows how the reporting system works and can make changes, that’s a problem.
Better to have overlapping job roles and skillsets. If you have a chance to adopt a new technology, make sure it’s rock solid one that is mainstream, and easy to hire for.
Related: Is Fred Wilson right about dealing in an honest, direct and transparent way?
Beware Of Technical Debt
We’ve all heard the reasons.
- We don’t have the luxury to fix that now.
- We can’t afford the downtime.
- We have pressing features to ship.
But as technical debt piles on, so does complexity. And you’ll quickly end up end up carrying a larger burden than you realized.
As advocated by Kogan, rip and replace is often a more serious solution, and better for the firm. Yes, you’ll have some downtime. Yes, you’ll redirect team members temporarily. But you’ll solve the real problem and will bring more simplicity to your architecture.
What’s more, the pain of paying down the debt will make you think twice about borrowing in the future!
Conclusion
From the above discussion, we hope now you’ve understood that complex systems can fail unexpectedly. So, it’s important to have backup plans and redundancy measures in place. Overlapping job roles and skillsets can also help prevent problems. And while technical debt may seem easier to ignore, it can lead to even more complexity down the line. So, always be aware of that.