Improve this architecture

Aaron,

Title

Someone asked how I would improve the architecture of the Rev1 image in about 10 minutes with a one-week delivery deadline for the solution. Sounds simple enough right? The first thing to clarify was the use case of such an architecture; how many people are using it and why. Okay so there are about six thousand tenants times multiple accounts accessing this architecture throughout the day, there is burst activity in the early hours and sporadic access during evenings, but this workload mainly follows the sun for a single region and is not globally available. We don't know why they are using this or what this service is actually doing.

The Diagram (Rev1)

Alt text

Initial thoughts

Rev1 is not: scalable, highly-available nor reliable. It also invalidates the single-responsibility principle. What do I mean by that and why?

Two potential solutions

The simple answer I might give as a Senior Manager (Rev2):

This is a time critical ask, so in a situation where the team is directly asking me for a potential solution id suggest:

How long do you think this would take with a team of 4 engineers? reply in the comments on my LinkedIn post.

The answer I would give as an Engineering Lead (Rev3):

We have a week and a handful of engineers to get this done, this is priority and whatever we build we need to be happy that the solution will stand on its own two feet, here are some action items:

How long do you think this would take with a team of 4 engineers? reply in the comments on my LinkedIn post.

So what does this look like?

Rev2 Diagram

Highlighting my approach as a Senior Manager Alt text

Rev3 Diagram

Highlighting my approach as an Engineering Lead Alt text

Final thoughts

It goes without saying that Rev3 is more complex to implement, both in required knowledge of the team and initial deployment, however; with the correct planning it can be achieved relatively quickly.

Is it my preferred solution? Maybe - depending on cost, team size etc.

I'd estimate the implementation of Rev3 would take four mid-level engineers with some Kubernetes experience to implement this solution for a single environment before the required delivery date.

Notes - Outside of scope for 1 week delivery

I'd speak to the cloud provider to secure a commited use discount for all worker nodes that we're using to bring the cost down to something comparable to spot instances and hibernate any on-demand instances.

Stay tuned & follow

Stay tuned and follow me for my next post on architecture where we discuss how to get the most out of a single Kubernetes control-plane, it's going to be an interesting one.