New content is up on Infrastructure Engineer. Share your thoughts on Twitter at @lethain, or reply to this email.
In my early career roles, I worked at companies that never worried about their infrastructure costs at all. They were simply too low a cost and growing too slowly for the Finance team to pay much attention to it. This “ignore it until it’s too large to ignore” approach served me well.
Until it didn’t.
Working at Uber, I was caught me off guard when a new Director joined and overnight infrastructure costs were recategorized from insignificant to requiring urgent, detailed review every month. Adding the instrumentation and accountability for these costs retroactively was a difficult retrofit. Although I was surprised that time, I’ve come to appreciate that all successful companies go through the transition from ignoring to setting goals on infrastructure costs, and an early focus during my time at Stripe was ensuring we were ready ahead of that shift.
Your job as an infrastructure leader is diagnosing the right mode of operation for your company’s infrastructure costs today, understanding when you’re likely to switch modes, and ensuring you’ve done the prework to make the transition relatively painless.
We’ll explore this topic by digging into:
When you finish reading this, you won’t have your entire efficiency plan worked out, but you will have the high-level pieces, know where you need to dig in, and have a clear approach to communciate to anyone who has been pushing you for a documented approach around infrastructure costs.
Before diving into the mechanics of managing infrastructure costs, the first question to answer is whether it’s a valuable use of organizational time to make your current infrastructure spend more efficient. How you think about this will vary a bit depending on whether your company is early-stage, prioritizing growth, or focused on profitability in late-stage.
Generally speaking, very early-stage companies shouldn’t spend much time thinking about infrastructure costs. You should instead be focused on finding product-market fit for your first product.
Here are two checks you can run to determine if it’s worth reducing your infrastructure costs:
If you’re not violating either of those checks, then keep on ignoring infrastructure spend. If you are exceeding one, and infrastructure costs are a significant part of your overall burn, then invest a sprint into reducing spend, and then resume ignoring it once these checks resume passing.
The one notable exception is if you’re building a low-margin product or product where cost efficiency is a pillar of your long-term strategy. For example, if you’re operating a metrics collection and dashboarding product like Datadog, then efficiency probably is worth considering earlier than usual.
When you’re prioritizing growth, the primary focus of the engineering organization in a technology company is creating, operating and advancing the products that support the business. Managing costs is important, but even immaculate cost management won’t make your company a success if enough energy isn’t being invested in your product.
The fundamental question to ask is whether infrastructure’s share of cost of goods sold (COGS) is increasing as a percentage of revenue? (The simplest way to think COGS is all your non-headcount costs, although a slightly better definition would be all costs to operate your software.)
Start answering this question by plotting revenue and infrastructure costs on a chart to get a sense of how these two numbers are moving. Although logarithmic scales often generate more confusion than they’re worth, in this case it’s usually the only way to see both lines closely enough to understand their slopes within a single chart. You particularly want to understand if either line has experienced an inflection over the past few quarters. If costs have started accelerating without corresponding acceleration of revenue, that’s worth digging into.
Once you’ve looked at the two lines independently to understand their movement, simplify your first chart into a chart showing infrastructure costs as a percentage of revenue. This chart hides some detail but is easier to parse for folks further away from the details. As long as the ratio is going down and your company is focused on growth, then this data should be sufficient to justify your current level of investment into efficiency: if growth is key, and infrastructure costs are not getting in the way, why should you slow down growth to reduce them?
Even the best business lines stop growing at some point. Facebook is one of the most valuable businesses in the world, but even they at some point ran out of new users to attract to their platform. Once growth slows, a business naturally starts focusing more on costs, including infrastructure spend.
In those scenarios, the easiest approach is to work with the business to align on two numbers:
In both, the key thing is moving away from anchoring on a percentage of revenue and instead setting a target against the fundamental operations that you support. Thinking of costs as a percentage of revenue works well when you’re growing, but is too abstract and hides too many details once you’re focused on reducing costs.
If you find yourself exceeding those targets, then it’s time to dive into reducing them.
What I’ll introduce here is the fairly common playbook for managing infrastructure costs. As you work through these approaches, your goal is to do as few of them as possible while meeting your efficiency goals. I’ve prefixed a few particularly high return-on-investment tools with a “⭐, if you’re debating where to start, consider starting with them.
If doing all of these sounds overwhelming, it should! Few companies do all of these, and those that do either operate in a business that is unusually margin sensitive or are spending many millions a year on their infrastructure costs.
Generally, the way I think through spinning out any given area into a dedicated team is described in Trunk and Branches Model, and that applies for the efficiency as well. That said, let me add a few caveats to that general approach as it applies here.
Much like managing technical quality, efficiency is an area where you can make significant progress with one-off initiatives. Improving how you use AWS Reserved Instances or renegotiating your vendor contracts can reduce your spent by 30-40% in a week or two. Product-level improvements to your architecture can reduce your spend even more, although they’ll probably take a bit longer.
Because you can make significant progress through one-off initiatives, the default is to wait until late into a company’s growth to spin out a dedicated team, and in most cases that’s the right decision.
The three factors to consider as you think through whether postponing a dedicated team is the best solution for you are:
If you answer yes to any of those, then you may want to spin out a team earlier than the Trunk and Branches Model suggests. As you start sourcing candidates, it’ll become apparent that this is a bit of a custom role with folks who specifically enjoy working on the problem. Recruiting one or two folks with siginficant preexisting experience will save you years!
Read more stories on Infrastructure Engineer. Hope to hear your thoughts on Twitter at @lethain!