On my last job, the company was using NewRelic (for two environments we was using at the time) which had an ok cost and "suddenly" we'd been forced to use Datadog which costs way over for our budget and after the person responsible for the change and integration see the estimated high costs, started to cut everything possible to keep it low. So, our tools degraded and we wasn't able to test things on staging and collect metrics like we was when using NewRelic. FinOps is certainly a good approach, but we need it from the start!
sda2 34 days ago [-]
ha, we use NewRelic for our application but the company is so cheap, they won’t even buy the infra team a license!
brunoarueira 33 days ago [-]
Haha appears some managers change only the address but the behavior is the same, I already worked on a place where we had to use free heroku addons besides the paid PostgreSQL and the dynos
jsiepkes 34 days ago [-]
One of the advantages of self-hosting is that you don't need this level of FinOps. You also don't have to live in fear of bill-mageddon.
danpalmer 34 days ago [-]
And one of the disadvantages is that you can't solve problems by just spending more. It's a real trade-off, and too often is simplified to one option being obviously better.
m1keil 34 days ago [-]
You can't spend more fast, but you can always spend more.
danpalmer 34 days ago [-]
Why would you spend more if it's not solving problems?
You can certainly spend more, but on-prem/self-hosted, time is usually the limiting factor, either directly or through opportunity cost. Contrary to popular belief, time does not always equal money, if you need more storage and the blocker is that you need to build out an S3 equivalent (rather than just paying for S3), then you'll be blocked by hiring, by hardware lead times, etc.
m1keil 33 days ago [-]
Maybe I'm misreading what you mean, but I don't understand why it's not solving problems. Lack of capactiy solved buy buying more hardware or replacing existing hardware with more powerful hardware. In the datacenter you need to have capacity planning pretty early on, while in the cloud you can get by until reaching very large cloud bill.
I also don't think that every organisation that needs file storage must build a storage solution that should compete the reliabiltiy and features of S3. Most of the times you can get by just fine at fraction of the cost.
danpalmer 33 days ago [-]
I may not be communicating it clearly.
Take file storage for example. Going from one server, to one with backups, to N, to big-N, are all points of inflection where significant engineering is required for on-prem/self-hosted file storage. With a cloud solution none of these are inflection points, none require additional work, they only require additional money.
Assuming you have infinite time, you can just funnel money into things like hardware upgrades and hiring engineers to build these things, but if you don't assume infinite time, time is often as strong or even stronger a factor than money.
At my last company we had a bunch of servers in colo, and could not throw money at solving problems there. Getting a new machine took 2+ days and a bunch of emails, not an API call. We moved to the cloud mostly because the opportunity cost, i.e. the time spent by engineers on toil scaling things on physical machines, was higher than the monetary cost that we could pay on a cloud provider.
This won't be the same for everyone, but the point is that money is roughly the only consideration in cloud, but not the only consideration on-prem, at least when you discount common factors between the two.
m1keil 33 days ago [-]
If you are at the point of having to deal with individual servers and have a very fast-paced development (startup), or have to deal with very burstable traffic spikes (say e-commerce), cloud is probably your friend.
But sometimes you just need more compute and you are the type of organization that buys compute by the floor space and power consumption...
Regardless of the nuances of each situation, I think jsiepkes's comment meant to say that in the data center you can buy pretty killer hardware that will be totally overkill for the moment and won't require you to count active timeseries in order to not pay $300k a month for your metrics, and at the same time will last you for the next couple of years.
Also, for most companies, the next point of inflection will never come and this server will probably last them for a very, very long time.
I'm sharing my point of view as someone who works at an organization that took money as the only consideration and managed to grow over the years to now having to start taking both time and money into consideration because taking only money into consideration proves to be too expensive.
patmorgan23 33 days ago [-]
And you don't have to pick just cloud or on-prem, you can utilize both. Use cloud for your bursty workloads, or for it's CDN/edge, and then your on-prem for consistent workloads. As long as you're not using cloud specific services you can run open source versions on-prem (such as minio for S3, or your own Postgres cluster, or kubernetes + what ever operator)
AznHisoka 33 days ago [-]
And you don’t have to operate your own servers or rent out space either. There’s things like dedicated servers and VPS’s…
jterrys 33 days ago [-]
lol.
We recently switched from Grafana to Prometheus. Reason being that a license refresh took longer to process on their end. What happens when a license expires on Grafana? They fucking shut down all your shit cold turkey. Don't care if you're in prod or have a dedicated guy on their end for support or whatever. So you're happily churning along and then suddenly you're blind. Nice FinOps. With Prometheus there's a grace period where they'll happily overcharge you. But we've never had a product absurdly blow up on us like this before. It's truly mind boggling that they're out here talking about 'FinOps' now.
arccy 33 days ago [-]
prometheus isn't a company though? maybe you're on some other vendor that runs it for you?
valyala 28 days ago [-]
I suppose they switched from Grafana Cloud to self-hosted Prometheus or Prometheus-like solution such as Mimir, Thanos or VictoriaMetrics.
m1keil 34 days ago [-]
So next time someone says "in cloud you don't need anyone to manage it for you" you can link them this article.
NotGMan 34 days ago [-]
You know you have a problem when you're afraid to add a few more metrics because the bill might get too high.
danpalmer 34 days ago [-]
This was the reason that we ended up choosing Datadog over Grafana Cloud at my previous place. Most metrics came from "integrations", and Datadog doesn't charge extra for any of those (they just curate them so that the cardinality can't get too high), whereas Grafana charges (charged?) for each metric and didn't provide anything to reduce cardinality. Their solution was to suggest we did more engineering and ran more infrastructure to aggregate before sending to them, not something we wanted to invest in given that the whole point was to not self-host Grafana anymore.
Datadog is expensive, but at least we were only making these decisions for the ~hundreds of custom business metrics, and not the ~tens of thousands of metrics from our infrastructure.
kozikow 34 days ago [-]
I am big fan of "cost monitoring".
In my previous company I had a good setup for costs monitoring - including release to release comparisons, drill downs, statistics, etc.
After each release I looked at this data. It saved a lot of $, by simple fixes like "why we are calling this API twice?".
It also quite some issues that weren't strictly customer related, but weren't apparent from other type of data (you will always have some "unknown unknowns" in your monitoring, and costs data seem to be pretty wide net to catch some of those)
m1keil 34 days ago [-]
What levels of observability did you have for costs of data transfer and how did you do it?
lukaslalinsky 32 days ago [-]
I'm a fan of Grafana, both the main tool and also the infrastructure projects they have been working on, they did a great service to the IT world. However, the pricing of their cloud service, that's some shady business. It goes from reasonable pricing to oh-my-god pricing so fast. I really wish they would introduce limits on their paid plans, but that's against their business practice.
floating-io 34 days ago [-]
Seriously? Not everything needs to be xxxOps. And if you need a "FinOps" team to manage your cloud cost, I would argue that there's something wrong with your whole damn paradigm.
I kind of agree that I don't love the term. That being said it has become the de-facto way that people refer to the space and practice. The FinOps Foundation hyped up the term and space quite a bit which they deserve credit for but do wish there was a better name :)
mvdtnz 34 days ago [-]
> if you need a "FinOps" team to manage your cloud cost, I would argue that there's something wrong with your whole damn paradigm.
How would you manage your cloud costs if you ran a company of, say, 4,000 engineers? Balancing the needs of delivery teams to build their technology with the needs of the business to manage costs. Do you think every single team should directly report their cloud costs to the CFO? Or at that scale does it make more sense to report costs to another individual? And when that needs to scale, maybe we give that individual a team?
moandcompany 34 days ago [-]
The _Ops team needs an _Ops' team.
The game here is defining parts of the job away to be someone else's job or responsibility.
arccy 34 days ago [-]
we invent new functions to use to decorate our CVs with
j0rdans 34 days ago [-]
I think for most smaller orgs you can get away with an off the shelf product to surface some more basic cost stuff. In relatively large engineering orgs, you're looking at optimising on stuff like cross-region calls to save millions a year so yeah there's a good reason to invest in cloud cost management.
sebazzz 33 days ago [-]
> And if you need a "FinOps" team to manage your cloud cost, I would argue that there's something wrong with your whole damn paradigm.
FinOps isn't a dedicated job but something a cloud engineer can do as part of its job. In the same way that DevOps doesn't need to be a dedicated function itself.
And as for the cloud... Yes that turned out to be a whole lot expensive for companies than predicted - and you share compute with anyone so that dualcore CPU isn't always that fast. But cloud is also flexible and that is where FinOps comes in.
svilen_dobrev 34 days ago [-]
> And if you need a "FinOps" team to manage your cloud cost,
somehow i misread that as "massage your cloud costs", and it.. sticks..
m1keil 34 days ago [-]
FinOps isn't even new (my definition of new is "once you have an O'Reilly book on the topic it is not new anymore").
aledalgrande 34 days ago [-]
I had the same thought. Why do we need to keep coming up with weird names/acronyms/portmanteau
MortyWaves 34 days ago [-]
Sadly it’s one of the only ways of getting through the thick skulls of dumb middle management that seem to always be leading software projects despite having no technical background.
If it’s not got a familiar marketing thing going on, they’ll refuse to acknowledge it even if their devs are practically begging for it.
aledalgrande 34 days ago [-]
> dumb middle management that seem to always be leading software projects despite having no technical background
Or you change to a company with capable & technical middle management :)
It always blows my mind when people say "management does not have to be technical".
senko 34 days ago [-]
Wait until you hear about "RevOps".
At this point I'm waiting for someone to start flogging "CodeOps".
34 days ago [-]
aqueueaqueue 34 days ago [-]
What about a 5k engineer org. Having 10 finops people would make it 0.2% of the workforce. Those people help surface the information for other teams to act.
s2l 34 days ago [-]
Start with educating architects and developers on how to measure and optimize costs. They are the first point (and probably cheapest) to optimize costs.
Later, FinOps role can evolve but expectations will mostly be reactive.
everforward 33 days ago [-]
I don't care for this paradigm because the architects and developers likely have very little context upon which to base decisions. There are tradeoffs between time to develop, cost to run and reliability that it runs at. Where you want to make those tradeoffs is more of a business decision than a technical one.
A high margin product might want to launch quickly and reliably, costs be damned. Another product may need to run as cheap as possible, even if it's unreliable and takes a long time to develop.
FinOps translates business goals into technical requirements. It's not just cutting everything down to as cheap as it can be.
teitoklien 33 days ago [-]
at even $70/hr of employee cost + $200/employee per employee training cost + at 20% annual employee churn + added salary payout to hire devs who are cost conscious.
Lets see how long that is cheaper than just hiring 2-3 finops folk and just put them in every room where the software architecture is being designed for new services and make them drill down hard into team on what to avoid.
Not to mention it’s a better way to do things in the Single Responsibility Principle that most great teams follow
if everyone is responsible for cutting costs and optimizing.
Then no one is….
EfficientDude 34 days ago [-]
What's Grafana? I see it mentioned here a lot, nowhere else though. Is it a YC property?
brunoarueira 34 days ago [-]
Grafana is mostly knows for the most used interface to query Prometheus and create dashboards for collected metrics
MortyWaves 34 days ago [-]
And more recently inventing and reinventing Prometheus alternatives. It’s a bit much trying to keep up.
You can certainly spend more, but on-prem/self-hosted, time is usually the limiting factor, either directly or through opportunity cost. Contrary to popular belief, time does not always equal money, if you need more storage and the blocker is that you need to build out an S3 equivalent (rather than just paying for S3), then you'll be blocked by hiring, by hardware lead times, etc.
I also don't think that every organisation that needs file storage must build a storage solution that should compete the reliabiltiy and features of S3. Most of the times you can get by just fine at fraction of the cost.
Take file storage for example. Going from one server, to one with backups, to N, to big-N, are all points of inflection where significant engineering is required for on-prem/self-hosted file storage. With a cloud solution none of these are inflection points, none require additional work, they only require additional money.
Assuming you have infinite time, you can just funnel money into things like hardware upgrades and hiring engineers to build these things, but if you don't assume infinite time, time is often as strong or even stronger a factor than money.
At my last company we had a bunch of servers in colo, and could not throw money at solving problems there. Getting a new machine took 2+ days and a bunch of emails, not an API call. We moved to the cloud mostly because the opportunity cost, i.e. the time spent by engineers on toil scaling things on physical machines, was higher than the monetary cost that we could pay on a cloud provider.
This won't be the same for everyone, but the point is that money is roughly the only consideration in cloud, but not the only consideration on-prem, at least when you discount common factors between the two.
But sometimes you just need more compute and you are the type of organization that buys compute by the floor space and power consumption...
Regardless of the nuances of each situation, I think jsiepkes's comment meant to say that in the data center you can buy pretty killer hardware that will be totally overkill for the moment and won't require you to count active timeseries in order to not pay $300k a month for your metrics, and at the same time will last you for the next couple of years.
Also, for most companies, the next point of inflection will never come and this server will probably last them for a very, very long time.
I'm sharing my point of view as someone who works at an organization that took money as the only consideration and managed to grow over the years to now having to start taking both time and money into consideration because taking only money into consideration proves to be too expensive.
We recently switched from Grafana to Prometheus. Reason being that a license refresh took longer to process on their end. What happens when a license expires on Grafana? They fucking shut down all your shit cold turkey. Don't care if you're in prod or have a dedicated guy on their end for support or whatever. So you're happily churning along and then suddenly you're blind. Nice FinOps. With Prometheus there's a grace period where they'll happily overcharge you. But we've never had a product absurdly blow up on us like this before. It's truly mind boggling that they're out here talking about 'FinOps' now.
Datadog is expensive, but at least we were only making these decisions for the ~hundreds of custom business metrics, and not the ~tens of thousands of metrics from our infrastructure.
In my previous company I had a good setup for costs monitoring - including release to release comparisons, drill downs, statistics, etc.
After each release I looked at this data. It saved a lot of $, by simple fixes like "why we are calling this API twice?".
It also quite some issues that weren't strictly customer related, but weren't apparent from other type of data (you will always have some "unknown unknowns" in your monitoring, and costs data seem to be pretty wide net to catch some of those)
I kind of agree that I don't love the term. That being said it has become the de-facto way that people refer to the space and practice. The FinOps Foundation hyped up the term and space quite a bit which they deserve credit for but do wish there was a better name :)
How would you manage your cloud costs if you ran a company of, say, 4,000 engineers? Balancing the needs of delivery teams to build their technology with the needs of the business to manage costs. Do you think every single team should directly report their cloud costs to the CFO? Or at that scale does it make more sense to report costs to another individual? And when that needs to scale, maybe we give that individual a team?
The game here is defining parts of the job away to be someone else's job or responsibility.
FinOps isn't a dedicated job but something a cloud engineer can do as part of its job. In the same way that DevOps doesn't need to be a dedicated function itself.
And as for the cloud... Yes that turned out to be a whole lot expensive for companies than predicted - and you share compute with anyone so that dualcore CPU isn't always that fast. But cloud is also flexible and that is where FinOps comes in.
somehow i misread that as "massage your cloud costs", and it.. sticks..
If it’s not got a familiar marketing thing going on, they’ll refuse to acknowledge it even if their devs are practically begging for it.
Or you change to a company with capable & technical middle management :)
It always blows my mind when people say "management does not have to be technical".
At this point I'm waiting for someone to start flogging "CodeOps".
Later, FinOps role can evolve but expectations will mostly be reactive.
A high margin product might want to launch quickly and reliably, costs be damned. Another product may need to run as cheap as possible, even if it's unreliable and takes a long time to develop.
FinOps translates business goals into technical requirements. It's not just cutting everything down to as cheap as it can be.
Lets see how long that is cheaper than just hiring 2-3 finops folk and just put them in every room where the software architecture is being designed for new services and make them drill down hard into team on what to avoid.
Not to mention it’s a better way to do things in the Single Responsibility Principle that most great teams follow
if everyone is responsible for cutting costs and optimizing.
Then no one is….
Nothing to do with YC AFAIK.