upvote
> I've felt like its a little unfair to judge the uptime of company platforms like this; by saying "if any feature at all is down, its all down" and then translating that into 9s for the platform.

This is definitely true.

At the same time, none of the individual services has hit 3x9 uptime in the last 90 days [0], which is their Enterprise SLA [1] ...

> "Uptime" is the percentage of total possible minutes the applicable GitHub service was available in a given calendar quarter. GitHub commits to maintain at least 99.9% Uptime for the applicable GitHub service.

[0]: https://mrshu.github.io/github-statuses/

[1]: https://github.com/customer-terms/github-online-services-sla

(may have edited to add links and stuff, can't remember, one of those days)

reply
So what happens for those enterprise customers now? Is there a meaningful fallout when these services fail to meet their SLAs?
reply
deleted
reply
> If GitHub does not meet the SLA, Customer will be entitled to service credit to Customer's account ("Service Credits") based on the calculation below ("Service Credits Calculation").

The linked document in my previous comment has more detail.

reply
It's worth adding that big (BIG!) business clients will usually negotiate the terms for going below the SLA threshold. The goal is less to be compensated if it happens, and more to incentivize the provider to never let it happen.
reply
Right. Basically, they give you a coupon to lower your cost of future consumption. So, you have to keep consuming the service. If you just leave, you get no rebate. Obviously, very large customers get special deals.
reply
You're right that labelling any outage as "Github is down" is an overgeneralisation, & we should focus on bottlenecks that impact teams in a time sensitive matter, but that isn't the case here. Their most stable service (API) has only two 9s (99.69%).

They're not even struggling to get their average to three 9s, they're struggling to get ANY service to three 9s. They're struggling to get many services to two 9s.

Copilot may be the least stable at one 9, but the services I would consider most critical (Git & Actions) are also at one 9.

reply
I love multiple 9s as much as the next guy but that's only 27 hours per year of downtime. For a mostly free (for me) service, I'm thankful.
reply
Most people complaining about uptime aren't free users or open-source developers. It's people whose companies are enterprise GitHub customers. It's a real problem and affects productivity.
reply
GitHub going down during office hours in a large enterprise has knock on effects for hours as well. Especially if you are in a monorepo.
reply
deleted
reply
I'm happy to report that my one-person sysops has successfully hit nine-fives for the 20th year in a row!
reply
If there's only a 9 in availability, they've got a minimum downtime of 87.6 hours per year (98.99999999999999999%)
reply
Honestly, you're right - 2̶7̵ 87+ (correction from sibling) hours per year is absolutely fine & normal for me & anything I want to run. I personally think it should be fine for everybody.

On the other hand the baseline minimal Github Enterprise plan with no features (no Copilot, GHAS, etc.) runs a medium sized company $1m+ per annum, not including pay-per-use extras like CI minutes. As an individual I'm not the target audience for that invoice, but I can envisage whomever is wanting a couple of 9s to go with it. As a treat.

reply
87 hours a year is 1.5 hours a week. If that 1.5 hour window is when you need to use it it matters a hell of a lot more than if it’s 4am on a Sunday.
reply
Nine nines is too hard; my target is eight eights.
reply
ONLY TWO NINES! Meanwhile vital government services here have a whopping 25% availability.
reply
Two things can be bad.
reply
Lemme guess, those government services are run by the lowest bidder?
reply
This company is part of the portfolio of a $trillion+ transnational corporation. The idea that we can't judge them, when they clearly have more resources than 99% of other companies on this planet, doesn't hold up to any scrutiny.

Why defend a company that clearly doesn't care about its customers and see them as a money spigot to suck dry?

reply
The OP clearly never says we can't judge them. He was speaking to how the uptime is measured. I'm not saying I agree or disgree with the OP but at least address the argument he's making.
reply
There's a completely reasonable comment by jamiemallers on this thread which is marked as 'dead' even after vouching. Not sure what's going on there.
reply
Presumably what's going on is https://news.ycombinator.com/item?id=47340079 . It's been quite an issue lately.
reply
Take a look at his comment history.
reply
It doesn't help that almost all of the big tech companies talking about 5 9s are lying about it; "Does it respond to the API at all, even with errors? It's up!" and so on. If you spend a lot of time analyzing browser traces you see errors and failures constantly from everyone, even huge companies that brag a lot about their prowess. But it's "up" even if a shard is completely down.

The five nines tech people usually are talking about is a fiction; the only place where the measure is really real is in networking, specifically service provider networking, otherwise it's often just various ways of cleverly slicing the data to keep the status screen green. A dead giveaway is a gander at the SLAs and all the ways the SLAs are basically worthless for almost everyone in the space.

See also all of the "1 hour response time" SLAs from open source wrapper companies. Yes, in one hour they will create a case and give you case ID. But that's not how they describe it.

reply
Thats the rub.

Once you dig into the details what does it mean to have 5 9s? Some systems have a huge surface area of calls and views. If the main web page is down but the entire backend API still is responding fine is that a 'down'? Well sorta. Or what if one misc API that some users only call during onboarding is down does that count? Well technically yes.

It depends on your users and what path they use and what is the general path.

Then add in response times to those down items. Those are usually made up too.

reply
[dead]
reply