undefined

upvote

points

by GlenTheMachine14 hours ago |

upvote

by vidarh7 hours ago|

[-]

I've had actual, real-life deployments in datacentres where we just left dead hardware in the racks until we needed the space, and we rarely did. Typically we'd visit a couple of times a year, because it was cheap to do so, but it'd have totally viable to let failures accumulate over a much longer time horizon.

Failure rates tend to follow a bathtub curve, so if you burn-in the hardware before launch, you'd expect low failure rates for a long period and it's quite likely it'd be cheaper to not replace components and just ensure enough redundancy for key systems (power, cooling, networking) that you could just shut down and disable any dead servers, and then replace the whole unit when enough parts have failed.

reply

upvote

by rajnathani7 hours ago|

[-]

Exactly what I was thinking when the OP comment brought up "regular launches containing replacement hardware", this is easily solvable by actually "treating servers as cattle and not pets" whereby one would simply over-provision servers and then simply replace faulty servers around once per year.

Side: Thanks for sharing about the "bathtub curve", as TIL and I'm surprised I haven't heard of this before especially as it's related to reliability engineering (as from searching on HN (Algolia) that no HN post about the bathtub curve crossed 9 points).

reply

upvote

by btown4 hours ago|

[-]

https://accendoreliability.com/the-bath-tub-curve-explained/ is an interesting breakdown of bath tub curve dynamics for those curious!

reply

upvote

by rtkwe3 hours ago|

[-]

Wonder if you could game that in theory by burning in the components on the surface before launch or if the launch would cause a big enough spike from the vibration damage that it's not worth it.

reply

upvote

by dapperdrake1 hours ago|

[-]

Maybe they are different types of failure modes. Solar panel semiconductors hate vibration.

And then, there is of course radiation trouble.

So those two kinds of burn-in require a launch ti space anyway.

reply

upvote

by vidarh1 hours ago|

[-]

I suspect you'd absolutely want to burn in before launch, maybe even including simulating some mechanical stress to "shake out" more issues, but it is a valid question how much burn in is worth doing before and after launch.

reply

upvote

by 0xffff22 minutes ago|

[-]

Vibration testing is a completely standard part of space payload pre-flight testing. You would absolutely want to vibe-test (no, not that kind) at both a component level and fully integrated before launch.

reply

upvote

by dapperdrake1 hours ago|

[-]

Ah, the good old BETA distribution.

Programming and CS people somehow rarely look at that.

reply

upvote

by TheOtherHobbes2 hours ago|

[-]

The analysis has zero redundancy for either servers or support systems.

Redundancy is a small issue on Earth, but completely changes the calculations for space because you need more of everything, which makes the already-unfavourable space and mass requirements even less plausible.

Without backup cooling and power one small failure could take the entire facility offline.

And active cooling - which is a given at these power densities - requires complex pumps and plumbing which have to survive a launch.

The whole idea is bonkers.

IMO you'd be better off thinking about a swarm of cheaper, simpler, individual serversats or racksats connected by a radio or microwave comms mesh.

I have no idea if that's any more economic, but at least it solves the most obvious redundancy and deployment issues.

reply

upvote

by vidarh1 hours ago|

[-]

> The analysis has zero redundancy for either servers or support systems.

The analysis is a third party analysis that among other things presumes they'll launch unmodified Nvidia racks, which would make no sense. It might be this means Starcloud are bonkers, but it might also mean the analysis is based on flawed assumptions about what they're planning to do. Or a bit of both.

> IMO you'd be better off thinking about a swarm of cheaper, simpler, individual serversats or racksats connected by a radio or microwave comms mesh.

This would get you significantly less redundancy other than against physical strikes than having the same redundancy in a single unit and letting you control what feeds what, the same way we have smart, redundant power supplies and cooling in every data center (and in the racks they're talking about using as the basis).

If power and cooling die faster than the servers, you'd either need to overprovision or shut down servers to compensate, but it's certainly not all or nothing.

reply

upvote

by conradev2 hours ago|

[-]

Many small satellites also increases the surface area for cooling

reply

upvote

by tessierashpool1 hours ago|

[-]

even a swarm of satellites has risk factors. we treat space as if it were empty (it's in the name) but there's debris left over from previous missions. this stuff orbits at a very high velocity, so if an object greater than 10cm is projected to get within a couple kilometers of the ISS, they move the ISS out of the way. they did this in April and it happens about once a year.

the more satellites you put up there, the more it happens, and the greater the risk that the immediate orbital zone around Earth devolves into an impenetrable whirlwind of space trash, aka Kessler Syndrome.

reply

upvote

by drewg12335 minutes ago|

[-]

I'd naively assume that the stress of launch (vibration, G-forces) would trigger failures in hardware that had been working on the ground. So I'd expect to see a large-ish number of failures on initial bringup in space.

reply

upvote

by Retric22 minutes ago|

[-]

Electronics can be extremely resilient to vibration and g forces. Self guided artillery shells such as the M982 Excalibur include fairly normal electronics for GPS guidance. https://en.wikipedia.org/wiki/M982_Excalibur

reply

upvote

by VectorLock2 hours ago|

[-]

The original article even addresses this directly. Plus hardware returns over fast enough that you'll simply be replacing modules with a smattering of dead servers with entirely new generations anyways.

reply

upvote

by dapperdrake1 hours ago|

[-]

Really? Even radiation hardened hardware? Aren’t there way higher size floors on the transistors?

reply

upvote

by asah4 hours ago|

[-]

serious q: how much extra failure rate would you expect from the physical transition to space?

on one hand, I imagine you'd rack things up so the whole rack/etc moves as one into space, OTOH there's still movement and things "shaking loose" plus the vibration, acceleration of the flight and loss of gravity...

reply

upvote

by schmidtleonard4 hours ago|

[-]

Yes, an orbital launch probably resets the bathtub to some degree.

reply

upvote

by lumost3 hours ago|

[-]

I suspect the thermal system would look very different from a terrestrial component. Fans and connectors can shake loose - but do nothing in space.

Perhaps the server would be immersed in a thermally conductive resin to avoid parts shaking loose? If the thermals are taken care of by fixed heat pipes and external radiators - non thermally conductive resins could be used.

reply

upvote

by Coffeewine6 hours ago|

[-]

It would be interesting to see if the failure rate across time holds true after a rocket launch and time spent in space. My guess is that it wouldn’t, but that’s just a guess.

reply

upvote

by vidarh5 hours ago|

[-]

I think it's likely the overall rate would be higher, and you might find you need more aggressive burn-in, but even then you'd need an extremely high failure rate before it's more efficient to replace components than writing them off.

reply

upvote

by MobiusHorizons2 hours ago|

[-]

The bathtub curve isn’t the same for all components of a server though. Writing off the entire server because a single ram chip or ssd or network card failed would limit the entire server to the lifetime of the weakest part. I think you would want redundant hot spares of certain components with lower mean time between failures.

reply

upvote

by vidarh1 hours ago|

[-]

We do often write off an entire server because a single component fails because the lifetime of the shortest-lifetime components is usually long enough that even on-earth with easy access it's often not worth the cost to try to repair. In an easy-to-access data centre, the component most likely to get replaced would be hot-swappable drives or power supplies, but it's been about 2 decades since the last time I worked anywhere where anyone bothered to check for failed RAM or failed CPUs to salvage a server. And lot of servers don't have network devices you can replace without soldering, and haven't for a long time outside of really high end networking.

And at sufficient scale, once you plan for that it means you can massively simplify the servers. The amount of waste a sever case suitable for hot-swapping drives adds if you're not actually going to use the capability is massive.

reply

upvote

by 4ndrewl53 minutes ago|

[-]

A new meaning to the term "space junk"

reply

upvote

by NitpickLawyer11 hours ago|

[-]

Appreciate the insights, but I think failing hardware is the least of their problems. In that underwater pod trial, MS saw lower failure rates than expected (nitrogen atmosphere could be a key factor there).

> The company only lost six of the 855 submerged servers versus the eight servers that needed replacement (from the total of 135) on the parallel experiment Microsoft ran on land. It equates to a 0.7% loss in the sea versus 5.9% on land.

6/855 servers over 6 years is nothing. You'd simply re-launch the whole thing in 6 years (with advances in hardware anyways) and you'd call it a day. Just route around the bad servers. Add a bit more redundancy in your scheme. Plan for 10% to fail.

That being said, it's a complete bonkers proposal until they figure out the big problems, like cooling, power, and so on.

reply

upvote

by nine_k9 hours ago|

[-]

Indeed, MS had it easier with a huge, readily available cooling reservoir and a layer of water that additionally protects (a little) against cosmic rays, plus the whole thing had to be heavy enough to sink. An orbital datacenter would be in a opposite situation: all cooling is radiative, many more high-energy particles, and the weight should be as light as possible.

reply

upvote

by dragonwriter1 hours ago|

[-]

> In that underwater pod trial, MS saw lower failure rates than expected

Underwater pods are the polar opposite of space in terms of failure risks. They don't require a rocket launch to get there, and they further insulate the servers from radiation compared to operating on the surface of the Earth, rather than increasing exposure.

(Also, much easier to cool.)

reply

upvote

by sheepybloke1 hours ago|

[-]

The biggest difference is radiation. Even in LEO, you will get radiation-caused Single Events that will affect the hardware. That could be a small error or a destructive error, depending on what gets hit.

reply

upvote

by VectorLock2 hours ago|

[-]

Power is solar and cooling is radiators. They did the math on it, its feasible and mostly an engineering problem now.

reply

upvote

by looofooo07 hours ago|

[-]

Power!? Isnt that just PV and batteries? LEO has like 1.5h orbit.

reply

upvote

by literalAardvark6 hours ago|

[-]

It's a Datacenter... I guess solar is what they're planning to use, but the array will be so large it'll have its own gravity well

reply

upvote

by bevr13371 hours ago|

[-]

All mass has gravity

reply

upvote

by 8 minutes ago|

[-]

deleted

reply

upvote

by lumost3 hours ago|

[-]

I used to build and operate data center infrastructure. There is very limited reason to do anything more than a warranty replacement on a GPU. With a high quality hardware vendor that properly engineers the physical machine, failure rates can be contained to less than .5% per year. Particularly if the network has redundancy to avoid critical mass failures.

In this case, I see no reason to perform any replacements of any kind. Proper networked serial port and power controls would allow maintenance for firmware/software issues.

reply

upvote

by protocolture13 hours ago|

[-]

Did Microsoft do any of that with their submersible tests?

My feeling is that, a bit like starlink, you would just deprecate failed hardware, rather than bother with all the moving parts to replace faulty ram.

Does mean your comms and OOB tools need to be better than the average american colo provider but I would hope that would be a given.

reply

upvote

by protocolture13 hours ago|

[-]

>The mass analysis also doesn't appear to include the massive number of heat pipes you would need to transfer the heat from the chips to the radiators. For an orbiting datacenter, that would probably be the single biggest mass allocation.

And once you remove all the moving parts, you just fill the whole thing with oil rather than air and let heat transfer more smoothly to the radiators.

reply

upvote

by MadnessASAP10 hours ago|

[-]

Oil, like air, doesn't convent well in 0G, you'll need pretty hefty pumps and well designed layouts to ensure no hot spots form. Heat pipes are at least passive and don't depend on gravity.

reply

upvote

by qmr10 hours ago|

[-]

Mineral oil density is around 900kg / cubic meter.

Not sure this is such a great idea.

reply

upvote

by sagarm11 hours ago|

[-]

Does using oil solve the mass problem? Liquids aren't light.

reply

upvote

by protocolture11 hours ago|

[-]

I would wager that its lighter than:

Repair robots

Enough air between servers to allow robots to access and replace componentry.

Spare componentry.

An eject/return system.

Heatpipes from every server to the radiators.

reply

upvote

by junon10 hours ago|

[-]

I would wager it isn't.

reply

upvote

by littlestymaar11 hours ago|

[-]

First, oil is much heavier than air.

Second: you still need radiators to dissipate heat that is in oil somehow.

reply

upvote

by oceanplexian13 hours ago|

[-]

Why does it need to be robots?

On Earth we have skeleton crews maintain large datacenters. If the cost of mass to orbit is 100x cheaper, it’s not that absurd to have an on-call rotation of humans to maintain the space datacenter and install parts shipped on space FedEx or whatever we have in the future.

reply

upvote

by verzali8 hours ago|

[-]

If you want to have people you need to add in a whole lot of life support and additional safety to keep people alive. Robots are easier, since they don't die so easily. If you can get them to work at all, that is.

reply

upvote

by intended7 hours ago|

[-]

Life support can be on the shuttle/transport. Or it can be its own hab… space office ? Space workshop ?

reply

upvote

by kranke1554 hours ago|

[-]

What about food, water and air filtration needs?

reply

upvote

by Mtinie2 hours ago|

[-]

Presumably those needs are handled on the habitat where the orbital maintenance team lives when they aren’t visiting satellite data centers.

Treat each maintenance trip like an EVA (extra vehicular activity) and bring your life support with you.

reply

upvote

by tokai2 hours ago|

[-]

Thats life support.

reply

upvote

by monster_truck13 hours ago|

[-]

That isn't going to last for much longer with the way power density projections are looking.

Consider that we've been at the point where layers of monitoring & lockout systems are required to ensure no humans get caught in hot spots, which can surpass 100C, for quite some time now.

reply

upvote

by Robotbeat12 hours ago|

[-]

You mean like every single kitchen?

reply

upvote

by dweinus2 hours ago|

[-]

You might be thinking of 100F, a toasty summer day. 100C on the other hand (about 212F) is fatal even in zero humidity.

reply

upvote

by wmf11 hours ago|

[-]

Yeah, just attach a Haven module to the data center.

reply

upvote

by Robotbeat12 hours ago|

[-]

Bingo.

It's all contingent on a factor of 100-1000x reduction in launch costs, and a lot of the objections to the idea don't really engage with that concept. That's a cost comparable to air travel (both air freight and passenger travel).

(Especially irritating is the continued assertion that thermal radiation is really hard, and not like something that every satellite already seems to deal with just fine, with a radiator surface much smaller than the solar array.)

reply

upvote

by verzali8 hours ago|

[-]

It is really hard, and it is something you need to take into careful consideration when designing a satellite.

It is really fucking hard when you have 40MW of heat being generated that you somehow have to get rid of.

reply

upvote

by HPsquared7 hours ago|

[-]

It's all relative. Is it harder than getting 40MW of (stable!) power? Harder than packaging and launching the thing? Sure it's a bit of a problem, perhaps harder than other satellites if the temperature needs to be lower (assuming commodity server hardware) so the radiator system might need to be large. But large isn't the same as difficult.

reply

upvote

by weq12 hours ago|

[-]

Musk is already in the testing phase for this. His starship rockets should be reusable as soon as 2018!

reply

upvote

by Robotbeat3 hours ago|

[-]

Well sure. If you think fully reusable rockets won’t ever happen, then the datacenter in space thing isn’t viable. But THAT’S where the problem is, not innumerate bullcrap about size of radiators.

(And of course, the mostly reusable Falcon 9 is launching far more mass to orbit than the rest of the world combined, launching about 150 times per year. No one yet has managed to field a similarly highly reusable orbital rocket booster since Falcon 9 was first recovered about 10 years ago in 2015).

reply

upvote

by Nevermark11 hours ago|

[-]

And in the meantime, he has responsibly redistributed and recycled their mass. Avoiding any concern that Earth's mass could be negatively impacted.

reply

upvote

by mavhc8 hours ago|

[-]

How will he overtake all the other reusable rockets at this rate?

reply

upvote

by monster_truck13 hours ago|

[-]

I suspect they'd stop at automatic rendezvous & docking. Use some sort of cradle system that holds heat fins, power, etc that boxes of racks would slot into. Once they fail just pop em out and let em burn up. Someone else will figure out the landing bit

I won't say it's a good idea, but it's a fun way to get rid of e-waste (I envision this as a sort of old persons home for parted out supercomptuers)

reply

upvote

by closewith11 hours ago|

[-]

Spreading heavy metals in the upper atmosphere. Fun.

reply

upvote

by garbagewoman8 hours ago|

[-]

seems to be an industry standard

reply

upvote

by callamdelaney57 minutes ago|

[-]

What, why would you fly out and replace it? It'd be much cheaper just to launch more.

reply

upvote

by Spooky235 hours ago|

[-]

Don’t you need to look at different failure scenarios or patterns in orbit due to exposure to cosmic rays as well?

It just seems funny, I recall when servers started getting more energy dense it was a revelation to many computer folks that safe operating temps in a datacenter should be quite high.

I’d imagine operating in space has lots of revelations in store. It’s a fascinating idea with big potential impact… but I wouldn’t expect this investment to pay out!

reply

upvote

by intended7 hours ago|

[-]

It sounds like building it on the moon would be better.

reply

upvote

by empath752 hours ago|

[-]

I think what you actually do is let it gradually degrade over time and then launch a new one.

reply

upvote

by hamburglar14 hours ago|

[-]

Seems prudent to achieve fully robotic datacenters on earth before doing it in space. I know, I’m a real wet blanket.

reply

upvote

by Robotbeat12 hours ago|

[-]

If mass is going to be as cheap as is needed for this to work anyway, there's no reason you can't just use people like in a normal datacenter.

reply

upvote

by 11 hours ago|

[-]

deleted

reply

upvote

by littlestymaar11 hours ago|

[-]

Space is very bad for the human body, you wouldn't be able to leave the humans there waiting for something to happen like you do on earth, they'd need to be sent from earth every time.

Also, making something suitable for humans means having lots of empty space where the human can walk around (or float around, rather, since we're talking about space).

reply

upvote

by switknee5 hours ago|

[-]

Underwater welder, though being replaced by drone operator, is still a trade despite the health risks. Do you think nobody on this whole planet would take a space datacenter job on a 3 month rotation?

I agree that it may be best to avoid needing the space and facilities for a human being in the satellite. Fire and forget. Launch it further into space instead of back to earth for a decommission. People can salvage the materials later.

reply

upvote

by littlestymaar3 hours ago|

[-]

The problem isn't health “risk”, there are risks but there are also health effects that will come with certainty. For instance, low gravity deplete your muscles pretty fast. Spend three month in space and you're not going to walk out of the reentry vehicle.

This effect can be somehow overcome by exercising while in space but it's not perfect even with the insane amount of medical monitoring the guys up there receive.

reply

upvote

by Robotbeat3 hours ago|

[-]

Then just provide spin gravity for the crew habitat.

reply

upvote

by littlestymaar3 hours ago|

[-]

“just”

It's theoretically possible for sure, but we've never done that in practice and it's far from trivial.

reply

upvote

by MobiusHorizons1 hours ago|

[-]

Good points. Spin “gravity” is also quite challenging to acclimatize to because it’s not uniform like planetary gravity. Lots of nausea and unintuitive gyroscopic effects when moving. It’s definitely not a “just”

reply

upvote

by HPsquared7 hours ago|

[-]

The economics don't work the same on earth.

reply

upvote

by andreasmetsala5 hours ago|

[-]

What makes the economics better in space?

Are there any unique use-cases waiting to be unleashed?

reply

upvote

by HPsquared5 hours ago|

[-]

Regular maintenance methods are cheap on earth and infeasible in space.

Keep in mind economics is all about allocation of scarce resources with alternative uses.

reply

upvote

by aaron6958 hours ago|

[-]

[dead]

reply