Failure rates tend to follow a bathtub curve, so if you burn-in the hardware before launch, you'd expect low failure rates for a long period and it's quite likely it'd be cheaper to not replace components and just ensure enough redundancy for key systems (power, cooling, networking) that you could just shut down and disable any dead servers, and then replace the whole unit when enough parts have failed.
Side: Thanks for sharing about the "bathtub curve", as TIL and I'm surprised I haven't heard of this before especially as it's related to reliability engineering (as from searching on HN (Algolia) that no HN post about the bathtub curve crossed 9 points).
And then, there is of course radiation trouble.
So those two kinds of burn-in require a launch ti space anyway.
Programming and CS people somehow rarely look at that.
Redundancy is a small issue on Earth, but completely changes the calculations for space because you need more of everything, which makes the already-unfavourable space and mass requirements even less plausible.
Without backup cooling and power one small failure could take the entire facility offline.
And active cooling - which is a given at these power densities - requires complex pumps and plumbing which have to survive a launch.
The whole idea is bonkers.
IMO you'd be better off thinking about a swarm of cheaper, simpler, individual serversats or racksats connected by a radio or microwave comms mesh.
I have no idea if that's any more economic, but at least it solves the most obvious redundancy and deployment issues.
The analysis is a third party analysis that among other things presumes they'll launch unmodified Nvidia racks, which would make no sense. It might be this means Starcloud are bonkers, but it might also mean the analysis is based on flawed assumptions about what they're planning to do. Or a bit of both.
> IMO you'd be better off thinking about a swarm of cheaper, simpler, individual serversats or racksats connected by a radio or microwave comms mesh.
This would get you significantly less redundancy other than against physical strikes than having the same redundancy in a single unit and letting you control what feeds what, the same way we have smart, redundant power supplies and cooling in every data center (and in the racks they're talking about using as the basis).
If power and cooling die faster than the servers, you'd either need to overprovision or shut down servers to compensate, but it's certainly not all or nothing.
the more satellites you put up there, the more it happens, and the greater the risk that the immediate orbital zone around Earth devolves into an impenetrable whirlwind of space trash, aka Kessler Syndrome.
on one hand, I imagine you'd rack things up so the whole rack/etc moves as one into space, OTOH there's still movement and things "shaking loose" plus the vibration, acceleration of the flight and loss of gravity...
Perhaps the server would be immersed in a thermally conductive resin to avoid parts shaking loose? If the thermals are taken care of by fixed heat pipes and external radiators - non thermally conductive resins could be used.
And at sufficient scale, once you plan for that it means you can massively simplify the servers. The amount of waste a sever case suitable for hot-swapping drives adds if you're not actually going to use the capability is massive.
> The company only lost six of the 855 submerged servers versus the eight servers that needed replacement (from the total of 135) on the parallel experiment Microsoft ran on land. It equates to a 0.7% loss in the sea versus 5.9% on land.
6/855 servers over 6 years is nothing. You'd simply re-launch the whole thing in 6 years (with advances in hardware anyways) and you'd call it a day. Just route around the bad servers. Add a bit more redundancy in your scheme. Plan for 10% to fail.
That being said, it's a complete bonkers proposal until they figure out the big problems, like cooling, power, and so on.
Underwater pods are the polar opposite of space in terms of failure risks. They don't require a rocket launch to get there, and they further insulate the servers from radiation compared to operating on the surface of the Earth, rather than increasing exposure.
(Also, much easier to cool.)
In this case, I see no reason to perform any replacements of any kind. Proper networked serial port and power controls would allow maintenance for firmware/software issues.
My feeling is that, a bit like starlink, you would just deprecate failed hardware, rather than bother with all the moving parts to replace faulty ram.
Does mean your comms and OOB tools need to be better than the average american colo provider but I would hope that would be a given.
And once you remove all the moving parts, you just fill the whole thing with oil rather than air and let heat transfer more smoothly to the radiators.
Not sure this is such a great idea.
Repair robots
Enough air between servers to allow robots to access and replace componentry.
Spare componentry.
An eject/return system.
Heatpipes from every server to the radiators.
Second: you still need radiators to dissipate heat that is in oil somehow.
On Earth we have skeleton crews maintain large datacenters. If the cost of mass to orbit is 100x cheaper, it’s not that absurd to have an on-call rotation of humans to maintain the space datacenter and install parts shipped on space FedEx or whatever we have in the future.
Treat each maintenance trip like an EVA (extra vehicular activity) and bring your life support with you.
Consider that we've been at the point where layers of monitoring & lockout systems are required to ensure no humans get caught in hot spots, which can surpass 100C, for quite some time now.
It's all contingent on a factor of 100-1000x reduction in launch costs, and a lot of the objections to the idea don't really engage with that concept. That's a cost comparable to air travel (both air freight and passenger travel).
(Especially irritating is the continued assertion that thermal radiation is really hard, and not like something that every satellite already seems to deal with just fine, with a radiator surface much smaller than the solar array.)
It is really fucking hard when you have 40MW of heat being generated that you somehow have to get rid of.
(And of course, the mostly reusable Falcon 9 is launching far more mass to orbit than the rest of the world combined, launching about 150 times per year. No one yet has managed to field a similarly highly reusable orbital rocket booster since Falcon 9 was first recovered about 10 years ago in 2015).
I won't say it's a good idea, but it's a fun way to get rid of e-waste (I envision this as a sort of old persons home for parted out supercomptuers)
It just seems funny, I recall when servers started getting more energy dense it was a revelation to many computer folks that safe operating temps in a datacenter should be quite high.
I’d imagine operating in space has lots of revelations in store. It’s a fascinating idea with big potential impact… but I wouldn’t expect this investment to pay out!
Also, making something suitable for humans means having lots of empty space where the human can walk around (or float around, rather, since we're talking about space).
I agree that it may be best to avoid needing the space and facilities for a human being in the satellite. Fire and forget. Launch it further into space instead of back to earth for a decommission. People can salvage the materials later.
This effect can be somehow overcome by exercising while in space but it's not perfect even with the insane amount of medical monitoring the guys up there receive.
It's theoretically possible for sure, but we've never done that in practice and it's far from trivial.
Are there any unique use-cases waiting to be unleashed?
Keep in mind economics is all about allocation of scarce resources with alternative uses.