- report an error
- ignore
- materialize
Regardless, if you make it back up software that doesn’t give this level of control to users, and you make a change about which files you’re going to back up, you should probably be a lot more vocal with your users about the change. Vanishingly few people read release notes.
If it's not open-source, but the protocol is documented, see above.
If it's not open-source, and the protocol isn't documented, well... that makes the decision easy, doesn't it?
I've used enough Claude coded applications that I wouldn't trust that with a backup, unless it had extensive tests along with it.
I can audit and verify Claude's output. Code running at BackBlaze, not so much. Take some responsibility for your data. Rest assured, nobody else will.
What time I do have, I've been using to try and figure out photo libraries. Nothing is working the way I need it to. The providers are a mess of security restrictions and buggy software.
You're forgetting the third option:
You can remain blissfully unaware of it.
And you can read many accounts of the outcome of that strategy in this very thread.
In the last panel, Charlie Brown tells him, "You have to move your feet, too."
https://featureassets.gocomics.com/assets/5ee4a050f894013014...
Making the change without making it clear though, that's just awful. A clear recipe for catastrophic loss & drip drip drip of news in the vein of "How Backblaze Lost my Stuff"
To have this thing you’re not supposed to need to worry about affect whether your files got backed up is exactly the problem here. The goal is to back up your files, whether they’re in the cloud or not.
I sympathize with Backblaze’s problem with their file change monitor, but then they should considee implementing connectors for OneDrive, Dropbox, etc. and back up files directly from the cloud.
Or maybe just do what they do now, but WARN about that in HUGE RED LETTERS, in the website and the app, instead of burying it in an update note like weasels!
Hiding the network always ends in pain. But never goes out of style.
This particular example is a useful one for me to think about, because it's a version of hiding complexity in order to present a simple interface that I actually hate. (WYSIWYG editors is another one, for similar reasons: it always ends up being buggy and unpredictable.)
It would be reasonable to say that if you run the file sync in a mode that keeps everything locally, then Backblaze should be backing it up. Arguably they should even when not in that mode, but it'll churn files repeatedly as you stream files in and out of local storage with the cloud provider.
When you have a couple terabytes of data in that drive, is it acceptable to cycle all that data and use all that bandwidth and wear down your SSD at the same time?
Also, high number of small files is a problem for these services. I have a large font collection in my cloud account and oh boy, if I want to sync that thing, the whole thing proverbially overheats from all the queries it's sending.
I assume you don’t think that, so I’m curious, what would you propose positively?
Yes, I didn't technically said that.
> It sounds like you are arguing it is impossible to backup files in Dropbox in any reasonable way, and therefore nobody should backup their cloud files.
I don't argue neither, either.
What I said is with "on demand file download", traditional backup software faces a hard problem. However, there are better ways to do that, primary candidate being rclone.
You can register a new application ID for your rclone installation for your Google Drive and Dropbox accounts, and use rclone as a very efficient, rsync-like tool to backup your cloud storage. That's what I do.
I'm currently backing up my cloud storages to a local TrueNAS installation. rclone automatically hash-checks everything and downloads the changed ones. If you can mount Backblaze via FUSE or something similar, you can use rclone as an intelligent MITM agent to smartly pull from cloud and push to Backblaze.
Also, using RESTIC or Borg as a backup container is a good idea since they can deduplicate and/or only store the differences between the snapshots, saving tons of space in the process, plus encrypting things for good measure.
Use tools with straightforward, predictable semantics, like rclone, or synching, or restic/Borg. (Deduplication rules, too.)
But generally speaking, I'd agree with your sentiment.
[0]: https://www.backblaze.com/computer-backup/docs/supported-bac...
[1]: https://www.backblaze.com/docs/cloud-storage-about-backblaze...
So, in practice, you shouldn't have to download the whole remote drive when you do an incremental backup.
Interestingly, rclone supports that on many providers, but to be able to backblaze support that, it needs to integrate rclone, connect to the providers via that channel and request checks, which is messy, complicated, and computationally expensive. Even if we consider that you won't be hitting API rate limits on the cloud provider.
Sometimes modification time of a file which is not downloaded on computer A, but modified by computer B is not reflected immediately to computer A.
Henceforth, backup software running on computer A will think that the file has not been modified. This is a known problem in file synchronization. Also, some applications modifying the files revert or protect the mtime of the file for reasons. They are rare, but they're there.
The filesystem is a black box for these software since they don't know where a file resides. If you want control, you need to talk with every party, incl. the cloud provider, a-la rclone style.
And, as a separate note, they shouldn't be balking at the amount of data in a virtualized onedrive or dropbox either considering the user could get a many-terabyte hard drive for significantly less money.
The moment you call read() (or fopen() or your favorite function), the download will be triggered. It's a hook sitting between you and the file. You can't ignore it.
The only way to bypass it is to remount it over rclone or something and use "ls" and "lsd" functions to query filenames. Otherwise it'll download, and it's how it's expected to work.
And no, my server isn't behind cloudflare, primarily because I don't have $200 to throw at them to allow me to proxy arbitrary TCP/UDP ports through their network, and I don't know how to tell CF "Hey, only proxy this traffick but let me handle everything else" (assuming that's even possible given that the usual flow is to put your entire domain behind them).
Dropbox and onedrive can handle backblaze zipping through and opening many files. The risk is getting too many gigabytes at once, but that shouldn't happen because backblaze should only open enough for immediate upload. If it does happen it's very easily fixed.
If it overloads nextcloud by hitting too many files too fast, that's a legitimate issue but it's not what OP was worried about.
It shouldn't stress things to spend a couple weeks relaying a terabyte in small chunks. The most likely strain is on my upload bandwidth and yeah that's the cost of cloud backup, more ISPs need to improve upload.
> more ISPs need to improve upload.
I was yelling the same things to the void for the longest time, then I had a brilliant idea of reading the technical specs of the technology coming to my home.
Lo and behold, the numbers I got were the technical limits of the technology that I had at home (PON for the time being), and going higher would need a very large and expensive rewiring with new hardware and technology.
> the technical limits of the technology that I had at home (PON for the time being)
Isn't that usually symmetrical? Is yours not?
Depends on your device capacity and how much is in actual use. Wear leveling things also wear things while it moves things around.
> For something you'll need to do once or twice ever?
I don't know you, but my cloud storage is living, and even if it's not living, if the software can't smartly ignore files, it'll pull everything in, compare and pass without uploading, causing churns in every backup cycle.
> Isn't that usually symmetrical? Is yours not?
GPON (Gigabit PON) is asymmetric. Theoretical limits is 2.4Gbps down, 1.2Gbps up. I have 1000Mbit/75Mbit at home.
But you're probably changing less than 1% each day. And new changes are likely already in the cache, no need to download them.
> if the software can't smartly ignore files, it'll
Backblaze checks the modification date.
> GPON (Gigabit PON) is asymmetric. Theoretical limits is 2.4Gbps down, 1.2Gbps up. I have 1000Mbit/75Mbit at home.
2:1 is fine. If you're getting worse than 10:1 then that does sound like your ISP failed you?
Of course I'm not modifying 4TB on a cloud drive, every day.
Hell, if I open a directory of photos and my OS tries to pull exif data for each one, it would be wild if that caused those files to be fully downloaded and consume disk space.
After a backup, you’d go out to a coffee shop or on a plane only to find that the files in the synced folder you used yesterday, and expected to still be there, were not - but photos from ten years ago were available!
Today's Dropbox is a network file system with inscrutable cache behavior that seeks to hide from the users the information about which files are actually present. That makes it impossible for normal users to correctly reason about its behavior, to have correct expectations for what will be available offline or what the side effects of opening a file will be, and Backblaze is stuck trying to cope with a situation where there is no right answer.
Seems simple enough to do for Backblaze, no?
What i want is restores. The ability to restore anything from ideally any point back in time.
How that is achieved is not my concern.
Obviously Backblaze does not achieve that, today.
You're dodging the question. Wanting to ignore the side effects does not mean they won't affect you.
It would still happen with the first backup - or first connection of the cloud drive - though, which isn’t a great post-setup new user experience. It probably drove complaints and cancellations.
I feel like I’ve accidentally started defending the concept of not backing up these folders, which I didn’t really intend to. I’d also want these backed up. I’m just thinking out loud about the reasons the decision was made.
I am not aware of any evidence supporting this.
Your backup solution is not something you ever want to be the source of surprises!