upvote
parquet is optimized for storage and compresses well (=> smaller files)

feather is optimized for fast reading

reply
Given the cost of storage is getting cheaper, wouldn't most firms want to use feather for analytic performance? But everyone uses parquet.
reply
Storage getting cheaper did not really reach the cloud providers and for self-hosting it has recently gotten even more expensive due to AI bs.
reply
You can, still, gain a lot of performance by doing less I/O.
reply
What people have done in the face of cheaper storage is store more data.
reply
And now there's Lance! https://lance.org/
reply
reply
I read that. But afaik, feather format is stable now. Hence my confusion. I use parquet at work a lot, where we store a lot of time series financial data. We like it. Creating the Parquet data is a pain since it's not append-able.
reply
Generally Parquet files are combined in an LSM style, compacting smaller files into larger ones. Parquet isn't really meant for the "journal" of level-0 append-one-record style storage, it's meant for the levels that follow.
reply
So feather for journaling and parquet for long term processing?
reply
You basically can't do row by row appends to any columnar format stored in a single file. You could kludge around it by allocating arenas inside the file but that's still a huge write amplification, instead of writing a row in a single block you'd have to write a block per column.
reply
Have you considered something like iceberg tables?
reply