upvote
It depends on your task. In analytics where you need to scan lots of data points within few columns, then columnar storage is very much the best. But for transactional workloads where you have to deal with specific entities, row based would be more advantageous. There are hybrid systems that try to be both at the same time but in my experience they end not doing either very well.
reply
Some day we'll get CREATE TABLE ... ( ... STORAGE ORDER COLUMN MAJOR) to have our transactional cake on the tables that need it and eat our analytics cake on the tables that need that.

But until then, separate tools for separate purposes isn't a bad place to be when those tools are both fantastic.

reply
Often used to be referred to as HTAP, and yeah in most data engineering its moving things from OLTP to OLAP forms, and OLAP pretty much always benefit from columnar compression for aggregations and rollups.
reply
BTW, columnar is very similar to struct of arrays (SOA) and some of the reasons it works well overlap with SOA.
reply
compression is a side effect but not really the goal. To simplify, analytical queries often filter on a specific column value, and if these are laid out contiguously it makes disk-level reads much faster than rows that would involve read-skip-read-etc. In transactional systems data is typically written as rows though, so that's likely slower in a columnar system. As a general rule, heavy read workflows with known access patterns is going to benefit from a columnar layout.
reply
Those three things you mentioned kind of live in the same niche - offline data storage and querying. In that world yes everything has become columnar since it’s just better. Row-oriented is still the solution for online streaming use cases.
reply