For Lakebase and Neon, our architecture needs the caching layer regardless (what we call Pageservers). Performing reads from S3 directly is too slow so we reconstruct pages and keep them on an nvme server for faster querying. Changing the format on S3 to be Parquet effectively introduces no additional copies over our existing architecture
I'll give the article another read... Maybe I missed something. Thank you for the response! Really nice to be able to get info straight from people who work on the product
Historical data when pushed to s3 is in parquet. This happens async - not on the transaction hot path.
So older data below certain LSN is on s3 in parquet available to all analytics processing. Hot data is on page servers in page format for OLTP.
You can be smart in querying both representations for real time analytical queries