Ten years of ClickHouse in open source

[-]

I had the same experience recently. Turns out ClickHouse would reduce our DB operations by 60%, remove the need for a TSDB, and reduce query times from ~300-500ms (and sometimes ~3s) to roughly ~75ms. Lastly, and most impressively we were already seeing a ridiculous level of compression and our storage cost benchmarks were reduced to the cost of S3. This took a $2-3M storage layer down to one measured in the single thousands per month.

ClickHouse is no panacea but if you understand how your data is accessed and thus how to arrange it you will get so many miles out of it.

by ashu14617 hours ago|

[-]

Same we are also stuck with ES wish could migrate to clickhouse but not able to do so because of the legacy load.

by cloudie784 hours ago|

[-]

What do you not like about ES?

by arunmu4 hours ago|

[-]

Were you using it for simple grep search or actually required advanced searching for eg: BM25. Clickhouse will only help you with grep like search from what I understand.

by drchaim2 hours ago|

[-]

Actually, there was no search, only on-the-fly aggregations/filtering over "big data". ES was kind of famous at the time, although not the best tool for that job.

afaik CH introduced FTS rececently.

by KebabCase4 hours ago|

[-]

Off topic: IMHO, everything that's been happening over the past few years is a self-fulfilling prophecy in no small part due to attitudes like this. Der Fuhrer did not have to put in much effort to convince the population when even those exposed to the outside world have met with enough suspicion and contempt to "know" (whether it's true or not) that most westerners have never seen us as equals, or even any sort of positive force.

Most probably don't even realize it. I see it as something similar to what racial minorities in the US go through: ask a random stranger on the street if he's racist, and he will honestly say no, even if he actually simply does not realize it, while it deeply affects how he sees the world.

I've also been seeing similar attitudes in relation to the Chinese. People avoiding excellent projects because they were written by some Chinese guy, including things where supply chain security is of no concern. Again apparently not realizing that these days a large part of the work on the Linux kernel is committed by paid employees of several large Chinese companies, all of them tightly intertwined with the government. Forget talking about who is building the hardware we all use.

Whatever, the internet is fracturing and balkanizing at full speed anyway, and the borders are slowly closing. Won't be long before we won't be able exchange anything non-destructive anymore. It was good while it lasted.

by budsniffer9524 hours ago|

[-]

>ask a random stranger on the street if he's racist, and he will honestly say no, even if he actually simply does not realize it

My lord you people are beyond patronizing.

When people refer to "the Chinese" or "the Russians", we are taking about the nation state, not the people. And there are legitimate security concerns. Whether we should be adversial is another question. But we are.

by leoqa4 hours ago|

[-]

I am wary of any supply chain attack and more so if the project is maintained by people with relationships in adversarial countries. The risk of exploitation outweighs the convenience.

by throw-the-towel3 hours ago|

[-]

I agree about the legitimate security concerns, but not with "we're talking about the nation state, not the people". If life has taught me anything in the last few years, it's that normies are incapable of making this distinction, at least in the Old World.

by throw9393kddif2 hours ago|

[-]

Google was created by some Russian guys. Current american president was Russian agent (that is why he won 2016 elections).

I think US is very tolerant when it comes to people from Russia.

by dionian1 hours ago|

[-]

yes and the proof was the spam email phoning home to russia. or, whatever other hoaxes they cooked up along the way. strangely most of them didnt make it into the trial where he was acquitted.

by goodmythical4 hours ago|

[-]

Given that american ignorance is a cultural thing (with many people deliberately electing the way grandpa did it) is it not kind of racist to generalize americans as unknowingly racist?

You said, "ask a random stranger...and he will honestly say no" not "ask a random stanger...and he will probably honestly say no".

Most of most people are racist, it's just different groups. Americans obviously have less distrust of americans, but then I am just as certain that there are many many humans who would proudly share their "dumb american" stories as if that is not every bit as prejudicial to those of us who do not fit the description as any other "weak french" or "commie russian" or "sister fucking indian" or whatever else.

by aleph_minus_one3 hours ago|

[-]

> but then I am just as certain that there are many many humans who would proudly share their "dumb american" stories as if that is not every bit as prejudicial to those of us who do not fit the description as any other "weak french" or "commie russian" or "sister fucking indian" or whatever else.

Racism is about race (i.e. phenotypical or genotypical properties), while being US-American/French/Russian/Indian/... is about nationality. So, these stories are not about racism (since they are not about race), but about prejudices against other nations/nationalities.

[-]

Can clickhouse to search? If not why did you seek to replace elastic with it

by sdairs4 hours ago|

Yes https://clickhouse.com/blog/clickhouse-full-text-search-obje...

[-]

[-]

Thanks

by ksajadi1 hours ago|

[-]

For our metrics and autoscaling engine at Cloud 66, we went through 5 iterations before settling on Clickhouse: 1. Redis 2. Cassandra 3. Handrolled: Ruby + RabbitMQ 4. Handrolled: Go + RabbitMQ 5. Clickhouse

Every time we reached some limit or huge optimization burdens that were unfeasible. Clickhouse has been rock solid for the past 4 years.

by himata41138 hours ago|

[-]

ClickHouse recently has been a breath of fresh air compared to using timescaledb for a long time. Although psql is the greatest there is and I really enjoyed the fact that I could rely on a single database system to run everything, when it came to migration maintenance and deployment it's really a pain and it also feels like development on timescaledb is a bit wishy washy with all the structural changes from version to version and it really feels like an alpha product sometimes.

by k_bx6 hours ago|

[-]

I was using TimescaleDB some very long time ago, things have changed quite a lot since (it's now even named differently).

In my current setup I was thinking on doing both: upgrading postgresql to timescaledb (to archive old data etc.), and to deploy ClickHouse in parallel. I'm still considering whether to go big on PeerDB to get ClickHouse mirror or just deploy it separately without additional fragility layer.

Would you not recommend using timescaledb at all? I definitely want to avoid alpha-quality software pain, since PostgreSQL is one of the most rock-solid parts of the stack at the moment.

by wkrp2 hours ago|

[-]

In my (minor) experience Timescale works fine. The developer experience is good and it is very convenient to be able to JOIN against your hypertables. My only real complaints are operational (no logical replication, normal postgres update complaints), but man Clickhouse is really slick. I wrote some small reviews of the two in my submission history if you want a bit more detail.

by himata41135 hours ago|

[-]

I would just run both and decomission the old one when a) all data is migrated, b) old data is no longer relevant and can be archived

by __s6 hours ago|

[-]

Worked on peerdb. If you're able to batch changes on your end & push to both postgres & clickhouse, do that. Only move to peerdb when you know you need cdc

[-]

Just looked up PeerDB expecting a Db as per its name.

But it’s a ETl tool. Stupid naming

by saisrirampur3 hours ago|

[-]

I know I know. Some people have loved it as it captures what it does (peering dbs) and some haven't because of the exact reason you called out. So we get it! :)

by rozenmd28 minutes ago|

[-]

I used to keep all of OnlineOrNot's timeseries data entirely in a hot postgres db with the rest of the relational data.

Used to take a few seconds to get a week's uptime data and do some useful analysis.

Since moving to Clickhouse I think I can grab a full year's data in around 200ms (probably less if I try optimising it). Still completely blows my mind everyday.

by adsharma3 hours ago|

[-]

It's interesting that the blog post places SQLite and Ladybird on the spectrum, but omits it's chief open source rival: DuckDB.

Agree that Level 3 is what inspires confidence. But we need to invent new business models to sustain in the era of vibe-coded databases.

by aaronblohowiak1 hours ago|

[-]

while ClickHouse can scale down to compete with duckdb, I dont believe (but happy to be corrected) that duckdb can scale up like ClickHouse can.

most people dont have that scale problems, but when you do...

by lazyasciiart7 hours ago|

[-]

> You can open a pull request as an experiment, without aiming for it to be merged - it will be tested with the same level of scrutiny as production releases. Found a new memory allocator, a new compression library, a new hash table, a data format, or a sorting algorithm? - bring it to ClickHouse, and it will expose it inside-out

Wow

by benjamkovi6 hours ago|

[-]

ClickHouse dev here, but this is true. ClickHouse contributed finding several bugs on our third-party libs (jemalloc, librdkafka for 100%, there much more, but I only worked on these), in linux kernel and basically everywhere. We have very rigorous fuzzers (yes, multiple fuzzers on multiple levels), running tests in insane number of configurations. I think the last number I heard a year ago is around 400 hours for a complete CI run for a single commit (not PR, but commit). So yeah, pretty insane, in the good way.

by tarun_anand20 minutes ago|

[-]

How does CH compare with the recent announcements made by Databricks Reyden...

by jaysh8 hours ago|

[-]

ClickHouse replacing Loki finally made our observability stack feel 'right'. It really is a powerhouse for logs and general analytical queries.

by oulipo27 hours ago|

[-]

How do you use it for visualization? Do you use ClickStack? or something else?

by jaysh5 hours ago|

[-]

Still via Grafana. I ran it side-by-side with Loki and despite trying to optimise Loki and using ClickHouse out of the box - it really was shocking how much faster ClickHouse was for every single query (e.g. in the last 12 hours give my the frequency of logs with a particular JSON event or even "find this log entry, then join back and find the number of times a different entry appears within the same correlation_id)

by CubsFan10604 hours ago|

[-]

What does the layout in click house look like? Do the input logs need to have a very defined structure?

by jaysh4 hours ago|

[-]

Not really, ClickHouse is super forgiving so you can do something like:

    CREATE TABLE default.events (
      `timestamp` DateTime
      `event` String -- e.g. 'product.updated' or empty/null
      `message` -- human readable message
      `raw` -- the raw message - this is very useful when pushing logs that aren't JSON - you just let the `event` be null and dump the entire message here
    )
    ENGINE = MergeTree
    PARTITION BY toDate(timestamp)
    ORDER BY (timestamp, event)
    TTL timestamp + toIntervalMonth(6)

ClickHouse is extremely performant even in the cases of e.g.: SELECT count(*) FROM `events` WHERE `raw` LIKE '%hello world%'

Of course, the more columns you splat out (e.g. like correlation_id, user_id, order_id, etc) the better you can index and expect those queries to perform but in general I don't bother outside the obvious core domain ones (exampled above), the performance is so good that unindexed queries are significantly faster than indexed queries in Loki. I have reached the point where I JSON extract on-the-fly for the WHERE clause with very large queries with no meaningful performance issues.

by oulipo22 hours ago|

[-]

Interesting, so you can bind a Clickhouse table as an extension to Grafana? Would you make a little Gist / post about it to show?

by jaysh1 hours ago|

[-]

You only need the plugin: https://clickhouse.com/docs/observability/grafana - then you get basically everything natively.

by aleks_me24 hours ago|

[-]

I have used SigNoz https://signoz.io/ for that

by jauntywundrkind2 hours ago|

[-]

Worth noting both hyperdx and maple too for other observability on clickhouse options. https://www.hyperdx.io/ https://maple.dev/

by usrme5 hours ago|

[-]

Same question here!

by jaysh4 hours ago|

[-]

Just replied to that question! Let me know if you have other questions.

by brunojppb6 hours ago|

1. https://open.spotify.com/episode/0TBKDUhO0KihBxEzZqnQx1

[-]

Clickhouse has been a game changer for some of the companies i have worked in the past. This reminds me of this podcast episode (1) from the Rust in Production pod about their Rust adoption.

by tdiff2 hours ago|

[-]

It is sad they are afraid to mention on the page that "data processing for a web analytics system ... similar to Google Analytics" was actually something used in Yandex.

by corentin881 hours ago|

[-]

Elsewhere on the page, they avoid mentioning Yandex. In fact, do they ever mention Yandex?

That’s probably not to advertise for that company. I don’t see why it’s sad?

by orta8 hours ago|

[-]

I've been using clickhouse for the last year for in-house analytics and found it a really pleasant experience, thanks for all the progress you've made

by dandellion8 hours ago|

[-]

Same. We replicated some data from Postgres, it was easy to set up, similar enough that the transition was trivial, and really good performance out of the box. One of those good "use the right tool for the job" experiences.

by dmix1 hours ago|

[-]

We use Clickhouse in a rails app for our customer facing dashboard analytics, logging, and datalake type stuff where Postgres is too heavy and expensive. The web admin panel they built is great and we’ve had solid performance.

by baq8 hours ago|

[-]

clickhouse is the low key amazing tech people are busy using instead of posting about. keep it up!

by spprashant6 hours ago|

[-]

If your data is too big for postgres, it seems like moving straight to Clickhouse is the best option. We have been through an whole array of distributed database technologies, and Clickhouse might be first one that doesn't have too many compromises.

[-]

What do you mean?

Postrgesql is a relational and row based db, ClickHouse is columnar

Clickhouse doesn’t replace postgresql:

by saisrirampur3 hours ago|

[-]

Sai from ClickHouse here. Totally with you here, ClickHouse isn't a replacement for Postgres. Most use-cases are co-existence - Postgres for OLTP and ClickHouse for OLAP, basically right tool for the right job situation. Both are purpose-built technologies with a similar OSS ethos/story. Btw on an interesting co-incidence, Postgres turned 30 this year and ClickHouse turned 10.

Above is exactly why we are embracing the Postgres + ClickHouse stack and are investing heavily to make workflows across both these DBs very easy for developers - PeerDB for native CDC, pg_clickhouse extension for querying CH from PG, pg_stat_ch for query PG observability from ClickHouse and more such are planned for future. And recently we also announced ClickHouse Managed Postgres which pacakages this entire stack as a fully managed service https://clickhouse.com/cloud/postgres

by spprashant3 hours ago|

[-]

This is a extremely common issue that happens in growing firms.

You start off with everything in Postgres, it makes the most sense. Soon you realize some tables are growing really huge - usually some sort of time-series or log data reaching 10TB+. You can no longer fit it in one node. You can try you luck with some sharding extensions, but they add complexity to upgrades.

In that case it makes total sense to move these large tables off Postgres, and I think Clickhouse is a straight up replacement here. You can still keep your relational heavy tables in Postgres.

Yes it affects you ability to cleanly join data, and guarantee 100% consistency. With some smart application code, and schema design, you can replace parts of Postgres with Clickhouse for the big data problem.

by gempir3 hours ago|

[-]

You can keep "columnar" data in a row based database like postgres, it's just more expensive. But with little data that's fine and reduces infrastructural complexity. When you reach too much data it gets to a point where you then actually want to use the correct database for your usecase.

by eklavya4 hours ago|

[-]

Not to mention ACID and CAP and all that. I use clickhouse AND postgres. Clickhouse is not a replacement for postgres at all.

by 8 hours ago|

[-]

deleted

by Talpur17 hours ago|

[-]

10 Years! quite a long journey, specailly observeability part is need of hour

by ddorian438 hours ago|

[-]

Clickhouse is *really* gatekeeping the "zero copy replication" where you store data on object-storage and have high availability from the open source version.

by orian5 hours ago|

[-]

This is the main driver for their cloud ;-)

by pepperoni_pizza5 hours ago|

[-]

I think that is just the nature of the open core business - but like most such businesses, they're not very clear about how that is what they are, pretending to be open source business instead.

https://en.wikipedia.org/wiki/ClickHouse

[-]

Vc funded with recent rounds so 10 years hasn’t been enough time to make money

by nvartolomei6 hours ago|

[-]

How? Have you tried contributing a reasonable implementation with test coverage and it was rejected?

by 7 hours ago|

[-]

deleted

by zuzululu2 hours ago|

[-]

what are you guys using it for other than collecting analytics?

by Edo912 hours ago|

[-]

[flagged]

by aleks_me24 hours ago|

[-]

[flagged]

by throwaway0123778 hours ago|

[-]

[dead]

by haeseong6 hours ago|

[-]

The query speed deserves the praise, but the JSON ingestion path has quiet footguns nobody mentions here. Every numeric column comes back as a string over JSONEachRow, so a forgotten Number() cast silently turns arithmetic into string concatenation, and with input_format_skip_unknown_fields enabled a single typo in a column name drops that field with no error at all. Worth wiring an assertion that inserts a row and reads it back into CI before trusting the dashboards.

by charrondev5 hours ago|

[-]

We’ve done our JSON ingestion by keeping a schema in the app for all the types we expect, and injecting the types into the query builder.

Then as needed we have materialized columns on our different tables.

by ignoramous5 hours ago|