upvote
I’m not saying it doesn’t have flaws, but the tidyverse is still the most coherent and functional ML/stat computing ecosystem I’ve ever used. R packages outside of the tidyverse can get pretty gnarly. Even the R stdlib is usually considered to be inconsistent and riddled with legacy cruft.
reply
I would be in minority. But, I don't like tidyverse ecosystem. I prefer data.table for most of my uses.
reply
Data.table is just so much faster, and the sql-like stntax is easier tonunderstand
reply
I can never remember the data.table syntax, every time I use it I have to re-learn it. It doesn't feel very SQL to me either. There is an interface to use tidy syntax on data.tables and get's you 90% of the speed.
reply
The core of the problem is that the tidyverse is trying to turn R into a user-friendly real-time calculator, rather than a tool for stable, deterministic, and literate data analysis.
reply
"The real problem with this package is that it makes things easy, instead of hard"
reply
That’s a rather glib interpretation of what I said.

I’m being rather charitable when I call the tidyverse “user-friendly”. That might be the goal of tidyverse devs, but it hasn’t been the reality since inception.

The point is even if we assume it is or will eventually become user-friendly, are we willing to accept the trade-offs that come with that?

reply
I think the users have clearly delivered the verdict that it is friendlier than base R. Admittedly a low bar. Non R users, which do you prefer:

    foo[foo$bar == “baz”,]

    foo |> filter(bar == “baz”)

?
reply
It's certainly quite pleasant to work with...but I would rather use sql for etl, the backend be whatever it needs to be...

The real world data transformations can get gnarly very quickly and sql is the perfect common debiminator compared to dplyr which is still niche...

How do you feel about polars?

reply
I’m a big fan of Polars. It’s really fast and memory efficient. With the lazy streaming functionality, I’ve been able to easily process 1 Tb+ data on a single machine (you do have to be careful to not do any operation that would cause the whole DF to materialize in that case).

It’s certainly miles better than Pandas, which has a terrible API in addition to being comically inefficient. In my group, we generally use it for any new work, and have also swapped out pandas for polars in critical spots of our existing code - the latter giving a huge benefit relative to the amount of work it took.

I largely agree with you on SQL being the common denominator, but there are some things that are just awkward in SQL, and much easier to do in Python or other general purpose language.

reply
I agree. There has never been anything in tidyverse I couldn't do in base R. Usually a lot cleaner in base R. People complain about plotting in base R, really? It is just a function call with supplied arguments. It is super straight forward. Follow the vignette.

People also saying python is better don't realize that R is basically like having pandas in the standard library. I don't think there is a better language for wrangling tabular data to be honest.

reply
I couldn’t disagree more. The base packages are a complete mess. If R was subset to only the tidyverse 5 years ago then it wouldn’t have lost so much ground to Python in nearly all fields.

Posit is obviously the only organization with the pull to do that, and I feel like they got pulled in 10 directions during the move to AI and trying to also support Python. R Shiny is dead too which sucks because reflex.dev just copied them and ate their lunch in 3 months.

reply
The proof is in the pudding. Every single grad student of mine that was brought up on the tidyverse produces gigantic R markdown files with 20 imports to accomplish something that would be shorter and much much easier to understand (and review!) with a base package or with one of a small number of packages (box, data.table) designed by people who understand programming.

Not to mention the ridiculous styling/formatting of most tidyverse users, which Wickham and others seem to promote. One of the reasons R has lost ground to other languages recently is that most R code these days is ugly

reply
Data.table is a masterclass in bad API design. Its lack of success despite its technical merits is entirely of their own doing.
reply
That was always my struggle w tidyverse vs base mastery. From the looney tunes cartoon of the road runner vs the coyote, the coyote used tidyverse and the road runner used base R.
reply
> The proof is in the pudding. Every single grad student of mine that was brought up on the tidyverse produces gigantic R markdown files with 20 imports to accomplish something that would be shorter and much much easier to understand (and review!) with a base package or with one of a small number of packages (box, data.table) designed by people who understand programming.

The fact that young people are producing sub-optimal code (in terms of whatever optimization criteria you are choosing--here, it sounds like terseness) is not strong evidence that a particular software ecosystem (tidyverse) is flawed. Young people producing bad code is not surprising. They're your grad students, mentor them, and maybe they'll adapt to your ways of thinking. Or not.

> One of the reasons R has lost ground to other languages recently is that most R code these days is ugly

Citation needed, surely. The fact that this article is about an increase in the number of CRAN submissions and pseudo-quantitative indices like the TIOBE index show R's slice of the pie is growing provides evidence to the contrary.

reply
> Young people producing bad code is not surprising. They're your grad students, mentor them, and maybe they'll adapt to your ways of thinking. Or not.

You’re right, mentorship is key and I do my best to suggest better practices. They are often quite happy to find out they can do more with less and can forget having to remember multiple additional syntaxes (looking at you “ggplot2”).

I somewhat understand why R instructors lean towards the tidyverse - Wickham’s group produces a ton of tutorials and workbooks, so it’s easy to just point students there - but it has led to entire cohorts of people producing poor code

reply
For doing "more with less" in graphics, I would rather learn a unique syntax for a package that is based on the grammar of graphics (ggplot2) than use a package with standard syntax and some other foundation.
reply
Good you find value in that framework, but it doesn’t seem like a useful starting point for first time R learners interested in plotting and exploring their data. I have a colleague that integrates ggplot2 and other tidyverse packages into their undergraduate classes and they struggle quite a bit with creating basic plots since they now have to learn two things instead of one.
reply
Python is just such a good Swiss army knife and it's never a waste to learn: you can do data science and you can do almost anything else. It's the BASIC of the 21st century.
reply
Swiss army knife with nothing actually attached to it, just the empty frame of the knife. You need to import so many aspects.

I mean one example is people routinely reaching for pandas. Pandas is basically just replicating base R data wrangling syntax.

reply