undefined

points

[-]

The pandas API is awful, but it's kind of interesting why. It was started as a financial time series manipulation library ('panels') in a hedge fund and a lot of the quirks come from that. For example the unique obsession with the 'index' - functions seemingly randomly returning dataframes with column data as the index, or having to write index=False every single time you write to disk, or it appending the index to the Series numpy data leading to incredibly confusing bugs. That comes from the assumption that there is almost always a meaningful index (timestamps).

by gwerbin1 hours ago|

parent|

[-]

> The pandas API is awful

I hate to be the "you're holding it wrong" guy but 90% of "Pandas bad!" posts I find are either outright misinformed or mischaracterizing one person's particular opinion as some kind of common truth. This one is both!

> That comes from the assumption that there is almost always a meaningful index (timestamps)

The index can be literally any unique row label or ID. It's idiosyncratic among "data frames" (SQL has no equivalent concept, and the R community has disowned theirs), but it's really not such a crazy thing to have row labels built into your data table. Excel supports this in several different ways (frozen columns, VLOOKUP) and users expect it in just about any table-oriented GUI tool.

> having to write index=False every single time you write to disk

If you're actually using the index as it's meant to be used, you'd see why this isn't the default setting.

> functions seemingly randomly returning dataframes with column data as the index

I assume you're talking about the behavior of .groupby() and .rolling()? It's never been random. Under-documented and hard to reason about group_keys= and related options, yes. But not random.

> appending the index to the Series numpy data leading to incredibly confusing bugs

I've been using Pandas professionally almost daily since 2015 and I have no idea what this means.

by _diyar9 minutes ago|

parent|

[-]

I think the commenter you are replying to might well understand these nuances. The point is not that Pandas is inscrutable, but instead that it‘s annoying to use in many common use-cases.

by bbkane8 hours ago|

prev|

[-]

Check out polars- I find it much more intuitive than pandas as it looks closer to SQL (and I learned SQL first). Maybe you'll feel the same way!

by Lyngbakr6 hours ago|

parent|

[-]

Agreed — I much prefer polars, too. IIRC the latest major version of pandas even introduced some polars-style syntax.

by Patient06 hours ago|

parent|

[-]

which makes sense because I believe that polars was written by the same guy that did pandas (hence the name - panda and polar are bears)

by rich_sasha6 hours ago|

parent|

prev|

[-]

I've looked at Polars. My sense is that Pandas is an interactive data analysis library poorly suited to production uses, and Polars is the other way around. Seemed quite verbose for example. Sometimes doing `series["2026"]` is exactly the right thing to type.

by entropicdrifter5 hours ago|

parent|

[-]

You can do that in Polars, too