You can generally reduce the problem to a finite alphabet by taking the finite subset that actually appears in the input.
If you have an unbounded length then you can make sorting O(l n) where `l` is a bound on the lengths of your input. It's still linear in n, and also better than the O(l n logn) you would with traditional comparison based algorithms once you factor in the O(l) complexity of the comparison function for such elements.
If you don’t have large numbers of repeats of each element, then l needs to scale like O(log n), so O(l * n) is at least O(n log n).
Fundamentally, what’s going on here is that switching between computation models can easily add and remove log factors.
In order to have n/10 unique elements, you need to be able to construct n/10 different strings, which means that L needs to be at least log_(base = how many distinct symbols you have)(n/10), which is O(log n). So you have L * n = O(n log n) symbols to write down, and even reading the input takes time O(n log n).
As a programmer, it's very very easy to think "64-bit integers can encode numbers up to 2^64, and 2^64 is HUUUUGE, so I'll imagine that my variables can store any integer". But asymptotic complexity is all about what happens when inputs get arbitrarily large, and your favorite computer's registers and memory cells cannot store arbitrarily large values, and you end up with extra factors of log n that you need to deal with.
P.S. For fun, you can try to extend the above analysis to the case where the number of unique elements is sublinear in the number of elements. The argument almost carries straight through if there are at least n^c unique elements for 0 < c < 1 (as the c turns into a constant factor when you take the log), but there's room to quibble: if the number of unique elements is sublinear in n, one might argue that one could write down a complete representation of the input and especially the sorted output in space that is sublinear in L * n. So then the problem would need to be defined a bit more carefully, for example by specifying the the input format is literally just a delimited list of the element values in input order.
I guess you mean "at least O(n*log(n))".
You can generally sort any array in constant time by taking that constant to be the time it takes to sort the array using bubble sort.
That said the finite alphabet and bounded length requirements can be softened a bit. Even for general sorting algorithms.
I mean, for the kind of lexicographic sotable data we're talking about you can basically pick a convenient alphabet size without cost.
And unbounded length is not that big an obstruction. Sure you are going to need O(n log(n)) comparisons. But you can't compare data of unbounded length in constant time anyway. In the end you end up taking an amount of time that is at least proportional to the amount of data, which is optimal up to a constant factor. And if you fiddle with radix sort enough you can get it within something similar.
Basic ASCII strings and tuples aren't that big an obstruction. Fractions are more complicated.
Really the O(n log(n)) for comparison based sorts and O(N) for radix sort mean something different. One is the number of comparisons to the number of elements, and the other closer to the number of operations per amount of data. Though that assumes O(1) swaps, which is technically incorrect for data that doesn't fit a 64 bit computer.
The transdichotomous model looks interesting.
The unstated implication is that the theory tracks with real world behavior. This is more or less the case for simple problems: your O(n^2) algorithm won't scale, your database query without an index will take forever and so on, it's definitely a useful and high-signal way of thinking about computational problems.
Obviously modern hardware isn't much like a 70s machine and things like "all memory accesses are O(1)" are so wrong it's not even funny.
It's a pure thought experiment with no possible counterfactual, but I think if you tried to develop basic CS theory from a purely mathematical angle (e.g. consider a machine defined so and so with basic operations costing 1, let's run with this ball without caring much about the real world) you'd naturally end up with some (but not all) of the concepts we're used to. For example, arrays and buffers are very natural. We need more space than our basic unit of memory can accomodate, let's use several in a row. Pointers also follow very nicely, and with them structures like linked lists. Other stuff like B-trees and to some extent hash-tables and friends very much less so, they're definitely "imported" from real-world usage.