undefined

points

[-]

To rephrase crudely: "inline everything".

This is infeasible in most languages, but if your language and concise and expressive enough, it becomes possible again to a large degree.

I always think about how Arthur Whitney just really hates scrolling. Let alone 20 open files and chains of "jump to definition". When the whole program fits on page, all that vanishes. You navigate with eye movements.

by dayvigo2 days ago|

parent|

[-]

> To rephrase crudely: "inline everything".

Sounds a lot like what Forth does.

by DHRicoF3 days ago|

prev|

[-]

> k v⍪←↓⍉↑(2 0.33)(2 0.01)(3 0.92) ⍝ insert values > k{str[⍺] ⍵}⌸v ⍝ pretty print > k v⌿⍨←⊂k≠str⍳⊂'buggy' ⍝ deletion

I like your funny words. No, really, I should expend some time learning APL.

But your idea deeply resonate with my last weeks struggle.

I have a legacy python code with too much coupling, and every prior attempt to "improve things" went adding more abstraction over a plain wrong data model.

You can't infer, reading the code linearly, what methods mutate their input objects. Some do, some don't. Sometimes the same input argument is returned even without mutation.

I would prefer some magic string that could be analyzed and understood than this sea of indirection with factories returning different calculators that in some instances they don't even share the same interface.

Sorry for the rant.

by rak15072 days ago|

prev|

[-]

The problem is that this isn't a "hashmap" in any meaningful sense, because all the operations are O(n).

by xelxebar2 days ago|

parent|

[-]

Hey, I know you.

> all the operations are O(n)

Not true, fortunately!

The cat-gets pattern for insertion is clearly just an O(1) append. Similar for the replicate-gets in the deletion.

Finding the deletion mask might be O(n) or O(log n), depending[0] on the size of your search space.

Key is effectively just a radix sort, which is indeed O(n) on the keys, but a trad hashmap doesn't get any better in this case.

[0]:https://help.dyalog.com/19.0/#Language/Defined%20Functions%2...

by rak15072 days ago|

parent|

[-]

The append is the only thing that is O(1), finding the deletion mask is linear (≠ is linear, isn't it?) and the actual deletion is also linear (⌿ is also linear).

       ]runtime -repeat=1s 'v←k1 ⋄ v⌿⍨←v≠2'

 \* Benchmarking "v←k1 ⋄ v⌿⍨←v≠2", repeat=1s
 ┌──────────┬──────────────┐
 │          │(ms)          │
 ├──────────┼──────────────┤
 │CPU (avg):│0.008491847826│
 ├──────────┼──────────────┤
 │Elapsed:  │0.008466372283│
 └──────────┴──────────────┘
       ]runtime -repeat=1s 'v←k2 ⋄ v⌿⍨←v≠2'

 \* Benchmarking "v←k2 ⋄ v⌿⍨←v≠2", repeat=1s
 ┌──────────┬────────────┐
 │          │(ms)        │
 ├──────────┼────────────┤
 │CPU (avg):│0.8333333333│
 ├──────────┼────────────┤
 │Elapsed:  │0.83        │
 └──────────┴────────────┘

by TuringTest2 days ago|

prev|

[-]

> APL makes it really convenient to take a different tack. Instead of designing abstractions, we can carefully design our data to be easily operated on with simple expressions. Where you would normally see a library function or DSL term, this approach just uses primitives directly

But in doing that, aren't you simply moving the complexity of the abstractions from the functions code into the data structure? Sure you can use generic operators now, but then you need to carefully understand what the value pairs represent in terms of domain logic and how to properly maintain the correct structure in every operation. Someone reading the program for the same time will have just the same difficulties understanding what the code does, not in terms of primitives, but in terms of business domain meanings.

I mean there is an improvement in your approach, but I don't think it comes from putting the complexity at a different place, but because this way you get to see the code and the actual actual values at the same time.

I have the insight that what makes complex programming easy is this juxtaposition of runtime data and code operations visible together. That's why IDE tools have building better and better debuggers and inspectors to let you see what the program is doing at each step.

In that context, creating new good concise abstractions is a good thing, wether you abstract parts of the operations or parts of the data structure.

by xelxebar2 days ago|

parent|

[-]

You're absolutely right that this approach encodes domain complexity in the data, and I agree that's not the magic sauce per se.

Honestly, it's just really hard to convey how simple code can be. Imagine a nerual net inference engine in 30 straightforward lines of code and no libraries! [0] On one page you really can read off the overall algorithm, tradeoffs made, algorithmic complexity, and intended highlights.

Abstractions encourage hiding of information whereas subordination encourages keeping it implicit.

Are you familiar with Observability 2 and wide events? In this data-first model we effectively encode all program behavior into a queryable data structure. I'll often breakpoint something I'm working on and iteratively drill down into the data to get to the root of a problem. That's just so friggin cool, IMHO.

Abstraction almost always manages complexity by introducing some control flow barrier like an API call. Observing said control flow is a lot more indirect, though, either via statistical sampling or in flight observation. In contrast, imagine if you could simply throw SQL queries at your application and you get an idea of what I'm stumbling to convey here.

[0]:https://doi.org/10.1145/3589246.3595371

by skydhash2 days ago|

parent|

prev|

[-]

> But in doing that, aren't you simply moving the complexity of the abstractions from the functions code into the data structure?

The core idea behind most of functional programming and APL is that data don't change. Policies do. So you design a set of primitives that can represent any data types by combination, then build a standard library that combine and split them, as well as transform them into each other. Then you have everything you need for expressing policies.

Let's take music notation as an analogy. You only need the staff for pitch and the note-head for duration. And with those (plus other extra symbols), you can note down most music. You can move higher up with chords, but you don't really need to go much higher than that.

Data are always external to the business domains. Only policies are internal. Those policies need semantics on data, so we have additional policies that takes rough data and transform them into meaningful one. What Java, C#, and other do is fixing those semantics in the code and have the code revolve around them. But those labels are in fact transient in either the short or long term. What you really need is a good way to write policies and reasons about them.

I like Go because it lets you define data semantics well, but don't confine your policies.

> Someone reading the program for the same time will have just the same difficulties understanding what the code does, not in terms of primitives, but in terms of business domain meanings.

That's what naming and namespacing is there for. Also docstrings.

by jodrellblank2 days ago|

prev|

[-]

> "immediately accessible"

Most of your 'insert values' code is wrangling the data from what the interpreter lets you type into the form you need it; ⍪← do the appending which is what you want, ↓⍉↑()()() are input parsing and transform which you don't care about but have to do to get around APL limitations and the interpreter's input parsing limitations. 9/11ths of the symbols in that line are APL boilerplate. 81% noise.

Deletion with k v⌿⍨←⊂k≠str⍳⊂'buggy' too; enclosing 'buggy' for index to search for it as a single element in a nested array, keeping in mind that (⍴'a')≠(⍴'aa') in case you had a single character key. Making a Boolean array which is nothing to do with your problem domain - it's what you have to do to deal with APL not offering you the thing you want.

Saying "we can create a hash map of vector values" is misleading because there isn't any hashing going on. Your code doesn't check for duplicate keys. You can't choose the hash, can't tune the hash for speed vs. random distribution. The keys are appended in order which makes searching slower than a sorted array (and you can't sort array k without messing up its links to array v, same with using nub ∪ on it to make it a set) and at scale the array search becomes slow - even with Dyalog APL's bit-packing and SIMD accelerated searches behind the scenes - so there is a magic interpreter command (I-Beam 1500)[1] which marks an array for internal hashing for faster lookups. But remember the leaky internal abstractions because ⍪← isn't really appending; APL's model is nearly immutable, the array is streamed through and updated into a new array, so the hash has to work with that and you need to understand it and keep it in mind for the times it won't work and some code will invalidate the hash and end up slower.

----

Good ideas of language and tooling design such as "pit of success", "there's only one way to do it", "the first way that comes to mind should be the right way to do it", "different work should look different" are all missing from APL; code that looks workable often has some small internal detail why it isn't; there's no obvious way to do common things. Compare it with Python or C# or similar languages which have syntax for kv={'a':1, 'b':2} and it "just works", if you miss parens or miss a colon it looks clearly wrong, throws editor squigglie underlines, and gives compile time errors. There's many fewer trivial typos which look plausible and run error-free but screw things up. Make a mistake and get an error, and the error won't be LENGTH ERROR or DOMAIN ERROR it will be something mostly helpful. The map behaves how one would expect - deleting doesn't stream the entire dictionary through memory and make a new copy; there's no need to manually link indices between different arrays; failed key lookup gives null or helpful error, the key can be any reasonable data type in the language, etc.

APL implementations are reliant on magic interpreter functions for IO (⎕NGET, ⎕CSV, ⎕JSON and friends). Error handling is bad, logging is bad, debugging is bad, because an entire expression is executed as one thing - and with hooks and forks, one thing which can't easily be split into smaller pieces. e.g. if ↓⍉↑ is a valid transform that returns some data but it's not the transform you want, how do you find out why not? How do you find out what it is doing instead of what you thought it was doing? How do you tell that k v⌿⍨←⊂... needed that enclose on the boolean matrix? Trying without the enclose gets LENGTH ERROR, then what? The language and tooling give you minimal to zero help with any of this, you're left with experimentation - and that sucks because you can't actually type in the 2D arrays you want to work with and the code doesn't work for some mismatch of shape/nesting/enclosing/primitives that you don't understand, so experiments to understand what's happening don't work for the exact same reasons that you can't do the experiments for reasons you don't understand.

One needs a precise feel for all the different array shapes, why ⊂3 doesn't seem to do anything, why ⊆ and ⊂ are different, what things are a 1D array and what are a 2D array with one row or a boxed 1D array or a nested 1D array as a scalar inside a 2D array, or which things are working by scalar extension[3] before one can do much of anything including experimenting and learning how to do much of anything.

Where is the 'immediately accessible' pattern "if a key is in the dictionary, update it, if it's not there, add it"? In Python it's if/else and "key in map". In C# it's if/else and "map.Contains(key)". In your APL design we start with the deletion code and enclose the key, search for it in str to get an index-of and hope it only comes up once, then immediately we find branching is difficult. Do we use ⍣ with a boolean arg to choose whether it runs or not? Do we always append but do a boolean compress on the data to append something-or-nothing? Do we use dfns and guards? Do we switch to a trad-fn and use :if/:else? Do we make an array of functions and index which one to execute, or an array of names and index which one to eval ⍎? There's no immediate answer so instead of skipping over this trivial non-issue in any other language and on with the problem, we descend into "how to reimplement basic thing in APL" for branching.

> "What I find really nice about this approach is that each expression is no longer a black box, making it really natural to customize expressions for specific needs."

It's an argument Aaron Hsu has made, but it's like using Up-Goer 5 or speaking Toki Pona where you can't say "fire engine" but can say "car of man with job to stop fire".[4]

[1] https://docs.dyalog.com/latest/CheatSheet%20-%20I-Beams.pdf

[3] https://aplwiki.com/wiki/Scalar_extension

[4] https://xkcd.com/1133/

by xelxebar2 days ago|

parent|

[-]

Man, you bring up so many good, juicy points. I wish we could sit down and chat about them!

First off, just to be clear, I totally understand the friction points you bring up. APL's learning curve is crazy steep, and learning materials beyond the beginner level are nigh on nonexistent. It sucks, I agree.

That said, if you're able to push through the abyss, I can assure you that most of that friction drops away. It's clear that you're still thinking in terms of individual primitive operations, but with familiarity thing like k v⌿⍨← will just read as atomic operations in the same way that you read "laser" not "l-a-s-e-r".

More than that, though, the if/else struggle you mention really brings back memories. In practice it's really not an issue. We just ⍪←∪ to insert potentially new keys, str←str[x←⍋str] ⋄ k v⊇←⊂x if a sort is needed, etc. But we only do those when the operations on the data demand it. Otherwise there's no point in encoding irrelevant extra structure in the data.

It took me a good couple years of regular APL hacking for things to click, but I'm not really sure how to get across the feeling, so you'll just have to believe me or not, I guess.