I suspect my use cases are less complex than yours. Or maybe jq just fits the way I think for some reason.
I dream of a world in which all CLI tools produce and consume JSON and we use jq to glue them together. Sounds like that would be a nightmare for you.
Here's an example of my white whale, converting JSON arrays to TSV.
cat input.json | jq -S '(first|keys | map({key: ., value: .}) | from_entries), (.[])' | jq -r '[.[]] | @tsv' > out.tsv
<input.json jq -S -r '(first | keys) , (.[]| [.[]]) | @tsv'
<input.json # redir
jq
-S # sort
-r # raw string out
'
(first | keys) # header
, # comma is generator
(.[] | # loop input array and bind to .
[ # construct array
.[] # with items being the array of values of the bound object
])
| @tsv' # generator binds the above array to . and renders to tsvI knew cat was an anti-pattern, but I always thought it was so unreadable to redirect at the end
cat input.json | jq -r '(first | keys) as $cols | $cols, (.[] | [.[$cols[]]]) | @tsv'
That whole map and from entries throws it off. It's not a good use for what you're doing. tsv expects a bunch of arrays, whereas you're getting a bunch of objects (with the header also being one) and then converting them to arrays. That is an unnecessary step and makes it a little harder to understand.look at $cols | $cols
my brain says hmm that's a typo, clearly they meant ; instead of | because nothing is getting piped, we just have two separate statements. Surely the assignment "exhausts the pipeline" and we're only passing null downstream
the pipelining has some implicit contextual stuff going on that I have to arrive at by trial and error each time since it doesn't fit in my worldview while I'm doing other shell stuff
that world exists and mature (powershell)
It's not unique in that regard. 'sed' is Turing complete[1][2], but few people get farther than learning how to do a basic regex substitution.
[1] https://catonmat.net/proof-that-sed-is-turing-complete
[1] And arguably a Turing tarpit.
Closest I've come, if you're willing to overlook its verbosity and (lack of) speed, is actually PowerShell, if only because it's a bit nicer than Python or JavaScript for interactive use.
jq is the CLI I like the most, but sometimes even I struggled to understand the queries I wrote in the past. celq uses a more familiar language (CEL)
# Common Expression Language
The Common Expression Language (CEL) implements common
semantics for expression evaluation, enabling different
applications to more easily interoperate.
## Key Applications
- Security policy: organizations have complex infrastructure
and need common tooling to reason about the system as a whole
- Protocols: expressions are a useful data type and require
interoperability across programming languages and platforms.I think my personal preference for syntax would be Python’s. One day I want to try writing a query tool with https://github.com/pydantic/monty
$ cat package.json | dq 'Object.keys(data).slice(0, 5)'
[ "name", "type", "version", "scripts", "dependencies" ]
https://crespo.business/posts/dq-its-just-js/No more fiddling around trying to figure out the damn selector by trying to track the indentation level across a huge file. Also easy to pipe into fzf, then split on "=", trim, then pass to jq
I was working at lot with Rego (the DSL for Open Policy Agent) and realized it was actually a pretty nice syntax for jq type use cases.
Of course, this doesn't matter now, I just ask an LLM to make the query for me if it's so complex that I can't do it by hand within seconds.
this and other reasons is why I built: https://github.com/dhuan/dop
You don't have to use my implementation, you could easily write your own.
I’ve never seen AI “hallucinate” on basic data transformation tasks. If you tell it to convert JSON to YAML, that’s what you’re going to get. Most LLMs are probably using something like jq to do the conversion in the background anyway.
AI experts say AI models don’t hallucinate, they confabulate.
When I'm deciding what tool to use, my question is "does this need AI?", not "could AI solve this?" There's plenty of cases where its hard to write a deterministic script to do something, but if there is a deterministic option, why would you choose something that might give you the wrong answer? It's also more expensive.
The jq script or other script that an LLM generates is way easier to spot check than the output if you ask it to transform the data directly, and you can reuse it.
Because the input might be huge.
Because there is a risk of getting hallucinations in the output.
Isn't this obvious?
It's an important idea in computer science. Go and learn.