Lately at work I've done C++ optimization tricks like inplace_map, inplace_string, placement new to inline map-like iterators inside a view adapter's iterators and putting that byte buffer as the first member of the class to not incur std::max_align_t padding with the other members. At a higher architecture level, I wrote a data model binding library that can serialize JSON, YAML and CBOR documents to an output iterator one byte at a time without incurring heap allocation in most cases.
This is because I work on an embedded system with 640 KiB of SRAM and given the sheer amount of run-time data it will have to handle and produce, I'm wary not only about heap usage, but also heap fragmentation.
AI will readily identify such tricks, it can even help implement them, but unless constrained otherwise AI will pick the most expedient solution that answers the question (note that I didn't say answers the problem).
With SOTA models it all depends on how you drive them.
The market is telling us that through increased hardware prices.
LLMs being very powerful means that we need to start being smarter about allocating resources. Should chat apps really eat up gigabytes of RAM and be entitled to cores, when we could use that for inference?
LLMs aren't even close to the level of knowledge distillation capacity a human has yet.