Through this lens, you can view an organism as an hierarchical collection of various models of its environment at different scales. As the organism specializes further, its model of the world can simplify, and it is free to explore simplification of some internal structures so that others can become more optimized for the more dynamic parts of the model which may need realtime updates, such as recognizing and tracking fast-moving prey. Various positive feedback loops result, and drive evolution.
It all comes back to the amount of free energy needed to predict the next state. In the information-theoretic sense, regularity lowers the uncertainty of the next relevant state given the system’s model. Specialization is what occurs when the system transforms that lower uncertainty into structure. This saves energy because the system has less ambiguity to resolve in realtime. But in the case of most biological organisms, it takes tens to millions of years for an organism's structure to react to its environment, so the cost of these "cheap reads" (metabolically speaking) on the model is an extraordinarily long, expensive and nondeterministic write process to update the model.
If an organism specifically allocates enough free energy to allowing rapid mutation while still constraining bad evolutionary paths, while metabolically expensive it can lead to organic evolution on the span of literally just decades:
https://www.nationalgeographic.com/animals/article/lizard-ev...
> The new habitat once had its own healthy population of lizards, which were less aggressive than the new implants, Irschick said. The new species wiped out the indigenous lizard populations
> Researchers found that the lizards developed cecal valves—muscles between the large and small intestine—that slowed down food digestion in fermenting chambers, which allowed their bodies to process the vegetation's cellulose into volatile fatty acids.
> The rapid physical evolution also sparked changes in the lizard's social and behavioral structure, he said. For one, the plentiful food sources allowed for easier reproduction and a denser population.
> The lizard also dropped some of its territorial defenses
The lizard not only developed a new organ to help it eat the local vegetation, but it exploited the regularity of dominion to reduce metabolic energy which was previously allocated for modelling an environment exhibiting territorial pressure from competing species.
To technically expand a bit on AI:
Any regularity in an environment which an embedded system can detect but fails to exploit represents an amount of excess free energy in the organism, distributed over itself, its group, its species, etc. depending on what types of systems and scales you choose to model.
There are parallels in information theory: any recognizable patterns/relationships within a compressed message represents excess entropy (the average uncertainty of future states), since that regularity was not exploited during compression and remained in the compressed structure. This means that a perfectly compressed message is functionally indistinguishable from random noise.
You can view weights in an AI model through the same lens: The weights represent "knowledge" of the environment the model has been exposed to. The model is designed to correctly predict future states, and thus "learning" is effectively the compression of a full model of the environment, which is more efficient to traverse than the uncompressed model. A perfectly learned environment minimizes uncertainty and should translate to weights that have no discernible patterns and thus are also functionally indistinguishable from random noise, void of any regularity.
Some level of "compression" of the local environment is required for any stable embedded system, or else the energy required to continually stabilize the system would require an equal amount of energy present as that in all of the universe, because the system would become a perfect copy of the very environment it is embedded within. This is obviously thermodynamically prohibitive.
Hopefully this helps make the relationship between structure, knowledge, information and uncertainty a lot more intuitive.
As a bonus, consider Fabrice Bellard's ts_zip, a great showcase on how knowledge and compression are related.
ts_zip compresses text at record efficiency (at the cost of magnitudes more memory and compute, nothing is free)
Previous attempts at text compression all purely relied on character-level patterns and semantics, syntactical structure, etc., maybe with some heuristic tweaking here and there.
That got us far, but LLMs do something never achieved before, which is to incorporate relationships beyond the surface: not just placement of characters, n-grams or words, but the actual meaning behind them, and large-scale correlations with other words or tokens across vast context windows.
The LLM actually becomes a world model with enough size and training, and thus we are able to use every fact we know about everything to compress text. If we're speaking about biology for example, that constrains the probabilities of what the most likely word might be after a given prefix. Or if the context is constrained to a specific historical period.
All of these regularities can be leveraged, at the cost of a lot of energy, in order to create compressed text that gets arbitrarily close to looking like completely random noise (actually verifying this would require infinite energy though, per Kolmogorov).
The catch is, such systems are specialized and depend on the regularity of cheap, widely-available energy networks and consumer access to cheap compute. Take that away, and it becomes ill-suited vs just using bzip. I mean, even now, bzip is a better choice when considering energy tradeoffs. And ts_zip in particular is specialized to the point of only working with text and not arbitrary byte streams.