You can go another step - a FFN can be simulated on a Turing machine, thus it just exemplifies the incredible semantical power of the Turing machine model of computation. (in fact you don't even need a Turing machine, since there is no looping in one forward pass).
In theory you can run a huge FFN on the tiniest Turing machine, in practice it's much better to run a Transformer on the latest NVIDIA hardware. Or as they say "quantity (performance) has a quality all its own"
There is also the case for Markov chains being theoretically able to do these if tuned well. Or even SAT problem.