You specify properties of the design and the tooling tests the design in a variety of ways to see if it can violate those properties.
Let’s say you have a system that controls turn signals. You can specify properties that say things like lanes that cross each other can’t both have a green light at the same time or even within some period of time of each other.
The tooling can exhaustively check the design for this and surface code traces that violate it.
A big difference is that formal methods allow you to use the "for all" quantifier.
For example, you might write a unit test that says "foo('abc') returns a string with no trailing whitespace".
But with formal methods, you can prove that "for any input x, foo(x) returns a string with no trailing whitespace".
This is a trivial example, but you could imagine something more complicated, such as "for any program P, compile(P) has the same behavior as P".
Of course, you have to define what "has the same behavior as" means!
n = 15 341 178 777 673 149 429 167 740 440 969 249 338 310 889
I don't think you can catch it with any test suite.
Statically proven type systems let you do this in a way that each contributing piece is checked in advance against all the other pieces, guaranteeing not just "this test passes" but, in combination, "all these tests create a reasonable, coherent whole", and it disallows incoherent models from being implemented in the code. An example of this is like Rust's match, which requires complete coverage of all the possible branches, but writ large across an entire codebase.
You're right that it does nothing for you if you make a fundamental logic or theory error, it can't catch that kind of error, but you'd be surprised how much more robust it is than "merely" e.g. Haskell's type system + ad hoc functional testing + property testing - which I would consider a quite strong system overall, but formally proven e.g. cryptography is a much higher bar.