undefined

points

by ralferoo17 hours ago |

comments

by throwa35626214 hours ago|

[-]

I think you are into something here.

I tried creating an emulator for CPU that is very well known but lacks working open source emulators.

Claude, Codex and Gemini were very good at starting something that looked great but all failed to reach a working product. They all ended up in a loop where fixing one issues caused something else to break and could never get out of it.

by 8 hours ago|

parent|

[-]

deleted

by stuaxo10 hours ago|

parent|

prev|

[-]

When they get stuck, I find adding debug that the model can access helps. + Sometimes you need to add something into the prompt to tell it to avoid some approach at a point.

by antirez14 hours ago|

parent|

prev|

[-]

Please tell me what CPU it is. I would give it a try. I doubt strongly a very well documented CPU can't be emulated by writing the code with modern AIs.

by dboreham5 hours ago|

parent|

prev|

[-]

Interesting. When I had Claude write a language transpiler it always checked that tests passed before declaring a feature ready for PR. There was never a case where it gave up on achieving that goal.

by PontifexMinimus16 hours ago|

prev|

[-]

> try using obscure CPUs

Better still invent a CPU instruction set, and get it to write an emulator for that instruction set in C.

Then invent a C-like HLL and get it to write a compiler from your HLL to your instruction set.

by abainbridge15 hours ago|

prev|

[-]

> try using obscure CPUs

I tried asking Gemini and ChatGPT, "What opcode has the value 0x3c on the Intel 8048?"

They were both wrong. The datasheet with the correct encodings is easily found online. And there are several correct open source emulators, eg MAME.

by bsoles8 hours ago|

parent|

[-]

Even on a specific STM microcontroller (STM32G031), the LLM tools invent non-existent registers and then apologize when I point it out. And conversely, they write code for an entire algorithm (CRC, for example) when hardware support already exists on the chip.

by stuaxo10 hours ago|

parent|

prev|

[-]

Think of "What opcode has the value 0x3c on the Intel 8048" as a PNG image but the LLM like a very compressed JPEG. It will only get a very approximate answer. But you can give it explicit tools to look up things.

by yomismoaqui14 hours ago|

parent|

prev|

[-]

If the LLM doesn't have a websearch tool your test doesn't make any sense.

An LLM by itself is like a lossy image of all text in the internet.

by deniska14 hours ago|

parent|

[-]

Just some more parameters, and it would overfit that specific PDF too.

by kamranjon13 hours ago|

prev|

[-]

I thought this part of the write-up was interesting:

"This is, I think, in contradiction with the idea that LLMs are memorizing the whole training set and uncompress what they have seen. LLMs can memorize certain over-represented documents and code, but while they can extract such verbatim parts of the code if prompted to do so, they don’t have a copy of everything they saw during the training set, nor they spontaneously emit copies of already seen code, in their normal operation."

Can't things basically get baked into the weights when trained on enough iterations, and isn't this the basis for a lot of plagiarism issues we saw with regards to code and literature? It seems like this is maybe downplaying the unattributed use of open source code when training these models.

by 14 hours ago|

prev|

[-]

deleted

by dist-epoch17 hours ago|

prev|

[-]

If you did that, comments would be "it's just a bit shuffle of the encodings, of course it can manage that, but how about we do totally random encodings..."

by ralferoo17 hours ago|

parent|

[-]

That's true, but I still think it'd be an interesting experiment to see how much it actually follows the specification vs how much it hallucinates by plagiarising from existing code.

Probably bonus points for telling it that you're emulating the well known ZX Spectrum and then describe something entire different and see whether it just treats that name as an arbitrary label, or whether it significantly influences its code generation.

But you're right of course, instruction decoding is a relatively small portion of a CPU that the differences would be quite limited if all the other details remained the same. That's why a completely hypothetical system is better.