undefined

points

[-]

See the later post testing a newer Mythos checkpoint, though: https://www.aisi.gov.uk/blog/how-fast-is-autonomous-ai-cyber...

by throwa3562628 hours ago|

parent|

[-]

Fair enough

by ACCount379 hours ago|

prev|

[-]

That claim keeps contradicted hard by other parties, who say Mythos beats 5.5 resoundingly on both autonomous search and discovery and creation of complex exploit chains.

There might be a harness difference, but also, this CTF-type benchmark might not capture the capability difference fully.

by nimchimpsky5 hours ago|

parent|

[-]

[dead]