Examples: AI really wants to use Project Panama (FFM) and while that can be significantly faster than traditional OO approaches it is almost never the best. And I'm not taking about using deprecated Unsafe calls, I'm talking about using primative arrays being better for Vector/SIMD operations on large sets of data. NIO being better than FFM + mmap for file reading.
You can use AI to build something that is sometimes better than what someone without domain specific knowledge would develop but the gap between that and the industry expected solution is much more than 100 hours.
That is clearly false. I’m only familiar with Opus, but it quite regularly tells me that, and/or decides it needs to do research before answering.
If I instruct it to answer regardless, it generally turns out that it indeed didn’t know.
> Please carefully review (whatever it is) and list out the parts that have the most risk and uncertainty. Also, for each major claim or assumption can you list a few questions that come to mind? Rank those questions and ambiguities as: minor, moderate, or critical.
> Afterwards, review the (plan / design / document / implementation) again thoroughly under this new light and present your analysis as well as your confidence about each aspect.
There's a million variations on patterns like this. It can work surprisingly well.
You can also inject 1-2 key insights to guide the process. E.g. "I don't think X is completely correct because of A and B. We need to look into that and also see how it affects the rest of (whatever you are working on)."
"Ok let's look at these issues 1 at a time. Can you walk me through each one and help me think through how to address it"
And then it will usually give a few options for what to do for each one as well as a recommendation. The recommendation is often fairly decent, in which case I can just say "sounds good". Or maybe provide a small bit of color like: "sounds good but make sure to consider X".
Often we will have a side discussion about that particular issue until I'm satisfied. This happen more when I'm doing design / architectural / planning sessions with the AI. It can be as short or as long as it needs. And then we move on to the next one.
My main goal with these strategies is to help the AI get the relevant knowledge and expertise from my brain with as little effort as possible on my part. :D
A few other tactics:
- You can address multiple at once: "Item 3, 4, and 7 sound good, but lets work through the others together."
- Defer a discussion or issue until later: "Let's come back to item 2 or possibly save for that for a later session".
- Save the review notes / analysis / design sketch to a markdown doc to use in a future session. Or just as a reference to remember why something was done a certain way when I'm coming back to it. Can be useful to give to the AI for future related work as well.
- Send the content to a sub-agent for a detailed review and then discuss with the main agent.
The only way to make LLMs useful for now is to restrain their hallucinations as much as possible with evals, and these evals need to be very clear about what are the goal you're optimizing for.
See karpathy's work on the autoresearch agent and how it carry experiments, it might be useful for what you're doing.
Man, I wish this was true. I know a bunch of non tech people who just trusts random shit that chatgpt made up.
I had an architect tell me "ask chatgpt" when I asked her the difference between two industrial standard measures :)
We had politicians share LLM crap, researchers doing papers with hallucinated citations..
It's not just tech people.
It took a lot of back-and-forths with her to convince her that the numbers she uses every day are "Arabic numerals". Even the author of the spec could barely convince her -- it took a meeting with the Arabic translators (several different ones) to finally do it. Think about that for a minute. People won't believe subject matter experts over an LLM.
We're cooked.
It would help if you briefly specified the AI you are using here. There are wildly different results between using, say, an 8B open-weights LLM and Claude Opus 4.6.
You want low deterministic latency with sharp tails.
If all you care about is throughput then deep pipelines + lots of threads will get you there at the cost of latency.
You have to optimize your memory usage patterns to fit in CPU cache as much as possible which is something typical Java develops don't consider. I have a background in assembly and C.
I'd say it's slightly harder since there is a little bit of abstraction but most of the time the JIT will produce code as good as C compilers. It's also an niche that often considers any application running on a general purpose CPU to be slow. If you want industry leading speed you start building custom FPGAs.
how exactly you are passing data? You can pass some primitives without allocating them on heap. You can use some tiny subset of Java+standard library to write high performance code, but why would you do this instead of using Rust or C++?
Strangely this is one of the areas where I want to use project panama so I might re-implement some of the ring buffers constructs.
You allocate off heap memory and dump data into it. With modern Java classes like Arena, MemoryLayout, and VarHandle it's honestly a lot like C structs.
I answered "why" in another post in this thread.
Then things like the jit, by default, doing run time profiling and adaptation.
In terms of speed, memory usage, runtime characteristics... sure there are better options. But if java is good enough, or can be made good enough by writing the code correctly, why add another toolchain?
"writing code correctly" here means stripping 95% of lang capabilities, and writing in some other language which looks like C without structs (because they will be heap allocated with cross thread synchronization and GC overhead) and standard lib.
Its good enough for some tiny algo, but not good enough for anything serious.
those have low bar of performance, also they mostly became popular because of investments from Java hype, and rust didn't exist or had weak ecosystem at that time.
It wasn't a matter of choosing Java for HFT, it was a matter of selecting a project that was a good fit for Java and my personal knowledge. I was a Java instructor for Sun for over a decade, I authored a chunk of their Java curriculum. I wrote many of the concurrency questions in the certification exams. It's in my wheelhouse :)
My C and assembly is rusty at this point so I believe I can hit my performance goals with Java sooner than if I developed in more bare metal languages.
I've worked at places where ~5us was considered the fast path and tails were acceptable.
In my current role it's less than a microsecond packet in, packet out (excluding time to cross the bus to the NIC).
But arguably it's not true HFT today unless you're using FPGA or ASIC somewhere in your stack.
So yeah there's really no HFT anymore, it's just order execution, and some algo trades want more or less latency which merits varying levels of technical squeezing latency out of systems.
I don't work for a firm so don't get to play with FPGAs. I'm also not co-located in an exchange and using microwave towers for networking. I might never even have access to kernel networking bypass hardware (still hopeful about this one). Hardware optimization in my case will likely top out at CPU isolation for the hot path thread and a hosting provider in close proximity to the exchanges.
The real goal is a combination of eliminating as much slippage as possible, making some lower timeframe strategies possible and also having best class back testing performance for parameter grid searching and strategy discovery. I expect to sit between industry leading firms and typical retail systematic traders.
Even non-influencers are trying to exaggerate their LLM skills as a way to get hired or raise their status on LinkedIn. I rarely read the LinkedIn social feed but when I check mine it’s now filled with claims from people about going from idea to shipped product in N days (with a note at the bottom that they’re looking for a new job or available to consult with your company). Many of these posts come from people who were all in on crypto companies a few years ago.
The world really is changing but there’s a wave of influencers and trend followers trying to stake out their claims as leaders on this new frontier. They should be ignored if you want any realistic information.
I also think these exaggerated posts are causing a lot of people to miss out on the real progress that is happening. They see these obviously false exaggerations and think the opposite must be true, that LLMs don’t provide any benefit at all. This is creating a counter-wave of LLM deniers who think it’s just a fad that will be going away shortly. They’re diminishing in numbers but every LLM thread on HN attracts a few people who want to believe it’s all just temporary and we’re going back to the old ways in a couple years.
This always seems to be the pattern. "I vibe coded my product and shipped it in 96 hours!" OK, what's the product? Why haven't I heard of it? Why can't it replace the current software I'm using? So, you're looking for work? Why is nobody buying it?
Where is the Quicken replacement that was vibecoded and shipping today? Where are the vibecoded AAA games that are going to kill Fortnite? Where is the vibecoded Photoshop alternative? Heck, where is the vibecoded replacement for exim3 that I can deploy on my self hosted E-mail server? Where are all of the actual shipping vibecoded products that millions of users are using?
https://www.reddit.com/r/selfhosted/comments/1rckopd/huntarr...
One redditor security reviews a vibe coded project
The maintainer, instead of listening to the security researcher and accepting feedback about his development process, instead:
1. Denied the problem
2. Censored discussion of the problem
3. Banned the people calling out the problem
...and then when the security issues were posted more publicly and got traction...
4. Made the subreddit private
5. Wiped and deleted his account
6. Wiped and deleted the GitHub repo
7. Took the project's web site off the web
Absolutely wild and unhinged behavior.
That said, I have had some good experiences getting a few features from zero to working via LLMs and it's helped me find lots of bugs far easier than my own looking.
I can imagine a vibe coded todo app. I can also kind of imagine a vibe coded gIMP/Photoshop though it would still take several person years, prompting through each and every feature.
Claude Code and OpenClaw - they are vibecoded. And I believe more coming.
Also people are using CC for the cheap access to the model, otherwise they'd be using opencode.
I note that games are mostly art assets and things like level design, and players are already happy to instantly consign such products to the slop bin.
The whole thing is "market for lemons": app stores filling with dozens of indistinguishable clones of each product category will simply scare users off all of them.
This is ditto my observation. There seems to be a certain "type" of people like this. And it's not just people looking for work.
My guess is either they have super low critical thinking, a very cynical view of the world where lies and exaggeration are the only way to make it, or something more pathological (narcissism etc).
I have a relative who was late to crypto, late to drop shipping, late to carbon credits, but is now absolutely all-in on AI as his ticket out. It honestly depresses the hell out of me trying to talk to him because everything is about money and getting rich.
People like this don't care about underlying technologies or learning past the most basic surface level of understanding.
I was listening to an "expert" on a podcast earlier today up until the point where the interviewer asked how long his amazing new vibe-coded tooling has been in production, and the self-proclaimed expert replied "actually we have an all-hands meeting later today so I can brief the team and we will then start using the output..."
With all that said, today’s LLMs do seem so provide a little bit more value compared to the bit chain thing, for example OCR/.pdf parsing is I’d say a solved thing right now thanks to LLMs, which is nice.
Actually I had some terrible experiences when asking the agent to do something simple in our codebase (like, rename these files and fix build scripts and dependencies) but it spent much longer time than a human, because it kept running the full CI pipelines to check the problems after every attempted change.
A human would, for example, rely on the linter to detect basic issues, run a partial build on affected targets, etc. to save the time. But the agent probably doesn't have a sense of time elapsed.
Co-pilot said something about having too many rows returned and had some complex answer on how to reduce row count.
I just added a "LIMIT 100" which was more than adequate.
I can't say how many times the LLM-proposed solution to a jittery behavior is adding retries. At this point we have to be even more careful with controlling the implementation of things in the hot path.
I have to say though, giving Amp/Claude Code the Grafana MCP + read-only kubectl has saved me days worth of debugging. So there's definitely trade-offs!
That says something about how much some people care about this.
The test cases themselves becomes the foci - the LLM usually can't get them right.
No it is not.
There os no amount of testing that can fix a flawed design
Consider the the following: Unit, Integration, System, UAT, Smoke, Sanity, Regression, API Testing, Performance, Load, Stress, Soak, Scalability, Reliability, Recovery, Volume Testing, White Box Testing, Mutation Testing, SAST, Code Coverage, Control Flow, Penetration Testing, Vulnerability Scanning, DAST, Compliance (GDPR/HIPAA), Usability, Accessibility (a11y), Localization (L10n), Internationalization (i18n), A/B Testing, Chaos Engineering, Fault Injection, Disaster Recovery, Negative Testing, Fuzzing, Monkey Testing, Ad-hoc, Guerilla Testing, Error Guessing, Snapshot Testing, Pixel-Perfect Testing, Compatibility Testing, Canary Testing, Installation Testing, Alpha/Beta Testing...
...and I'm certain I've missed dozens of other test approaches.
Everyone thinks LLMs are good at the things they are bad at. In many cases they are still just giving “plausible” code that you don’t have the experience to accurately judge.
I have a lot of frontend app dev experience. Even modern tools (Claude w/Opus 4.6 and a decent Claude.md) will slip in unmaintainable slop in frontend changes. I catch cases multiple times a day in code review.
Not contradicting your broader point. Indeed, I think if you’ve spent years working on any topic, you quickly realize Claude needs human guidance for production quality code in that domain.
There’s also a big disconnect in terms of SDLC/workflow in some places. If we take at face value that writing code is now 10x faster, what about the other parts of the SDLC? Is your testing/PR process ready for 10x the velocity or is it going to fall apart?
What % of your SDLC was actually writing code? Maybe time to market is now ~18% faster because coding was previously 20% of the duration.
Absolutely. Tight feedback loops are essential to coding agents and you can’t run pipelines locally.
So instead of the 10x coder doing it, the 1x coder does it, but then that factor of 3x becomes 0.3x.
A lot of people are OCD pedants about stuff that can be solved with a linter (but can’t be bothered to implement one) or just “LGTM” everything. Neither provide value or feedback to help develop other devs.
This many be one of the best quotes on HN in a while.
For me it is far more than 10x, but I consider noobs by saying 10x instead of 20x or more.