undefined

upvote

points

by Tiberium16 hours ago |

upvote

by embedding-shape16 hours ago|

[-]

> I hope people realize that tools like caveman are mostly joke/prank projects

This seems to be a common thread in the LLM ecosystem; someone starts a project for shits and giggles, makes it public, most people get the joke, others think it's serious, author eventually tries to turn the joke project into a VC-funded business, some people are standing watching with the jaws open, the world moves on.

reply

upvote

by simonw16 hours ago|

[-]

I was convinced https://github.com/memvid/memvid was a joke until it turned out it wasn't.

reply

upvote

by embedding-shape15 hours ago|

[-]

To be fair, most of us looked at GPT1 and GPT2 as fun and unserious jokes, until it started putting together sentences that actually read like real text, I remember laughing with a group of friends about some early generated texts. Little did we know.

reply

upvote

by Alifatisk15 hours ago|

[-]

Are there any public records I can see from GPT1 and GPT2 output and how it was marketed?

reply

upvote

by embedding-shape15 hours ago|

[-]

HN submissions have a bunch of examples in them, but worth remembering they were released as "Look at this somewhat cool and potentially useful stuff" rather than what we see today, LLMs marketed as tools.

https://news.ycombinator.com/item?id=21454273 / https://news.ycombinator.com/item?id=19830042 - OpenAI Releases Largest GPT-2 Text Generation Model

HN search for GPT between 2018-2020, lots of results, lots of discussions: https://hn.algolia.com/?dateEnd=1577836800&dateRange=custom&...

reply

upvote

by Den_VR3 hours ago|

[-]

I still think of The Unreasonable Effectiveness of Recurrent Neural Networks and related writings.

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

reply

upvote

by dalemhurley3 hours ago|

[-]

Wild how many people were predicting the AI slop, but was dismissing it as unlikely beyond some trolls.

reply

upvote

by mlsu14 hours ago|

[-]

I was first made aware of GPT2 from reading Gwern -- "huh, that sounds interesting" -- but really didn't start really reading model output until I saw this subreddit:

https://www.reddit.com/r/SubSimulatorGPT2/

There is a companion Reddit, where real people discuss what the bots are posting:

https://www.reddit.com/r/SubSimulatorGPT2Meta/

You can dig around at some of the older posts in there.

reply

upvote

by walthamstow15 hours ago|

[-]

I don't think it was marketed as such, they were research projects. GPT-3 was the first to be sold via API

reply

upvote

by maplethorpe15 hours ago|

[-]

From a 2019 news article:

> New AI fake text generator may be too dangerous to release, say creators

> The Elon Musk-backed nonprofit company OpenAI declines to release research publicly for fear of misuse.

> OpenAI, an nonprofit research company backed by Elon Musk, Reid Hoffman, Sam Altman, and others, says its new AI model, called GPT2 is so good and the risk of malicious use so high that it is breaking from its normal practice of releasing the full research to the public in order to allow more time to discuss the ramifications of the technological breakthrough.

https://www.theguardian.com/technology/2019/feb/14/elon-musk...

reply

upvote

by ethbr114 hours ago|

[-]

Aka 'We cared about misuse right up until it became apparent that was profit to be had'

OpenAI sure speed ran the Google and Facebook 'Don't be evil' -> 'Optimize money' transition.

reply

upvote

by sfn4213 hours ago|

[-]

Or - making sensational statements gets attention. A dangerous tool is necessarily a powerful tool, so that statement is pretty much exactly what you'd say if you wanted to generate hype, make people excited and curious about your mysterious product that you won't let them use.

reply

upvote

by eric_h13 hours ago|

[-]

Much like what Anthropic very recently did re: Mythos

reply

upvote

by xpe9 hours ago|

[-]

Think about all the possible explanations carefully. Weight them based on the best information you have.

(I think the most likely explanation for Mythos is that it's asymmetrically a very big deal. Come to your own conclusions, but don't simply fall back on the "oh this fits the hype pattern" thought terminating cliché.)

Also be aware of what you want to see. If you want the world to fit your narrative, you're more likely construct explanations for that. (In my friend group at least, I feel like most fall prey to this, at least some of the time, including myself. These people are successful and intelligent by most measures.)

Then make a plan to become more disciplined about thinking clearly and probabilistically. Make it a system, not just something you do sometimes. I recommend the book "the Scout Mindset".

Concretely, if one hasn't spent a couple of quality hours really studying AI safety I think one is probably missing out. Dan Hendrycks has a great book.

reply

upvote

by wat1000015 hours ago|

[-]

You can run GPT2! Here's the medium model: https://huggingface.co/openai-community/gpt2-medium

I will now have it continue this comment:

I've been running gps for a long time, and I always liked that there was something in my pocket (and not just me). One day when driving to work on the highway with no GPS app installed, I noticed one of the drivers had gone out after 5 hours without looking. He never came back! What's up with this? So i thought it would be cool if a community can create an open source GPT2 application which will allow you not only to get around using your smartphone but also track how long you've been driving and use that data in the future for improving yourself...and I think everyone is pretty interested.

[Updated on July 20] I'll have this running from here, along with a few other features such as: - an update of my Google Maps app to take advantage it's GPS capabilities (it does not yet support driving directions) - GPT2 integration into your favorite web browser so you can access data straight from the dashboard without leaving any site! Here is what I got working.

[Updated on July 20]

reply

upvote

by fancyfredbot10 hours ago|

[-]

Wow that is terrible. In my memory GPT 2 was more interesting than that. I remember thinking it could pass a Turing test but that output is barely better than a Markov chain.

I guess I was using the large model?

reply

upvote

by sillysaurusx9 hours ago|

[-]

There’s an art to GPT sampling. You have to use temperature 0.7. People never believe it makes such a massive difference, but it does.

reply

upvote

by wat1000010 hours ago|

[-]

Probably a much better prompt, too. I just literally pasted in the top part of my comment and let fly to see what would happen.

reply

upvote

by daveguy10 hours ago|

[-]

Here is the XL model. 20x the size of the medium model. Still just 2B parameters, but on the bright side it was trained pre-wordslop.

https://huggingface.co/openai-community/gpt2-xl

reply

upvote

by PufPufPuf12 hours ago|

[-]

I used GPT-2 (fine-tuned) to generate Peppa Pig cartoons, it was cutely incoherent https://youtu.be/B21EJQjWUeQ

reply

upvote

by 15 hours ago|

[-]

deleted

reply

upvote

by Bombthecat15 hours ago|

[-]

And now gpt is laughing,while it replaces coders lol

reply

upvote

by MarcelOlsz15 hours ago|

[-]

Why? Doesn't have jokey copy. Any thoughts on claude-mem[0] + context-mode[1]?

[0] https://github.com/thedotmack/claude-mem

[1] https://github.com/mksglu/context-mode

reply

upvote

by simonw15 hours ago|

[-]

The big idea with Memvid was to store embedding vector data as frames in a video file. That didn't seem like a serious idea to me.

reply

upvote

by nico15 hours ago|

[-]

Very cool idea. Been playing with a similar concept: break down one image into smaller self-similar images, order them by data similarity, use them as frames for a video

You can then reconstruct the original image by doing the reverse, extracting frames from the video, then piecing them together to create the original bigger picture

Results seem to really depend on the data. Sometimes the video version is smaller than the big picture. Sometimes it’s the other way around. So you can technically compress some videos by extracting frames, composing a big picture with them and just compressing with jpeg

reply

upvote

by jermaustin115 hours ago|

[-]

> embedding vector data as frames in a video file

Interesting, when I heard about it, I read the readme, and I didn't take that as literal. I assumed it was meant as we used video frames as inspiration.

I've never used it or looked deeper than that. My LLM memory "project" is essentially a `dict<"about", list<"memory">>` The key and memories are all embeddings, so vector searchable. I'm sure its naive and dumb, but it works for my tiny agents I write.

reply

upvote

by niuzeta15 hours ago|

[-]

Just read through the readme and I was fairly sure this was a well-written satire through "Smart Frames".

Honestly part of me still thinks this is a satire project but who knows.

reply

upvote

by DiffTheEnder15 hours ago|

[-]

Is this... just one file acting as memory?

reply

upvote

by paulddraper7 hours ago|

[-]

One video file

reply

upvote

by 15 hours ago|

[-]

deleted

reply

upvote

by combobyte14 hours ago|

[-]

> most people get the joke

I hope you're right, but from my own personal experience I think you're being way too generous.

reply

upvote

by msikora9 hours ago|

[-]

This has been a thing way before AI. Anyone remembers Yo, the single button social media app that raised $1M in 2014?

reply

upvote

by dakolli13 hours ago|

[-]

Its the same as cyrpto/nft hype cyles, except this time one of the joke projects is going to crash the economy.

reply

upvote

by imiric15 hours ago|

[-]

A major reason for that is because there's no way to objectively evaluate the performance of LLMs. So the meme projects are equally as valid as the serious ones, since the merits of both are based entirely on anecdata.

It also doesn't help that projects and practices are promoted and adopted based on influencer clout. Karpathy's takes will drown out ones from "lesser" personas, whether they have any value or not.

reply

upvote

by stingraycharles16 hours ago|

[-]

While the caveman stuff is obviously not serious, there is a lot of legit research in this area.

Which means yes, you can actually influence this quite a bit. Read the paper “Compressed Chain of Thought” for example, it shows it’s really easy to make significant reductions in reasoning tokens without affecting output quality.

There is not too much research into this (about 5 papers in total), but with that it’s possible to reduce output tokens by about 60%. Given that output is an incredibly significant part of the total costs, this is important.

https://arxiv.org/abs/2412.13171

reply

upvote

by altruios15 hours ago|

[-]

Who would suspect that the companies selling 'tokens' would (unintentionally) train their models to prefer longer answers, reaping a HIGHER ROI (the thing a publicly traded company is legally required to pursue: good thing these are all still private...)... because it's not like private companies want to make money...

reply

upvote

by stingraycharles13 hours ago|

[-]

I don’t think this is a plausible argument, as they’re generally capacity constrained, and everyone would like shorter (= faster) responses.

I’m fairly certain that in a few more releases we’ll have models with shorter CoT chains. Whether they’ll still let us see those is another question, as it seems like Anthropic wants to start hiding their CoT, potentially because it reveals some secret sauce.

reply

upvote

by Ifkaluva6 hours ago|

[-]

I guess mainly they don’t want you to distill on their CoT

reply

upvote

by fancyfredbot10 hours ago|

[-]

Try setting up one laundry which charges by the hour and washes clothes really really slowly, and another which washes clothes at normal speed at cost plus some margin similar to your competitors.

The one which maximizes ROI will not be the one you rigged to cost more and take longer.

reply

upvote

by sebastiennight8 hours ago|

[-]

I don't think the analogy is correct here.

Directionally, tokens are not equivalent to "time spent processing your query", but rather a measure of effort/resource expended to process your query.

So a more germane analogy would be:

What if you set up a laundry which charges you based on the amount of laundry detergent used to clean your clothes?

Sounds fair.

But then, what if the top engineers at the laundry offered an "auto-dispenser" that uses extremely advanced algorithms to apply just the right optimal amount of detergent for each wash?

Sounds like value-added for the customer.

... but now you end up with a system where the laundry management team has strong incentives to influence how liberally the auto-dispenser will "spend" to give you "best results"

reply

upvote

by bombcar5 hours ago|

[-]

Shades of “repeat” in lather, rinse, repeat.

reply

upvote

by gwern12 hours ago|

[-]

LLM APIs sell on value they deliver to the user, not the sheer number of tokens you can buy per $. The latter is roughly labor-theory-of-value levels of wrong.

reply

upvote

by ACCount3716 hours ago|

[-]

Some labs do it internally because RLVR is very token-expensive. But it degrades CoT readability even more than normal RL pressure does.

It isn't free either - by default, models learn to offload some of their internal computation into the "filler" tokens. So reducing raw token count always cuts into reasoning capacity somewhat. Getting closer to "compute optimal" while reducing token use isn't an easy task.

reply

upvote

by stingraycharles16 hours ago|

[-]

Yeah the readability suffers, but as long as the actual output (ie the non-CoT part) stays unaffected it’s reasonably fine.

I work on a few agentic open source tools and the interesting thing is that once I implemented these things, the overall feedback was a performance improvement rather than performance reduction, as the LLM would spend much less time on generating tokens.

I didn’t implement it fully, just a few basic things like “reduce prose while thinking, don’t repeat your thoughts” etc would already yield massive improvements.

reply

upvote

by AdamN16 hours ago|

[-]

Yeah you could easily imagine stenography like inputs and outputs for rapid iteration loops. It's also true that in social media people already want faster-to-read snippets that drop grammar so the desire for density is already there for human authors/readers.

reply

upvote

by ieie336616 hours ago|

[-]

All LLMs also effectively work by ”larping” a role. You steer it towards larping a caveman and well.. let’s just say they weren’t known for their high iq

reply

upvote

by roughly16 hours ago|

[-]

Fun fact: Neanderthals actually had larger brains than Homo Sapiens! Modern humans are thought to have outcompeted them by working better together in larger groups, but in terms of actual individual intelligence, Neanderthals may have had us beat. Similarly, humans have been undergoing a process of self-domestication over the last couple millenia that have resulted in physiological changes that include a smaller brain size - again, our advantage over our wilder forebearers remains that we're better in larger social groups than they were and are better at shared symbolic reasoning and synchronized activity, not necessarily that our brains are more capable.

(No, none of this changes that if you make an LLM larp a caveman it's gonna act stupid, you're right about that.)

reply

upvote

by adwn15 hours ago|

[-]

I thought we were way past the "bigger brain means more intelligence" stage of neuroscience?

reply

upvote

by seba_dos115 hours ago|

[-]

Bigger brain does not automatically mean more intelligence, but we have reasons to suspect that homo neanderthalensis may have been more intelligent than contemporary homo sapiens other than bigger brains.

reply

upvote

by nomel15 hours ago|

[-]

All data shows there's a moderate correlation.

reply

upvote

by dtech14 hours ago|

[-]

You can't draw conclusions on individuals, but at a species level bigger brain, especially compared to body size, strongly correlates with intelligence

reply

upvote

by waffletower15 hours ago|

[-]

Even neuronal density is simplistic, and the dimension of size alone doesn't consider that.

reply

upvote

by Hikikomori16 hours ago|

[-]

Modern humans were also cavemen.

reply

upvote

by DiogenesKynikos16 hours ago|

[-]

This is why ancient Chinese scholar mode (also extremely terse) is better.

reply

upvote

by bensyverson16 hours ago|

[-]

Exactly. The model is exquisitely sensitive to language. The idea that you would encourage it to think like a caveman to save a few tokens is hilarious but extremely counter-productive if you care about the quality of its reasoning.

reply

upvote

by andai9 hours ago|

[-]

Does this imply that if you train it on Gwern style output, the quality will improve?

reply

upvote

by gwern8 hours ago|

[-]

Unfortunately, that is an oversimplification for a highly RLed/chatbot trained LLM like Claude-4.7-opus. It may have started life as a base model (where prompting it with correctly spelled prompts, or text from 'gwern', would - and did with davinci GPT-3! - improve quality), but that was eons ago. The chatbots are largely invariant to that kind of prompt trickery, and just try to do their best every time. This is why those meme tricks about tips or bribery or my-grandmother-will-die stop working.

reply

upvote

by reacharavindh15 hours ago|

[-]

This specific form may be a joke, but token conscious work is becoming more and more relevant.. Look at https://github.com/AgusRdz/chop

And

https://github.com/toon-format/toon

reply

upvote

by alex7o14 hours ago|

[-]

Also https://github.com/rtk-ai/rtk but some people see that changing how commands output stuff can confuse some models

reply

upvote

by SEJeff11 hours ago|

[-]

I believe tools like graphify cut down the tokens in thinking dramatically. It makes a knowledge graph and dumps it into markdown that is honestly awesome. Then it has stubs that pretend to be some tools like grep that read from the knowledge graph first so it does less work. Easy to setup and use too. I like it.

https://graphify.net/

reply

upvote

by xnx6 hours ago|

[-]

There's a tremendous amount of superstition around LLMs. Remember when "prompt engineering" "best practices" were to say you were offering a tip or some other nonsense?

reply

upvote

by causal14 hours ago|

[-]

Output tokens are more expensive

reply

upvote

by 16 hours ago|

[-]

deleted

reply

upvote

by sidrag2213 hours ago|

[-]

I hesitated 100% when i saw caveman gaining steam, changing something like this absolutely changes the behaviour of the models responses, simply including like a "lmao" or something casual in any reply will change the tone entirely into a more relaxed style like ya whatever type mode.

I think a lot of people echo my same criticism, I would assume that the major LLM providers are the actual winners of that repo getting popular as well, for the same reason you stated.

> you will barely save even 1% with such a tool

For the end user, this doesnt make a huge impact, in fact it potentially hurts if it means that you are getting less serious replies from the model itself. However as with any minor change across a ton of users, this is significant savings for the providers.

I still think just keeping the model capable of easily finding what it needs without having to comb through a lot of files for no reason, is the best current method to save tokens. it takes some upfront tokens potentially if you are delegating that work to the agent to keep those navigation files up to date, but it pays dividends when future sessions your context window is smaller and only the proper portions of the project need to be loaded into that window.

reply

upvote

by sambellll10 hours ago|

[-]

Someone should make an MCP that parses every non-code file before it hits claude to turn it into caveman talk

reply

upvote

by egorfine16 hours ago|

[-]

They are indeed impractical in agentic coding.

However in deep research-like products you can have a pass with LLM to compress web page text into caveman speak, thus hugely compressing tokens.

reply

upvote

by claytongulick16 hours ago|

[-]

I don't understand how this would work without a huge loss in resolution or "cognitive" ability.

Prediction works based on the attention mechanism, and current humans don't speak like cavemen - so how could you expect a useful token chain from data that isn't trained on speech like that?

I get the concept of transformers, but this isn't doing a 1:1 transform from english to french or whatever, you're fundamentally unable to represent certain concepts effectively in caveman etc... or am I missing something?

reply

upvote

by egorfine14 hours ago|

[-]

Good catch actually.

Okay maybe not exactly caveman dialect, but text compression using LLM is definitely possible to save on tokens in deep research.

reply

upvote

by Waterluvian16 hours ago|

[-]

Help me understand: I get that the file reading can be a lot. But I also expand the box to see its “reasoning” and there’s a ton of natural language going on there.

reply

upvote

by addandsubtract14 hours ago|

[-]

We started out with oobabooga, so caveman is the next logical evolution on the road to AGI.

reply

upvote

by make316 hours ago|

[-]

I wonder if you can have it reason in caveman

reply

upvote

by 0123456789ABCDE16 hours ago|

[-]

would you be surprised if this is what happens when you ask it to write like one?

folks could have just asked for _austere reasoning notes_ instead of "write like you suffer from arrested development"

reply

upvote

by Sohcahtoa8216 hours ago|

[-]

> "write like you suffer from arrested development"

My first thought was that this would mean that my life is being narrated by Ron Howard.

reply

upvote

by micromacrofoot15 hours ago|

[-]

I mean we had a shoe company pivot to AI and raise their stock value by 300%, how can we even know anymore

reply

upvote

by bombcar5 hours ago|

[-]

Lemonade and blockchain rides again!

Or was it ice tea?

reply

upvote

by acedTrex16 hours ago|

[-]

You really think the 33k people that starred a 40 line markdown file realize that?

reply

upvote

by andersa16 hours ago|

[-]

You mean the 33k bots that created a nearly linear stars/day graph? There's a dip in the middle, but it was very blatant at the start (and now)

reply

upvote

by verdverm16 hours ago|

[-]

Stars are more akin to bookmarks and likes these days, as opposed to a show of support or "I use this"

reply

upvote

by zbrozek16 hours ago|

[-]

I use them like bookmarks.

reply

upvote

by giraffe_lady16 hours ago|

[-]

I intentionally throw some weird ones on there just in case anyone is actually ever checking them. Gotta keep interviewers guessing.

reply

upvote

by LPisGood16 hours ago|

[-]

I use them as likes

reply

upvote

by pdntspa16 hours ago|

[-]

[flagged]

reply