undefined

points

[-]

Before the AI stuff Google had those pop up quick answers when googling. So I googled something like three years ago, saw the answer, realized it was sourced from HN. Clicked the link, and lo and behold, I answered my own question. Look mah! Im on google! So I am not surprised at all that Google crawls HN enough to have it in their LLM.

I did chuckle at the 100% Rust Linux kernel. I like Rust, but that felt like a clever joke by the AI.

by dotancohen60 days ago|

parent|

[-]

I laughed at the SQLite 4.0 release notes. They're on 3.51.x now. Another major release a decade from now sounds just about right.

by ryanisnan60 days ago|

parent|

[-]

That one got me as well - some pretty wild stuff about prompting the compiler, starship on the moon, and then there's SQLite 4.0

by ikerrin160 days ago|

parent|

[-]

You can criticize it for many things but it seems to have comedic timing nailed.

by ncruces59 days ago|

parent|

prev|

[-]

The promise is backwards compatibility in the file format and C API until 2050.

https://sqlite.org/lts.html

by rtkwe60 days ago|

parent|

prev|

[-]

I wouldn't be surprised if it went towards the LaTeX model instead where there's essentially never another major version release. There's only so much functionality you need in a local only database engine I bet they're getting close to complete.

by dotancohen59 days ago|

parent|

[-]

I'd love to see more ALTER TABLE functionality, and maybe MERGE, and definitely better JSON validation. None of that warrants a version bump, though.

You know what I'd really like, that would justify a version bump? CRDT. Automatically syncing local changes to a remote service, so e.g. an Android app could store data locally on SQLite, but also log into a web site on his desktop and all the data is right there. The remote service need not be SQLite - in fact I'd prefer postgres. The service would also have to merge databases from all users into a single database... Or should I actually use postgres for authorisation but open each users' data in a replicated SQLite file? This is such a common issue, I'm surprised there isn't a canonical solution yet.

by rtkwe59 days ago|

parent|

[-]

I think the unified syncing while neat is way beyond what SQLite is really meant for and you'd get into so many niche situations dealing with out of sync master and slave 'databases' it's hard to make an automated solution that covers them effectively unless you force the schema into a transactional design for everything just to sort out update conflicts. eg: Your user has the app on two devices uses one while it doesn't have an internet connection altering the state and then uses the app on another device before the original has a chance to sync.

by dotancohen59 days ago|

parent|

[-]

Yes, it's a difficult problem. That's why I'd like it to be wrapped in a nice package away from my application logic.

Even a product that does this behind the scenes, by wrapping SQLite and exposing SQLite's wrapped interface, would be great. I'd pay for that.

by 60 days ago|

parent|

prev|

[-]

deleted

by Andrex60 days ago|

parent|

prev|

[-]

If it had been about GIMP I would have laughed harder.

by dotancohen59 days ago|

parent|

[-]

Be reasonable. It's only looking forward a single decade.

by schaum59 days ago|

parent|

prev|

[-]

Every few years I stumble across the same java or mongodb issue. I google for it, find it on stackoverflow, and figure that it was me who wrote that very answer. Always have a good laugh when it happens.

Usually my memory regarding such things is quite well, but this one I keep forgetting, so much so that I don't remember what the issue is actually about xD

by vidarh60 days ago|

parent|

prev|

[-]

I've run into my own comments or blog posts more often than I care to admit...

by james_marks60 days ago|

parent|

[-]

Several decades into this, I assume all documentation I write is for my future self.

Beautifully self-serving while being a benefit to others.

Same thing with picking nails up in the road to prevent my/everyone’s flat tire.

by QuantumNomad_60 days ago|

prev|

[-]

ziggy42 is both a submitter of a story on the actual front page at the moment, and also in the AI generated future one.

See other comment where OP shared the prompt. They included a current copy of the front page for context. So it’s not so surprising that ziggy42 for example is in the generated page.

And for other usernames that are real but not currently on the home page, the LLM definitely has plenty occurrences of HN comments and stories in its training data so it’s not really surprising that it is able to include real usernames of people that post a lot. Their names will be occurring over and over in the training data.

by NooneAtAll360 days ago|

parent|

[-]

one more reason to doubt that it's Ai-generated

by joaogui160 days ago|

prev|

[-]

HN has been used to train LLMs for a while now, I think it was in the Pile even

by never_inline60 days ago|

parent|

[-]

It has also fetched the current page in background. Because the jepsen post was recently on front page.

by morkalork60 days ago|

parent|

prev|

[-]

I may die but my quips shall live forever

by atrus60 days ago|

prev|

[-]

So many underscores for usernames, and yet, other than a newly created account, there was 1 other username with an underscore.

by robocat60 days ago|

parent|

[-]

In 2032 new HN usernames must use underscores. It was part of the grandfathering process to help with moderating accounts generated after the AI singlarity spammed too many new accounts.

by WorldPeas60 days ago|

parent|

prev|

[-]

my hypothesis is they trained it to snake case for lower case and that obsession carried over from programming to other spheres. It can't bring itself to make a lowercaseunseparatedname

by computably60 days ago|

parent|

[-]

Most LLMs, including Gemini (AFAIK), operate on tokens. lowercaseunseparatedname would be literally impossible for them to generate, unless they went out of their way to enhance the tokenizer. E.g. the LLM would need a special invisible separator token that it could output, and when preprocessing the training data the input would then be tokenized as "lowercase unseparated name" but with those invisible separators.

edit: It looks like it probably is a thing given it does sometimes output names like that. So the pattern is probably just too rare in the training data that the LLM almost always prefers to use actual separators like underscore.

by fooofw60 days ago|

parent|

[-]

The tokenization can represent uncommon words with multiple tokens. Inputting your example on https://platform.openai.com/tokenizer (GPT-4o) gives me (tokens separated by "|"):

    lower|case|un|se|parated|name

by maxglute59 days ago|

prev|

[-]

You can straight up ask Google to look for reddit, hackernews users post history. Some of it is probably just via search because it's very recent, as in last few days. Some of the older corpus includes deleted comments so they must be scraping from reddit archive apis too or using that deprecated google history cache.

by never_inline60 days ago|

prev|

[-]

This is definitely based on a search or page fetch, because there are these which are all today's topics

- IBM to acquire OpenAI (Rumor) (bloomberg.com)

- Jepsen: NATS 4.2 (Still losing messages?) (jepsen.io)

- AI progress is stalling. Human equivalence was a mirage (garymarcus.com)

by tempestn60 days ago|

parent|

[-]

The OP mentioned pasting the current frontpage into the prompt.

by 60 days ago|

prev|

[-]

deleted

by DANmode60 days ago|

prev|

[-]

What % of today’s front page submissions are from users that have existed 5-10 years+?

(Especially in datasets before this year?)

I’d bet half or more - but I’m not checking.

by vitorgrs59 days ago|

prev|

[-]

It does memorize. But that's not actually very news.... I remember ChatGPT 3.5 or old 4.0 to remember some users on some reddit subreddts and all. Saying even the top users for each subreddit..

The thing is, most of the models were heavily post-trained to limit this...

by skywhopper60 days ago|

prev|

[-]

That’s a lot more underscores than the actual distribution (I counted three users with underscores in their usernames among the first five pages of links atm).

by hurturue60 days ago|

prev|

[-]

either you only notice the xxx_yyy frequent posters or it's quite interesting that so many have this username format

by AceJohnny260 days ago|

prev|

[-]

Aw, I was actually a bit disappointed how much on the nose the usernames were, relative to their postings. Like the "Rust Linux Kernel" by rust_evangelist, "Fixing Lactose Intolerance" by bio_hacker, fixing an 2024 Framework by retro_fix, etc...

by dang_fan059 days ago|

prev|

[-]

I was here first

by dang_fan60 days ago|

prev|

[-]

[dead]

by bio_hacker59 days ago|

prev|

[-]

[dead]