undefined

points

[-]

You are wrong.

Text tokens are high-dimensional vectors, not 8 bits per character. Every token has a deep embedding, e.g. 1024 float values per text token.

DeepSeek-OCR proved 10x+ compression from visual embedding of text, which was a groundbreaking result. [1]

Very cool to see OP's project hacking on this principle. It's still not lossless, as noted in the github, but is a promising research direction.

[1] https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSe...

by deburo1 hours ago|

parent|

[-]

A token is probably not a single char, and an image is probably decomposed into tokens as well (and god knows how many tokens an image is decomposed into) which probably map to similar float-hungry vectors. Your counterargument could use a bit more flesh.

And we're talking about images of texts, not images that represent complex imagery such as a very detailed scene or what have you.

by Groxx48 minutes ago|

parent|

prev|

[-]

I kinda wonder if it's extracting usable context from 2D proximity between lines? Normal text input wouldn't have that kind of information (though it could, and it's arguably just a lookahead/behind of N characters on average).

by TZubiri41 minutes ago|

parent|

prev|

[-]

>Text tokens are high-dimensional vectors,

You are conflating tokens with embeddings.

Tokens fit in a single word, modern gpt uses a vocabulary with 200k possible values, which would fit into 18 bits.

Have a good one

by netsharc1 hours ago|

prev|

[-]

huh, what if the image encoding is 8 bits per R, G, B values of the pixel, then one can encode the same amount of text in less pixel dimensions (3 letters would need 1 pixel instead of three 12x12 pixels)

The top line can be the OCR-able instruction on how to decode the rest of the image, and the rest of the image would be random-looking colourful palette. It might not even need to use 8 bits per character, since ANSI is 7 bits/character.

by TZubiri39 minutes ago|

parent|

[-]

then it's no longer an image, as the one in the github repo, you would be encoding the text as characters and sending it as an image.

You can achieve this by changing the extension of an image file from .bmp to .txt

Guys, not to be mean, but maybe chill with the state of the art research and go back to studying fundamentals.

by vineyardmike1 hours ago|

prev|

[-]

[dead]