Full context and details: https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...
How can we describe OCR that wouldn't match this definition exactly?
_______ Optical character recognition:
1. You have a set of predefined patterns of interest which are well-known.
2. You're trying your best to find all occurrences of those patterns. If a letter appears only once, you still need to detect it.
3. You don't care much about visual similarity within a category. The letter "B" written in extremely different fonts is the same letter.
4. You care strongly about the boundaries between categories. For example, "B+" must resolve to two known characters in sequence.
5. You want to keep details of exactly where something was found, or at the least in what order they were found. You're creating a layer of new details, which may be added to the artifact.
_______ "Glyph compression":
1. You don't have a predefined set of patterns, the algorithm is probably trying to dynamically guess at patterns which are sufficiently similar and frequent.
2. Your aren't trying to find all occurrences, only sufficiently similar and common ones, to maximize compression. If a letter appears only once, it can be ignored.
3. You do care strongly about visual similarity within a category, you don't want to mix-n-match fonts.
4. You don't care about clear category lines, if "B+" becomes its own glyph, that's no problem.
5. You're discarding detail from the artifact, to make it smaller.
If the image is actually text, both of them can end up finding things. Binning will identify "these things look almost the same", while OCR will identify "these look like the letter M"
It also gives a false sense of security when it displays dirty pixels that still clearly show a specific digit, since you think you're basically looking at the original.
Jbig2 is an OCR algorithm that doesn't assume the document comes from a pre-existing alphabet.
Take another look at my comment.
Question: "How can we describe OCR that wouldn't match this definition exactly?"
Answer: This definition largely fits OCR, but "reference to a single instance" is a weird way to phrase it. A better definition of OCR would include how it uses builtin knowledge of glyphs and text structure, unlike JBIG2 which looks for examples dynamically. And that difference in technique gives you a significant difference in the end results.
Is that better?
The definition you quoted is not an "exact" fit to OCR, it's a mildly misleading fit to OCR, and clearing up the misleading part makes it no longer fit both.