If, for example, k=1 then N is likely small. On the other hand if k=n then N is likely large.
The most computer-sciencey way is to look at n at which you get a repeat, ah! a hash collision.
One can make these ideas more quantitative under assumptions about the numbers of each types of marbles.
The math of hashing, birthday paradox, coupon collection and hyperloglog are good places to start.
Then there are other ways. Two of you count the number of typos in a tedious text. One says N the other says n and out of them only k are common. From this you can estimate the likely number of typos in the text.
But TFA's estimate is perplexing because it is NOT a contrived scenario. We don't have marbles, we have some territory to cover. The territory isn't randomly distributed, we can't adequately randomly sample (presumably?).
It feels like the estimate could be wildly wildly off, in which case why estimate.
Regarding why estimate at all knowing they can be wrong ? Estimates are very useful for planning. Sophisticated models would also yield probabilities of over and underestimated, these combined with cost of over and underestimation errors are very useful for decision making.
See the German tank problem. Turns out the allied forces overestimated the number to f tanks left, still helped in planning.
Basically, you bulk sequence some sample like some soil, and from there you can call certain taxa and make estimates of unique species or unidentified sequences.
There are thousands of different species of many branches of the taxonomy tree (insects, molds, bacteria, etc.) and like fungi, each have tons of species not even identified.
Scientists estimate that something like 99% of species that ever existed, are extinct. I understand why people get upset when something like elephants hit the endangered list, but should we really care if some obscure species of dung beetle is endangered?
Species of beetle or of fungi or of any other kind of living beings may look very similar, but nonetheless they may differ in their ability to synthesize various chemical compounds by using various enzymes that may not have equivalents in other living beings.
The popular literature is full of triumphalist b*s*t which makes it appear that most basic sciences, like physics, chemistry and biology are solved, but this is extremely far from the truth. We are still a few decades away from being able to understand well enough how a living being works, so that we would be able to replicate similar processes for making whatever we want.
Until then, every kind of living being which disappears is an irreversible loss of precious information, which may have saved an unpredictable amount of time in the future, which will be needed to rediscover similar results with those produced by natural evolution during millions of years.
Give it enough time, it could happen
“Unreported” is usually short for “unreported to police”
I assume researchers ask people if they’ve seen a crime and not talked to the police about it.