upvote
Consider a contrived scenario where an opaque jar contains N distinguishable marbles. You take one out and note it's type and put it back in. You repeat this n times. If k out of n are unique it conveys information about N.

If, for example, k=1 then N is likely small. On the other hand if k=n then N is likely large.

The most computer-sciencey way is to look at n at which you get a repeat, ah! a hash collision.

One can make these ideas more quantitative under assumptions about the numbers of each types of marbles.

The math of hashing, birthday paradox, coupon collection and hyperloglog are good places to start.

Then there are other ways. Two of you count the number of typos in a tedious text. One says N the other says n and out of them only k are common. From this you can estimate the likely number of typos in the text.

reply
Right. That makes sense in the contrived scenario (although in that contrived scenario we know the probabilities with absolute surety).

But TFA's estimate is perplexing because it is NOT a contrived scenario. We don't have marbles, we have some territory to cover. The territory isn't randomly distributed, we can't adequately randomly sample (presumably?).

It feels like the estimate could be wildly wildly off, in which case why estimate.

reply
The contrived scenario is just a starting point. One can make more and more sophisticated ecological statistics models about the situation.

Regarding why estimate at all knowing they can be wrong ? Estimates are very useful for planning. Sophisticated models would also yield probabilities of over and underestimated, these combined with cost of over and underestimation errors are very useful for decision making.

See the German tank problem. Turns out the allied forces overestimated the number to f tanks left, still helped in planning.

reply
It also makes sense in non-contrived scenarios ... the contrivance was just pedagogical.
reply
Great explanation, thanks!
reply
It's probably something like, here are the environments where we've done comprehensive surveys, here are the kind of different situations where we expect to find different species (decomposers of various types, mycorrhizal, within plants, within animals, on surfaces, specialists, generalists, climates, etc). Multiply the species from places where we've probably found most of them by the number of places where we've only found the most obvious fungi. However it works it's going to have big error bars, reflected in the fact that 12M species is the upper end of a range starting at 2.2M.
reply
We have better DNA sequencing technology today, so we can detect how many species living in sample (soil/water/...) and guess something. But if someone want to "descripting" these fungi, they should plant the fungus species in lab and detect its feartures; this is more expensive, harder and usually impossible.
reply
https://pmc.ncbi.nlm.nih.gov/articles/PMC5118932/

Basically, you bulk sequence some sample like some soil, and from there you can call certain taxa and make estimates of unique species or unidentified sequences.

reply
deleted
reply
Also, how would they really know if a species is endangered? With millions of species that haven't even been identified, how would they know how common any of them are?

There are thousands of different species of many branches of the taxonomy tree (insects, molds, bacteria, etc.) and like fungi, each have tons of species not even identified.

Scientists estimate that something like 99% of species that ever existed, are extinct. I understand why people get upset when something like elephants hit the endangered list, but should we really care if some obscure species of dung beetle is endangered?

reply
For now, our science is not yet so advanced as to be able to appreciate what we will lose if an obscure species of dung beetle disappears.

Species of beetle or of fungi or of any other kind of living beings may look very similar, but nonetheless they may differ in their ability to synthesize various chemical compounds by using various enzymes that may not have equivalents in other living beings.

The popular literature is full of triumphalist b*s*t which makes it appear that most basic sciences, like physics, chemistry and biology are solved, but this is extremely far from the truth. We are still a few decades away from being able to understand well enough how a living being works, so that we would be able to replicate similar processes for making whatever we want.

Until then, every kind of living being which disappears is an irreversible loss of precious information, which may have saved an unpredictable amount of time in the future, which will be needed to rediscover similar results with those produced by natural evolution during millions of years.

reply
There's probably a really good answer using statistics, but it's beyond me.
reply
Inference.
reply
> 12 trillion species of fungi

Give it enough time, it could happen

reply
> (X number of crimes go "unreported"... if they're unreported how can we say that?).

“Unreported” is usually short for “unreported to police

I assume researchers ask people if they’ve seen a crime and not talked to the police about it.

reply