I've been thinking about similar systems for tissue cultures but I can't seem to find a way to generalize and still get good training data or effective results. Once you lose track of white balance, species, optical clarity and distortion from the vessel, etc... Results decline quite a bit in my experience. It makes it a neat yet fairly useless system outside of itself.
Granted, I have no idea what I'm doing and these could be solvable problems. Certainly much easier to solve by focusing on a single species.
I'm impressed with how well it classifies based on the image examples. A little over a million images is probably what makes it possible. My experiments have been much smaller. Maybe with more material I could overcome those limitations I mentioned, but I have a feeling the multi-species pipeline really drags it down.
Have you found that light temperature no longer skews feedback after so much training data? For me it really matters, causing classification to confuse light sources with actual plant condition (hence the colour card for white balance helping so much)
Early on the photography thing was a real problem. Training data was mostly decent shots, then inference would come in as some blurry phone photo under purple LEDs.
Confident misclassifications. The fix wasn't clever - just more data that looks like how people actually take photos of their plants. Messy, badly lit, half the leaf out of frame. Once there was enough of that in the training set the models stopped caring about white balance. About 1.1 million augmented images now and light temperature just isn't a factor. No color card needed.
For tissue culture - I'd bet the multi-species part is what's killing you. I'd pick the single highest-value species, collect a probably-uncomfortable amount of well-labeled data for just that one, and see if things change. Right now you might not be able to tell what's a data problem vs a fundamental limitation, because the generalization overhead masks both.
That never occurred to me. That's a great insight.
> I'd pick the single highest-value species, collect a probably-uncomfortable amount of well-labeled data for just that one
I think you're right. If I want to move forward with it I think it's the only feasible way to validate a proof of concept. Generalizing can't produce a useful tool at my scale.
Thank you! I think this was a helpful nudge. Narrow classifiers could make some things a lot easier. Do you know of any reading materials about routing like this? Is it just programmatic decision tree stuff, or is there something more clever I'm unaware of?
One crop at a time though. A so-so classifier across 50 species is way less useful than a really good one for the thing you're actually growing.