"But for the AI assistant to function, voice, text, image and sometimes video must be processed and may be shared onwards. This data processing is done automatically and cannot be turned off."
The distinction here occurs wherever the data is processed, and it sounds as if the difference between using your video for labeling versus privately processing it through an AI is deliberately confusing and obscured to the user by the way the terms of service are written. Once the video is uploaded, which is necessary for the basic function, it's unclear how or whether it can be separated from other streams that do go through labeling. This confusion also seems to be an intentional dark pattern.