Even though decoding the lengths must be serial (since's there's no unambiguous way to differentiate a tag and data byte), it's still doable within the wider SIMD registers, so there's some theoretical efficiency gain to be had (depending on the shape of the data).
On a general note, the continuation bit and prefix byte forms are equivalent, you just broadcast the prefix byte and compare against an increasing vector to convert it to a mask. Yeah, there's probably more fiddly SIMD if there are multiple prefixes in the register, but doable (it's just not byte-parallel, you eg. unroll the serial decode loop 8 times or whatever your maximum output byte width is, and mask out).
Simplified:
// Just maps a byte to its position in the register
__m128i idx = _mm_setr_epi8(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15);
// Broadcast the prefix
__m128i nn = _mm_set1_epi8((char)prefix_byte);
// Get applicable locations: prefix_byte contains the length, if byte_pos < len, the corresponding byte will be set
__m128i m = _mm_cmpgt_epi8(nn, idx);
// If you *really* want a high-bit mask:
m = _mm_and_si128(m, _mm_set1_epi8((char)0x80));Interleaved Bijou has no such signal (tag and payload bytes both span 0x00–0xFF), so finding the boundaries is a dependent per-value walk with no opportunities for parallelism.
With that, it's mostly byte-parallel (though data-dependent as I mentioned).