upvote
For a pretty small N I've found that less clever can be quite a bit faster. I'd try a linear search - possibly SIMD if you can change the data format to struct-of-arrays. An adaptive approach that uses linear search up to a certain N can also yield some benefit.
reply
The first implementation I encountered was a linear search, starting at the last-found field. Empirically it performed better to do a binary search with early exit and branchless bounds selection, I think due to branch predictor pressure. The data representation could be changed but it's tricky, as there are other traversals that want to go in sorted order, and there are lots of places that pass just one pointer for fields. But I agree any further improvement will probably have to come from that.

SIMD is tricky even with SoA because there is significant latency going between the general registers and the vector units, plus arm little cores can be configured to share a vector unit with another core.

reply
If you control the layout, eytzinger layout typically will give you the best of both worlds. As fast as a linear scan for small N, much faster than binary search over a sorted array for large N.
reply
I know protobuf code is extremely high quality, but I really can't stand the c-style naming conventions.

I know people train themselves into grokking this and reading and emitting this way, but it sounds like writing "bork bork bork bork" runes to me.

I'm glad Rust feels more like Ruby and Python and that method and field names are legible.

My eyes just glaze over:

    UPB_API_INLINE
    const struct upb_MiniTableField* upb_MiniTable_FindFieldByNumber(
        const struct upb_MiniTable* m, uint32_t number) {
      const uint32_t i = number - 1;  // 0 wraps to UINT32_MAX
    
      // Ideal case: index into dense fields
      if (i < m->UPB_PRIVATE(dense_below)) {
        UPB_ASSERT(m->UPB_ONLYBITS(fields)[i].UPB_ONLYBITS(number) == number);
        return &m->UPB_ONLYBITS(fields)[i];
      }
    
      // Early exit if the field number is out of range.
      uint32_t hi = m->UPB_ONLYBITS(field_count);
      uint32_t lo = m->UPB_PRIVATE(dense_below);
      UPB_ASSERT(hi >= lo);
      uint32_t search_len = hi - lo;
      if (search_len == 0 ||
          number > m->UPB_ONLYBITS(fields)[hi - 1].UPB_ONLYBITS(number)) {
        return NULL;
      }
    
      // Slow case: binary search
      const struct upb_MiniTableField* candidate;
    #ifndef NDEBUG
      candidate = UPB_PRIVATE(upb_MiniTable_ArmOptimizedLowerBound)(
          m, lo, search_len, number);
      UPB_ASSERT(candidate ==
                 UPB_PRIVATE(upb_MiniTable_LowerBound)(m, lo, search_len, number));
    #elif UPB_ARM64_ASM
      candidate = UPB_PRIVATE(upb_MiniTable_ArmOptimizedLowerBound)(
          m, lo, search_len, number);
    #else
      candidate = UPB_PRIVATE(upb_MiniTable_LowerBound)(m, lo, search_len, number);
    #endif
    
      return candidate->UPB_ONLYBITS(number) == number ? candidate : NULL;
    }
reply
> I really can't stand the c-style naming conventions.

Honestly I don't see much difference between

  upb_MiniTable_FindFieldByNumber
and

  upb::MiniTable::FindFieldByNumber
reply
Those are fairly indistinguishable. It's when they start removing letters from words to save... debug symbol bytes or something? That's when c-style naming annoys me.
reply
This is also my pet peeve with a lot of code as well as commands like

    npm -g i package-name 
Like why would you teach people to do this? I understand people needed to save precious bytes in the sixties so we have cat and ls but saving 192 bytes or whatever with shorter variable names is not a worthwhile tradeoff anymore.
reply
What exactly bothers you about this and what would you prefer to see?
reply
In their defence "hi" sounds very much like "high" in my mind's ear and "lo" like "low" :)
reply
Yeah namespaces and public/private would be quite nice, but C doesn't have them, so they get hacked on via macros and prefixing. The syntax was not the hard part of working or analyzing this code, though.
reply
I think this needs way more "upb" and "UPB" to make it clear that it is, in fact, dealing with UPBs. Whatever these are.
reply
> μpb (often written 'upb') is a small protobuf implementation written in C.
reply