Fives Times Better Binary Chop

I just accepted my new life pas­sion to be a mas­ter pro­gram­mer. Sud­denly… I feel… full of knowl­edge! Wait, I mean, full of MOTIVATION! That’s the one. Here I am, full of want-to-be-master-programmer, but there’s still this gap between me and that goal. I know how to pro­gram. I know all the fun­da­men­tal com­puter sci­ence the­o­ries, algo­rithms, and data struc­tures. I can object-orient any­thing, even your face. I even love cod­ing in C++! The next step, I sup­pose is to get more expe­ri­ence. So I did Code Kata #2. That’s the one where you recre­ate a binary chop algo­rithm 5 times. Sim­ple right? Here are my attempts:

PS the code gets bet­ter as you descend the list

1. Iter­a­tive attempt #1

Show Iter­a­tive #1 Code

// O(nlogn) -- if lucky search hit, can return before nlogn.
int BinaryChop::chop1(int to_find, 
                     const std::vector<int>& data){
  int len = static_cast<int>(data.size()),
  low = 0;
  if(len == 0) return NOT_FOUND;
  int i, curr_data;

  while( len > 0 ){
    i = len/2 + low;
    curr_data = data[i];
    if( curr_data == to_find ){
      return i;
    }else if( to_find > curr_data ){
      //Need to search farther down array
      low = i+1;
      len = (len-1)/2;
    }else{ //to_find < curr_data
      //Search again at a smaller index.
      len = div_ceil( len-1, 2 );
    }
  }
  return NOT_FOUND;
}

This approach I used a start­ing index with a cur­rent length to find the next posi­tion. I have to use both floor and ceil­ing to get my indices cor­rect. I really would not rec­om­mend this method. I had to over engi­neer the solu­tion to a sim­ple prob­lem. In gen­eral I pre­fer to use min and max index when chop­ping an array. More on that later.

How­ever, I learned a quick way to get the ceil­ing with all inte­ger math:

Show Ceil­ing Code

static int div_ceil_overflow(int dividend, int divisor){
 return (dividend + divisor - 1) / divisor;
}

int div_ceil(int dividend, int divisor){
 if(dividend == 0)
   return div_ceil_overflow(dividend,
                            divisor);
 return 1 + ((dividend - 1) / divisor);
}

The div_ceil won’t over­flow an inte­ger, but will fail if the divi­sor is 0. It falls back to div_ceil_overflow just in case which may over­flow. Either way, it’s a good all-case solution!

2. Recur­sive attempt #1, (Get­ting faster)

Show Recur­sive #1 Code

static int chop2_rec (int& to_find, 
                     const std::vector<int>& data, 
                     int& length, int& low){
    if(length == 0)
        return NOT_FOUND;
    int i = length/2 + low;
    int curr_data = data[i];
    if( curr_data == to_find ){
        return i;
    }else if( to_find > curr_data ){
        low = i+1;
        length = (length-1)/2;
    }else{
        length = div_ceil(length-1, 2);
    }
    return chop2_rec(to_find, data, length, low);
}

// O(nlogn)
int BinaryChop::chop2 (int to_find, 
                      const std::vector<int>& data){
    int length = static_cast<int>(data.size());
    int low = 0;
    return chop2_rec(to_find, data, 
                     length, low);
}

This is basi­cally the same logic as iter­a­tive #1 except in recur­sive form. I still wouldn’t rec­om­mend this method. It gets messy with all the floors and ceil­ings. The only thing I learned is that a com­piler can opti­mize a recur­sive func­tion to be faster than an iden­ti­cal iter­a­tive ver­sion. Despite all my brain sweat over these solu­tions, the com­piler wins.

3. Func­tional array

Here’s an inter­est­ing one. I’ve never pro­grammed in a func­tional lan­guage. I’m famil­iar with the con­cept how­ever. Here’s a go at mak­ing a sim­ple array class that can be ‘sliced’ very quickly and addressed at the new indices based on the slice. I wanted to recre­ate the idea of pass­ing around these slices into func­tions. It could use a lot more func­tion­al­ity and opti­miza­tion. But here’s my rough draft:

Show Func­tion­alVec­tor Code

template <class T>
class FunctionalVector : public std::vector<T> {
    size_t i_start, i_length, i_end;
public:
    FunctionalVector();
    
    FunctionalVector (const FunctionalVector<T>& rhs) :
               std::vector<T>(static_cast<std::vector<T> >(rhs)),
               i_start(rhs.i_start), i_end(rhs.i_end),
               i_length(rhs.i_length) {}
    
    //Copy data constructor
    FunctionalVector (const std::vector<T>& other) : 
               std::vector<T>(other),
               i_start(0), i_end(other.size()), 
               i_length(other.size()) {}
    
    FunctionalVector<T>& slice (size_t new_start,
                 size_t new_length){
        i_start += new_start;
        i_length = new_length;
        i_end = i_start +i_length;
        return *this;
    }
    
    //Override vector's bracket operator to place 
    //modified indices.
    T& operator[] (size_t idx){
        if(idx >= i_length && i_length > 0)
            idx %= i_length;
        return std::vector<T>::operator[](index_at(idx));
    }
    const T& operator[] (size_t idx)const{
        if(idx >= i_length && i_length > 0)
            idx %= i_length;
        return std::vector<T>::operator[](index_at(idx));
    }
    
    void reset (){
      i_start=0;
      i_length=std::vector<T>::size();
      i_end = i_length;
    }
    
    
    size_t index_at (size_t norm_idx) const{
        return i_start + norm_idx;
    }
    
    size_t size ()const{
        return i_length;
    }
};

Isn’t it fun to inherit from std con­tain­ers? At hind­sight I think a bet­ter choice to inher­it­ing from std::vector would be to have a mem­ber vec­tor vari­able. It would lead to much fewer errors because vec­tor is com­pletely con­tained within Func­tion­Vec­tor as a pri­vate mem­ber. Yea, just don’t use my Func­tion­Vec­tor class, it’s not actu­ally very func­tional. Ha! And now the chop­ping code:

Show Func­tional Chop Code

// O(nlogn)
int BinaryChop::chop3(int to_find, 
             const std::vector<int>& data){
    FunctionalVector<int> fun_data(data); //copy data
    size_t i;
    int curr_data;
    while( fun_data.size() > 0 ){
        i = fun_data.size()/2;
        curr_data = fun_data[i];
        if( curr_data == to_find ){
            return static_cast<int>(fun_data.index_at(i));
        }else if( to_find > curr_data ){
            //Need to search farther down array
            fun_data.slice(i+1, (fun_data.size()-1)/2);
        }else{ //to_find < curr_data
            //Search again at a smaller index.
            fun_data.slice(0, div_ceil( static_cast<int>
                               (fun_data.size()-1), 2) );
        }
    }
    return NOT_FOUND;
}

4. Threaded, or Oh God What Have I Created

Binary Chop isn’t meant to be mul­ti­threaded. But I did it to prac­tice C++ threads. I’m not proud of this. But here goes:

Show Threaded Chop Code

void chop4_thread(int to_find, const std::vector<int> &data, 
                  int low, int length, int* result){
    int i = length/2 + low;
    int curr_data = data[i]; //thread-safe read
    
    if(curr_data == to_find){
        *result = i;
        return;
    }else if( to_find > curr_data ){
        low = i+1;
        length = (length-1)/2;
    }else{
        length = div_ceil(length-1, 2);
    }
    
    if(length == 0)
        return;
    
    std::thread continue_search(chop4_thread, to_find,
                        data, low, length, result);
    continue_search.join();
}

void chop4_thread_spawn(int to_find, const std::vector<int> &data,
                        int low, int length, int* result){
    if(length == 0) return;
    if(to_find < data[low] || to_find > data[low+length-1]) return;
    std::thread continue_search(chop4_thread, to_find, data,
                                low, length, result);
    continue_search.join();
}

//O(nlogn)
int BinaryChop::chop4( int to_find, const std::vector<int> &data ){
    int result = NOT_FOUND;
    int half = static_cast<int>(data.size())/2;
    std::thread left(chop4_thread_spawn, to_find, data, 0, 
                     half, &result);
    std::thread right(chop4_thread_spawn, to_find, data, 
                      half, 
                      div_ceil(static_cast<int>(data.size()), 2),
                      &result);
    left.join();
    right.join();
    return result;
}

This chop takes an obscene amount of time to do the sim­ple task of nav­i­gat­ing through an array. It spawns two threads in the begin­ning, one to look through the left half, and another for the right. The inter­me­di­ary chop4_thread_spawn() will either con­tinue spawn­ing a new thread or not depend­ing on whether the object we’re look­ing for is pos­si­bly on the left or right.

5. Iter­a­tive attempt #2 (much, much better)

I rethought out my solu­tion on this attempt. This is a binary chop I can be proud of. This time I use min and max indices where max is inclu­sive of the last ele­ment in the data. This fact removes the need for my ceil­ing func­tion. It also makes the loop much sim­pler and easy to read. Addi­tion­ally, I used a con­cept called deferred equal­ity to only check to see if I’ve found the key after the loop ends. This removes nlogn equal­ity checks inside the loop which when you think about it, those would be lucky hits if they were ever found to be true. So I sep­a­rate the work of get­ting to the key and check­ing the key. Less cou­pling, bet­ter code.

Show Iter­a­tive #2 Code

int BinaryChop::chop5( int to_find, const std::vector<int>& data ){
    int imax = static_cast<int>( data.size()-1 ),
        imin = 0,
        imid;
    if( imax < 0 ) return NOT_FOUND;
    
    while( imin < imax ){
        imid = (imin + imax)/2;
        if( to_find > data[imid] ){
            imin = imid+1;
        }else{ //to_find < curr_data
            imax = imid;
        }
    }
    if( imin == imax && data[imin] == to_find ){
        return imin;
    }
    return NOT_FOUND;
}

6. Bonus Recur­sive Attempt #2

I’ve been doing some pro­fil­ing through­out these tests. So far, chop 5 is blaz­ing fast com­pared to the oth­ers. But I wanted to try to beat it using a com­piler opti­mized tail recur­sive func­tion using my deferred equal­ity logic.

Show Recur­sive #2 Code

static int chop6_rec( int& to_find, const std::vector<int>& data,
                     int& imin, int& imax ){
    if( imin == imax ){
        if( data[imin] == to_find ){
            return imin;
        } else {
            return NOT_FOUND;
        }
    }

    int imid = ( imin + imax )/2;
    
    if( to_find > data[imid] ){
        imin = imid+1;
    }else{
        imax = imid;
    }

    return chop6_rec( to_find, data, imin, imax );
}

// Θ(nlogn) -- bound above and below nlogn due to deferred equality.
int BinaryChop::chop6( int to_find, const std::vector<int>& data ){
    int imax = static_cast<int>( data.size()-1 );
    int imin = 0;
    if( imax < 0 )
        return NOT_FOUND;
    return chop6_rec( to_find, data, imin, imax );
}

This is fast, but sur­pris­ingly, it’s actu­ally slightly slower than chop #5. YES! Here’s to learning!

That’s it for me. Good job if you made it down this far.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>