Using a Single Threaded Functor in Multiple Threads with Futures in C++

Multithreaded programming requires a shift of paradigm when it comes to return values of functions. C++11 provides `std::async <http://en.cppreference.com/w/cpp/thread/async>`__ to run functions asynchronously but this is not available in older versions.

My current project on word spotting on historical documents is fairly complete in functionality but I decided that searching word images on page images concurrently is necessary for speed up. I'm already using Boost for many of the functionality and instead of creating a dependency on not yet mature C++11 support in various compilers, I decided to use =boost::thread=s.

Suppose we have a functor like

class Search_t
{
   public: 
       Search_t(Document d) { ... };
       SearchResult operator()(SearchItem i) { ... };
};

and we want to use this functor in multiple threads. We can't simply do

std::vector<SearchResult> results;
Search_t search(document);
// search_items is vector<SearchItem> and si is an iterator on this. 
for (si = search_items.begin(); si != search_items.end(); ++si)
{
   boost::thread task(boost::bind(search, *si));
   results.push_back(task); //ERROR!
}

because task does not return a SearchResult.

Instead we need to store results within the object and retrieve them after they are generated.

I didn't want to change the interface of Search_t, because multithreading should be optional and other parts of the program may depend on this interface. Instead a wrapper class that runs these threads with a similar interface looked a better solution.

class SearchMT_t
{

   boost::shared_ptr<std::vector<boost::unique_future<SearchResult> > > futures_;

   public: 
   SearchMT_t(Document d) : 
   /* The most important assumption to this is Search_t does not alter
      Document object's state in any way. Otherwise we need to ensure that a
      document_ is reached by a single thread with mutexes. */
   document_(d),
   futures_(new std::vector<boost::unique_future<SearchResult> >)
   {};

   void operator()(SearchItem si) 
   {
       /*If you are sure that there won't be any race conditions
       between Search_t threads in search, you can move the following line
       to the constructor and use a single object for all searches.*/
       Search_t search(document_);

       boost::packaged_task<SearchResult> search_task(std::bind(search, si));
       futures_->push_back(search_task.get_future());
       boost::thread task(boost::move(search_task));
   };

   std::vector<SearchResult> results()
   {
      std::vector<SearchResult> results;

      /* Wait for all threads to complete their work. */

      boost::wait_for_all(futures_->begin(), futures_->end());

      for(int i = 0; i < futures_->size(); ++i)
      {
          results.push_back((*futures_)[i].get());
      }

      return results;
   }

}

This way, it becomes much more straightforward to use multithreading in a loop:

std::vector<SearchResult> results;
SearchMT_t search(document);
// search_items is vector<SearchItem> and si is an iterator on this. 
for (si = search_items.begin(); si != search_items.end(); ++si)
{
    search(*si);
}

results = search.results();

This way, we kept the Search_t class intact and used a much simpler approach in the loop.