I have to do keyed reductions of arrays with many different keys that repeat only once in a while:
keys =  {1,2,3,3,4,5,6,7,7, 8, 9, 9,10,11,...}
array = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,...}
// after reduction
result = {1,2,7,5,6,7,17,10,23,13,14}
Using thrust::reduce_by_key (or any other segmented reduction method) is not the fastest option here as most operations are in fact just copies from one array to another.
What would be a better approach to this problem?
Actually, reduce_by_key is the appropriate algorithm to use here.  It's just that the current implementation in Thrust is not as fast as it could be.  To elaborate, there's nothing to prevent reduce_by_key from executing at memcpy speed, and I believe other implementations already achieve that rate.  Our tentative plan for Thrust v1.7 includes making performance improvements to reduce_by_key and other scan-based algorithms using the codes in the related back40computing project.
Note that when the segments are either (1) long or of (2) uniform length then it's possible to do better than reduce_by_key.  For example, at some point it's more economical to use an offset-based segment descriptor than keys or head flags.  However, when the segments are short (as in your case) or of highly variable length, then an optimal reduce_by_key implementation is really the best tool for the job.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With