Accelerate the mean operations without axis#589
Accelerate the mean operations without axis#589ThreeMonth03 wants to merge 1 commit intosolvcon:masterfrom
Conversation
ThreeMonth03
left a comment
There was a problem hiding this comment.
@yungyuc , could you please review this pull request when you are available?
| template <typename A, typename T> | ||
| class SimpleArrayMixinSum | ||
| { | ||
|
|
There was a problem hiding this comment.
Move sum operation to a seperate class because of complex optimization.
| value_type sum_contiguous() const | ||
| { | ||
| auto athis = static_cast<A const *>(this); | ||
| value_type result; | ||
| if constexpr (is_complex_v<value_type>) | ||
| { | ||
| result = value_type{}; | ||
| } | ||
| else | ||
| { | ||
| result = 0; | ||
| } | ||
| sum_unrolled_generic(athis->data(), athis->size(), 1, result); | ||
| return result; | ||
| } |
There was a problem hiding this comment.
I forget to implement simd for common data type. Would it become a seperate pull request?
| return total; | ||
| } | ||
|
|
||
| void sum_unrolled_generic(const value_type * data_ptr, size_t size, size_t stride, value_type & result) const |
There was a problem hiding this comment.
I'm not sure whether it is really unroll the loop.
There was a problem hiding this comment.
It's hard to tell. If you are not sure about it, why adding it?
yungyuc
left a comment
There was a problem hiding this comment.
Good progress. Points to address:
- Do not use threads at the time being. We need a system for controlling threading from outside the computing kernel and it is outside the scope of speeding up one operation.
- Make functions static when you can.
- Clarify why adding seemingly unrolled loop that you are not sure about.
| private: | ||
| void check_c_contiguous(small_vector<size_t> const & shape, | ||
| small_vector<size_t> const & stride) const | ||
| bool is_c_contiguous(small_vector<size_t> const & shape, |
| return true; | ||
| } | ||
|
|
||
| void check_c_contiguous(small_vector<size_t> const & shape, |
| const size_t prefix_len = ndim - 1; | ||
| const size_t total_combinations = calculate_total_combinations(shape, prefix_len); | ||
|
|
||
| const size_t num_threads = static_cast<size_t>(std::thread::hardware_concurrency()); |
There was a problem hiding this comment.
We are not ready for using threads. Without a system to control how to use threads from outside the computing kernel here, the performance and resource consumption are not predictable.
| return total; | ||
| } | ||
|
|
||
| void sum_unrolled_generic(const value_type * data_ptr, size_t size, size_t stride, value_type & result) const |
There was a problem hiding this comment.
It's hard to tell. If you are not sure about it, why adding it?
In this pull request, I accelerate the mean operations.

For 1D contiguous array(unrolling, wo simd):
For 3D non contiguous array(unrolling + multithread):
