Skip to content

Commit c3154a2

Browse files
committed
draft Rcpp gallery article for SIMD sum
1 parent f781ff4 commit c3154a2

File tree

2 files changed

+109
-0
lines changed

2 files changed

+109
-0
lines changed

.Rbuildignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@
1010
^doc$
1111
^src/.*\.o$
1212
^examples/
13+
^gallery/

gallery/simd-vector-sum.cpp

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
/**
2+
* @title Using SIMD Instructions to Sum a Vector with RcppParallel
3+
* @author Kevin Ushey
4+
* @license GPL (>= 2)
5+
* @tags parallel
6+
* @summary Demonstrates how RcppParallel could be used to implement a SIMD-aware sum.
7+
*/
8+
9+
/**
10+
* Newer releases of [RcppParallel](http://rcppcore.github.io/RcppParallel/) now
11+
* bundle the [Boost.SIMD](https://www.lri.fr/~falcou/pub/pact-2012.pdf)
12+
* library, providing a framework for ensuring that the algorithms you write
13+
* take advantage of the vectorized instructions offered by new CPUs. This
14+
* article illustrates how you might use `Boost.SIMD` to sum all of the elements
15+
* in a vector.
16+
*
17+
* First, let's review how we might use `std::accumulate()` to sum a vector of numbers.
18+
* We explicitly pass in the `std::plus<double>()` functor, just to make it clear that
19+
* the `std::accumulate()` algorithm expects a binary functor when accumulating values.
20+
*/
21+
22+
#include <Rcpp.h>
23+
using namespace Rcpp;
24+
25+
// [[Rcpp::export]]
26+
double vectorSum(NumericVector x) {
27+
return std::accumulate(x.begin(), x.end(), 0.0, std::plus<double>());
28+
}
29+
30+
/**
31+
* Now, let's rewrite this to take advantage of `Boost.SIMD`. There are
32+
* two main steps required to take advantage of `Boost.SIMD` at a high
33+
* level:
34+
*
35+
* 1. Write a functor, with a templated call operator, with the implementation
36+
* written in a '`Boost.SIMD`-aware' way;
37+
*
38+
* 2. Provide the functor as an argument to `boost::simd::transform()` or
39+
* `boost::simd::accumulate()`, as appropriate.
40+
*
41+
* Let's follow these steps in implementing our SIMD sum.
42+
*/
43+
44+
// [[Rcpp::depends(RcppParallel)]]
45+
#include <RcppParallel.h>
46+
47+
struct simd_plus {
48+
template <typename T>
49+
T operator()(const T& lhs, const T& rhs) {
50+
return lhs + rhs;
51+
}
52+
};
53+
54+
// [[Rcpp::export]]
55+
double vectorSumSimd(NumericVector x) {
56+
return boost::simd::accumulate(x.begin(), x.end(), 0.0, simd_plus());
57+
}
58+
59+
/**
60+
* As you can see, it's quite simple to take advantage of `Boost.SIMD`.
61+
* Now, let's compare the performance of these two implementations.
62+
*/
63+
64+
/*** R
65+
library(microbenchmark)
66+
67+
# allocate a large vector
68+
set.seed(123)
69+
v <- rnorm(1E6)
70+
71+
# ensure they compute the same values
72+
stopifnot(all.equal(vectorSum(v), vectorSumSimd(v)))
73+
74+
# compare performance
75+
res <- microbenchmark(vectorSum(v), vectorSumSimd(v))
76+
print(res)
77+
*/
78+
79+
/**
80+
* Perhaps surprisingly, the `Boost.SIMD` solution is much faster -- the gains
81+
* are similar to what we might have seen when computing the sum in parallel.
82+
* However, we're still just using a single core; we're just taking advantage of
83+
* vectorized instructions provided by the CPU. In this particular case, on
84+
* Intel CPUs, `Boost.SIMD` will ensure that we are using the `addpd`
85+
* instruction, which is documented in the Intel Software Developer's Manual
86+
* [[PDF](http://www.intel.com/Assets/en_US/PDF/manual/253666.pdf)].
87+
*
88+
* Note that, for the naive serial sum, the compiler would likely generate
89+
* similarly efficient code when the `-ffast-math` optimization flag is set.
90+
* By default, the compiler is somewhat 'pessimistic' about the set of
91+
* optimizations it can perform around floating point arithmetic. This is
92+
* because it must respect the [IEEE floating
93+
* point](https://en.wikipedia.org/wiki/IEEE_floating_point) standard, and
94+
* this means respecting the fact that, for example, floating point
95+
* operations are not assocative:
96+
*/
97+
98+
/*** R
99+
((0.1 + 0.2) + 0.3) - (0.1 + (0.2 + 0.3))
100+
*/
101+
102+
/**
103+
* Surprisingly, the above computation does not evaluate to zero!
104+
*
105+
* In practice, you're likely safe to take advantage of the `-ffast-math`
106+
* optimizations, or Boost.SIMD, in your own work. However, be sure to test and
107+
* verify!
108+
*/

0 commit comments

Comments
 (0)