These functions are trivial to write; but the author found himself writing them repeatedly, so he decided to write them just once. The fun was keeping the library simple for use with built-in types while making it usable for user-defined types with only a little bit of extra work. |
A precondition is that the container be non-empty and already sorted ascending. |
The code is distributed under the Boost Software License. (This is not part of Boost. The author just likes their open-source license.)
The library is just a single header file. That header, this document, and the Boost license are available here.
My goal was to write a library that could be compiled with any C++ implementation that conforms to C++11 or later; but that goal was not entirely achieved. |
All but minimum() and maximum() have additional template parameters shown below, but they all default, and ContainerT is deduced.namespace simpstat { template<typename ContainerT> auto minimum(const ContainerT&); template<typename ContainerT> auto maximum(const ContainerT&); template<typename ContainerT> auto median(const ContainerT&); template<typename ContainerT> auto mean(const ContainerT&); template<typename ContainerT> auto variance(const ContainerT&); template<typename ContainerT> auto stddev(const ContainerT&); template<typename ContainerT> auto meanvardev(const ContainerT&); } // namespace simpstat
The first six templates all return ContainerT::value_type;
variance() calls mean(), and stddev() calls variance();
so to keep from doing the same work multiple times when you want more that one of those, the
The actual work when dealing with built-in types is done in a long double in the hope of avoiding overflow
in the variance calculation; and the result is cast to ContainerT::value_type, first rounding to nearest using
The library doesn’t call begin() or end() or use the container’s iterators explicitly;
but it has a couple of range-based for loops, which is why
It’s not clear why you’d want to write your own container, but you can do it if you really need to provided that it works like a vector or a deque in the few ways specified above. |
The complications are all in subnamespaces that users don’t really need to care about (but see the actual source file if you need to do anything fancier than what’s described below).
The general structure of the header is:
Let’s say that you have a rational number with at least the following public API:namespace simpstat { namespace dflt { // defaults, mostly for built-in types } namespace frac { // the function templates that do the actual work using some // “fractional type” which will be long double for built-in // value types, which we hope is big enough to mitigate the // overflow problem given the sum-of-squares in the variance // calculation, and which, for integer value types, can hold // values between two adjacent value type values } // the function templates described above } // namespace simpstat
You might also need a “fractional type” if you don’t want it to default to long double; and if your value type and fractional type are different types, you’ll need explicit conversions between the value type and the fractional type. (In this example, we’ll just use rational as both the value type and the fractional type, and so no conversions are required.)class rational { public: explicit rational(int); // the canonical special member functions rational& operator+=(const rational&); rational& operator-=(const rational&); rational& operator*=(const rational&); rational& operator/=(const rational&); bool operator<(const rational&) const; // ... }; rational operator-(const rational&); rational operator+(const rational&, const rational&); rational operator-(const rational&, const rational&); rational operator*(const rational&, const rational&); rational operator/(const rational&, const rational&);
And if you want a standard deviation, you’ll also need to extract the
square root of the variance while it still has your fractional type. If that’s a built-in
floating point type, the square root operation will happily default to
which is what you’ll get, with the second and third template arguments defaulting as shown, if you don’t do anything else.template<typename FracT, int ErrV = 1000, double FixedErrV = 1.0> class NewtonSqrt { // // Note that we require an explicit conversion from int to FracT. // We hope that it’s a constant expression but can’t count on it. // static const FracT zero = static_cast<FracT>(0); static const FracT one = static_cast<FracT>(1); static const FracT two = static_cast<FracT>(2); static const FracT err = static_cast<FracT>(ErrV); static FracT absval(FracT val) { return val < zero ? -val : val; } public: static FracT sqrt(FracT val) { FracT error = INT_MIN == ErrV ? FracT(FixedErrV) : val / err; FracT root = one; while (error < absval(val - root * root)) { // Newton’s method: root = (val / root + root) / two; } return root; } };
It computes a square root by successive approximation using Newton’s method and terminates when the difference between the value and the square of the trial root is not greater than some error. The allowable error will default to the passed value divided by the second template parameter, which parameter in turn defaults to 1000, thus giving an answer within ±0.1% of the passed value. You can specify other values for the second template argument if you want some other accuracy; but note that the template argument will be a divisor, not a multiplier (so a bigger argument yields a smaller error); and since it’s a non-type template argument, it must be a constant expression.
If that’s not good enough for you, you’ll need to write your own; but that might not be as daunting as it sounds. Let’s say that you’re happy with Newton’s method but you want one additional decimal digit of accuracy:
struct ratsqrt { static rational sqrt(const rational& val) { // Just specify a 2nd template argument 10x the default: return simpstat::dflt:NewtonSqrt<rational,10000>::sqrt(val); } };
Or maybe you want some fixed error that doesn’t depend on what the passed value is, in which case you can use INT_MIN as the second template argument and specify the fixed error as the third. Let’s say that your ultimate value type is an integer, so you’ll be happy with an error of ±½:
struct my_sqrt { static my_frac sqrt(const my_frac& val) { return simpstat::dflt:NewtonSqrt<my_frac,INT_MIN,0.5>::sqrt(val); } };
Other than the square root (needed only for standard deviations) and the fractional type (needed for medians and standard deviations), using the library should be only a matter of specifying the actual template arguments. The most “interesting” function to call is:
Given:template<typename ContainerT, typename FracT = dflt::DefaultFrac, typename RootT = dflt::DefaultRoot<FracT>, typename CastT = dflt::DefaultCaster<typename ContainerT::value_type,FracT>> /* std::tuple<typename ContainerT::value_type, typename ContainerT::value_type, typename ContainerT::value_type> */ auto meanvardev(const ContainerT&);
you can use the library with only a little more work:std::vector<rational> data; //load data with at least one value and sort it
The minimum() and maximum() templates have only a single template parameter which can be deduced. For all the others, you’ll need to specify most or all of your template arguments explicitly. In our example, since the value type and fractional type are the same type (both rational), the CastT parameter can default. (It will assume that it’s dealing with a user-defined type and default to just a copy or a move as appropriate.)using namespace simpstat; // for simplicity herein rational min = minimum(data); rational max = maximum(data); //explicitly specify < ContainerT , FracT > rational med = median<std::vector<rational>,rational>(data); // < ContainerT , FracT , RootT > const auto others = meanvardev<std::vector<rational>,rational,ratsqrt>(data); rational avg = std::get<0>(others); rational var = std::get<1>(others); rational dev = std::get<2>(others);
Only the stddev() and meanvardev() templates have the RootT parameter.
Note that the RootT and CasT parameters are expected to be class templates
with static
They’re not just function templates since they’d need to be non-type template parameters with pointer-to-function type. That’s doable, but ugh! |
namespace simpstat { template<typename ContainerT> auto minimum(const ContainerT&); template<typename ContainerT> auto maximum(const ContainerT&); template<typename ContainerT, typename FracT = dflt::DefaultFrac, typename CastT = dflt::DefaultCaster<typename ContainerT::value_type,FracT>> auto median(const ContainerT&); template<typename ContainerT, typename FracT = dflt::DefaultFrac, typename CastT = dflt::DefaultCaster<typename ContainerT::value_type,FracT>> auto mean(const ContainerT&); template<typename ContainerT, typename FracT = dflt::DefaultFrac, typename CastT = dflt::DefaultCaster<typename ContainerT::value_type,FracT>> auto variance(const ContainerT&); template<typename ContainerT, typename FracT = dflt::DefaultFrac, typename RootT = dflt::DefaultRoot<FracT>, typename CastT = dflt::DefaultCaster<typename ContainerT::value_type,FracT>> auto stddev(const ContainerT&); template<typename ContainerT, typename FracT = dflt::DefaultFrac, typename RootT = dflt::DefaultRoot<FracT>, typename CastT = dflt::DefaultCaster<typename ContainerT::value_type,FracT>> auto meanvardev(const ContainerT&); } // namespace simpstat