[JKQTPlotterBasicJKQTPDatastoreStatisticsGroupedStat]: @ref JKQTPlotterBasicJKQTPDatastoreStatisticsGroupedStat "1-Dimensional Group Statistics with JKQTPDatastore"
This tutorial project (see `./examples/datastore_statistics/`) explains several advanced functions of JKQTPDatastore in combination with the [[statisticslibrary]] conatined in JKQTPlotter.
The source code of the main application can be found in [`datastore_statistics.cpp`](https://github.com/jkriege2/JKQtPlotter/tree/master/examples/datastore_statistics/datastore_statistics.cpp).
The code segments below will fill four instances of JKQTPlotter with different statistical plots. All these plots are based on three sets of random numbers generated as shown here:
```.cpp
size_t randomdatacol1=datastore1->addColumn("random data 1");
size_t randomdatacol2=datastore1->addColumn("random data 2");
size_t randomdatacol3=datastore1->addColumn("random data 3");
std::random_device rd; // random number generators:
std::mt19937 gen{rd()};
std::uniform_int_distribution<> ddecide(0,1);
std::normal_distribution<> d1{0,1};
std::normal_distribution<> d2{6,1.2};
for (size_t i=0; i<150;i++){
double v=0;
const int decide=ddecide(gen);
if (decide==0) v=d1(gen);
else v=d2(gen);
datastore1->appendToColumn(randomdatacol1, v);
if (decide==0) datastore1->appendToColumn(randomdatacol2, v);
The column `randomdatacol1` will contain 150 random numbers. Each one is drawn either from a normal dirstribution N(0,1) (`d1`) or N(6,1.2) (`d2`). the decision, which of the two to use is based on the result of a third random distribution ddecide, which only returns 0 or 1. The two columns `randomdatacol2` and `randomdatacol3` only collect the random numbers drawn from `d1` or `d2` respectively.
The three columns are generated empyt by calling `JKQTPDatastore::addColumn()` with only a name. Then the actual values are added by calling `JKQTPDatastore::appendToColumn()`.
Based on the raw data we can now use JKQTPlotter's [statisticslibrary] to calculate some basic properties, like the average (`jkqtpstatAverage()`) or the standard deviation (`jkqtpstatStdDev()`):
Both statistics functions (the same as all statistics functions in the library) use an iterator-based interface, comparable to the interface of the algorithms in the C++ standard template library. To this end, the class `JKQTPDatastore` provides an iterator interface to its columns, using the functions `JKQTPDatastore::begin()` and `JKQTPDatastore::end()`. Both functions simply receive the column ID as parameter and exist in a const and a mutable variant. the latter allows to also edit the data. In addition the function `JKQTPDatastore::backInserter()` returns a back-inserter iterator (like generated for STL containers with `std::back_inserter(container)`) that also allows to append to the column.
note that the iterator interface allows to use these functions with any container that provides such iterators (e.g. `std::vector<double>`, `std::list<int>`, `std::set<float>`, `QVector<double>`...).
The output of these functions is shown in the image above in the plot legend/key.
Of course, several other functions exist that calculate basic statistics from a column, e.g.:
- 5-Number Summary (e.g. for boxplots): `jkqtpstat5NumberStatistics()`, `jkqtpstat5NumberStatisticsAndOutliers()`, `jkqtpstat5NumberStatisticsOfSortedVector()`, `jkqtpstat5NumberStatisticsAndOutliersOfSortedVector()`
All these functions use all values in the given range and convert each value to a `double`, using `jkqtp_todouble()`. The return values is always a dohble. Therefore you can use these functions to calculate statistics of ranges of any type that can be converted to `double`. Values that do not result in a valid `double`are not used in calculating the statistics. Therefore you can exclude values by setting them `JKQTP_DOUBLE_NAN` (i.e. "not a number").
As mentioned above and shown in several other examples, JKQTPlotter supports [Boxplots](https://en.wikipedia.org/wiki/Box_plot) with the classes `JKQTPBoxplotHorizontalElement`, `JKQTPBoxplotVerticalElement`, as well as `JKQTPBoxplotHorizontal` and `JKQTPBoxplotVertical`. You can then use the 5-Number Summray functions from the [statisticslibrary] to calculate the data for such a boxplot (e.g. `jkqtpstat5NumberStatistics()`) and set it up by hand. Code would look roughly like this:
In order to save you the work of writing out this code, the [statisticslibrary] provides "adaptors", such as `jkqtpstatAddVBoxplot()`, which basically implements the code above. Then drawing a boxplot is reduced to:
Here `-0.25`indicates the location (on the y-axis) of the boxplot. and the plot is calculated for the data in the `JKQTPDatastore` column `randomdatacol2`.
Usually the boxplot draws its whiskers at the minimum and maximum value of the dataset. But if your data contains a lot of outliers, it may make sense to draw them e.g. at the 3% and 97% quantiles and the draw the outliers as additional data points. This can also be done with `jkqtpstat5NumberStatistics()`, as you can specify the minimum and maximum quantile (default is 0 and 1, i.e. the true minimum and maximum) and the resulting object contains a vector with the outlier values. Then you could add them to the JKQTPDatastore and add a scatter plot that displays them. Also this task is sped up by an "adaptor". Simply call
0.25, 0.75, // 1. and 3. Quartile for the boxplot box
0.03, 0.97 // Quantiles for the boxplot box whiskers' ends
);
```
As you can see this restuns the `JKQTPBoxplotHorizontalElement` and in addition a `JKQTPSingleColumnSymbolsGraph` for the display of the outliers. The result looks like this:
Calculating 1D-Histograms is supported by several functions from the [statisticslibrary], e.g. `jkqtpstatHistogram1DAutoranged()`. You can use the result to fill new columns in a `JKQTPDatastore`, which can then be used to draw the histogram (here wit 15 bins, spanning the full data range):
Especially when only few samples from a distribution are available, histograms are not good at representing the underlying data distribution. In such cases, [Kernel Density Estimates (KDE)](https://en.wikipedia.org/wiki/Kernel_density_estimation) can help, which are basically a smoothed variant of a histogram. The [statisticslibrary] supports calculating them via e.g. `jkqtpstatKDE1D()`:
Both histograms and KDEs support a parameter `bool cummulative`, which allows to accumulate the data after calculation and drawing cummulative histograms/KDEs:
The output of the full test program [`datastore_statistics.cpp`](https://github.com/jkriege2/JKQtPlotter/tree/master/examples/datastore_statistics/datastore_statistics.cpp) looks like this: