[JKQTPlotterBasicJKQTPDatastoreStatisticsGroupedStat]: @ref JKQTPlotterBasicJKQTPDatastoreStatisticsGroupedStat "1-Dimensional Group Statistics with JKQTPDatastore"
This tutorial project (see `./examples/datastore_statistics_2d/`) explains several advanced functions of JKQTPDatastore in combination with the [[statisticslibrary]] conatined in JKQTPlotter.
The source code of the main application can be found in [`datastore_statistics_2d.cpp`](https://github.com/jkriege2/JKQtPlotter/tree/master/examples/datastore_statistics_2d/datastore_statistics_2d.cpp).
This tutorial cites only parts of this code to demonstrate different ways of working with data for the graphs.
# Generating different sets of random numbers
The code segments below will fill different instances of JKQTPlotter with different statistical plots. All these plots are based on either of two sets of random number pairs (i.e. x/y-locations) generated as shown here:
The columns `randomdatacolx` and `randomdatacoly` will contain 500 random numbers. Each one is drawn either from the normal distributions `d1x`/`d1y`, or `d2x`/`d2y`. The decision, which of the two to use is based on the result of a third random distribution `ddecide`, which only returns 0, 1 or 2 (`d1x`/`d1y` is chosen for `ddecide==0`). The column `randomdatacoldist` finally encodes which pair of distributions was chosen. The three columns `randomdatacolx_small`, `randomdatacoly_small` and `randomdatacoldist_small` represent a subset of the first 150 numbers from the full dataset.
The columns are generated empty by calling `JKQTPDatastore::addColumn()` with only a name. Then the actual values are added by calling `JKQTPDatastore::appendToColumn()`.
The datasets generated above can be plotted using a simple scatterplot (here shown for the full dataset only):
To explore the statistical properties of a 2D dataset, it is often useful to draw the marginal histograms, i.e. the histograms of one of the two variables. To do so, we use the function `jkqtpstatAddHHistogram1DAutoranged()` and `jkqtpstatAddVHistogram1DAutoranged()` that were introduced in [JKQTPlotterBasicJKQTPDatastoreStatistics] and are explained in detail there:
Note that the plots are put into `JKQTPlotter`-instances `plothistBottom` and `plothistLeft`. These are put into a global `QGridSizer` just below and left of the main histogram plot `plothist`. The axes of these plots are linked to the main plot, so zooming one affects the other plot too and their axes are inverted, so the drawn histogram elongates to the left and to the bottom (starting with 0 at the main plot `plothist`):
Calculating 2D-Histograms is implemented in `jkqtpstatHistogram2D()` from the [statisticslibrary]. This functions calculates a histogram on a given grid with the first bin at `xmin,ymin` and the last bin at `xmax,ymax`, containing `Nx,Ny` bins in x-/y-direction. You can use the result to fill new image column (with `Nx` columns and `Ny` rows) in a `JKQTPDatastore`, which can then be used to draw the histogram.
First we have to determine the x- and y-range of data and calculate the number of bins from these:
These differ in the type of plot that is added `jkqtpstatAddHistogram2DImage()` adds an image graph (`JKQTPColumnMathImage`) and `jkqtpstatAddHistogram2DContour()` adds a contour plot (`JKQTPColumnContourPlot`). A contour plot may look like this:
Especially when only few samples from a distribution are available, histograms are not good at representing the underlying data distribution. In such cases, [Kernel Density Estimates (KDE)](https://en.wikipedia.org/wiki/Multivariate_kernel_density_estimation) can help, which are basically a smoothed variant of a histogram. The [statisticslibrary] supports calculating them via e.g. `jkqtpstatKDE2D()`:
This functions calculates a histogram on a given grid with the first bin at `xmin,ymin` and the last bin at `xmax,ymax`, containing `Nx,Ny` bins in x-/y-direction. You can use the result to fill new image column (with `Nx` columns and `Ny` rows) in a `JKQTPDatastore`, which can then be used to draw the histogram.
First we have to determine the x- and y-range of data and calculate the number of bins from these:
Note that calculating a KDE is a rather time-consuming task. The KDE in the image above took about 8s to calculate (from 150 datapairs and on a 193*267 pixel grid), whereas the corresponding histogram took only 1ms to calculate!
There is also an "adaptor" which significanty reduce the amount of coude you have to type (i.e. it replaces all the code above):
These differ in the type of plot that is added `jkqtpstatAddKDE2DImage()` adds an image graph (`JKQTPColumnMathImage`) and `jkqtpstatAddKDE2DContour()` adds a contour plot (`JKQTPColumnContourPlot`). the first is shown in the screenshot above, the latter looks like this:
The output of the full test program [`datastore_statistics_2d.cpp`](https://github.com/jkriege2/JKQtPlotter/tree/master/examples/datastore_statistics_2d/datastore_statistics_2d.cpp) looks like this: