(about_stats)=
# About the stats module
The stats module is composed of the wrappers for statistical distributions
and wrappers for other statistical function and summaries. These two
follow completely different approaches which are discussed in
the two sections below, one per block.

This page assumes you have already read {ref}`stats_tutorial`.

## About distribution wrappers
There are only two classes that serve as wrappers to _all_ statistical
distributions in {mod}`scipy.stats`, one for continuous distributions
and another for discrete ones. This has some drawbacks related to preservation
of named dimensions and coordinates, but we believe its simplicity and ease of
maintenance outweighs this drawbacks.

The two wrappers {class}`~xarray_einstats.stats.XrContinuousRV` and
{class}`~xarray_einstats.stats.XrDiscreteRV` take as first
argument the scipy distribution to be wrapped and then optional
args and kwargs. Methods take also a mixture of args and kwargs,
mimicking the behaviour of scipy distributions where the
scale for example can be defined either at creation time
or when calling a method and it can be passed as a positional
or keyword argument.

The xarray einstats wrapper classes however, instead of initializing the distributions
at creation store the distribution and initialization args and kwargs. Then,
whenever a method is called the args and kwargs provided via the method
and the ones provided at initialization are combined and broadcasted.
This happens in the `_broadcast_args` method of {class}`xarray_einstats.stats.XrRV`.
The combined+broadcasted arguments are used to call the scipy distribution via
{func}`xarray.apply_ufunc`, which ensures that the shapes will be compatible.

As the same wrappers are used for all distributions, even if both positional and
keyword arguments are broadcasted, they are used as provided when calling `apply_ufunc`.
The main drawback of this approach is that `apply_ufunc` is only able to preserve
the dimensions and coordinates of _positional_ arguments. Therefore, given
two equivalent wrappers, one using positional and another using keyword arguments,
there are some edge cases where the one using keyword arguments will return numpy
arrays instead of `DataArray`s. Values are the same in both cases, but one
case has lost all information about named dimensions and coordinates.
The arguably more common and annoying case of such behaviour is with the `.rvs` method.

```{jupyter-execute}
import numpy as np
from scipy import stats
from xarray_einstats.tutorial import generate_mcmc_like_dataset
from xarray_einstats import stats as xtats
ds = generate_mcmc_like_dataset(3)

dist_pos = xtats.XrContinuousRV(stats.norm, ds["mu"], ds["sigma"])
dist_kw = xtats.XrContinuousRV(stats.norm, loc=ds["mu"], scale=ds["sigma"])

rvs_pos = dist_pos.rvs(size=5, random_state=7)
rvs_kw = dist_kw.rvs(size=5, random_state=7)
allclose = np.allclose(rvs_pos, rvs_kw)

print(f"Output type of rv_pos: {type(rvs_pos)}")
print(f"Output type of rv_kw:  {type(rvs_kw)}")
print(f"\nCheck all values are indeed equal in both cases: {allclose}")
```

In other methods, this is more complicated to trigger, because only one positional
argument is enough to preserve _all_ information. As the rest of the methods
convert array input to xarray under the hood, the following code doesn't lose any
labels:

```{jupyter-execute}
dist_kw.pdf(np.linspace(-3, 3))
```

## About statistical function and summary wrappers
Most wrappers here are minimal wrappers, that generally spend more time handling argument defaults.
The general pattern of these wrappers is the following:
1. Handle arguments. Arguments whose information is not needed by the wrapper generally default
  to `None` and are not included in the `dict` passed to `apply_ufunc` as `kwargs` argument.
  This covers us from having to track changes in scipy and update our argument defaults.
2. (optional) Take care of arguments that accept array values, that in `xarray_einstats`
  take `DataArray`s. They are broadcasted and aligned so the computation works.
3. Stack/reshape _if necessary_. `xarray_einstats` uses a `dims` argument that differs from
  scipy `axis` because it takes strings and because sequences of strings are also valid. When
  multiple dimensions are provided via `dims` they are stacked before calling scipy as
  it only takes integer `axis`
4. Call the scipy function.

Steps 3 and 4 are generally done via {func}`xarray_einstats.stats._apply_reduce_func` and
{func}`xarray_einstats.stats._apply_nonreduce_func`. However, if necessary, they are
done manually (for now this only happens with {func}`~xarray_einstats.stats.median_abs_deviation`)