.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_plot_activity_recognition.py>`     to download the full example code
    .. rst-class:: sphx-glr-example-title

    .. _sphx_glr_auto_examples_plot_activity_recognition.py:


Activity recognition from accelerometer data
============================================

This demo shows how the **sklearn-xarray** package works with the ``Pipeline``
and ``GridSearchCV`` methods from scikit-learn providing a metadata-aware
grid-searchable pipeline mechansism.

The package combines the metadata-handling capabilities of xarray with the
machine-learning framework of sklearn. It enables the user to apply
preprocessing steps group by group, use transformers that change the number
of samples, use metadata directly as labels for classification tasks and more.

The example performs activity recognition from raw accelerometer data with a
Gaussian naive Bayes classifier. It uses the
`WISDM`_ activity prediction dataset which contains the activities
walking, jogging, walking upstairs, walking downstairs, sitting and standing
from 36 different subjects.

.. _WISDM: http://www.cis.fordham.edu/wisdm/dataset.php


.. code-block:: default


    from __future__ import print_function

    import numpy as np

    from sklearn_xarray import wrap, Target
    from sklearn_xarray.preprocessing import Splitter, Sanitizer, Featurizer
    from sklearn_xarray.model_selection import CrossValidatorWrapper
    from sklearn_xarray.datasets import load_wisdm_dataarray

    from sklearn.preprocessing import StandardScaler, LabelEncoder
    from sklearn.decomposition import PCA
    from sklearn.naive_bayes import GaussianNB
    from sklearn.model_selection import GroupShuffleSplit, GridSearchCV
    from sklearn.pipeline import Pipeline

    import matplotlib.pyplot as plt


First, we load the dataset and plot an example of one subject performing
the 'Walking' activity.

.. tip::

    In the jupyter notebook version, change the first cell to ``%matplotlib
    notebook`` in order to get an interactive plot that you can zoom and pan.


.. code-block:: default


    X = load_wisdm_dataarray()

    X_plot = X[np.logical_and(X.activity == "Walking", X.subject == 1)]
    X_plot = X_plot[:500] / 9.81
    X_plot["sample"] = (X_plot.sample - X_plot.sample[0]) / np.timedelta64(1, "s")

    f, axarr = plt.subplots(3, 1, sharex=True)

    axarr[0].plot(X_plot.sample, X_plot.sel(axis="x"), color="#1f77b4")
    axarr[0].set_title("Acceleration along x-axis")

    axarr[1].plot(X_plot.sample, X_plot.sel(axis="y"), color="#ff7f0e")
    axarr[1].set_ylabel("Acceleration [g]")
    axarr[1].set_title("Acceleration along y-axis")

    axarr[2].plot(X_plot.sample, X_plot.sel(axis="z"), color="#2ca02c")
    axarr[2].set_xlabel("Time [s]")
    axarr[2].set_title("Acceleration along z-axis")


.. image:: /auto_examples/images/sphx_glr_plot_activity_recognition_001.png
    :alt: Acceleration along x-axis, Acceleration along y-axis, Acceleration along z-axis
    :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none


    Text(0.5, 1.0, 'Acceleration along z-axis')


Then we define a pipeline with various preprocessing steps and a classifier.

The preprocessing consists of splitting the data into segments, removing
segments with `nan` values and standardizing. Since the accelerometer data is
three-dimensional but the standardizer and classifier expect a
one-dimensional feature vector, we have to vectorize the samples.

Finally, we use PCA and a naive Bayes classifier for classification.


.. code-block:: default


    pl = Pipeline(
        [
            (
                "splitter",
                Splitter(
                    groupby=["subject", "activity"],
                    new_dim="timepoint",
                    new_len=30,
                ),
            ),
            ("sanitizer", Sanitizer()),
            ("featurizer", Featurizer()),
            ("scaler", wrap(StandardScaler)),
            ("pca", wrap(PCA, reshapes="feature")),
            ("cls", wrap(GaussianNB, reshapes="feature")),
        ]
    )


Since we want to use cross-validated grid search to find the best model
parameters, we define a cross-validator. In order to make sure the model
performs subject-independent recognition, we use a `GroupShuffleSplit`
cross-validator that ensures that the same subject will not appear in both
training and validation set.


.. code-block:: default


    cv = CrossValidatorWrapper(
        GroupShuffleSplit(n_splits=2, test_size=0.5), groupby=["subject"]
    )


The grid search will try different numbers of PCA components to find the best
parameters for this task.

.. tip::

    To use multi-processing, set ``n_jobs=-1``.


.. code-block:: default


    gs = GridSearchCV(
        pl, cv=cv, n_jobs=1, verbose=1, param_grid={"pca__n_components": [10, 20]}
    )


The label to classify is the activity which we convert to an integer
representation for the classification.


.. code-block:: default


    y = Target(
        coord="activity", transform_func=LabelEncoder().fit_transform, dim="sample"
    )(X)


Finally, we run the grid search and print out the best parameter combination.


.. code-block:: default


    if __name__ == "__main__":  # in order for n_jobs=-1 to work on Windows
        gs.fit(X, y)
        print("Best parameters: {0}".format(gs.best_params_))
        print("Accuracy: {0}".format(gs.best_score_))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Fitting 2 folds for each of 2 candidates, totalling 4 fits
    [Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
    [Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:   12.2s finished
    Best parameters: {'pca__n_components': 10}
    Accuracy: 0.6746431870478216


.. note::

    The performance of this classifier is obviously pretty bad,
    it was chosen for execution speed, not accuracy.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  17.490 seconds)


.. _sphx_glr_download_auto_examples_plot_activity_recognition.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_activity_recognition.py <plot_activity_recognition.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_activity_recognition.ipynb <plot_activity_recognition.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_