Preprocessing

The sklearn_xarray.preprocessing module contains various preprocessing methods that work on xarray DataArrays and Datasets.

class BaseTransformer[source]

Base class for transformers.

Methods

fit(self, X[, y])

Fit estimator to data.

fit_transform(self, X[, y])

Fit to data, then transform it.

get_params(self[, deep])

Get parameters for this estimator.

inverse_transform(self, X)

Reverse the transformation.

set_params(self, **params)

Set the parameters of this estimator.

transform(self, X)

Transform input data.

fit(self, X, y=None, **fit_params)[source]

Fit estimator to data.

Parameters
Xxarray DataArray or Dataset

Training set.

yxarray DataArray or Dataset

Target values.

Returns
self:

The estimator itself.

inverse_transform(self, X)[source]

Reverse the transformation.

Parameters
Xxarray DataArray or Dataset

The input data.

Returns
Xtxarray DataArray or Dataset

The transformed data.

transform(self, X)[source]

Transform input data.

Parameters
Xxarray DataArray or Dataset

The input data.

Returns
Xtxarray DataArray or Dataset

The transformed data.

class Concatenator(dim='feature', new_dim=None, variables=None, new_var='Feature', new_index_func=None, return_array=False, groupby=None, group_dim='sample')[source]

Concatenate variables along a dimension.

Parameters
dimstr

Name of the dimension along which to concatenate the Dataset.

new_dimstr

New name of the dimension, if desired.

variableslist or tuple

Names of the variables to concatenate, default all.

new_var :

Name of the new variable created by the concatenation.

new_index_funcfunction

A function that takes the length of the concatenated dimension as a parameter and returns a vector of this length to be used as the index for that dimension.

return_array: bool

Whether to return a DataArray when a Dataset was passed.

groupbystr or list, optional

Name of coordinate or list of coordinates by which the groups are determined.

group_dimstr, optional

Name of dimension along which the groups are indexed.

Methods

fit(self, X[, y])

Fit estimator to data.

fit_transform(self, X[, y])

Fit to data, then transform it.

get_params(self[, deep])

Get parameters for this estimator.

inverse_transform(self, X)

Reverse the transformation.

set_params(self, **params)

Set the parameters of this estimator.

transform(self, X)

Transform input data.

fit(self, X, y=None, **fit_params)[source]

Fit estimator to data.

Parameters
Xxarray DataArray or Dataset

Training set.

yxarray DataArray or Dataset

Target values.

Returns
self:

The estimator itself.

class Featurizer(sample_dim='sample', feature_dim='feature', var_name='Features', order=None, return_array=False, groupby=None, group_dim='sample')[source]

Stack all dimensions and variables except for sample dimension.

Parameters
sample_dimstr

Name of the sample dimension.

feature_dimstr

Name of the feature dimension.

var_namestr

Name of the new variable (for Datasets).

orderlist or tuple

Order of dimension stacking.

return_array: bool

Whether to return a DataArray when a Dataset was passed.

groupbystr or list, optional

Name of coordinate or list of coordinates by which the groups are determined.

group_dimstr, optional

Name of dimension along which the groups are indexed.

Methods

fit(self, X[, y])

Fit estimator to data.

fit_transform(self, X[, y])

Fit to data, then transform it.

get_params(self[, deep])

Get parameters for this estimator.

inverse_transform(self, X)

Reverse the transformation.

set_params(self, **params)

Set the parameters of this estimator.

transform(self, X)

Transform input data.

class Reducer(dim='feature', func=<function norm>, groupby=None, group_dim='sample')[source]

Reduce data along some dimension.

Parameters
dimstr

Name of the dimension.

funcfunction

Reduction function.

groupbystr or list, optional

Name of coordinate or list of coordinates by which the groups are determined.

group_dimstr, optional

Name of dimension along which the groups are indexed.

Methods

fit(self, X[, y])

Fit estimator to data.

fit_transform(self, X[, y])

Fit to data, then transform it.

get_params(self[, deep])

Get parameters for this estimator.

inverse_transform(self, X)

Reverse the transformation.

set_params(self, **params)

Set the parameters of this estimator.

transform(self, X)

Transform input data.

class Resampler(freq=None, dim='sample', groupby=None, group_dim='sample')[source]

Resample along some dimension.

Parameters
freqstr

Frequency after resampling.

dimstr

Name of the dimension along which to resample.

groupbystr or list, optional

Name of coordinate or list of coordinates by which the groups are determined.

group_dimstr, optional

Name of dimension along which the groups are indexed.

Methods

fit(self, X[, y])

Fit the estimator.

fit_transform(self, X[, y])

Fit to data, then transform it.

get_params(self[, deep])

Get parameters for this estimator.

inverse_transform(self, X)

Reverse the transformation.

set_params(self, **params)

Set the parameters of this estimator.

transform(self, X)

Transform input data.

fit(self, X, y=None, **fit_params)[source]

Fit the estimator.

Parameters
Xxarray DataArray or Dataset

The input data.

yNone

For compatibility.

Returns
self :

The estimator itself.

class Sanitizer(dim='sample', groupby=None, group_dim='sample')[source]

Remove elements containing NaNs.

Parameters
dimstr

Name of the sample dimension.

groupbystr or list, optional

Name of coordinate or list of coordinates by which the groups are determined.

group_dimstr, optional

Name of dimension along which the groups are indexed.

Methods

fit(self, X[, y])

Fit estimator to data.

fit_transform(self, X[, y])

Fit to data, then transform it.

get_params(self[, deep])

Get parameters for this estimator.

inverse_transform(self, X)

Reverse the transformation.

set_params(self, **params)

Set the parameters of this estimator.

transform(self, X)

Transform input data.

class Segmenter(dim='sample', new_dim=None, new_len=None, step=None, axis=None, reduce_index='subsample', new_index_func=<built-in function arange>, keep_coords_as=None, groupby=None, group_dim='sample', return_view=False)[source]

Split into segments along some dimension.

Parameters
dimstr

Name of the dimension along which to split.

new_dimstr

Name of the newly added dimension.

new_lenint

Length of the newly added dimension.

step: int

Number of values between the start of a segment and the next one.

axisint

Axis position where new dimension is to be inserted. If None, the dimension will be inserted at the end.

reduce_indexstr

How to reduce the index of the split dimension.

  • 'head' : Take the first n values where n is the length of the dimension after segmenting.

  • 'subsample' : Take the values corresponding to the first element of every segment.

new_index_funccallable

A function that takes new_len as a parameter and returns a vector of length new_len to be used as the indices for the new dimension.

keep_coords_asstr or None

If set, the coordinate of the split dimension will be kept as a separate coordinate with this name. This allows inverse_transform to reconstruct the original coordinate.

return_viewbool, default False

If true, return a view instead of a copy of the segmented array.

groupbystr or list, optional

Name of coordinate or list of coordinates by which the groups are determined.

group_dimstr, optional

Name of dimension along which the groups are indexed.

Methods

fit(self, X[, y])

Fit estimator to data.

fit_transform(self, X[, y])

Fit to data, then transform it.

get_params(self[, deep])

Get parameters for this estimator.

inverse_transform(self, X)

Reverse the transformation.

set_params(self, **params)

Set the parameters of this estimator.

transform(self, X)

Transform input data.

class Selector(dim='sample', coord=None, groupby=None, group_dim='sample')[source]

Selects a subset of the samples.

Parameters
dimstr

Name of the sample dimension.

coordstr

The name of the coordinate that acts as the selector.

groupbystr or list, optional

Name of coordinate or list of coordinates by which the groups are determined.

group_dimstr, optional

Name of dimension along which the groups are indexed.

Methods

fit(self, X[, y])

Fit estimator to data.

fit_transform(self, X[, y])

Fit to data, then transform it.

get_params(self[, deep])

Get parameters for this estimator.

inverse_transform(self, X)

Reverse the transformation.

set_params(self, **params)

Set the parameters of this estimator.

transform(self, X)

Transform input data.

class Splitter(dim='sample', new_dim=None, new_len=None, axis=None, reduce_index='subsample', new_index_func=<built-in function arange>, keep_coords_as=None, groupby=None, group_dim='sample')[source]

Split along some dimension.

Parameters
dimstr

Name of the dimension along which to split.

new_dimstr

Name of the newly added dimension.

new_lenint

Length of the newly added dimension.

axisint

Axis position where new dimension is to be inserted. If None, the dimension will be inserted at the end.

reduce_indexstr

How to reduce the index of the split dimension.

  • 'head' : Take the first n values where n is the length of the dimension after splitting.

  • 'subsample' : Take every new_len th value.

new_index_funccallable

A function that takes new_len as a parameter and returns a vector of length new_len to be used as the indices for the new dimension.

keep_coords_asstr or None

If set, the coordinate of the split dimension will be kept as a separate coordinate with this name. This allows inverse_transform to reconstruct the original coordinate.

groupbystr or list, optional

Name of coordinate or list of coordinates by which the groups are determined.

group_dimstr, optional

Name of dimension along which the groups are indexed.

Methods

fit(self, X[, y])

Fit estimator to data.

fit_transform(self, X[, y])

Fit to data, then transform it.

get_params(self[, deep])

Get parameters for this estimator.

inverse_transform(self, X)

Reverse the transformation.

set_params(self, **params)

Set the parameters of this estimator.

transform(self, X)

Transform input data.

class Transposer(order=None, groupby=None, group_dim='sample')[source]

Reorder data dimensions.

Parameters
orderlist or tuple

The new order of the dimensions.

groupbystr or list, optional

Name of coordinate or list of coordinates by which the groups are determined.

group_dimstr, optional

Name of dimension along which the groups are indexed.

Methods

fit(self, X[, y])

Fit the estimator.

fit_transform(self, X[, y])

Fit to data, then transform it.

get_params(self[, deep])

Get parameters for this estimator.

inverse_transform(self, X)

Reverse the transformation.

set_params(self, **params)

Set the parameters of this estimator.

transform(self, X)

Transform input data.

fit(self, X, y=None, **fit_params)[source]

Fit the estimator.

Parameters
Xxarray DataArray or Dataset

The input data.

yNone

For compatibility.

Returns
self :

The estimator itself.

concatenate(X, return_estimator=False, **fit_params)[source]

Concatenates variables along a dimension.

Parameters
Xxarray DataArray or Dataset

The input data.

return_estimatorbool

Whether to return the fitted estimator along with the transformed data.

Returns
Xtxarray DataArray or Dataset

The transformed data.

featurize(X, return_estimator=False, **fit_params)[source]

Stacks all dimensions and variables except for sample dimension.

Parameters
Xxarray DataArray or Dataset

The input data.

return_estimatorbool

Whether to return the fitted estimator along with the transformed data.

Returns
Xtxarray DataArray or Dataset

The transformed data.

preprocess(X, function, groupby=None, group_dim='sample', **fit_params)[source]

Wraps preprocessing functions from sklearn for use with xarray types.

Parameters
Xxarray DataArray or Dataset

The input data.

functioncallable

The function to apply to the data. Note that this function cannot change the shape of the data.

groupbystr or list, optional

Name of coordinate or list of coordinates by which the groups are determined.

group_dimstr, optional

Name of dimension along which the groups are indexed.

Returns
Xtxarray DataArray or Dataset

The transformed data.

reduce(X, return_estimator=False, **fit_params)[source]

Reduces data along some dimension.

Parameters
Xxarray DataArray or Dataset

The input data.

return_estimatorbool

Whether to return the fitted estimator along with the transformed data.

Returns
Xtxarray DataArray or Dataset

The transformed data.

resample(X, return_estimator=False, **fit_params)[source]

Resamples along some dimension.

Parameters
Xxarray DataArray or Dataset

The input data.

return_estimatorbool

Whether to return the fitted estimator along with the transformed data.

Returns
Xtxarray DataArray or Dataset

The transformed data.

sanitize(X, return_estimator=False, **fit_params)[source]

Removes elements containing NaNs.

Parameters
Xxarray DataArray or Dataset

The input data.

return_estimatorbool

Whether to return the fitted estimator along with the transformed data.

Returns
Xtxarray DataArray or Dataset

The transformed data.

segment(X, return_estimator=False, **fit_params)[source]

Segments X along some dimension.

Parameters
Xxarray DataArray or Dataset

The input data.

return_estimatorbool

Whether to return the fitted estimator along with the transformed data.

Returns
Xtxarray DataArray or Dataset

The transformed data.

select(X, return_estimator=False, **fit_params)[source]

Selects a subset of the samples.

Parameters
Xxarray DataArray or Dataset

The input data.

return_estimatorbool

Whether to return the fitted estimator along with the transformed data.

Returns
Xtxarray DataArray or Dataset

The transformed data.

split(X, return_estimator=False, **fit_params)[source]

Splits X along some dimension.

Parameters
Xxarray DataArray or Dataset

The input data.

return_estimatorbool

Whether to return the fitted estimator along with the transformed data.

Returns
Xtxarray DataArray or Dataset

The transformed data.

transpose(X, return_estimator=False, **fit_params)[source]

Reorders data dimensions.

Parameters
Xxarray DataArray or Dataset

The input data.

return_estimatorbool

Whether to return the fitted estimator along with the transformed data.

Returns
Xtxarray DataArray or Dataset

The transformed data.