Preprocessing¶
The sklearn_xarray.preprocessing
module contains various preprocessing
methods that work on xarray DataArrays and Datasets.
-
class
BaseTransformer
[source]¶ Base class for transformers.
Methods
fit
(self, X[, y])Fit estimator to data.
fit_transform
(self, X[, y])Fit to data, then transform it.
get_params
(self[, deep])Get parameters for this estimator.
inverse_transform
(self, X)Reverse the transformation.
set_params
(self, **params)Set the parameters of this estimator.
transform
(self, X)Transform input data.
-
fit
(self, X, y=None, **fit_params)[source]¶ Fit estimator to data.
- Parameters
- Xxarray DataArray or Dataset
Training set.
- yxarray DataArray or Dataset
Target values.
- Returns
- self:
The estimator itself.
-
-
class
Concatenator
(dim='feature', new_dim=None, variables=None, new_var='Feature', new_index_func=None, return_array=False, groupby=None, group_dim='sample')[source]¶ Concatenate variables along a dimension.
- Parameters
- dimstr
Name of the dimension along which to concatenate the Dataset.
- new_dimstr
New name of the dimension, if desired.
- variableslist or tuple
Names of the variables to concatenate, default all.
- new_var :
Name of the new variable created by the concatenation.
- new_index_funcfunction
A function that takes the length of the concatenated dimension as a parameter and returns a vector of this length to be used as the index for that dimension.
- return_array: bool
Whether to return a DataArray when a Dataset was passed.
- groupbystr or list, optional
Name of coordinate or list of coordinates by which the groups are determined.
- group_dimstr, optional
Name of dimension along which the groups are indexed.
Methods
fit
(self, X[, y])Fit estimator to data.
fit_transform
(self, X[, y])Fit to data, then transform it.
get_params
(self[, deep])Get parameters for this estimator.
inverse_transform
(self, X)Reverse the transformation.
set_params
(self, **params)Set the parameters of this estimator.
transform
(self, X)Transform input data.
-
class
Featurizer
(sample_dim='sample', feature_dim='feature', var_name='Features', order=None, return_array=False, groupby=None, group_dim='sample')[source]¶ Stack all dimensions and variables except for sample dimension.
- Parameters
- sample_dimstr
Name of the sample dimension.
- feature_dimstr
Name of the feature dimension.
- var_namestr
Name of the new variable (for Datasets).
- orderlist or tuple
Order of dimension stacking.
- return_array: bool
Whether to return a DataArray when a Dataset was passed.
- groupbystr or list, optional
Name of coordinate or list of coordinates by which the groups are determined.
- group_dimstr, optional
Name of dimension along which the groups are indexed.
Methods
fit
(self, X[, y])Fit estimator to data.
fit_transform
(self, X[, y])Fit to data, then transform it.
get_params
(self[, deep])Get parameters for this estimator.
inverse_transform
(self, X)Reverse the transformation.
set_params
(self, **params)Set the parameters of this estimator.
transform
(self, X)Transform input data.
-
class
Reducer
(dim='feature', func=<function norm>, groupby=None, group_dim='sample')[source]¶ Reduce data along some dimension.
- Parameters
- dimstr
Name of the dimension.
- funcfunction
Reduction function.
- groupbystr or list, optional
Name of coordinate or list of coordinates by which the groups are determined.
- group_dimstr, optional
Name of dimension along which the groups are indexed.
Methods
fit
(self, X[, y])Fit estimator to data.
fit_transform
(self, X[, y])Fit to data, then transform it.
get_params
(self[, deep])Get parameters for this estimator.
inverse_transform
(self, X)Reverse the transformation.
set_params
(self, **params)Set the parameters of this estimator.
transform
(self, X)Transform input data.
-
class
Resampler
(freq=None, dim='sample', groupby=None, group_dim='sample')[source]¶ Resample along some dimension.
- Parameters
- freqstr
Frequency after resampling.
- dimstr
Name of the dimension along which to resample.
- groupbystr or list, optional
Name of coordinate or list of coordinates by which the groups are determined.
- group_dimstr, optional
Name of dimension along which the groups are indexed.
Methods
fit
(self, X[, y])Fit the estimator.
fit_transform
(self, X[, y])Fit to data, then transform it.
get_params
(self[, deep])Get parameters for this estimator.
inverse_transform
(self, X)Reverse the transformation.
set_params
(self, **params)Set the parameters of this estimator.
transform
(self, X)Transform input data.
-
class
Sanitizer
(dim='sample', groupby=None, group_dim='sample')[source]¶ Remove elements containing NaNs.
- Parameters
- dimstr
Name of the sample dimension.
- groupbystr or list, optional
Name of coordinate or list of coordinates by which the groups are determined.
- group_dimstr, optional
Name of dimension along which the groups are indexed.
Methods
fit
(self, X[, y])Fit estimator to data.
fit_transform
(self, X[, y])Fit to data, then transform it.
get_params
(self[, deep])Get parameters for this estimator.
inverse_transform
(self, X)Reverse the transformation.
set_params
(self, **params)Set the parameters of this estimator.
transform
(self, X)Transform input data.
-
class
Segmenter
(dim='sample', new_dim=None, new_len=None, step=None, axis=None, reduce_index='subsample', new_index_func=<built-in function arange>, keep_coords_as=None, groupby=None, group_dim='sample', return_view=False)[source]¶ Split into segments along some dimension.
- Parameters
- dimstr
Name of the dimension along which to split.
- new_dimstr
Name of the newly added dimension.
- new_lenint
Length of the newly added dimension.
- step: int
Number of values between the start of a segment and the next one.
- axisint
Axis position where new dimension is to be inserted. If None, the dimension will be inserted at the end.
- reduce_indexstr
How to reduce the index of the split dimension.
'head'
: Take the first n values where n is the length of the dimension after segmenting.'subsample'
: Take the values corresponding to the first element of every segment.
- new_index_funccallable
A function that takes
new_len
as a parameter and returns a vector of lengthnew_len
to be used as the indices for the new dimension.- keep_coords_asstr or None
If set, the coordinate of the split dimension will be kept as a separate coordinate with this name. This allows
inverse_transform
to reconstruct the original coordinate.- return_viewbool, default False
If true, return a view instead of a copy of the segmented array.
- groupbystr or list, optional
Name of coordinate or list of coordinates by which the groups are determined.
- group_dimstr, optional
Name of dimension along which the groups are indexed.
Methods
fit
(self, X[, y])Fit estimator to data.
fit_transform
(self, X[, y])Fit to data, then transform it.
get_params
(self[, deep])Get parameters for this estimator.
inverse_transform
(self, X)Reverse the transformation.
set_params
(self, **params)Set the parameters of this estimator.
transform
(self, X)Transform input data.
-
class
Selector
(dim='sample', coord=None, groupby=None, group_dim='sample')[source]¶ Selects a subset of the samples.
- Parameters
- dimstr
Name of the sample dimension.
- coordstr
The name of the coordinate that acts as the selector.
- groupbystr or list, optional
Name of coordinate or list of coordinates by which the groups are determined.
- group_dimstr, optional
Name of dimension along which the groups are indexed.
Methods
fit
(self, X[, y])Fit estimator to data.
fit_transform
(self, X[, y])Fit to data, then transform it.
get_params
(self[, deep])Get parameters for this estimator.
inverse_transform
(self, X)Reverse the transformation.
set_params
(self, **params)Set the parameters of this estimator.
transform
(self, X)Transform input data.
-
class
Splitter
(dim='sample', new_dim=None, new_len=None, axis=None, reduce_index='subsample', new_index_func=<built-in function arange>, keep_coords_as=None, groupby=None, group_dim='sample')[source]¶ Split along some dimension.
- Parameters
- dimstr
Name of the dimension along which to split.
- new_dimstr
Name of the newly added dimension.
- new_lenint
Length of the newly added dimension.
- axisint
Axis position where new dimension is to be inserted. If None, the dimension will be inserted at the end.
- reduce_indexstr
How to reduce the index of the split dimension.
'head'
: Take the first n values where n is the length of the dimension after splitting.'subsample'
: Take everynew_len
th value.
- new_index_funccallable
A function that takes
new_len
as a parameter and returns a vector of lengthnew_len
to be used as the indices for the new dimension.- keep_coords_asstr or None
If set, the coordinate of the split dimension will be kept as a separate coordinate with this name. This allows
inverse_transform
to reconstruct the original coordinate.- groupbystr or list, optional
Name of coordinate or list of coordinates by which the groups are determined.
- group_dimstr, optional
Name of dimension along which the groups are indexed.
Methods
fit
(self, X[, y])Fit estimator to data.
fit_transform
(self, X[, y])Fit to data, then transform it.
get_params
(self[, deep])Get parameters for this estimator.
inverse_transform
(self, X)Reverse the transformation.
set_params
(self, **params)Set the parameters of this estimator.
transform
(self, X)Transform input data.
-
class
Transposer
(order=None, groupby=None, group_dim='sample')[source]¶ Reorder data dimensions.
- Parameters
- orderlist or tuple
The new order of the dimensions.
- groupbystr or list, optional
Name of coordinate or list of coordinates by which the groups are determined.
- group_dimstr, optional
Name of dimension along which the groups are indexed.
Methods
fit
(self, X[, y])Fit the estimator.
fit_transform
(self, X[, y])Fit to data, then transform it.
get_params
(self[, deep])Get parameters for this estimator.
inverse_transform
(self, X)Reverse the transformation.
set_params
(self, **params)Set the parameters of this estimator.
transform
(self, X)Transform input data.
-
concatenate
(X, return_estimator=False, **fit_params)[source]¶ Concatenates variables along a dimension.
- Parameters
- Xxarray DataArray or Dataset
The input data.
- return_estimatorbool
Whether to return the fitted estimator along with the transformed data.
- Returns
- Xtxarray DataArray or Dataset
The transformed data.
-
featurize
(X, return_estimator=False, **fit_params)[source]¶ Stacks all dimensions and variables except for sample dimension.
- Parameters
- Xxarray DataArray or Dataset
The input data.
- return_estimatorbool
Whether to return the fitted estimator along with the transformed data.
- Returns
- Xtxarray DataArray or Dataset
The transformed data.
-
preprocess
(X, function, groupby=None, group_dim='sample', **fit_params)[source]¶ Wraps preprocessing functions from sklearn for use with xarray types.
- Parameters
- Xxarray DataArray or Dataset
The input data.
- functioncallable
The function to apply to the data. Note that this function cannot change the shape of the data.
- groupbystr or list, optional
Name of coordinate or list of coordinates by which the groups are determined.
- group_dimstr, optional
Name of dimension along which the groups are indexed.
- Returns
- Xtxarray DataArray or Dataset
The transformed data.
-
reduce
(X, return_estimator=False, **fit_params)[source]¶ Reduces data along some dimension.
- Parameters
- Xxarray DataArray or Dataset
The input data.
- return_estimatorbool
Whether to return the fitted estimator along with the transformed data.
- Returns
- Xtxarray DataArray or Dataset
The transformed data.
-
resample
(X, return_estimator=False, **fit_params)[source]¶ Resamples along some dimension.
- Parameters
- Xxarray DataArray or Dataset
The input data.
- return_estimatorbool
Whether to return the fitted estimator along with the transformed data.
- Returns
- Xtxarray DataArray or Dataset
The transformed data.
-
sanitize
(X, return_estimator=False, **fit_params)[source]¶ Removes elements containing NaNs.
- Parameters
- Xxarray DataArray or Dataset
The input data.
- return_estimatorbool
Whether to return the fitted estimator along with the transformed data.
- Returns
- Xtxarray DataArray or Dataset
The transformed data.
-
segment
(X, return_estimator=False, **fit_params)[source]¶ Segments X along some dimension.
- Parameters
- Xxarray DataArray or Dataset
The input data.
- return_estimatorbool
Whether to return the fitted estimator along with the transformed data.
- Returns
- Xtxarray DataArray or Dataset
The transformed data.
-
select
(X, return_estimator=False, **fit_params)[source]¶ Selects a subset of the samples.
- Parameters
- Xxarray DataArray or Dataset
The input data.
- return_estimatorbool
Whether to return the fitted estimator along with the transformed data.
- Returns
- Xtxarray DataArray or Dataset
The transformed data.