Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
d775c62
Add ArcSinhTransformer for inverse hyperbolic sine transformation
ankitlade12 Jan 8, 2026
841e30c
Enhance ArcSinhTransformer docs: add to index/README, improve user gu…
ankitlade12 Jan 12, 2026
6022a24
Add ArcSinhTransformer to standard estimator checks
ankitlade12 Jan 12, 2026
201b032
Fix duplicate LogCpTransformer in estimator checks
ankitlade12 Jan 12, 2026
0cb0023
Docs: Remove leading 'The' from ArcSinhTransformer references
ankitlade12 Jan 12, 2026
bf44266
Docs: Remove 'the' before LogTransformer reference
ankitlade12 Jan 12, 2026
b9d92d5
Docs: Rename Example section to Python demo
ankitlade12 Jan 12, 2026
246cd2c
Docs: Standardize section underlines to '---'
ankitlade12 Jan 12, 2026
0c44f40
Docs: Add dataframe output to ArcSinhTransformer python demo
ankitlade12 Jan 12, 2026
cc95d3a
Docs: Update transformer setup text in ArcSinhTransformer demo
ankitlade12 Jan 12, 2026
aeeddad
Docs: Add commas around 'however' for grammar
ankitlade12 Jan 12, 2026
e3e6440
Docs: Add transformed dataframe output to ArcSinhTransformer demo
ankitlade12 Jan 12, 2026
a8d880a
Docs: Clarify intro text for plotting code
ankitlade12 Jan 12, 2026
8fad3b8
Docs: Add histogram plot image to ArcSinhTransformer guide
ankitlade12 Jan 12, 2026
9fa7698
Docs: Replace np.allclose with dataframe output in inverse transform …
ankitlade12 Jan 12, 2026
6a8fc64
Docs: Remove API Reference from User Guide (exists in api_doc)
ankitlade12 Jan 12, 2026
bdcb271
Docstring: Clarify linear behavior of arcsinh for small x
ankitlade12 Jan 12, 2026
73b9ef1
Docstring: Remove redundant 'does not learn parameters' sentence from…
ankitlade12 Jan 12, 2026
f055b05
Tests: Add explicit value assertions for negative values in ArcSinh test
ankitlade12 Jan 12, 2026
3f17f04
Tests: Add string and boolean to invalid_scale parameterization
ankitlade12 Jan 12, 2026
3be7004
Docs: Add practical explanation for using loc and scale parameters
ankitlade12 Jan 12, 2026
f990fde
Docs: Add ArcSinhTransformer to api_doc index and update description
ankitlade12 Jan 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ Please share your story by answering 1 quick question
* PowerTransformer
* BoxCoxTransformer
* YeoJohnsonTransformer
* ArcSinhTransformer

### Variable Scaling methods
* MeanNormalizationScaler
Expand Down
5 changes: 5 additions & 0 deletions docs/api_doc/transformation/ArcSinhTransformer.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
ArcSinhTransformer
==================

.. autoclass:: feature_engine.transformation.ArcSinhTransformer
:members:
1 change: 1 addition & 0 deletions docs/api_doc/transformation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ mathematical transformations.
LogCpTransformer
ReciprocalTransformer
ArcsinTransformer
ArcSinhTransformer
PowerTransformer
BoxCoxTransformer
YeoJohnsonTransformer
Expand Down
Binary file added docs/images/arcsinh_profit_histogram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,7 @@ like anova, and machine learning models, like linear regression. Feature-engine
- :doc:`api_doc/transformation/BoxCoxTransformer`: performs Box-Cox transformation of numerical variables
- :doc:`api_doc/transformation/YeoJohnsonTransformer`: performs Yeo-Johnson transformation of numerical variables
- :doc:`api_doc/transformation/ArcsinTransformer`: performs arcsin transformation of numerical variables
- :doc:`api_doc/transformation/ArcSinhTransformer`: applies arcsinh (pseudo-logarithm) transformation of numerical variables

Feature Creation:
~~~~~~~~~~~~~~~~~
Expand Down
195 changes: 195 additions & 0 deletions docs/user_guide/transformation/ArcSinhTransformer.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
.. _arcsinh_transformer:

.. currentmodule:: feature_engine.transformation

ArcSinhTransformer
==================

:class:`ArcSinhTransformer()` applies the inverse hyperbolic sine transformation
(arcsinh) to numerical variables. Also known as the pseudo-logarithm, this
transformation is useful for data that contains both positive and negative values.

The transformation is: x → arcsinh((x - loc) / scale)

Comparison to LogTransformer and ArcsinTransformer
--------------------------------------------------

- **LogTransformer**: `log(x)` requires `x > 0`. If your data contains zeros or negative values, you cannot use the standard LogTransformer directly. You would need to shift the data (e.g. `LogCpTransformer`) or remove non-positive values.
- **ArcsinTransformer**: `arcsin(sqrt(x))` is typically used for proportions/ratios bounded between 0 and 1. It is not suitable for general unbounded numerical data.
- **ArcSinhTransformer**: `arcsinh(x)` works for **all real numbers** (positive, negative, and zero). It handles zero gracefully (arcsinh(0) = 0) and is symmetric around zero.

When to use ArcSinhTransformer:
- Your data contains zeros or negative values (e.g., profit/loss, debt, temperature).
- You want a log-like transformation to stabilize variance or compress extreme values.
- You don't want to add an arbitrary constant (shift) to make values positive.

Intuitive Explanation of Parameters
-----------------------------------

The transformation includes optional `loc` (location) and `scale` parameters:

.. math::
y = \text{arcsinh}\left(\frac{x - \text{loc}}{\text{scale}}\right)

- **Why scale?**
The `arcsinh(x)` function is linear near zero (for small x) and logarithmic for large x.
The "linear region" is roughly between -1 and 1.
By adjusting the `scale`, you control which part of your data falls into this linear region versus the logarithmic region.
- If `scale` is large, more of your data falls in the linear region (behavior close to original data).
- If `scale` is small, more of your data falls in the logarithmic region (stronger compression of values).
Common practice is to set `scale` to 1 or usage the standard deviation of the variable.

- **Why loc?**
The `loc` parameter centers the data. The transition from negative logarithmic behavior to positive logarithmic behavior happens around `x = loc`.
Common practice is to set `loc` to 0 or usage the mean of the variable.

References
----------

For more details on the inverse hyperbolic sine transformation:

1. `How should I transform non-negative data including zeros? <https://stats.stackexchange.com/questions/1444/how-should-i-transform-non-negative-data-including-zeros>`_ (StackExchange)
2. `Interpreting Treatment Effects: Inverse Hyperbolic Sine Outcome Variable <https://blogs.worldbank.org/en/impactevaluations/interpreting-treatment-effects-inverse-hyperbolic-sine-outcome-variable-and>`_ (World Bank Blog)
3. `Burbidge, J. B., Magee, L., & Robb, A. L. (1988). Alternative transformations to handle extreme values of the dependent variable. Journal of the American Statistical Association. <https://www.jstor.org/stable/2288929>`_

Python demo
-----------

Unlike :class:`LogTransformer()`, :class:`ArcSinhTransformer()` can handle
zero and negative values without requiring any preprocessing.

Let's create a dataframe with positive and negative values and apply the arcsinh
transformation:

.. code:: python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

from feature_engine.transformation import ArcSinhTransformer

# Create sample data with positive and negative values
np.random.seed(42)
X = pd.DataFrame({
'profit': np.random.randn(1000) * 10000, # Values from -30000 to 30000
'net_worth': np.random.randn(1000) * 50000,
})

# Separate into train and test
X_train, X_test = train_test_split(X, test_size=0.3, random_state=0)

print(X.head())

The dataframe contains positive and negative values:

.. code:: python

profit net_worth
0 4967.141530 69967.771829
1 -1382.643012 46231.684146
2 6476.885381 2981.518496
3 15230.298564 -32346.838885
4 -2341.533747 34911.165681

Now let's set up the ArcSinhTransformer and fit it to the training set:

.. code:: python

# Set up the arcsinh transformer
tf = ArcSinhTransformer(variables=['profit', 'net_worth'])

# Fit the transformer
tf.fit(X_train)

The transformer does not learn any parameters when applying the fit method. It does
check, however, that the variables are numerical.

We can now transform the variables:

.. code:: python

# Transform the data
train_t = tf.transform(X_train)
test_t = tf.transform(X_test)

print(train_t.head())

The dataframe with the transformed variables:

.. code:: python

profit net_worth
105 8.997273 -11.552056
68 8.886371 -10.753000
479 10.016437 -10.686152
399 10.116836 -11.092693
434 10.310523 -9.723893

The arcsinh transformation compresses extreme values while preserving the sign. We can inspect the distribution of the original and transformed variables with histograms:

.. code:: python

# Compare original and transformed distributions
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

X_train['profit'].hist(ax=axes[0], bins=50)
axes[0].set_title('Original profit')

train_t['profit'].hist(ax=axes[1], bins=50)
axes[1].set_title('Transformed profit')

plt.tight_layout()

.. image:: ../../images/arcsinh_profit_histogram.png

Using loc and scale parameters
------------------------------

:class:`ArcSinhTransformer()` supports location and scale parameters to
center and normalize data before transformation.

In practice, it is common to standardize the variable (zero mean, unit variance)
so that the center of the distribution falls in the linear region of the arcsinh
function, while the tails are compressed logarithmically. We can achieve this
by setting ``loc`` to the mean and ``scale`` to the standard deviation:

.. code:: python

# Center around mean and scale by std
tf = ArcSinhTransformer(
variables=['profit'],
loc=X_train['profit'].mean(),
scale=X_train['profit'].std()
)

tf.fit(X_train)
train_t = tf.transform(X_train)

Inverse transformation
----------------------

:class:`ArcSinhTransformer()` supports inverse transformation to recover
the original values:

.. code:: python

# Transform and then inverse transform
train_t = tf.transform(X_train)
train_recovered = tf.inverse_transform(train_t)

print(train_recovered.head())

The recovered data:

.. code:: python

profit net_worth
105 4040.508568 -51995.296356
68 3616.360250 -23385.060066
479 11195.749114 -21872.915016
399 12378.163120 -32844.713949
434 15023.570521 -8356.085689


1 change: 1 addition & 0 deletions docs/user_guide/transformation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ on the nature of the variable.
LogCpTransformer
ReciprocalTransformer
ArcsinTransformer
ArcSinhTransformer
PowerTransformer
BoxCoxTransformer
YeoJohnsonTransformer
4 changes: 3 additions & 1 deletion feature_engine/transformation/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,20 @@
"""

from .arcsin import ArcsinTransformer
from .arcsinh import ArcSinhTransformer
from .boxcox import BoxCoxTransformer
from .log import LogCpTransformer, LogTransformer
from .power import PowerTransformer
from .reciprocal import ReciprocalTransformer
from .yeojohnson import YeoJohnsonTransformer

__all__ = [
"ArcsinTransformer",
"ArcSinhTransformer",
"BoxCoxTransformer",
"LogTransformer",
"LogCpTransformer",
"PowerTransformer",
"ReciprocalTransformer",
"YeoJohnsonTransformer",
"ArcsinTransformer",
]
Loading