diff --git a/docs/user_guides/fs/feature_group/on_demand_transformations.md b/docs/user_guides/fs/feature_group/on_demand_transformations.md index 8cabf1370..653c3f144 100644 --- a/docs/user_guides/fs/feature_group/on_demand_transformations.md +++ b/docs/user_guides/fs/feature_group/on_demand_transformations.md @@ -254,3 +254,68 @@ On-demand transformation functions can also be accessed and executed as normal f ](feature_vector["transaction_time"], datetime.now()) ``` + +## Testing On-Demand Transformations Locally + +Hopsworks allows you to test on-demand transformations locally without requiring a connection to the Hopsworks platform. +This is useful for validating transformation logic before deploying it to production. + +### Testing individual on-demand transformation functions + +Individual on-demand transformation functions can be accessed by name from the feature group and tested using the `execute` or `executor` methods. +Refer to the [Testing Transformation Functions](../transformation_functions.md#testing-transformation-functions) guide for more details. + +=== "Python" +!!! example "Accessing and testing an individual on-demand transformation function from a feature group" + ```python + # Access the transformation function by name + transaction_age_udf = fg["transaction_age"] + + # Quick test + result = transaction_age_udf.execute( + pd.Series([datetime(2023, 1, 1)]), + pd.Series([datetime(2023, 6, 1)]) + ) + ``` + +### Testing all on-demand transformations on a feature group + +The `execute_odts` method on a feature group applies all attached on-demand transformations to the provided data. +This allows you to test the complete on-demand transformation pipeline locally. + +=== "Python" +!!! example "Testing on-demand transformations on a feature group with a DataFrame" + ```python + @hopsworks.udf(return_type=float) + def compute_ratio(amount, quantity): + return amount / quantity + + fg = fs.get_or_create_feature_group( + name="transactions", + version=1, + primary_key=["pk"], + transformation_functions=[compute_ratio("amount", "quantity")] + ) + + # Test with a DataFrame (offline mode) + test_df = pd.DataFrame({ + "amount": [100.0, 200.0, 300.0], + "quantity": [2, 4, 5] + }) + result_df = fg.execute_odts(test_df) + ``` + +=== "Python" +!!! example "Testing on-demand transformations on a feature group with a dictionary" + ```python + # Test with a dictionary (simulating online inference) + test_dict = {"amount": 100.0, "quantity": 2} + result_dict = fg.execute_odts(test_dict, online=True) + ``` + +The `execute_odts` method accepts the following parameters: + +- **`data`**: Input data as a `pd.DataFrame`, `pl.DataFrame`, or `dict[str, Any]`. +- **`online`**: Whether to execute in online mode (single values) or offline mode (batch). Defaults to offline mode. +- **`transformation_context`**: A dictionary (or list of dictionaries for batch) mapping variable names to contextual values accessible via the `context` parameter in transformation functions. +- **`request_parameters`**: A dictionary (or list of dictionaries for batch) of request parameters. These take highest priority when resolving feature values. diff --git a/docs/user_guides/fs/feature_view/model-dependent-transformations.md b/docs/user_guides/fs/feature_view/model-dependent-transformations.md index b16972b27..bfeb87457 100644 --- a/docs/user_guides/fs/feature_view/model-dependent-transformations.md +++ b/docs/user_guides/fs/feature_view/model-dependent-transformations.md @@ -164,3 +164,94 @@ To achieve this, set the `transform` parameter to False. transform=False ) ``` + +## Testing Transformations Locally + +Hopsworks allows you to test transformations attached to a feature view locally without requiring a connection to the Hopsworks platform. +This is useful for validating transformation logic before deploying it to production. + +### Accessing transformation functions by name + +Transformation functions attached to a feature view can be accessed by name using dictionary-style or attribute-style access. +This returns the underlying `HopsworksUdf` object, which can be tested using the `execute` or `executor` methods described in the [Testing Transformation Functions](../transformation_functions.md#testing-transformation-functions) guide. + +=== "Python" +!!! example "Accessing and testing an individual transformation function from a feature view" + ```python + # Access via dictionary-style syntax + normalize_udf = fv["normalize"] + + # Or access via attribute-style syntax + normalize_udf = fv.normalize + + # Test with mocked statistics + executor = normalize_udf.executor(statistics={"amount": {"mean": 100.0, "std_dev": 25.0}}) + result = executor.execute(pd.Series([100.0, 125.0, 150.0])) + ``` + +### Testing model-dependent transformations locally + +The `execute_mdts` method applies all model-dependent transformations attached to the feature view to the provided data. +This method requires that training data statistics have been initialized first, either by calling `create_training_data`, `init_batch_scoring`, or `init_serving`. + +=== "Python" +!!! example "Testing model-dependent transformations on a feature view with a DataFrame" + ```python + from hopsworks import udf + from hopsworks.transformation_statistics import TransformationStatistics + + @udf(return_type=float) + def normalize(amount, statistics=TransformationStatistics("amount")): + return (amount - statistics.amount.mean) / statistics.amount.std_dev + + fv = fs.get_or_create_feature_view( + name="transactions_fv", + version=1, + query=fg.select_features(), + transformation_functions=[normalize("amount")] + ) + + # Initialize statistics by creating training data + features, labels = fv.create_training_data() + + # Test with a DataFrame (offline mode) + test_df = pd.DataFrame({"amount": [100.0, 200.0, 300.0]}) + result_df = fv.execute_mdts(test_df) + ``` + +=== "Python" +!!! example "Testing model-dependent transformations simulating online inference" + ```python + # Test with a dictionary (simulating online inference) + test_dict = {"amount": 100.0} + result_dict = fv.execute_mdts(test_dict, online=True) + ``` + +The `execute_mdts` method accepts the following parameters: + +- **`data`**: Input data as a `pd.DataFrame`, `pl.DataFrame`, or `dict[str, Any]`. +- **`online`**: Whether to execute in online mode (single values) or offline mode (batch). Defaults to offline mode. +- **`transformation_context`**: A dictionary (or list of dictionaries for batch) mapping variable names to contextual values accessible via the `context` parameter in transformation functions. +- **`request_parameters`**: A dictionary (or list of dictionaries for batch) of request parameters. These take highest priority when resolving feature values. + +### Testing on-demand transformations locally + +If the feature view includes on-demand features from its underlying feature groups, you can test those transformations using the `execute_odts` method. +This method applies all on-demand transformations attached to the feature view on the provided data. + +=== "Python" +!!! example "Testing on-demand transformations on a feature view" + ```python + # Test with a DataFrame (offline mode) + test_df = pd.DataFrame({ + "amount": [100.0, 200.0, 300.0], + "quantity": [2, 4, 5] + }) + result_df = fv.execute_odts(test_df) + + # Test with a dictionary (simulating online inference) + test_dict = {"amount": 100.0, "quantity": 2} + result_dict = fv.execute_odts(test_dict, online=True) + ``` + +The `execute_odts` method accepts the same parameters as `execute_mdts` described above. diff --git a/docs/user_guides/fs/transformation_functions.md b/docs/user_guides/fs/transformation_functions.md index 5466e5c96..1b971c767 100644 --- a/docs/user_guides/fs/transformation_functions.md +++ b/docs/user_guides/fs/transformation_functions.md @@ -308,6 +308,141 @@ If only the `name` is provided, then the version will default to 1. plus_one_fn = fs.get_transformation_function(name="plus_one", version=2) ``` +## Testing Transformation Functions + +Hopsworks provides built-in support for unit testing transformation functions locally, without requiring a connection to the Hopsworks platform. +This enables you to validate your transformation logic before deploying it to feature groups or feature views. + +### Quick testing with `execute` + +The `execute` method provides a convenient way to quickly test simple transformation functions that do not require statistics or context variables. +It executes the transformation function in offline mode (batch processing). + +=== "Python" + !!! example "Quick testing of a transformation function" + ```python + from hopsworks import udf + import pandas as pd + + @udf(return_type=float) + def add_one(value): + return value + 1 + + # Direct execution for simple tests + result = add_one.execute(pd.Series([1.0, 2.0, 3.0])) + assert result.tolist() == [2.0, 3.0, 4.0] + ``` + +### Advanced testing with `executor` + +The `executor` method creates a reusable callable object for testing transformation functions that require statistics, context variables, or need to be tested in a specific execution mode. + +The `executor` method accepts three optional parameters: + +- **`statistics`**: Mock statistics for model-dependent transformations. Accepts three formats: a `TransformationStatistics` object, a `dict[str, dict[str, Any]]` mapping feature names to statistics, or a `list[FeatureDescriptiveStatistics]`. +- **`context`**: A dictionary of contextual variables passed to the transformation function at runtime. +- **`online`**: Whether to execute in online mode (single values) or offline mode (batch/vectorized). Only relevant for transformation functions using the `default` execution mode. Defaults to `False` (offline). + +=== "Python" + !!! example "Testing a transformation function with mocked statistics" + ```python + from hopsworks import udf + from hopsworks.transformation_statistics import TransformationStatistics + import pandas as pd + + @udf(return_type=float) + def normalize(value, statistics=TransformationStatistics("value")): + return (value - statistics.value.mean) / statistics.value.std_dev + + # Test with mock statistics provided as a dictionary + executor = normalize.executor(statistics={"value": {"mean": 100.0, "std_dev": 25.0}}) + result = executor.execute(pd.Series([100.0, 125.0, 150.0])) + assert result.tolist() == [0.0, 1.0, 2.0] + ``` + +=== "Python" + !!! example "Testing a transformation function with context variables" + ```python + from hopsworks import udf + import pandas as pd + + @udf(return_type=float) + def apply_discount(price, context): + return price * (1 - context["discount_rate"]) + + executor = apply_discount.executor(context={"discount_rate": 0.1}) + result = executor.execute(pd.Series([100.0, 200.0])) + assert result.tolist() == [90.0, 180.0] + ``` + +### Testing online and offline execution modes + +Transformation functions using the `default` execution mode are executed as Pandas UDFs during batch processing and as Python UDFs during online inference. +The `executor` method allows you to test both modes by setting the `online` parameter. + +=== "Python" + !!! example "Testing both online and offline execution modes" + ```python + from hopsworks import udf + import pandas as pd + + @udf(return_type=float) + def double_value(value): + return value * 2 + + # Offline mode (batch processing with Pandas Series) + offline_executor = double_value.executor(online=False) + batch_result = offline_executor.execute(pd.Series([1.0, 2.0, 3.0])) + + # Online mode (single value processing) + online_executor = double_value.executor(online=True) + single_result = online_executor.execute(5.0) + assert single_result == 10.0 + ``` + +!!! note + For transformation functions with a `mode` set to `python` or `pandas`, the `online` parameter has no effect since those modes always execute as the specified UDF type. + +### Accessing transformation functions by name + +Transformation functions attached to feature views and feature groups can be accessed by name using dictionary-style or attribute-style access. +This is useful for testing individual transformation functions in isolation. + +=== "Python" + !!! example "Accessing transformation functions from a feature view" + ```python + # Access via dictionary-style syntax + normalize_udf = fv["normalize"] + + # Access via attribute-style syntax + normalize_udf = fv.normalize + + # Test the accessed transformation function + result = normalize_udf.execute(pd.Series([100.0, 125.0, 150.0])) + ``` + +=== "Python" + !!! example "Accessing transformation functions from a feature group" + ```python + # Access via dictionary-style syntax + transaction_age_udf = fg["transaction_age"] + + # Access via attribute-style syntax + transaction_age_udf = fg.transaction_age + + # Test the accessed transformation function + result = transaction_age_udf.execute(pd.Series([datetime(2023, 1, 1)]), pd.Series([datetime(2023, 6, 1)])) + ``` + +### Testing transformations attached to feature groups and feature views + +In addition to testing individual transformation functions, you can test all transformations attached to a feature group or feature view at once using the `execute_odts` and `execute_mdts` methods. +These methods are described in their respective guides: + +- [Testing on-demand transformations on feature groups](./feature_group/on_demand_transformations.md#testing-on-demand-transformations-locally) +- [Testing on-demand transformations on feature views](./feature_view/model-dependent-transformations.md#testing-on-demand-transformations-locally) +- [Testing model-dependent transformations on feature views](./feature_view/model-dependent-transformations.md#testing-model-dependent-transformations-locally) + ## Using transformation functions Transformation functions can be used by attaching it to a feature view to [create model-dependent transformations](./feature_view/model-dependent-transformations.md) or attached to feature groups to [create on-demand transformations](./feature_group/on_demand_transformations.md)