Implement Random Forest Classifier and Regressor from scratch (fixes #13537) by Tejasrahane · Pull Request #13610 · TheAlgorithms/Python

Tejasrahane · 2025-10-20T05:24:36Z

Describe your change:

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
Documentation change?

Description:

This PR implements Random Forest Classifier and Regressor from scratch as requested in issue #13537.

Classifier Implementation (random_forest_classifier.py):

Decision tree classifier using entropy-based information gain
Bootstrap sampling (bagging) for ensemble diversity
Random feature selection at each split
Majority voting for final predictions
Comprehensive doctests and examples

Regressor Implementation (random_forest_regressor.py):

Decision tree regressor using MSE/variance-based splitting
Bootstrap sampling with feature subsampling
Averaging of predictions from all trees
Comprehensive doctests and examples
Includes demo with sklearn datasets and metrics

Both implementations are built from scratch without using sklearn's ensemble models, only using numpy for numerical operations and sklearn for demo/testing purposes.

Checklist:

Fixes #13537

References:

Implements Random Forest Classifier with: - Decision Tree base learners from scratch - Bootstrap sampling (bagging) - Random feature selection at splits - Majority voting aggregation - Clear docstrings and example usage Part of implementation for issue TheAlgorithms#13537

- Implemented DecisionTreeRegressor with MSE-based splitting - Implemented RandomForestRegressor with bootstrap aggregating - Added comprehensive docstrings and examples - Includes doctest and demo usage with sklearn metrics - Completes issue TheAlgorithms#13537 alongside the classifier implementation

algorithms-keeper

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Contributing guidelines

Project Euler solution guidelines

Python:

Formatted string literals (f-strings)

Type hints

doctest

unittest

pytest

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

@algorithms-keeper review to trigger the checks for only added pull request files

@algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

algorithms-keeper · 2025-10-20T05:24:46Z

machine_learning/random_forest_classifier.py

+        tree: The built tree structure
+    """
+
+    def __init__(self, max_depth=10, min_samples_split=2, n_features=None):


Please provide return type hint for the function: __init__. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: max_depth

Please provide type hint for the parameter: min_samples_split

Please provide type hint for the parameter: n_features

algorithms-keeper · 2025-10-20T05:24:46Z

machine_learning/random_forest_classifier.py

+        self.n_features = n_features
+        self.tree = None
+
+    def fit(self, X, y):


As there is no test file in this pull request nor any test function or class in the file machine_learning/random_forest_classifier.py, please provide doctest for the function fit

Please provide return type hint for the function: fit. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide descriptive name for the parameter: X

Please provide type hint for the parameter: X

Please provide descriptive name for the parameter: y

Please provide type hint for the parameter: y

algorithms-keeper · 2025-10-20T05:24:46Z

machine_learning/random_forest_classifier.py

+        self.n_features = X.shape[1] if not self.n_features else min(self.n_features, X.shape[1])
+        self.tree = self._grow_tree(X, y)
+
+    def _grow_tree(self, X, y, depth=0):


As there is no test file in this pull request nor any test function or class in the file machine_learning/random_forest_classifier.py, please provide doctest for the function _grow_tree

Please provide return type hint for the function: _grow_tree. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide descriptive name for the parameter: X

Please provide type hint for the parameter: X

Please provide descriptive name for the parameter: y

Please provide type hint for the parameter: y

Please provide type hint for the parameter: depth

algorithms-keeper · 2025-10-20T05:24:46Z

machine_learning/random_forest_classifier.py

+            'right': right
+        }
+
+    def _best_split(self, X, y, feat_idxs):


As there is no test file in this pull request nor any test function or class in the file machine_learning/random_forest_classifier.py, please provide doctest for the function _best_split

Please provide return type hint for the function: _best_split. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide descriptive name for the parameter: X

Please provide type hint for the parameter: X

Please provide descriptive name for the parameter: y

Please provide type hint for the parameter: y

Please provide type hint for the parameter: feat_idxs

algorithms-keeper · 2025-10-20T05:24:46Z

machine_learning/random_forest_classifier.py

+        split_idx, split_thresh = None, None
+
+        for feat_idx in feat_idxs:
+            X_column = X[:, feat_idx]


Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_column

algorithms-keeper · 2025-10-20T05:24:47Z

machine_learning/random_forest_regressor.py

+        for _ in range(self.n_estimators):
+            # Bootstrap sampling
+            indices = np.random.choice(n_samples, n_samples, replace=True)
+            X_bootstrap = X[indices]


Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_bootstrap

algorithms-keeper · 2025-10-20T05:24:47Z

machine_learning/random_forest_regressor.py

+            feature_indices = np.random.choice(
+                n_features, max_features, replace=False
+            )
+            X_bootstrap = X_bootstrap[:, feature_indices]


Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_bootstrap

algorithms-keeper · 2025-10-20T05:24:47Z

machine_learning/random_forest_regressor.py

+
+        return self
+
+    def predict(self, X):


Please provide return type hint for the function: predict. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide descriptive name for the parameter: X

Please provide type hint for the parameter: X

algorithms-keeper · 2025-10-20T05:24:47Z

machine_learning/random_forest_regressor.py

+        predictions = []
+
+        for tree, feature_indices in self.trees:
+            X_subset = X[:, feature_indices]


Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_subset

algorithms-keeper · 2025-10-20T05:24:47Z

machine_learning/random_forest_regressor.py

+    )
+
+    # Split the data
+    X_train, X_test, y_train, y_test = train_test_split(


Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_train

Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_test

for more information, see https://pre-commit.ci

algorithms-keeper

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Contributing guidelines

Project Euler solution guidelines

Python:

Formatted string literals (f-strings)

Type hints

doctest

unittest

pytest

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

@algorithms-keeper review to trigger the checks for only added pull request files

@algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

algorithms-keeper · 2025-10-21T02:29:40Z

machine_learning/random_forest_classifier.py

+        tree: The built tree structure
+    """
+
+    def __init__(self, max_depth=10, min_samples_split=2, n_features=None):


Please provide return type hint for the function: __init__. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: max_depth

Please provide type hint for the parameter: min_samples_split

Please provide type hint for the parameter: n_features

algorithms-keeper · 2025-10-21T02:29:41Z

machine_learning/random_forest_classifier.py

+        self.n_features = n_features
+        self.tree = None
+
+    def fit(self, X, y):


Please provide return type hint for the function: fit. If the function does not return a value, please provide the type hint as: def function() -> None:

As there is no test file in this pull request nor any test function or class in the file machine_learning/random_forest_classifier.py, please provide doctest for the function fit

Please provide type hint for the parameter: X

Please provide descriptive name for the parameter: X

Please provide type hint for the parameter: y

Please provide descriptive name for the parameter: y

algorithms-keeper · 2025-10-21T02:29:41Z

machine_learning/random_forest_classifier.py

+        )
+        self.tree = self._grow_tree(X, y)
+
+    def _grow_tree(self, X, y, depth=0):


Please provide return type hint for the function: _grow_tree. If the function does not return a value, please provide the type hint as: def function() -> None:

As there is no test file in this pull request nor any test function or class in the file machine_learning/random_forest_classifier.py, please provide doctest for the function _grow_tree

Please provide type hint for the parameter: X

Please provide descriptive name for the parameter: X

Please provide type hint for the parameter: y

Please provide descriptive name for the parameter: y

Please provide type hint for the parameter: depth

algorithms-keeper · 2025-10-21T02:29:41Z

machine_learning/random_forest_classifier.py

+            "right": right,
+        }
+
+    def _best_split(self, X, y, feat_idxs):


Please provide return type hint for the function: _best_split. If the function does not return a value, please provide the type hint as: def function() -> None:

As there is no test file in this pull request nor any test function or class in the file machine_learning/random_forest_classifier.py, please provide doctest for the function _best_split

Please provide type hint for the parameter: X

Please provide descriptive name for the parameter: X

Please provide type hint for the parameter: y

Please provide descriptive name for the parameter: y

Please provide type hint for the parameter: feat_idxs

algorithms-keeper · 2025-10-21T02:29:41Z

machine_learning/random_forest_classifier.py

+        split_idx, split_thresh = None, None
+
+        for feat_idx in feat_idxs:
+            X_column = X[:, feat_idx]


Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_column

algorithms-keeper · 2025-10-21T02:29:43Z

machine_learning/random_forest_regressor.py

+        for _ in range(self.n_estimators):
+            # Bootstrap sampling
+            indices = np.random.choice(n_samples, n_samples, replace=True)
+            X_bootstrap = X[indices]


Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_bootstrap

algorithms-keeper · 2025-10-21T02:29:43Z

machine_learning/random_forest_regressor.py

+
+            # Feature sampling
+            feature_indices = np.random.choice(n_features, max_features, replace=False)
+            X_bootstrap = X_bootstrap[:, feature_indices]


Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_bootstrap

algorithms-keeper · 2025-10-21T02:29:43Z

machine_learning/random_forest_regressor.py

+
+        return self
+
+    def predict(self, X):


Please provide return type hint for the function: predict. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: X

Please provide descriptive name for the parameter: X

algorithms-keeper · 2025-10-21T02:29:43Z

machine_learning/random_forest_regressor.py

+        predictions = []
+
+        for tree, feature_indices in self.trees:
+            X_subset = X[:, feature_indices]


Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_subset

algorithms-keeper · 2025-10-21T02:29:43Z

machine_learning/random_forest_regressor.py

+    )
+
+    # Split the data
+    X_train, X_test, y_train, y_test = train_test_split(


Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_train

Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_test

…dule - Annotate all function parameters and return types - Rename variables to snake_case (x_column, x_bootstrap, x_subset, x_train/x_test) - Add/expand doctests for public and core internal functions - Address algorithms-keeper review comments

algorithms-keeper

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Contributing guidelines

Project Euler solution guidelines

Python:

Formatted string literals (f-strings)

Type hints

doctest

unittest

pytest

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

@algorithms-keeper review to trigger the checks for only added pull request files

@algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

algorithms-keeper · 2025-10-21T02:43:12Z

machine_learning/random_forest_classifier.py

+        self.n_features: Optional[int] = n_features
+        self.tree: Optional[TreeNode] = None
+
+    def fit(self, x: np.ndarray, y: np.ndarray) -> None:


Please provide descriptive name for the parameter: x

Please provide descriptive name for the parameter: y

algorithms-keeper · 2025-10-21T02:43:12Z

machine_learning/random_forest_classifier.py

+        )
+        self.tree = self._grow_tree(x, y, depth=0)
+
+    def _grow_tree(self, x: np.ndarray, y: np.ndarray, depth: int = 0) -> TreeNode:


Please provide descriptive name for the parameter: x

Please provide descriptive name for the parameter: y

algorithms-keeper · 2025-10-21T02:43:12Z

machine_learning/random_forest_classifier.py

+        }
+
+    def _best_split(
+        self, x: np.ndarray, y: np.ndarray, feat_indices: Sequence[int]


Please provide descriptive name for the parameter: x

Please provide descriptive name for the parameter: y

algorithms-keeper · 2025-10-21T02:43:12Z

machine_learning/random_forest_classifier.py

+                    split_thresh = float(threshold)
+        return split_idx, split_thresh
+
+    def _information_gain(self, y: np.ndarray, x_column: np.ndarray, threshold: float) -> float:


Please provide descriptive name for the parameter: y

algorithms-keeper · 2025-10-21T02:43:12Z

machine_learning/random_forest_classifier.py

+        ig = parent_entropy - child_entropy
+        return float(ig)
+
+    def _entropy(self, y: np.ndarray) -> float:


Please provide descriptive name for the parameter: y

algorithms-keeper · 2025-10-21T02:43:14Z

machine_learning/random_forest_regressor.py

+        for _ in range(self.n_estimators):
+            # Bootstrap sampling
+            indices = np.random.choice(n_samples, n_samples, replace=True)
+            X_bootstrap = X[indices]


Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_bootstrap

algorithms-keeper · 2025-10-21T02:43:14Z

machine_learning/random_forest_regressor.py

+
+            # Feature sampling
+            feature_indices = np.random.choice(n_features, max_features, replace=False)
+            X_bootstrap = X_bootstrap[:, feature_indices]


Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_bootstrap

algorithms-keeper · 2025-10-21T02:43:14Z

machine_learning/random_forest_regressor.py

+
+        return self
+
+    def predict(self, X):


Please provide return type hint for the function: predict. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: X

Please provide descriptive name for the parameter: X

algorithms-keeper · 2025-10-21T02:43:14Z

machine_learning/random_forest_regressor.py

+        predictions = []
+
+        for tree, feature_indices in self.trees:
+            X_subset = X[:, feature_indices]


Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_subset

algorithms-keeper · 2025-10-21T02:43:14Z

machine_learning/random_forest_regressor.py

+    )
+
+    # Split the data
+    X_train, X_test, y_train, y_test = train_test_split(


Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_train

Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_test

for more information, see https://pre-commit.ci

- Annotate all parameters and return types across tree and forest - Rename variables to snake_case (x_bootstrap, x_subset, etc.) - Add doctests for predict, _best_split, _calculate_mse, and class examples - Replace RNG usage with numpy Generator for determinism

algorithms-keeper

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Contributing guidelines

Project Euler solution guidelines

Python:

Formatted string literals (f-strings)

Type hints

doctest

unittest

pytest

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

@algorithms-keeper review to trigger the checks for only added pull request files

@algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

algorithms-keeper · 2025-10-21T02:45:13Z

machine_learning/random_forest_classifier.py

+        self.n_features: Optional[int] = n_features
+        self.tree: Optional[TreeNode] = None
+
+    def fit(self, x: np.ndarray, y: np.ndarray) -> None:


Please provide descriptive name for the parameter: x

Please provide descriptive name for the parameter: y

algorithms-keeper · 2025-10-21T02:45:13Z

machine_learning/random_forest_classifier.py

+        )
+        self.tree = self._grow_tree(x, y, depth=0)
+
+    def _grow_tree(self, x: np.ndarray, y: np.ndarray, depth: int = 0) -> TreeNode:


Please provide descriptive name for the parameter: x

Please provide descriptive name for the parameter: y

algorithms-keeper · 2025-10-21T02:45:13Z

machine_learning/random_forest_classifier.py

+        }
+
+    def _best_split(
+        self, x: np.ndarray, y: np.ndarray, feat_indices: Sequence[int]


Please provide descriptive name for the parameter: x

Please provide descriptive name for the parameter: y

algorithms-keeper · 2025-10-21T02:45:13Z

machine_learning/random_forest_classifier.py

+        return split_idx, split_thresh
+
+    def _information_gain(
+        self, y: np.ndarray, x_column: np.ndarray, threshold: float


Please provide descriptive name for the parameter: y

algorithms-keeper · 2025-10-21T02:45:13Z

machine_learning/random_forest_classifier.py

+        ig = parent_entropy - child_entropy
+        return float(ig)
+
+    def _entropy(self, y: np.ndarray) -> float:


Please provide descriptive name for the parameter: y

algorithms-keeper · 2025-10-21T02:45:14Z

machine_learning/random_forest_regressor.py

+        self.tree = self._grow_tree(x, y)
+        return self
+
+    def _grow_tree(self, x: np.ndarray, y: np.ndarray, depth: int = 0) -> TreeNodeReg:


Please provide descriptive name for the parameter: x

Please provide descriptive name for the parameter: y

algorithms-keeper · 2025-10-21T02:45:14Z

machine_learning/random_forest_regressor.py

+            "right": right_subtree,
+        }
+
+    def _best_split(self, x: np.ndarray, y: np.ndarray, n_features: int) -> Optional[Dict[str, Any]]:


Please provide descriptive name for the parameter: x

Please provide descriptive name for the parameter: y

algorithms-keeper · 2025-10-21T02:45:14Z

machine_learning/random_forest_regressor.py

+        mse_right = float(np.var(right_y)) if n_right > 0 else 0.0
+        return (n_left / n_samples) * mse_left + (n_right / n_samples) * mse_right
+
+    def predict(self, x: np.ndarray) -> np.ndarray:


Please provide descriptive name for the parameter: x

algorithms-keeper · 2025-10-21T02:45:14Z

machine_learning/random_forest_regressor.py

+        self.random_state: Optional[int] = random_state
+        self.trees: List[Tuple[DecisionTreeRegressor, np.ndarray]] = []
+
+    def fit(self, x: np.ndarray, y: np.ndarray) -> "RandomForestRegressor":


Please provide descriptive name for the parameter: x

Please provide descriptive name for the parameter: y

algorithms-keeper · 2025-10-21T02:45:15Z

machine_learning/random_forest_regressor.py

+            self.trees.append((tree, feature_indices))
+        return self
+
+    def predict(self, x: np.ndarray) -> np.ndarray:


Please provide descriptive name for the parameter: x

for more information, see https://pre-commit.ci

Tejasrahane added 2 commits October 20, 2025 10:30

algorithms-keeper bot added require descriptive names This PR needs descriptive function and/or variable names require tests Tests [doctest/unittest/pytest] are required require type hints https://docs.python.org/3/library/typing.html labels Oct 20, 2025

algorithms-keeper bot reviewed Oct 20, 2025

View reviewed changes

algorithms-keeper bot added the awaiting reviews This PR is ready to be reviewed label Oct 20, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

e0ef096

for more information, see https://pre-commit.ci

algorithms-keeper bot added the tests are failing Do not merge until tests pass label Oct 20, 2025

Tejasrahane closed this Oct 21, 2025

Tejasrahane reopened this Oct 21, 2025

algorithms-keeper bot reviewed Oct 21, 2025

View reviewed changes

algorithms-keeper bot removed the require tests Tests [doctest/unittest/pytest] are required label Oct 21, 2025

algorithms-keeper bot reviewed Oct 21, 2025

View reviewed changes

pre-commit-ci bot and others added 2 commits October 21, 2025 02:43

[pre-commit.ci] auto fixes from pre-commit.com hooks

d2d7392

for more information, see https://pre-commit.ci

algorithms-keeper bot removed the require type hints https://docs.python.org/3/library/typing.html label Oct 21, 2025

algorithms-keeper bot reviewed Oct 21, 2025

View reviewed changes

[pre-commit.ci] auto fixes from pre-commit.com hooks

5e0f844

for more information, see https://pre-commit.ci

This was referenced Oct 22, 2025

Add data structure programs #13664

Closed

Add data structure programs #13665

Closed

Tejasrahane closed this Oct 22, 2025

Uh oh!

Comments

Conversation

Tejasrahane commented Oct 20, 2025

Describe your change:

Description:

Checklist:

References:

Uh oh!

algorithms-keeper bot left a comment

Choose a reason for hiding this comment

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper actions can be triggered by commenting on this PR:

Uh oh!

algorithms-keeper bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot left a comment

Choose a reason for hiding this comment

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper actions can be triggered by commenting on this PR:

Uh oh!

algorithms-keeper bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 21, 2025