|
| 1 | +# System Testing Report |
| 2 | + |
| 3 | +## Executive Summary |
| 4 | + |
| 5 | +This report documents the system-level black-box testing effort for the pandas library. Our team created 3 system test cases that validate complete end-to-end workflows through pandas' public API, treating the system as a black box without referencing internal implementation details. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Test Scope and Coverage |
| 10 | + |
| 11 | +### Testing Approach |
| 12 | + |
| 13 | +Our system tests validate pandas' core functionality through **black-box testing**, meaning: |
| 14 | +- Tests interact only through public APIs (DataFrame, Series, read_csv, to_csv, etc.) |
| 15 | +- No reference to internal implementation or private methods |
| 16 | +- Tests simulate real user workflows from start to finish |
| 17 | +- Validation based on observable behavior and outputs |
| 18 | + |
| 19 | +### Workflows Validated |
| 20 | + |
| 21 | +The system tests cover three critical end-to-end workflows that represent typical pandas usage patterns: |
| 22 | + |
| 23 | +#### 1. Data Loading and Export Workflow (Sandeep Ramavath) |
| 24 | +**Scope:** Complete data I/O lifecycle |
| 25 | +- **Features Tested:** |
| 26 | + - CSV file import (`pd.read_csv()`) |
| 27 | + - CSV file export (`DataFrame.to_csv()`) |
| 28 | + - Mixed data type handling (integers, floats, strings, dates, booleans) |
| 29 | + - Data persistence and round-trip integrity |
| 30 | + - Datetime parsing during import |
| 31 | + |
| 32 | +- **User Story:** "As a data analyst, I want to load data from CSV files, work with it in pandas, and export results back to CSV so that I can share my analysis with others." |
| 33 | + |
| 34 | +#### 2. Data Cleaning and Transformation Workflow (Nithikesh Bobbili) |
| 35 | +**Scope:** Missing data handling and data quality |
| 36 | +- **Features Tested:** |
| 37 | + - Missing value detection (`isnull()`, `sum()`) |
| 38 | + - Forward fill strategy (`ffill()`) |
| 39 | + - Backward fill strategy (`bfill()`) |
| 40 | + - Constant value fill (`fillna()`) |
| 41 | + - Data integrity preservation during cleaning |
| 42 | + |
| 43 | +- **User Story:** "As a data scientist, I want to identify and handle missing values in my dataset using various filling strategies so that I can prepare clean data for analysis." |
| 44 | + |
| 45 | +#### 3. Aggregation and Analysis Workflow (Mallikarjuna) |
| 46 | +**Scope:** Group-by operations and statistical analysis |
| 47 | +- **Features Tested:** |
| 48 | + - Categorical grouping (`groupby()`) |
| 49 | + - Statistical aggregations (mean, sum, count) |
| 50 | + - Multiple simultaneous aggregations (`agg()`) |
| 51 | + - Grouped data integrity |
| 52 | + - Result correctness verification |
| 53 | + |
| 54 | +- **User Story:** "As a business analyst, I want to group data by categories and compute statistics for each group so that I can understand patterns and trends in my data." |
| 55 | + |
| 56 | +### Coverage Metrics |
| 57 | + |
| 58 | +| Workflow Category | Public APIs Used | Test Cases | Assertions | |
| 59 | +|------------------|------------------|------------|------------| |
| 60 | +| Data I/O | 2 APIs | 1 | 8 | |
| 61 | +| Data Cleaning | 4 APIs | 1 | 11 | |
| 62 | +| Data Aggregation | 4 APIs | 1 | 13 | |
| 63 | +| **Total** | **10 unique APIs** | **3** | **32** | |
| 64 | + |
| 65 | +### Out of Scope |
| 66 | + |
| 67 | +The following are explicitly **not** tested in this system testing phase: |
| 68 | +- Internal implementation details (block managers, internals, etc.) |
| 69 | +- Performance benchmarks or optimization |
| 70 | +- Edge cases requiring white-box knowledge |
| 71 | +- Deprecated or experimental APIs |
| 72 | +- Platform-specific behaviors |
| 73 | + |
| 74 | +--- |
| 75 | + |
| 76 | +## Test Case Summary |
| 77 | + |
| 78 | +### Test Case 1: CSV Data Import-Export Workflow |
| 79 | + |
| 80 | +**Test ID:** SYS-001 |
| 81 | +**Owner:** Sandeep Ramavath |
| 82 | +**Category:** Data I/O Workflow |
| 83 | +**Test File:** `pandas/tests/system/test_system_workflows.py::TestDataIOWorkflow::test_csv_roundtrip_workflow` |
| 84 | + |
| 85 | +| Attribute | Details | |
| 86 | +|-----------|---------| |
| 87 | +| **Title** | CSV Data Import-Export Workflow | |
| 88 | +| **Pre-conditions** | • Temporary directory available for file operations<br>• pandas library installed and functional<br>• Write permissions in test directory | |
| 89 | +| **Test Steps** | **Step 1:** Create DataFrame with mixed data types using public API<br> - Create DataFrame with 5 columns: id (int), name (string), score (float), date (datetime), active (boolean)<br> - Use pandas constructor: `pd.DataFrame()`<br><br>**Step 2:** Export DataFrame to CSV file<br> - Call `to_csv()` method with file path<br> - Use `index=False` parameter<br> - Verify file creation on disk<br><br>**Step 3:** Import CSV file back into new DataFrame<br> - Call `pd.read_csv()` with file path<br> - Use `parse_dates` parameter for date column<br><br>**Step 4:** Verify data integrity and type preservation<br> - Check row count matches original (5 rows)<br> - Verify column names preserved<br> - Compare all values with original data<br> - Verify datetime type correctly parsed | |
| 90 | +| **Expected Results** | • CSV file created successfully at specified path<br>• File contains 5 data rows plus header<br>• Data round-trips without any loss<br>• Integer values: [1, 2, 3, 4, 5] preserved<br>• String values: ['Alice', 'Bob', 'Charlie', 'David', 'Eve'] preserved<br>• Float values: [95.5, 87.3, 92.1, 88.7, 91.4] preserved<br>• Boolean values: [True, False, True, True, False] preserved<br>• Date column recognized as datetime64 type<br>• All assertions pass without errors | |
| 91 | +| **Actual Results** | **PASSED** - All expected results achieved | |
| 92 | + |
| 93 | +--- |
| 94 | + |
| 95 | +### Test Case 2: Missing Data Cleaning Workflow |
| 96 | + |
| 97 | +**Test ID:** SYS-002 |
| 98 | +**Owner:** Nithikesh Bobbili |
| 99 | +**Category:** Data Cleaning Workflow |
| 100 | +**Test File:** `pandas/tests/system/test_system_workflows.py::TestDataCleaningWorkflow::test_missing_data_handling_workflow` |
| 101 | + |
| 102 | +| Attribute | Details | |
| 103 | +|-----------|---------| |
| 104 | +| **Title** | Missing Data Cleaning Workflow | |
| 105 | +| **Pre-conditions** | • pandas library available<br>• numpy library available for NaN values<br>• No external dependencies or files required | |
| 106 | +| **Test Steps** | **Step 1:** Create DataFrame with missing values using public API<br> - Create 3-column DataFrame with `np.nan` values<br> - Column A: 2 missing values at positions 1 and 3<br> - Column B: 2 missing values at positions 0 and 2<br> - Column C: 1 missing value at position 4<br><br>**Step 2:** Detect missing values using public methods<br> - Call `isnull()` to create boolean mask<br> - Call `sum()` to count missing values per column<br> - Verify counts: A=2, B=2, C=1<br><br>**Step 3:** Fill missing values using multiple strategies<br> - **Strategy 3a:** Forward fill using `ffill()`<br> - Verify propagation of last valid value<br> - Check remaining NaN count<br> - **Strategy 3b:** Backward fill using `bfill()`<br> - Verify propagation of next valid value<br> - Check remaining NaN count<br> - **Strategy 3c:** Constant fill using `fillna(0)`<br> - Verify all NaN replaced with 0<br><br>**Step 4:** Verify all missing values handled correctly<br> - Confirm no NaN values remain after constant fill<br> - Verify DataFrame shape preserved<br> - Check specific filled values match expectations | |
| 107 | +| **Expected Results** | • Missing values correctly identified: A=2, B=2, C=1 (total 5)<br>• Forward fill leaves 1 NaN (at first position of column B)<br>• Forward fill propagates value correctly (row 1, col A = 1.0)<br>• Backward fill leaves 1 NaN (at last position of column C)<br>• Backward fill propagates value correctly (row 0, col B = 2.0)<br>• Constant fill (value=0) removes all NaN values<br>• Constant fill replaces NaN with exact value (row 1, col A = 0.0)<br>• DataFrame shape (5, 3) preserved after all operations<br>• All assertions pass without errors | |
| 108 | +| **Actual Results** | **PASSED** - All expected results achieved | |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +### Test Case 3: Group-by Aggregation Analysis Workflow |
| 113 | + |
| 114 | +**Test ID:** SYS-003 |
| 115 | +**Owner:** Mallikarjuna |
| 116 | +**Category:** Aggregation and Analysis Workflow |
| 117 | +**Test File:** `pandas/tests/system/test_system_workflows.py::TestAggregationWorkflow::test_groupby_aggregation_workflow` |
| 118 | + |
| 119 | +| Attribute | Details | |
| 120 | +|-----------|---------| |
| 121 | +| **Title** | Group-by Aggregation Analysis Workflow | |
| 122 | +| **Pre-conditions** | • pandas library functional<br>• Sufficient memory for group operations<br>• No external data sources required | |
| 123 | +| **Test Steps** | **Step 1:** Create DataFrame with categorical and numeric data<br> - Create DataFrame with 3 columns:<br> - category: ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']<br> - value: [10, 20, 15, 25, 20, 30, 25, 35]<br> - quantity: [1, 2, 3, 4, 5, 6, 7, 8]<br> - Total 8 rows, evenly split between categories A and B<br><br>**Step 2:** Group data by category using public API<br> - Call `groupby('category')` on DataFrame<br> - Store grouped object for multiple operations<br><br>**Step 3:** Apply multiple aggregation functions<br> - **Step 3a:** Apply mean aggregation on 'value' column<br> - Calculate average for each category<br> - Verify: Category A mean = 17.5, Category B mean = 27.5<br> - **Step 3b:** Apply sum aggregation on 'value' column<br> - Calculate total for each category<br> - Verify: Category A sum = 70, Category B sum = 110<br> - **Step 3c:** Apply count aggregation<br> - Count items in each category using `size()`<br> - Verify: Category A count = 4, Category B count = 4<br><br>**Step 4:** Apply multiple aggregations simultaneously<br> - Use `agg(['mean', 'sum', 'count'])` on grouped data<br> - Create multi-column result DataFrame<br><br>**Step 5:** Verify aggregated results comprehensively<br> - Check all 6 values (2 categories × 3 aggregations)<br> - Verify result DataFrame shape is (2, 3)<br> - Confirm index contains category labels | |
| 124 | +| **Expected Results** | • Data groups correctly into 2 categories (A and B)<br>• Category A mean aggregation: (10+15+20+25)/4 = 17.5 ✓<br>• Category B mean aggregation: (20+25+30+35)/4 = 27.5 ✓<br>• Category A sum aggregation: 10+15+20+25 = 70 ✓<br>• Category B sum aggregation: 20+25+30+35 = 110 ✓<br>• Category A count: 4 items ✓<br>• Category B count: 4 items ✓<br>• Multi-aggregation creates DataFrame with shape (2, 3)<br>• Multi-aggregation preserves all individual results<br>• Result index contains 'A' and 'B' as category labels<br>• Result columns contain 'mean', 'sum', 'count'<br>• All assertions pass without errors | |
| 125 | +| **Actual Results** | **PASSED** - All expected results achieved | |
| 126 | + |
| 127 | +--- |
| 128 | + |
| 129 | +## Execution and Results |
| 130 | + |
| 131 | +### Test Environment |
| 132 | + |
| 133 | +**Test File:** `pandas/tests/system/test_system_workflows.py` |
| 134 | +**Testing Framework:** pytest 8.4.2 |
| 135 | +**Python Version:** 3.13.5 |
| 136 | +**Pandas Version:** 3.0.0.dev0+ |
| 137 | +**NumPy Version:** 1.26+ |
| 138 | +**Operating System:** macOS |
| 139 | + |
| 140 | +### Execution Command |
| 141 | + |
| 142 | +```bash |
| 143 | +python -m pytest pandas/tests/system/test_system_workflows.py -v |
| 144 | +``` |
| 145 | + |
| 146 | +### Test Results Summary |
| 147 | + |
| 148 | +``` |
| 149 | +collected 3 items |
| 150 | +
|
| 151 | +pandas/tests/system/test_system_workflows.py::TestDataIOWorkflow::test_csv_roundtrip_workflow PASSED [33%] |
| 152 | +pandas/tests/system/test_system_workflows.py::TestDataCleaningWorkflow::test_missing_data_handling_workflow PASSED [66%] |
| 153 | +pandas/tests/system/test_system_workflows.py::TestAggregationWorkflow::test_groupby_aggregation_workflow PASSED [100%] |
| 154 | +
|
| 155 | +=================================== 3 passed in 0.52s =================================== |
| 156 | +``` |
| 157 | + |
| 158 | +### Detailed Test Results |
| 159 | + |
| 160 | +| Test Case | Status | Duration | Assertions | Outcome | |
| 161 | +|-----------|--------|----------|------------|---------| |
| 162 | +| CSV Data Import-Export Workflow | PASSED | ~0.18s | 8 | All data round-tripped correctly | |
| 163 | +| Missing Data Cleaning Workflow | PASSED | ~0.16s | 11 | All fill strategies worked as expected | |
| 164 | +| Group-by Aggregation Workflow | PASSED | ~0.18s | 13 | All aggregations computed correctly | |
| 165 | + |
| 166 | +**Summary Statistics:** |
| 167 | +- **Total Test Cases:** 3 |
| 168 | +- **Passed:** 3 (100%) |
| 169 | +- **Failed:** 0 (0%) |
| 170 | +- **Skipped:** 0 |
| 171 | +- **Total Execution Time:** 0.52 seconds |
| 172 | +- **Average Test Duration:** 0.17 seconds |
| 173 | +- **Total Assertions:** 32 |
| 174 | +- **Assertions Passed:** 32 (100%) |
| 175 | + |
| 176 | +### Behavioral Analysis |
| 177 | + |
| 178 | +#### Test Case 1: CSV Roundtrip Workflow |
| 179 | + |
| 180 | +**Expected Behavior:** |
| 181 | +- CSV export creates valid file with proper formatting |
| 182 | +- CSV import reconstructs DataFrame with same data |
| 183 | +- Mixed data types (int, float, string, datetime, bool) preserved |
| 184 | +- Datetime parsing works correctly with `parse_dates` parameter |
| 185 | + |
| 186 | +**Actual Behavior:** |
| 187 | +**Matches Expected** - All behaviors confirmed: |
| 188 | +- CSV file created with proper structure (header + 5 data rows) |
| 189 | +- All numeric values preserved exactly (no rounding errors) |
| 190 | +- String values preserved with proper encoding |
| 191 | +- Boolean values correctly written and read as True/False |
| 192 | +- Datetime column parsed correctly to datetime64 dtype |
| 193 | +- No data loss or corruption during round-trip |
| 194 | + |
| 195 | +**Deviations:** None |
| 196 | + |
| 197 | +#### Test Case 2: Missing Data Cleaning Workflow |
| 198 | + |
| 199 | +**Expected Behavior:** |
| 200 | +- `isnull()` correctly identifies NaN values |
| 201 | +- `ffill()` propagates last valid observation forward |
| 202 | +- `bfill()` propagates next valid observation backward |
| 203 | +- `fillna()` replaces all NaN with specified constant |
| 204 | +- Original DataFrame remains unchanged (immutable operations) |
| 205 | + |
| 206 | +**Actual Behavior:** |
| 207 | +**Matches Expected** - All behaviors confirmed: |
| 208 | +- Missing value detection accurate (5 total NaN values identified) |
| 209 | +- Forward fill correctly propagated values, leaving only leading NaN |
| 210 | +- Backward fill correctly propagated values, leaving only trailing NaN |
| 211 | +- Constant fill successfully eliminated all NaN values |
| 212 | +- Shape and non-NaN values preserved across all operations |
| 213 | +- Original DataFrame immutable (each operation returns new DataFrame) |
| 214 | + |
| 215 | +**Deviations:** None |
| 216 | + |
| 217 | +#### Test Case 3: Group-by Aggregation Workflow |
| 218 | + |
| 219 | +**Expected Behavior:** |
| 220 | +- `groupby()` splits data by category labels |
| 221 | +- Aggregation functions compute correct statistics per group |
| 222 | +- Multiple aggregations can be applied simultaneously |
| 223 | +- Result maintains category labels as index |
| 224 | + |
| 225 | +**Actual Behavior:** |
| 226 | +**Matches Expected** - All behaviors confirmed: |
| 227 | +- Data correctly split into 2 groups (A and B) |
| 228 | +- Mean calculations accurate: A=17.5, B=27.5 |
| 229 | +- Sum calculations accurate: A=70, B=110 |
| 230 | +- Count calculations accurate: A=4, B=4 |
| 231 | +- Multi-aggregation created proper DataFrame structure |
| 232 | +- Category labels preserved in result index |
| 233 | +- All numeric computations precise (no floating-point errors) |
| 234 | + |
| 235 | +**Deviations:** None |
| 236 | + |
| 237 | +### Failures and Deviations |
| 238 | + |
| 239 | +**Result: No failures or behavioral deviations were discovered.** |
| 240 | + |
| 241 | +All system tests passed successfully, indicating that: |
| 242 | +- End-to-end workflows function as designed |
| 243 | +- Public APIs behave according to documentation |
| 244 | +- Data integrity maintained across operations |
| 245 | +- No unexpected errors or exceptions |
| 246 | +- All user workflows complete successfully |
| 247 | + |
| 248 | +### Test Coverage Analysis |
| 249 | + |
| 250 | +The system tests successfully validated: |
| 251 | + |
| 252 | +| Workflow Component | Validated | Evidence | |
| 253 | +|-------------------|-----------|----------| |
| 254 | +| File I/O operations | Yes | CSV roundtrip successful | |
| 255 | +| Data type handling | Yes | 5 different types preserved | |
| 256 | +| Missing value detection | Yes | All NaN values identified | |
| 257 | +| Fill strategies | Yes | 3 strategies all worked | |
| 258 | +| Grouping operations | Yes | Categories split correctly | |
| 259 | +| Aggregation functions | Yes | 3 aggregations accurate | |
| 260 | +| Multi-aggregation | Yes | Combined aggregations worked | |
| 261 | +| Data immutability | Yes | Original data preserved | |
| 262 | + |
| 263 | +--- |
| 264 | + |
| 265 | +## Group Contributions |
| 266 | + |
| 267 | +### Individual Contributions |
| 268 | + |
| 269 | +| Student | Test Cases | Workflow Validated | Assertions | LOC | |
| 270 | +|---------|------------|-------------------|------------|-----| |
| 271 | +| **Sandeep Ramavath** | 1 test case | Data I/O (CSV import/export) | 8 | ~50 | |
| 272 | +| **Nithikesh Bobbili** | 1 test case | Data Cleaning (missing data handling) | 11 | ~60 | |
| 273 | +| **Mallikarjuna** | 1 test case | Aggregation (groupby operations) | 13 | ~65 | |
| 274 | + |
0 commit comments