Skip to content

Commit 937ff36

Browse files
committed
Added READM and report
1 parent ed79a74 commit 937ff36

File tree

2 files changed

+335
-0
lines changed

2 files changed

+335
-0
lines changed
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# System Testing - Instructions to Run Tests
2+
3+
## Test File Location
4+
5+
System tests are located in: **`pandas/tests/system/test_system_workflows.py`**
6+
7+
## Prerequisites
8+
9+
```bash
10+
# Navigate to project directory
11+
cd /Volumes/T7Shield/SWEN777/SWEN_777_Pandas
12+
13+
# Activate virtual environment
14+
source venv/bin/activate
15+
```
16+
17+
## How to Run System Tests to Reproduce Results
18+
19+
### Run All System Tests
20+
21+
```bash
22+
python -m pytest pandas/tests/system/test_system_workflows.py -v
23+
```
24+
25+
**Expected Output:**
26+
```
27+
collected 3 items
28+
29+
pandas/tests/system/test_system_workflows.py::TestDataIOWorkflow::test_csv_roundtrip_workflow PASSED [33%]
30+
pandas/tests/system/test_system_workflows.py::TestDataCleaningWorkflow::test_missing_data_handling_workflow PASSED [66%]
31+
pandas/tests/system/test_system_workflows.py::TestAggregationWorkflow::test_groupby_aggregation_workflow PASSED [100%]
32+
33+
=================================== 3 passed in 0.52s
34+
```
35+
36+
### Run Tests by Student/Workflow
37+
38+
```bash
39+
# Sandeep Ramavath - Data I/O Workflow
40+
python -m pytest pandas/tests/system/test_system_workflows.py::TestDataIOWorkflow -v
41+
42+
# Nithikesh Bobbili - Data Cleaning Workflow
43+
python -m pytest pandas/tests/system/test_system_workflows.py::TestDataCleaningWorkflow -v
44+
45+
# Mallikarjuna - Aggregation Workflow
46+
python -m pytest pandas/tests/system/test_system_workflows.py::TestAggregationWorkflow -v
47+
```
48+
49+
### Run Individual Test Case
50+
51+
```bash
52+
# CSV Roundtrip Workflow
53+
python -m pytest pandas/tests/system/test_system_workflows.py::TestDataIOWorkflow::test_csv_roundtrip_workflow -v
54+
55+
# Missing Data Handling Workflow
56+
python -m pytest pandas/tests/system/test_system_workflows.py::TestDataCleaningWorkflow::test_missing_data_handling_workflow -v
57+
58+
# Group-by Aggregation Workflow
59+
python -m pytest pandas/tests/system/test_system_workflows.py::TestAggregationWorkflow::test_groupby_aggregation_workflow -v
60+
```
61+
Lines changed: 274 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,274 @@
1+
# System Testing Report
2+
3+
## Executive Summary
4+
5+
This report documents the system-level black-box testing effort for the pandas library. Our team created 3 system test cases that validate complete end-to-end workflows through pandas' public API, treating the system as a black box without referencing internal implementation details.
6+
7+
---
8+
9+
## Test Scope and Coverage
10+
11+
### Testing Approach
12+
13+
Our system tests validate pandas' core functionality through **black-box testing**, meaning:
14+
- Tests interact only through public APIs (DataFrame, Series, read_csv, to_csv, etc.)
15+
- No reference to internal implementation or private methods
16+
- Tests simulate real user workflows from start to finish
17+
- Validation based on observable behavior and outputs
18+
19+
### Workflows Validated
20+
21+
The system tests cover three critical end-to-end workflows that represent typical pandas usage patterns:
22+
23+
#### 1. Data Loading and Export Workflow (Sandeep Ramavath)
24+
**Scope:** Complete data I/O lifecycle
25+
- **Features Tested:**
26+
- CSV file import (`pd.read_csv()`)
27+
- CSV file export (`DataFrame.to_csv()`)
28+
- Mixed data type handling (integers, floats, strings, dates, booleans)
29+
- Data persistence and round-trip integrity
30+
- Datetime parsing during import
31+
32+
- **User Story:** "As a data analyst, I want to load data from CSV files, work with it in pandas, and export results back to CSV so that I can share my analysis with others."
33+
34+
#### 2. Data Cleaning and Transformation Workflow (Nithikesh Bobbili)
35+
**Scope:** Missing data handling and data quality
36+
- **Features Tested:**
37+
- Missing value detection (`isnull()`, `sum()`)
38+
- Forward fill strategy (`ffill()`)
39+
- Backward fill strategy (`bfill()`)
40+
- Constant value fill (`fillna()`)
41+
- Data integrity preservation during cleaning
42+
43+
- **User Story:** "As a data scientist, I want to identify and handle missing values in my dataset using various filling strategies so that I can prepare clean data for analysis."
44+
45+
#### 3. Aggregation and Analysis Workflow (Mallikarjuna)
46+
**Scope:** Group-by operations and statistical analysis
47+
- **Features Tested:**
48+
- Categorical grouping (`groupby()`)
49+
- Statistical aggregations (mean, sum, count)
50+
- Multiple simultaneous aggregations (`agg()`)
51+
- Grouped data integrity
52+
- Result correctness verification
53+
54+
- **User Story:** "As a business analyst, I want to group data by categories and compute statistics for each group so that I can understand patterns and trends in my data."
55+
56+
### Coverage Metrics
57+
58+
| Workflow Category | Public APIs Used | Test Cases | Assertions |
59+
|------------------|------------------|------------|------------|
60+
| Data I/O | 2 APIs | 1 | 8 |
61+
| Data Cleaning | 4 APIs | 1 | 11 |
62+
| Data Aggregation | 4 APIs | 1 | 13 |
63+
| **Total** | **10 unique APIs** | **3** | **32** |
64+
65+
### Out of Scope
66+
67+
The following are explicitly **not** tested in this system testing phase:
68+
- Internal implementation details (block managers, internals, etc.)
69+
- Performance benchmarks or optimization
70+
- Edge cases requiring white-box knowledge
71+
- Deprecated or experimental APIs
72+
- Platform-specific behaviors
73+
74+
---
75+
76+
## Test Case Summary
77+
78+
### Test Case 1: CSV Data Import-Export Workflow
79+
80+
**Test ID:** SYS-001
81+
**Owner:** Sandeep Ramavath
82+
**Category:** Data I/O Workflow
83+
**Test File:** `pandas/tests/system/test_system_workflows.py::TestDataIOWorkflow::test_csv_roundtrip_workflow`
84+
85+
| Attribute | Details |
86+
|-----------|---------|
87+
| **Title** | CSV Data Import-Export Workflow |
88+
| **Pre-conditions** | • Temporary directory available for file operations<br>• pandas library installed and functional<br>• Write permissions in test directory |
89+
| **Test Steps** | **Step 1:** Create DataFrame with mixed data types using public API<br>&nbsp;&nbsp;- Create DataFrame with 5 columns: id (int), name (string), score (float), date (datetime), active (boolean)<br>&nbsp;&nbsp;- Use pandas constructor: `pd.DataFrame()`<br><br>**Step 2:** Export DataFrame to CSV file<br>&nbsp;&nbsp;- Call `to_csv()` method with file path<br>&nbsp;&nbsp;- Use `index=False` parameter<br>&nbsp;&nbsp;- Verify file creation on disk<br><br>**Step 3:** Import CSV file back into new DataFrame<br>&nbsp;&nbsp;- Call `pd.read_csv()` with file path<br>&nbsp;&nbsp;- Use `parse_dates` parameter for date column<br><br>**Step 4:** Verify data integrity and type preservation<br>&nbsp;&nbsp;- Check row count matches original (5 rows)<br>&nbsp;&nbsp;- Verify column names preserved<br>&nbsp;&nbsp;- Compare all values with original data<br>&nbsp;&nbsp;- Verify datetime type correctly parsed |
90+
| **Expected Results** | • CSV file created successfully at specified path<br>• File contains 5 data rows plus header<br>• Data round-trips without any loss<br>• Integer values: [1, 2, 3, 4, 5] preserved<br>• String values: ['Alice', 'Bob', 'Charlie', 'David', 'Eve'] preserved<br>• Float values: [95.5, 87.3, 92.1, 88.7, 91.4] preserved<br>• Boolean values: [True, False, True, True, False] preserved<br>• Date column recognized as datetime64 type<br>• All assertions pass without errors |
91+
| **Actual Results** | **PASSED** - All expected results achieved |
92+
93+
---
94+
95+
### Test Case 2: Missing Data Cleaning Workflow
96+
97+
**Test ID:** SYS-002
98+
**Owner:** Nithikesh Bobbili
99+
**Category:** Data Cleaning Workflow
100+
**Test File:** `pandas/tests/system/test_system_workflows.py::TestDataCleaningWorkflow::test_missing_data_handling_workflow`
101+
102+
| Attribute | Details |
103+
|-----------|---------|
104+
| **Title** | Missing Data Cleaning Workflow |
105+
| **Pre-conditions** | • pandas library available<br>• numpy library available for NaN values<br>• No external dependencies or files required |
106+
| **Test Steps** | **Step 1:** Create DataFrame with missing values using public API<br>&nbsp;&nbsp;- Create 3-column DataFrame with `np.nan` values<br>&nbsp;&nbsp;- Column A: 2 missing values at positions 1 and 3<br>&nbsp;&nbsp;- Column B: 2 missing values at positions 0 and 2<br>&nbsp;&nbsp;- Column C: 1 missing value at position 4<br><br>**Step 2:** Detect missing values using public methods<br>&nbsp;&nbsp;- Call `isnull()` to create boolean mask<br>&nbsp;&nbsp;- Call `sum()` to count missing values per column<br>&nbsp;&nbsp;- Verify counts: A=2, B=2, C=1<br><br>**Step 3:** Fill missing values using multiple strategies<br>&nbsp;&nbsp;- **Strategy 3a:** Forward fill using `ffill()`<br>&nbsp;&nbsp;&nbsp;&nbsp;- Verify propagation of last valid value<br>&nbsp;&nbsp;&nbsp;&nbsp;- Check remaining NaN count<br>&nbsp;&nbsp;- **Strategy 3b:** Backward fill using `bfill()`<br>&nbsp;&nbsp;&nbsp;&nbsp;- Verify propagation of next valid value<br>&nbsp;&nbsp;&nbsp;&nbsp;- Check remaining NaN count<br>&nbsp;&nbsp;- **Strategy 3c:** Constant fill using `fillna(0)`<br>&nbsp;&nbsp;&nbsp;&nbsp;- Verify all NaN replaced with 0<br><br>**Step 4:** Verify all missing values handled correctly<br>&nbsp;&nbsp;- Confirm no NaN values remain after constant fill<br>&nbsp;&nbsp;- Verify DataFrame shape preserved<br>&nbsp;&nbsp;- Check specific filled values match expectations |
107+
| **Expected Results** | • Missing values correctly identified: A=2, B=2, C=1 (total 5)<br>• Forward fill leaves 1 NaN (at first position of column B)<br>• Forward fill propagates value correctly (row 1, col A = 1.0)<br>• Backward fill leaves 1 NaN (at last position of column C)<br>• Backward fill propagates value correctly (row 0, col B = 2.0)<br>• Constant fill (value=0) removes all NaN values<br>• Constant fill replaces NaN with exact value (row 1, col A = 0.0)<br>• DataFrame shape (5, 3) preserved after all operations<br>• All assertions pass without errors |
108+
| **Actual Results** | **PASSED** - All expected results achieved |
109+
110+
---
111+
112+
### Test Case 3: Group-by Aggregation Analysis Workflow
113+
114+
**Test ID:** SYS-003
115+
**Owner:** Mallikarjuna
116+
**Category:** Aggregation and Analysis Workflow
117+
**Test File:** `pandas/tests/system/test_system_workflows.py::TestAggregationWorkflow::test_groupby_aggregation_workflow`
118+
119+
| Attribute | Details |
120+
|-----------|---------|
121+
| **Title** | Group-by Aggregation Analysis Workflow |
122+
| **Pre-conditions** | • pandas library functional<br>• Sufficient memory for group operations<br>• No external data sources required |
123+
| **Test Steps** | **Step 1:** Create DataFrame with categorical and numeric data<br>&nbsp;&nbsp;- Create DataFrame with 3 columns:<br>&nbsp;&nbsp;&nbsp;&nbsp;- category: ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']<br>&nbsp;&nbsp;&nbsp;&nbsp;- value: [10, 20, 15, 25, 20, 30, 25, 35]<br>&nbsp;&nbsp;&nbsp;&nbsp;- quantity: [1, 2, 3, 4, 5, 6, 7, 8]<br>&nbsp;&nbsp;- Total 8 rows, evenly split between categories A and B<br><br>**Step 2:** Group data by category using public API<br>&nbsp;&nbsp;- Call `groupby('category')` on DataFrame<br>&nbsp;&nbsp;- Store grouped object for multiple operations<br><br>**Step 3:** Apply multiple aggregation functions<br>&nbsp;&nbsp;- **Step 3a:** Apply mean aggregation on 'value' column<br>&nbsp;&nbsp;&nbsp;&nbsp;- Calculate average for each category<br>&nbsp;&nbsp;&nbsp;&nbsp;- Verify: Category A mean = 17.5, Category B mean = 27.5<br>&nbsp;&nbsp;- **Step 3b:** Apply sum aggregation on 'value' column<br>&nbsp;&nbsp;&nbsp;&nbsp;- Calculate total for each category<br>&nbsp;&nbsp;&nbsp;&nbsp;- Verify: Category A sum = 70, Category B sum = 110<br>&nbsp;&nbsp;- **Step 3c:** Apply count aggregation<br>&nbsp;&nbsp;&nbsp;&nbsp;- Count items in each category using `size()`<br>&nbsp;&nbsp;&nbsp;&nbsp;- Verify: Category A count = 4, Category B count = 4<br><br>**Step 4:** Apply multiple aggregations simultaneously<br>&nbsp;&nbsp;- Use `agg(['mean', 'sum', 'count'])` on grouped data<br>&nbsp;&nbsp;- Create multi-column result DataFrame<br><br>**Step 5:** Verify aggregated results comprehensively<br>&nbsp;&nbsp;- Check all 6 values (2 categories × 3 aggregations)<br>&nbsp;&nbsp;- Verify result DataFrame shape is (2, 3)<br>&nbsp;&nbsp;- Confirm index contains category labels |
124+
| **Expected Results** | • Data groups correctly into 2 categories (A and B)<br>• Category A mean aggregation: (10+15+20+25)/4 = 17.5 ✓<br>• Category B mean aggregation: (20+25+30+35)/4 = 27.5 ✓<br>• Category A sum aggregation: 10+15+20+25 = 70 ✓<br>• Category B sum aggregation: 20+25+30+35 = 110 ✓<br>• Category A count: 4 items ✓<br>• Category B count: 4 items ✓<br>• Multi-aggregation creates DataFrame with shape (2, 3)<br>• Multi-aggregation preserves all individual results<br>• Result index contains 'A' and 'B' as category labels<br>• Result columns contain 'mean', 'sum', 'count'<br>• All assertions pass without errors |
125+
| **Actual Results** | **PASSED** - All expected results achieved |
126+
127+
---
128+
129+
## Execution and Results
130+
131+
### Test Environment
132+
133+
**Test File:** `pandas/tests/system/test_system_workflows.py`
134+
**Testing Framework:** pytest 8.4.2
135+
**Python Version:** 3.13.5
136+
**Pandas Version:** 3.0.0.dev0+
137+
**NumPy Version:** 1.26+
138+
**Operating System:** macOS
139+
140+
### Execution Command
141+
142+
```bash
143+
python -m pytest pandas/tests/system/test_system_workflows.py -v
144+
```
145+
146+
### Test Results Summary
147+
148+
```
149+
collected 3 items
150+
151+
pandas/tests/system/test_system_workflows.py::TestDataIOWorkflow::test_csv_roundtrip_workflow PASSED [33%]
152+
pandas/tests/system/test_system_workflows.py::TestDataCleaningWorkflow::test_missing_data_handling_workflow PASSED [66%]
153+
pandas/tests/system/test_system_workflows.py::TestAggregationWorkflow::test_groupby_aggregation_workflow PASSED [100%]
154+
155+
=================================== 3 passed in 0.52s ===================================
156+
```
157+
158+
### Detailed Test Results
159+
160+
| Test Case | Status | Duration | Assertions | Outcome |
161+
|-----------|--------|----------|------------|---------|
162+
| CSV Data Import-Export Workflow | PASSED | ~0.18s | 8 | All data round-tripped correctly |
163+
| Missing Data Cleaning Workflow | PASSED | ~0.16s | 11 | All fill strategies worked as expected |
164+
| Group-by Aggregation Workflow | PASSED | ~0.18s | 13 | All aggregations computed correctly |
165+
166+
**Summary Statistics:**
167+
- **Total Test Cases:** 3
168+
- **Passed:** 3 (100%)
169+
- **Failed:** 0 (0%)
170+
- **Skipped:** 0
171+
- **Total Execution Time:** 0.52 seconds
172+
- **Average Test Duration:** 0.17 seconds
173+
- **Total Assertions:** 32
174+
- **Assertions Passed:** 32 (100%)
175+
176+
### Behavioral Analysis
177+
178+
#### Test Case 1: CSV Roundtrip Workflow
179+
180+
**Expected Behavior:**
181+
- CSV export creates valid file with proper formatting
182+
- CSV import reconstructs DataFrame with same data
183+
- Mixed data types (int, float, string, datetime, bool) preserved
184+
- Datetime parsing works correctly with `parse_dates` parameter
185+
186+
**Actual Behavior:**
187+
**Matches Expected** - All behaviors confirmed:
188+
- CSV file created with proper structure (header + 5 data rows)
189+
- All numeric values preserved exactly (no rounding errors)
190+
- String values preserved with proper encoding
191+
- Boolean values correctly written and read as True/False
192+
- Datetime column parsed correctly to datetime64 dtype
193+
- No data loss or corruption during round-trip
194+
195+
**Deviations:** None
196+
197+
#### Test Case 2: Missing Data Cleaning Workflow
198+
199+
**Expected Behavior:**
200+
- `isnull()` correctly identifies NaN values
201+
- `ffill()` propagates last valid observation forward
202+
- `bfill()` propagates next valid observation backward
203+
- `fillna()` replaces all NaN with specified constant
204+
- Original DataFrame remains unchanged (immutable operations)
205+
206+
**Actual Behavior:**
207+
**Matches Expected** - All behaviors confirmed:
208+
- Missing value detection accurate (5 total NaN values identified)
209+
- Forward fill correctly propagated values, leaving only leading NaN
210+
- Backward fill correctly propagated values, leaving only trailing NaN
211+
- Constant fill successfully eliminated all NaN values
212+
- Shape and non-NaN values preserved across all operations
213+
- Original DataFrame immutable (each operation returns new DataFrame)
214+
215+
**Deviations:** None
216+
217+
#### Test Case 3: Group-by Aggregation Workflow
218+
219+
**Expected Behavior:**
220+
- `groupby()` splits data by category labels
221+
- Aggregation functions compute correct statistics per group
222+
- Multiple aggregations can be applied simultaneously
223+
- Result maintains category labels as index
224+
225+
**Actual Behavior:**
226+
**Matches Expected** - All behaviors confirmed:
227+
- Data correctly split into 2 groups (A and B)
228+
- Mean calculations accurate: A=17.5, B=27.5
229+
- Sum calculations accurate: A=70, B=110
230+
- Count calculations accurate: A=4, B=4
231+
- Multi-aggregation created proper DataFrame structure
232+
- Category labels preserved in result index
233+
- All numeric computations precise (no floating-point errors)
234+
235+
**Deviations:** None
236+
237+
### Failures and Deviations
238+
239+
**Result: No failures or behavioral deviations were discovered.**
240+
241+
All system tests passed successfully, indicating that:
242+
- End-to-end workflows function as designed
243+
- Public APIs behave according to documentation
244+
- Data integrity maintained across operations
245+
- No unexpected errors or exceptions
246+
- All user workflows complete successfully
247+
248+
### Test Coverage Analysis
249+
250+
The system tests successfully validated:
251+
252+
| Workflow Component | Validated | Evidence |
253+
|-------------------|-----------|----------|
254+
| File I/O operations | Yes | CSV roundtrip successful |
255+
| Data type handling | Yes | 5 different types preserved |
256+
| Missing value detection | Yes | All NaN values identified |
257+
| Fill strategies | Yes | 3 strategies all worked |
258+
| Grouping operations | Yes | Categories split correctly |
259+
| Aggregation functions | Yes | 3 aggregations accurate |
260+
| Multi-aggregation | Yes | Combined aggregations worked |
261+
| Data immutability | Yes | Original data preserved |
262+
263+
---
264+
265+
## Group Contributions
266+
267+
### Individual Contributions
268+
269+
| Student | Test Cases | Workflow Validated | Assertions | LOC |
270+
|---------|------------|-------------------|------------|-----|
271+
| **Sandeep Ramavath** | 1 test case | Data I/O (CSV import/export) | 8 | ~50 |
272+
| **Nithikesh Bobbili** | 1 test case | Data Cleaning (missing data handling) | 11 | ~60 |
273+
| **Mallikarjuna** | 1 test case | Aggregation (groupby operations) | 13 | ~65 |
274+

0 commit comments

Comments
 (0)