|
| 1 | +# Biomedical Statistical Analysis Module |
| 2 | + |
| 3 | +A comprehensive Python module providing implementations of non-parametric statistical tests commonly used in biomedical research, along with visualization utilities and educational examples. |
| 4 | + |
| 5 | +## 📊 Overview |
| 6 | + |
| 7 | +This module implements two fundamental non-parametric statistical tests: |
| 8 | + |
| 9 | +- **Wilcoxon Signed-Rank Test**: For analyzing paired/dependent samples |
| 10 | +- **Mann-Whitney U Test**: For comparing two independent groups |
| 11 | + |
| 12 | +Both tests are essential when data doesn't meet the assumptions required for parametric tests (normality, equal variances, etc.). |
| 13 | + |
| 14 | +## 🔬 Features |
| 15 | + |
| 16 | +### Core Implementations |
| 17 | +- **Pure Python**: No external dependencies required for core functionality |
| 18 | +- **Educational Focus**: Clear, well-documented algorithms for learning |
| 19 | +- **Biomedical Context**: Examples and documentation tailored for biomedical research |
| 20 | +- **Statistical Rigor**: Proper handling of ties, effect sizes, and p-value calculations |
| 21 | + |
| 22 | +### Visualization Tools |
| 23 | +- Text-based visualizations (no external plotting libraries required) |
| 24 | +- Box plot representations |
| 25 | +- Paired data change visualization |
| 26 | +- Group comparison histograms |
| 27 | +- Statistical summary displays |
| 28 | + |
| 29 | +### Quality Assurance |
| 30 | +- Comprehensive error handling and input validation |
| 31 | +- Type hints for better code maintainability |
| 32 | +- Extensive documentation and examples |
| 33 | +- Educational comments explaining algorithms |
| 34 | + |
| 35 | +## 📚 Theory and Applications |
| 36 | + |
| 37 | +### Wilcoxon Signed-Rank Test |
| 38 | + |
| 39 | +**When to use:** |
| 40 | +- Paired or dependent samples (before/after, matched pairs) |
| 41 | +- Data is ordinal or continuous but not normally distributed |
| 42 | +- Dependent variable measured at least at ordinal level |
| 43 | +- Differences between pairs are not normally distributed |
| 44 | + |
| 45 | +**Examples in biomedical research:** |
| 46 | +- Blood pressure before and after treatment |
| 47 | +- Pain scores pre and post medication |
| 48 | +- Biomarker levels before and after intervention |
| 49 | +- Patient quality of life scores over time |
| 50 | + |
| 51 | +**Algorithm:** |
| 52 | +1. Calculate differences between paired observations |
| 53 | +2. Remove zero differences |
| 54 | +3. Rank absolute differences (handle ties by averaging) |
| 55 | +4. Sum ranks for positive and negative differences |
| 56 | +5. Test statistic W = smaller of the two sums |
| 57 | +6. Calculate p-value using exact tables (small n) or normal approximation (large n) |
| 58 | + |
| 59 | +### Mann-Whitney U Test |
| 60 | + |
| 61 | +**When to use:** |
| 62 | +- Two independent groups |
| 63 | +- Data is ordinal or continuous but not normally distributed |
| 64 | +- Independent observations |
| 65 | +- No assumption of equal variances required |
| 66 | + |
| 67 | +**Examples in biomedical research:** |
| 68 | +- Treatment vs control group outcomes |
| 69 | +- Disease vs healthy population comparisons |
| 70 | +- Different drug dosage group comparisons |
| 71 | +- Gender differences in biomarker levels |
| 72 | + |
| 73 | +**Algorithm:** |
| 74 | +1. Combine both samples and rank all observations |
| 75 | +2. Sum ranks for each group |
| 76 | +3. Calculate U statistics: U₁ = R₁ - n₁(n₁+1)/2 |
| 77 | +4. Test statistic = min(U₁, U₂) |
| 78 | +5. Calculate p-value using exact tables (small n) or normal approximation (large n) |
| 79 | + |
| 80 | +## 🚀 Quick Start |
| 81 | + |
| 82 | +### Basic Usage |
| 83 | + |
| 84 | +```python |
| 85 | +from Biomedical import wilcoxon_signed_rank_test, mann_whitney_u_test |
| 86 | +from Biomedical import plot_wilcoxon_results, plot_mann_whitney_results |
| 87 | + |
| 88 | +# Wilcoxon test example: Blood pressure study |
| 89 | +before_treatment = [145, 142, 138, 150, 155, 148, 152, 160] |
| 90 | +after_treatment = [140, 138, 135, 145, 148, 142, 147, 152] |
| 91 | + |
| 92 | +w_stat, p_value, stats = wilcoxon_signed_rank_test( |
| 93 | + before_treatment, |
| 94 | + after_treatment, |
| 95 | + alternative="greater" # one-sided: treatment reduces BP |
| 96 | +) |
| 97 | + |
| 98 | +print(f"W statistic: {w_stat}") |
| 99 | +print(f"p-value: {p_value:.4f}") |
| 100 | +print(f"Effect size: {stats['effect_size']:.3f}") |
| 101 | + |
| 102 | +# Visualize results |
| 103 | +plot_wilcoxon_results( |
| 104 | + before_treatment, |
| 105 | + after_treatment, |
| 106 | + ("Before Treatment", "After Treatment"), |
| 107 | + "Blood Pressure Reduction Study" |
| 108 | +) |
| 109 | + |
| 110 | +# Mann-Whitney test example: Drug efficacy study |
| 111 | +treatment_group = [85, 88, 90, 92, 95, 98, 100] |
| 112 | +control_group = [78, 80, 82, 85, 87, 89, 91] |
| 113 | + |
| 114 | +u_stat, p_value, stats = mann_whitney_u_test( |
| 115 | + treatment_group, |
| 116 | + control_group, |
| 117 | + alternative="greater" # treatment > control |
| 118 | +) |
| 119 | + |
| 120 | +print(f"U statistic: {u_stat}") |
| 121 | +print(f"p-value: {p_value:.4f}") |
| 122 | +print(f"Median difference: {stats['median_difference']:.1f}") |
| 123 | + |
| 124 | +# Visualize results |
| 125 | +plot_mann_whitney_results( |
| 126 | + treatment_group, |
| 127 | + control_group, |
| 128 | + ("Treatment", "Control"), |
| 129 | + "Drug Efficacy Comparison" |
| 130 | +) |
| 131 | +``` |
| 132 | + |
| 133 | +## 📖 Detailed Examples |
| 134 | + |
| 135 | +### Example 1: Clinical Trial - Pain Medication |
| 136 | + |
| 137 | +```python |
| 138 | +# Pre and post medication pain scores (1-10 scale) |
| 139 | +pain_before = [8, 7, 9, 6, 8, 7, 9, 8, 7, 6] |
| 140 | +pain_after = [4, 3, 5, 3, 4, 3, 5, 4, 3, 2] |
| 141 | + |
| 142 | +w_stat, p_val, stats = wilcoxon_signed_rank_test( |
| 143 | + pain_before, pain_after, alternative="greater" |
| 144 | +) |
| 145 | + |
| 146 | +print("Pain Medication Study Results:") |
| 147 | +print(f"Median pain reduction: {stats['median_difference']:.1f} points") |
| 148 | +print(f"Statistical significance: p = {p_val:.4f}") |
| 149 | + |
| 150 | +if p_val < 0.05: |
| 151 | + print("✓ Medication significantly reduces pain") |
| 152 | +else: |
| 153 | + print("✗ No significant pain reduction detected") |
| 154 | +``` |
| 155 | + |
| 156 | +### Example 2: Biomarker Comparison Study |
| 157 | + |
| 158 | +```python |
| 159 | +# Biomarker levels: patients vs healthy controls |
| 160 | +patients = [120, 125, 130, 135, 140, 145, 150, 155] |
| 161 | +healthy = [100, 105, 110, 115, 118, 120, 125, 128] |
| 162 | + |
| 163 | +u_stat, p_val, stats = mann_whitney_u_test( |
| 164 | + patients, healthy, alternative="greater" |
| 165 | +) |
| 166 | + |
| 167 | +print("Biomarker Level Comparison:") |
| 168 | +print(f"Patient median: {stats['median1']:.1f}") |
| 169 | +print(f"Healthy median: {stats['median2']:.1f}") |
| 170 | +print(f"Difference: {stats['median_difference']:.1f}") |
| 171 | +print(f"Effect size: {stats['effect_size']:.3f}") |
| 172 | + |
| 173 | +if p_val < 0.05: |
| 174 | + print("✓ Significant difference between groups") |
| 175 | +else: |
| 176 | + print("✗ No significant difference detected") |
| 177 | +``` |
| 178 | + |
| 179 | +## 📊 Interpretation Guidelines |
| 180 | + |
| 181 | +### P-value Interpretation |
| 182 | +- **p < 0.001**: Very strong evidence against null hypothesis |
| 183 | +- **p < 0.01**: Strong evidence against null hypothesis |
| 184 | +- **p < 0.05**: Moderate evidence against null hypothesis |
| 185 | +- **p ≥ 0.05**: Insufficient evidence to reject null hypothesis |
| 186 | + |
| 187 | +### Effect Size Interpretation (for large samples) |
| 188 | +- **Small effect**: r ≈ 0.1 (explains 1% of variance) |
| 189 | +- **Medium effect**: r ≈ 0.3 (explains 9% of variance) |
| 190 | +- **Large effect**: r ≈ 0.5 (explains 25% of variance) |
| 191 | + |
| 192 | +### Assumptions and Limitations |
| 193 | + |
| 194 | +**Wilcoxon Signed-Rank Test:** |
| 195 | +- ✓ Pairs are independent |
| 196 | +- ✓ Data is at least ordinal |
| 197 | +- ✓ Distribution of differences is approximately symmetric |
| 198 | +- ✗ Cannot handle tied differences well (uses average ranks) |
| 199 | + |
| 200 | +**Mann-Whitney U Test:** |
| 201 | +- ✓ Observations are independent |
| 202 | +- ✓ Data is at least ordinal |
| 203 | +- ✓ No assumption of equal variances |
| 204 | +- ✗ Assumes similar distribution shapes for location comparison |
| 205 | + |
| 206 | +## 🔧 API Reference |
| 207 | + |
| 208 | +### `wilcoxon_signed_rank_test(sample1, sample2, alternative='two-sided')` |
| 209 | + |
| 210 | +**Parameters:** |
| 211 | +- `sample1`: First sample (list of numbers) |
| 212 | +- `sample2`: Second sample (list of numbers, same length as sample1) |
| 213 | +- `alternative`: 'two-sided', 'greater', or 'less' |
| 214 | + |
| 215 | +**Returns:** |
| 216 | +- `w_statistic`: Test statistic (float) |
| 217 | +- `p_value`: P-value (float) |
| 218 | +- `stats`: Dictionary with additional statistics |
| 219 | + |
| 220 | +### `mann_whitney_u_test(group1, group2, alternative='two-sided')` |
| 221 | + |
| 222 | +**Parameters:** |
| 223 | +- `group1`: First group (list of numbers) |
| 224 | +- `group2`: Second group (list of numbers) |
| 225 | +- `alternative`: 'two-sided', 'greater', or 'less' |
| 226 | + |
| 227 | +**Returns:** |
| 228 | +- `u_statistic`: Test statistic (float) |
| 229 | +- `p_value`: P-value (float) |
| 230 | +- `stats`: Dictionary with additional statistics |
| 231 | + |
| 232 | +## 🎯 Best Practices |
| 233 | + |
| 234 | +### Study Design Considerations |
| 235 | +1. **Sample Size**: Consider power analysis for adequate sample size |
| 236 | +2. **Data Collection**: Ensure independence of observations |
| 237 | +3. **Multiple Comparisons**: Apply Bonferroni correction when appropriate |
| 238 | +4. **Effect Size**: Always report effect sizes alongside p-values |
| 239 | +5. **Visualization**: Use plots to understand data distribution |
| 240 | + |
| 241 | +### Code Usage Tips |
| 242 | +1. Always validate your data before analysis |
| 243 | +2. Choose appropriate alternative hypothesis |
| 244 | +3. Check sample size recommendations for test validity |
| 245 | +4. Document your analysis assumptions |
| 246 | +5. Provide context for statistical significance |
| 247 | + |
| 248 | +## 🤝 Contributing |
| 249 | + |
| 250 | +This module was created as part of Hacktoberfest 2024. Contributions welcome! |
| 251 | + |
| 252 | +### Areas for Enhancement |
| 253 | +- [ ] Additional non-parametric tests (Kruskal-Wallis, Friedman) |
| 254 | +- [ ] Integration with popular plotting libraries (matplotlib, seaborn) |
| 255 | +- [ ] Power analysis functions |
| 256 | +- [ ] Bootstrap confidence intervals |
| 257 | +- [ ] Multiple comparison corrections |
| 258 | + |
| 259 | +### Development Guidelines |
| 260 | +- Follow existing code style and documentation patterns |
| 261 | +- Include comprehensive tests for new features |
| 262 | +- Maintain educational focus with clear explanations |
| 263 | +- Ensure compatibility with existing API |
| 264 | + |
| 265 | +## 📝 License |
| 266 | + |
| 267 | +MIT License - See LICENSE file for details. |
| 268 | + |
| 269 | +## 🔗 References |
| 270 | + |
| 271 | +1. Wilcoxon, F. (1945). Individual comparisons by ranking methods. *Biometrics Bulletin*, 1(6), 80-83. |
| 272 | +2. Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. *Annals of Mathematical Statistics*, 18(1), 50-60. |
| 273 | +3. Hollander, M., Wolfe, D. A., & Chicken, E. (2013). *Nonparametric Statistical Methods* (3rd ed.). John Wiley & Sons. |
| 274 | + |
| 275 | +## 📧 Contact |
| 276 | + |
| 277 | +Created for Hacktoberfest 2024 - Educational implementation of biomedical statistical methods. |
| 278 | + |
| 279 | +--- |
| 280 | + |
| 281 | +*Happy analyzing! 🧬📈* |
0 commit comments