Replication of "Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas"
Repository: https://github.com/chirindaopensource/forecasting_macroeconomic_scenarios_synthetic_llm_personas
Owner: 2025 Craig Chirinda (Open Source Projects)
This repository contains an independent, professional-grade Python implementation of the research methodology from the 2025 paper entitled "Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas" by:
- Giulia Iadisernia
- Carolina Camassa
The project provides a complete, end-to-end computational framework for replicating the paper's findings. It delivers a modular, auditable, and extensible pipeline that executes the entire research workflow: from rigorous data validation and cleansing to a large-scale, multi-stage persona filtering process, high-volume asynchronous forecast generation, and the final statistical analysis, including the central ablation study.
- Introduction
- Theoretical Background
- Features
- Methodology Implemented
- Core Components (Notebook Structure)
- Key Callable:
run_synthetic_economist_study - Prerequisites
- Installation
- Input Data Structure
- Usage
- Output Structure
- Project Structure
- Customization
- Contributing
- Recommended Extensions
- License
- Citation
- Acknowledgments
This project provides a Python implementation of the analytical framework presented in Iadisernia and Camassa (2025). The core of this repository is the iPython Notebook forecasting_macroeconomic_scenarios_synthetic_llm_personas_draft.ipynb, which contains a comprehensive suite of functions to replicate the paper's findings. The pipeline is designed to be a robust and scalable system for evaluating the impact of persona-based prompting on the macroeconomic forecasting performance of Large Language Models (LLMs).
The paper's central research question is whether sophisticated persona descriptions improve LLM performance in a real-world forecasting task. This codebase operationalizes the paper's experimental design, allowing users to:
- Rigorously validate and manage the entire experimental configuration via a single
config.yamlfile. - Execute a multi-stage filtering pipeline on the ~370M-entry PersonaHub corpus to distill a set of 2,368 high-quality, domain-specific expert personas.
- Systematically replicate 50 rounds of the ECB Survey of Professional Forecasters using GPT-4o.
- Generate over 120,000 individual forecasts across a main "persona" arm and a "no-persona" baseline for a controlled ablation study.
- Perform a comprehensive statistical analysis comparing the accuracy (MAE, win-share) and disagreement (dispersion) of the AI panels against the human expert panel.
- Run hypothesis tests (Monte Carlo and exact binomial) to assess the statistical significance of the findings.
- Execute the final ablation study tests (paired t-test, Kolmogorov-Smirnov test) to formally evaluate the impact of personas.
The implemented methods are grounded in the principles of experimental design, computational linguistics, and time-series forecast evaluation.
1. Persona-Based Prompting: The core hypothesis is that providing an LLM with a detailed "persona" or "role" can improve its performance on domain-specific reasoning tasks. This is tested via an ablation study, where the performance of prompts with personas is compared to identical prompts without them.
2. Forecast Evaluation Metrics:
- Mean Absolute Error (MAE): A standard metric for point forecast accuracy, measuring the average magnitude of forecast errors. $$ \mathrm{MAE}{vh} = \frac{1}{n{vh}} \sum_{r=1}^{n_{vh}} | \hat{y}{rvh} - y{rvh} | $$
- Win-Share: A head-to-head comparison metric that counts the proportion of forecast rounds where one panel's forecast was strictly more accurate than another's, excluding ties. $$ w_{vh} = \frac{W_{vh}}{n_{vh}}, \quad \text{where } W_{vh} = \sum_{r} \mathbf{1}{ e_{rvh}^{\mathrm{AI}} < e_{rvh}^{\mathrm{H}} } $$
3. Hypothesis Testing:
-
Null Hypothesis: The AI panel and human panel have an equal probability of producing a more accurate forecast (
$H_0: p=0.5$ ). -
Test for Large Samples (In-Sample): The null distribution is approximated using a Monte Carlo simulation with
$N=10,000$ draws from a Binomial distribution,$W^* \sim \text{Binom}(n_{vh}, 0.5)$ . - Test for Small Samples (Out-of-Sample): An exact Binomial test is used, calculating probabilities directly from the Binomial PMF.
The provided iPython Notebook (forecasting_macroeconomic_scenarios_synthetic_llm_personas_draft.ipynb) implements the full research pipeline, including:
- Modular, Multi-Task Architecture: The entire pipeline is broken down into 25 distinct, modular tasks, each with its own orchestrator function.
- Configuration-Driven Design: All study parameters are managed in an external
config.yamlfile. - Scalable Data Processing: Includes streaming validators and processors for the large-scale PersonaHub dataset.
- Resilient API Orchestration: A robust, asynchronous framework for managing over 120,000 API calls with concurrency control, rate limiting, automatic retries, and resumability via checkpointing.
- Advanced Persona Filtering: A four-stage pipeline combining keyword filtering, NER, semantic deduplication (via embeddings and HNSW), and a multi-run, majority-vote LLM-as-a-judge triage.
- Rigorous Statistical Analysis: Implements all specified forecast evaluation metrics and hypothesis tests with high fidelity.
- Complete Replication: A single top-level function call can execute the entire study from raw data to final result tables.
The core analytical steps directly implement the methodology from the paper:
- Validation & Cleansing (Tasks 1-4): Ingests and validates all raw inputs, including the ~370M-entry PersonaHub file and all analytical datasets.
- Persona Filtering (Tasks 5-9): Executes the four-stage filtering pipeline to derive the final
persona_final_dfof 2,368 personas. Includes a Cohen's kappa validation of the LLM judge. - Forecast Generation (Tasks 10-12): Assembles all 123,400 prompts and executes the API calls for both the persona and baseline arms.
- Analysis & Scoring (Tasks 13-22): Consolidates and QCs all forecasts, computes AI panel medians, aligns all data sources, calculates dispersion, errors, MAE, and win-shares, and runs all hypothesis tests.
- Ablation Study (Tasks 24-25): Runs the final paired t-test and Kolmogorov-Smirnov test on the aligned results to formally test the paper's main hypothesis.
The forecasting_macroeconomic_scenarios_synthetic_llm_personas_draft.ipynb notebook is structured as a logical pipeline with modular orchestrator functions for each of the 25 major tasks. All functions are self-contained, fully documented with type hints and docstrings, and designed for professional-grade execution.
The project is designed around a single, top-level user-facing interface function:
run_synthetic_economist_study: This master orchestrator function, located in the final section of the notebook, runs the entire automated research pipeline from end-to-end. A single call to this function reproduces the entire computational portion of the project.
- Python 3.9+
- An OpenAI API key.
- Core dependencies:
pandas,numpy,pyyaml,pyarrow,openai,spacy,sentence-transformers,networkx,hnswlib,scipy,scikit-learn,tqdm,faker.
-
Clone the repository:
git clone https://github.com/chirindaopensource/forecasting_macroeconomic_scenarios_synthetic_llm_personas.git cd forecasting_macroeconomic_scenarios_synthetic_llm_personas -
Create and activate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install Python dependencies:
pip install -r requirements.txt
-
Download the spaCy model:
python -m spacy download en_core_web_trf
-
Set your OpenAI API Key:
export OPENAI_API_KEY='your-key-here'
The pipeline requires several input files with specific schemas, which are rigorously validated. A synthetic data generator is included in the notebook for a self-contained demonstration.
persona_hub_raw.parquet: The large-scale persona dataset.contextual_data.csv: Time-series data for the 50 SPF rounds.human_benchmark.csv: Human expert panel median forecasts.human_micro.csv: Individual human expert forecasts.realized_outcomes.csv: Ground-truth macroeconomic data.human_annotations.csv: Human judgments for the kappa validation.
All other parameters are controlled by the config.yaml file.
The forecasting_macroeconomic_scenarios_synthetic_llm_personas_draft.ipynb notebook provides a complete, step-by-step guide. The primary workflow is to execute the final cell of the notebook, which demonstrates how to use the top-level run_synthetic_economist_study orchestrator:
# Final cell of the notebook
# This block serves as the main entry point for the entire project.
if __name__ == '__main__':
# 1. Define paths and load configuration.
run_dir = Path("./synthetic_economist_run")
raw_data_dir = run_dir / "raw_data"
output_dir = run_dir / "output"
with open('config.yaml', 'r') as f:
config = yaml.safe_load(f)
# 2. Generate a full set of synthetic data files for the demonstration.
# (The generation functions are defined in the notebook)
data_paths = setup_synthetic_data(raw_data_dir, config)
# 3. Execute the entire replication study.
final_artifacts = run_synthetic_economist_study(
data_paths=data_paths,
config=config,
output_dir=str(output_dir),
total_persona_rows=10000, # Use the size of our synthetic dataset
run_kappa_validation=True
)
# 4. Inspect final results.
print("--- Ablation Paired T-Test Report ---")
print(final_artifacts['ablation_ttest_report'])The pipeline generates a structured output directory:
output/processed/: Contains intermediate, processed data files (e.g., cleansed and filtered persona sets).output/checkpoints/: Contains raw JSONL results from all API calls, enabling resumability.output/results/: Contains all final output tables (MAE, win-share, dispersion) as CSV files and a comprehensivefull_pipeline_report.json.output/pipeline_run.log: A detailed log file for the entire run.
forecasting_macroeconomic_scenarios_synthetic_llm_personas/
│
├── forecasting_macroeconomic_scenarios_synthetic_llm_personas_draft.ipynb
├── config.yaml
├── requirements.txt
│
├── study_run/
│ ├── raw_data/
│ └── output/
│ ├── processed/
│ ├── results/
│ ├── checkpoints/
│ └── pipeline_run.log
│
├── LICENSE
└── README.md
The pipeline is highly customizable via the config.yaml file. Users can modify all study parameters, including model names, API settings, filtering thresholds, and file paths, without altering the core Python code.
Contributions are welcome. Please fork the repository, create a feature branch, and submit a pull request with a clear description of your changes. Adherence to PEP 8, type hinting, and comprehensive docstrings is required.
Future extensions could include:
- Evaluating Different LLMs: The modular design allows for easy substitution of the
model_namein the config to test other models (e.g., from Anthropic, Google). - Density Forecasting: Extending the prompts to ask for probability distributions (as in the real SPF) and evaluating the quality of the LLM's density forecasts.
- Alternative Prompting Strategies: Implementing and testing other prompting techniques, such as chain-of-thought or adversarial prompting, to see if they can generate more diverse or accurate forecasts.
This project is licensed under the MIT License. See the LICENSE file for details.
If you use this code or the methodology in your research, please cite the original paper:
@inproceedings{iadisernia2025prompting,
author = {Iadisernia, Giulia and Camassa, Carolina},
title = {Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas},
year = {2025},
booktitle = {Proceedings of the 6th ACM International Conference on AI in Finance},
series = {ICAIF '25}
}For the implementation itself, you may cite this repository:
Chirinda, C. (2025). A Production-Grade Replication of "Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas".
GitHub repository: https://github.com/chirindaopensource/forecasting_macroeconomic_scenarios_synthetic_llm_personas
- Credit to Giulia Iadisernia and Carolina Camassa for the foundational research that forms the entire basis for this computational replication.
- This project is built upon the exceptional tools provided by the open-source community. Sincere thanks to the developers of the scientific Python ecosystem, including Pandas, NumPy, spaCy, Sentence-Transformers, NetworkX, Scipy, Scikit-learn, and OpenAI.
--
This README was generated based on the structure and content of the forecasting_macroeconomic_scenarios_synthetic_llm_personas_draft.ipynb notebook and follows best practices for research software documentation.