Topic Continuity in Political Discourse

This repository contains the code and resources for the paper "Explainable Topic Continuity in Political Discourse: A Sentence Pair BERT Model Analysis". The project leverages Sentence Pair Modeling (SPM), BERT, and the Transformers Interpret library to analyze topic continuity in political discourse.

Project Overview

Topic continuity is defined by specific linguistic features that suggest a sustained subject or theme between two consecutive sentences. This research focuses on analyzing five linguistic features that define topic continuity:

Coreferentiality
Lexical cohesion
Semantic cohesion
Syntactic parallelism
Transitional cohesion

The project includes a dataset of 2,884 sentence pairs and a fine-tuned BERT model (TopicContinuityBERT) to analyze how these linguistic features influence topic continuity across sentences.

Academic Context

This paper is part of the doctoral thesis:

"Explaining Large Language Models for Passage-Level Political Statement Extraction Using Linguistic Rule-Based Models"

A doctoral thesis submitted to the Faculty 1: Mathematics, Computer Science, Physics, Electrical Engineering and Information Technology of the Brandenburg University of Technology Cottbus-Senftenberg for the academic degree of Dr.-Ing.

This work was published in: Reyes, J. F., "Explainable Topic Continuity in Political Discourse: A Sentence Pair BERT Model Analysis", International Journal of Computational Linguistics (IJCL), Volume 15, Issue 2.

Hugging Face Resources

The model and dataset used in this project are published on Hugging Face:

Dataset: TopicContinuity, https://doi.org/10.57967/hf/2756
Model: TopicContinuityBERT, https://doi.org/10.57967/hf/2757

Repository Structure

Root Directory Python Files

db.py: Database connection and configuration for Google Sheets integration
paper_c_1_split_dataset.py: Splits the dataset into train, validation, and test sets
paper_c_2_train_bert.py: Trains the BERT model for topic continuity classification
paper_c_3_test_bert.py: Evaluates the trained BERT model on the test dataset
paper_c_4_inference_bert.py: Performs inference using the trained BERT model
paper_c_5_plot_embeddings.py: Visualizes BERT embeddings
paper_c_6_lrbm_classify.py: Implements a Logistic Regression Baseline Model for comparison
paper_c_7_extend_tokenizer.py: Extends the BERT tokenizer with domain-specific tokens
paper_c_8_transformers_interpret_analysis.py: Performs explainability analysis using Transformers Interpret
paper_c_9_bert_hop_training.py: Implements a hyperparameter optimization training approach for BERT
paper_c_10_feature_analysis.py: Analyzes linguistic features in the dataset
paper_c_11_word_frequency_analysis.py: Analyzes word frequencies in the dataset
paper_c_12_eda.py: Performs exploratory data analysis

Library Files (lib/)

continuity_checks.py: Implements checks for topic continuity features
ner_processing.py: Processes named entities for coreferentiality analysis
text_utils.py: Provides text processing utilities
utils.py: Contains general utility functions used across the project
visualizations.py: Implements visualization functions for analysis results

Images (images/)

paper_c_1_dl_setfit_confusion_matrix.png: Confusion matrix visualization
paper_c_bert_losses_final_28_06.png: Plot of BERT model training losses
paper_c_bert_roc_curve.png: ROC curve for the BERT model
paper_c_plot_bert_embeddings_22_07.png: Visualization of BERT embeddings

Datasets (dataset/)

topic_continuity_test.jsonl: Test dataset with sentence pairs and labels
topic_continuity_train.jsonl: Training dataset with sentence pairs and labels
topic_continuity_valid.jsonl: Validation dataset with sentence pairs and labels

Documentation

paper-c.html: The full research paper describing the methodology and findings
unused_lib_files.md: List of library files not directly used in the main scripts

Usage

Dataset Preparation: Run paper_c_1_split_dataset.py to prepare the dataset
Model Training: Run paper_c_2_train_bert.py to train the BERT model
Model Evaluation: Run paper_c_3_test_bert.py to evaluate the model
Analysis: Run the various analysis scripts (paper_c_8_transformers_interpret_analysis.py, paper_c_10_feature_analysis.py, etc.) to analyze the results

Requirements

The project dependencies are listed in the requirements.txt file. Install them using:

pip install -r requirements.txt

Research Findings

The analysis reveals that coreferentiality, lexical cohesion, and transitional cohesion are pivotal in maintaining thematic consistency through sentence pairs. This research enhances our understanding of political rhetoric and improves transparency in natural language processing models, offering insights into the dynamics of political discourse.

Citation

If you use this code or the findings in your research, please cite the original paper:

Reyes, J. F. (2024). Explainable Topic Continuity in Political Discourse: A Sentence Pair BERT Model Analysis. International Journal of Computational Linguistics (IJCL), Volume 15, Issue 2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Topic Continuity in Political Discourse

Project Overview

Academic Context

Hugging Face Resources

Repository Structure

Root Directory Python Files

Library Files (lib/)

Images (images/)

Datasets (dataset/)

Documentation

Usage

Requirements

Research Findings

Citation

About

Uh oh!

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset		dataset
images		images
lib		lib
README.md		README.md
db.py		db.py
paper_c_10_feature_analysis.py		paper_c_10_feature_analysis.py
paper_c_11_word_frequency_analysis.py		paper_c_11_word_frequency_analysis.py
paper_c_12_eda.py		paper_c_12_eda.py
paper_c_1_split_dataset.py		paper_c_1_split_dataset.py
paper_c_2_train_bert.py		paper_c_2_train_bert.py
paper_c_3_test_bert.py		paper_c_3_test_bert.py
paper_c_4_inference_bert.py		paper_c_4_inference_bert.py
paper_c_5_plot_embeddings.py		paper_c_5_plot_embeddings.py
paper_c_6_lrbm_classify.py		paper_c_6_lrbm_classify.py
paper_c_7_extend_tokenizer.py		paper_c_7_extend_tokenizer.py
paper_c_8_transformers_interpret_analysis.ipynb		paper_c_8_transformers_interpret_analysis.ipynb
paper_c_8_transformers_interpret_analysis.py		paper_c_8_transformers_interpret_analysis.py
paper_c_9_bert_hop_training.py		paper_c_9_bert_hop_training.py
requirements.txt		requirements.txt

pacoreyes/topic-continuity

Folders and files

Latest commit

History

Repository files navigation

Topic Continuity in Political Discourse

Project Overview

Academic Context

Hugging Face Resources

Repository Structure

Root Directory Python Files

Library Files (lib/)

Images (images/)

Datasets (dataset/)

Documentation

Usage

Requirements

Research Findings

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages