Skip to content

Machine Learning for Automated Electrical Penetration Graph Analysis of Aphid Feeding Behavior: Accelerating Research on Insect-Plant Interactions

License

Notifications You must be signed in to change notification settings

HySonLab/ML4Insects

Repository files navigation

DiscoEPG (Discover EPG) - A library for EPG signal analysis of piercing-sucking insects πŸžπŸƒβš‘πŸ’»

ML4Insects
PyPI Link

Our work has been published at:

  • PLOS ONE: paper
  • Smart Agricultural Technology: paper

🌎 Overview

Electrical penetration graph (EPG) is a technique used to study the feeding behavior of piercing-sucking insects such as aphids. Specifically, the experimental insect and host plant are made part of an electrical circuit, which is closed when aphid mouthparts penetrate plant tissue. When the aphid stylet is inserted intercellularly, the voltage is positive and when inserted intracellularly, the voltage is negative. Waveforms in EPG have been correlated to specific aphid feeding behaviors by stylectomy followed by microscopy of the plant tissue to determine the approximate location of the stylet as well as observing aphid head movement, posture, and muscle dynamics. EPG is well established and has been widely used to study the mechanisms of plant virus transmission by aphids, the effect of resistant and susceptible lines on aphid feeding behaviors, and to better our understanding of the mechanisms that aphids use to continuously feed from the phloem.

ML4Insects

The DiscoEPG (abbreviated for Discover-EPG) package is an open-source Python package, designed to be compatible with the popular Stylet+ EPG System by W. F. Tjallingii [1]. DiscoEPG provides many utilities including data visualization, accurate automatic segmentation and annotation of waveforms, and calculations of various EPG parameters, which facilitate the data analysis stage in the study of EPG signals. The package was used as a helpful support tool for our study in characterizing aphid's behavior based on this data.

The novelty of DiscoEPG lies in the automatic segmentation procedure, which follows a sliding-window technique where the entire signal is broken into non-overlapping segments, then the each of them is labeled independently before concatenating the predictions to form a unified segmentation. Despite being simple, we observe great performance in terms of 1) the segment classification results and 2) the overlap rate between the prediction and the ground-truth aggregated segmentation.

πŸ“ Novel features of DiscoEPG

ML for characterizing EPG waveforms

DiscoEPG provides two trainer objects EPGSegment and EPGSegmentML which respectively support training Deep Learning models (CNN1D, ResNet and CNN2D) and Traditional Machine Learning models (XGB, Random Forest, Logistic Regression) for automatically detecting EPG waveforms. For Deep Learning models, it is possible to save the trained model for future use, while only XGB from the other group provide a similar function. The prediction results can be plotted to visually assess or make post-prediction refinement, as small alignment errors are unavoidable. To make this step easier, EPGSegment allows saving the prediction result in a *.ANA file which can be later processed by Stylet+.

Visualization

DiscoEPG allows users to create color plots, in both static and interactive states of EPG recordings. The data visualization functions are based on well-known visualization libraries such as matplotlib and plotly. To help with visualizing a huge numbers of data points, plotly-resampler [2] was incorporated into our package. The figure below shows an example of a plot between the predicted segmentation and the ground-truth version. The overlap rate is 95%, where the errors were mostly caused by minor waveforms such as pd.

ML4Insects

EPG parameters calculation

DiscoEPG can calculate various EPG parameters proposed for aphids, adopted from [4].

πŸ““ Example of usage

Installation

To install DiscoEPG, simply run

pip install DiscoEPG

For DiscoEPG to run properly, you only need to prepare a dataset folder which contains the recordings with the .D0x format and the annotation files with the .ANA format obtained from Stylet+ application. Inside the data-containing folder named <data>, there should be one subfolder called <dataset_name> containing the recording data (with .D0xextension) and another one called <dataset_name>_ANA containing the waveform position (with .ANA extension). Each complete recording comprises of multiple hour-long recording files, which will be concatenated into one complete recording.

For example

working directory
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ dataset
|   |   └── dataset.name0.D01
|   |   └── dataset.name1.D01
|   |   └── ...
|   |   └── dataset.name0.D08
|   |   └── dataset.name1.D08
|   β”œβ”€β”€ dataset_ANA
|   |   └── dataset.name0_ANA
|   |   └── dataset.name1_ANA
|   |   └── ...
|   └── ...
β”œβ”€β”€ config
|   └── config_file_name.json
|	└── ...
└── Your Python script
└── ...

For loading EPG data and doing EPG parameters calculation

from DiscoEPG import EPGDataset
root_dir = <your_working_directory>
dataset = EPGDataset(data_path = root_dir, dataset_name = <a_dataset_name>)

For training/making inference with ML models

from DiscoEPG import EPGSegment # Importing trainer objects
from DiscoEPG.utils import process_config
config_file = <the_path_to_your_config_file> # Define the path to your config file
config = process_config(config_file)
epgs = EPGSegment(config) # Call the trainer

NOTE. Please refer to the tutorial notebooks for explicit detail on how to work with DiscoEPG.

πŸ’‘ Acknowledgement

We hugely thanks the authors of the cited work for providing us with necessary tools which were the building blocks for DiscoEPG.

βœ… If you find our work helpful, please cite it with

@article{10.1371/journal.pone.0319484,
    doi = {10.1371/journal.pone.0319484},
    author = {Dinh, Quang Dung AND Kunk, Daniel AND Son Hy, Truong AND Nalam, Vamsi AND Dao, Phuong D},
    journal = {PLOS ONE},
    publisher = {Public Library of Science},
    title = {Machine learning for automated electrical penetration graph analysis of aphid feeding behavior: Accelerating research on insect-plant interactions},
    year = {2025},
    month = {04},
    volume = {20},
    url = {https://doi.org/10.1371/journal.pone.0319484},
    pages = {1-25},
    abstract = {The electrical penetration graph (EPG) is a well-known technique that provides insights into the feeding behavior of insects with piercing-sucking mouthparts, mostly hemipterans. Since its inception in the 1960s, EPG has become indispensable in studying plant-insect interactions, revealing critical information about host plant selection, plant resistance, virus transmission, and responses to environmental factors. By integrating the plant and insect into an electrical circuit, EPG allows researchers to identify specific feeding behaviors based on their distinctive waveform patterns. However, the traditional manual analysis of EPG waveform data is time-consuming and labor-intensive, limiting research throughput. This study presents a novel Machine Learning (ML) approach to automate the annotation of EPG signals. We rigorously evaluated six diverse ML models, including neural networks, tree-based models, and logistic regression, using an extensive dataset from multiple aphid feeding experiments. Our results demonstrate that a Residual Network (ResNet) architecture achieved the highest overall waveform classification accuracy of 96.8% and highest segmentation overlap rate of 84.4%, highlighting the potential of ML for accurate and efficient EPG analysis. This automated approach promises to accelerate research in this field significantly and broaden insights into insect-plant interactions, showcasing the power of computational techniques for insect biological research. The source code for all experiments conducted within this study is publicly available at https://github.com/HySonLab/ML4Insects.},
    number = {4},

}
@article{QUANGDINH2026101874,
title = {DiscoEPG: A Python package for characterization of insect electrical penetration graph (EPG) signals},
journal = {Smart Agricultural Technology},
volume = {13},
pages = {101874},
year = {2026},
issn = {2772-3755},
doi = {https://doi.org/10.1016/j.atech.2026.101874},
url = {https://www.sciencedirect.com/science/article/pii/S2772375526000985},
author = {Dung {Quang Dinh} and Daniel Kunk and Truong-Son Hy and Nalam Vamsi and Phuong D. Dao},
keywords = {Electrical penetration graph, Pierce-sucking insect, Automatic annotation, Machine learning, Open-source package},
abstract = {The Electrical Penetration Graph (EPG) technique is a widely recognized tool for monitoring and analyzing the feeding behavior of herbivorous insects with piercing-sucking mouthparts, such as aphids. Traditionally, EPG waveform annotation relies on a skilled practitioner using expert knowledge to compare target waveforms with the standard patterns of feeding behaviors. This process is labor-intensive, often requiring more than 30 minutes to annotate an 8-hour recording, depending on the complexity of the behaviors displayed. Machine learning (ML) has shown significant potential in automating behavioral analysis, including in EPG waveform annotation. However, most publicly available tools that provide automatic annotation suffer from low prediction accuracy due to their simple classification rules. To address this limitation, we developed DiscoEPG, an open-source Python package that offers highly accurate automatic annotation of aphid EPG data generated with EPG Systems hardware and the associated Stylet+ software. We rigorously evaluated multiple ML algorithms, demonstrating superior predictive accuracy compared to existing methods. In addition, DiscoEPG includes novel features to improve usability, such as tools to generate publication-quality visualizations, compute EPG variables, and perform statistical analysis. By streamlining the analysis of aphid EPG data, DiscoEPG aims to make this technique more accessible to researchers studying the feeding behavior of aphids. Our package source code and example interactive workbooks are publicly available at: https://github.com/HySonLab/ML4Insects.}
}

πŸ§‘β€πŸ”¬ Contributors

  • Quang-Dung DINH, Institut GalilΓ©e, Universite Sorbonne Paris Nord, Villetaneuse 93430, Paris, France
  • Dr. Truong-Son HY (PI), Department of Computer Science, University of Alabama at Birmingham, Birmingham, AL 35294, United States
  • Dr. Phuong DAO (PI), Department of Agricultural Biology, Colorado State University, Fort Collins, CO 80523, United States

πŸ“– References

1. Aphids' EPG waveforms. Tjallingii WF. Electronic Recording of Penetration Behaviour by Aphids Entomologia Experimentalis et Applicata. 1978; 24(3): 721–730.

2. Package for effective EPG visualization. J. Van Der Donckt, J. Van der Donckt, E. Deprost and S. Van Hoecke, "Plotly-Resampler: Effective Visual Analytics for Large Time Series," 2022 IEEE Visualization and Visual Analytics (VIS), Oklahoma City, OK, USA, 2022, pp. 21-25. GitHub

3. The Pytorch implementation of wavelet transform. Runia, T.F.H., Snoek, C.G.M. & Smeulders, A.W.M. Repetition Estimation. Int J Comput Vis 127, 1361–1383 (2019). GitHub

4. The EPG parameters which we adopt. Elisa Garzo, Antonio Jesús Álvarez, ArÑnzazu Moreno, Gregory P Walker, W Fred Tjallingii, Alberto Fereres, Novel program for automatic calculation of EPG variables, Journal of Insect Science, Volume 24, Issue 3, May 2024, 28.

About

Machine Learning for Automated Electrical Penetration Graph Analysis of Aphid Feeding Behavior: Accelerating Research on Insect-Plant Interactions

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •