Skip to content

UNITES-Lab/Spatial2Sentence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spatial2Sentence

Spatial Coordinates as a Cell Language: A Multi-Sentence Framework for Imaging Mass Cytometry Analysis

Chi-Jane Chen*, Yuhang Chen*, Sukwon Yun*, Natalie Stanley, Tianlong Chen

The University of North Carolina at Chapel Hill

*Equal contribution

Image mass cytometry (IMC) enables high-dimensional spatial profiling by combining mass cytometry's analytical power with spatial distributions of cell phenotypes. Recent studies leverage large language models (LLMs) to extract cell states by translating gene or protein expression into biological context. However, existing single-cell LLMs face two major challenges: (1) Integration of spatial information: they struggle to generalize spatial coordinates and effectively encode spatial context as text, and (2) Treating each cell independently: they overlook cell-cell interactions, limiting their ability to capture biological relationships. To address these limitations, we propose Spatial2Sentence, a novel framework that integrates single-cell expression and spatial information into natural language using a multi-sentence approach. Spatial2Sentence constructs expression similarity and distance matrices, pairing spatially adjacent and expressionally similar cells as positive pairs while using distant and dissimilar cells as negatives. These multi-sentence representations enable LLMs to learn cellular interactions in both expression and spatial contexts. Equipped with multi-task learning, Spatial2Sentence outperforms existing single-cell LLMs on preprocessed IMC datasets, improving cell-type classification by 5.98% and clinical status prediction by 4.18% on the diabetes dataset while enhancing interpretability.

Repository Layout

  • ours/: preprocessing, training, and inference scripts for the paper
  • src/cell2sentence/: core library code (data conversion, prompt formatting, model wrapper)
  • src/cell2sentence/prompts/: prompt templates for cell-type, status, and multi-task settings
  • data/: released datasets (CSV adjacency and processed h5ad)
  • docs/, tutorials/: legacy documentation/examples from the base code

Data

We keep the original adjacency CSVs and regenerate processed h5ad files via the preprocessing scripts:

  • Diabetes IMC CSVs: data/diabete_csv_adjacency_v2/train, data/diabete_csv_adjacency_v2/test
  • Brain IMC CSVs: data/brain_csv_adjacency_v2/train, data/brain_csv_adjacency_v2/test

Installation

Create a Python environment (3.8+ recommended), then install dependencies:

pip install -e .

Preprocessing

Convert CSV adjacency files into h5ad (if you want to regenerate):

python ours/diabete_pre.py
python ours/brain_pre.py

Training

Fine-tune a model with multi-sentence prompts. Required arguments are --task_name, --model_name, --method, --bs, and --dataset.

Example:

python ours/finetune.py \
  --task_name both_pred \
  --model_name <hf_model_or_local_path> \
  --method s2s \
  --bs 4 \
  --dataset diabetes \
  --model_from pretrained

--method options:

  • c2s: single-sentence baseline
  • s2swos: Spatial2Sentence w/o spatial pairing
  • s2s: Spatial2Sentence (spatial+expression pairing)

Inference

Run inference with a fine-tuned checkpoint:

python ours/inference.py \
  --task_name both_pred \
  --dataset diabetes \
  --method s2s \
  --model_path <path_to_finetuned_model>

The script prints cell-type and status accuracy and writes predictions to predictions.txt.

Notes

  • All prompts live in src/cell2sentence/prompts/.
  • The multi-sentence spatial pairing is implemented in src/cell2sentence/prompt_formatter.py.

License

See LICENSE.

Acknowledgements

This project builds on the Cell2Sentence codebase (https://github.com/vandijklab/cell2sentence). We thank the authors for releasing their work and open-source tools that enabled this research.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published