Spatial2Sentence

Spatial Coordinates as a Cell Language: A Multi-Sentence Framework for Imaging Mass Cytometry Analysis

Chi-Jane Chen^*, Yuhang Chen^*, Sukwon Yun^*, Natalie Stanley, Tianlong Chen

The University of North Carolina at Chapel Hill

^*Equal contribution

Image mass cytometry (IMC) enables high-dimensional spatial profiling by combining mass cytometry's analytical power with spatial distributions of cell phenotypes. Recent studies leverage large language models (LLMs) to extract cell states by translating gene or protein expression into biological context. However, existing single-cell LLMs face two major challenges: (1) Integration of spatial information: they struggle to generalize spatial coordinates and effectively encode spatial context as text, and (2) Treating each cell independently: they overlook cell-cell interactions, limiting their ability to capture biological relationships. To address these limitations, we propose Spatial2Sentence, a novel framework that integrates single-cell expression and spatial information into natural language using a multi-sentence approach. Spatial2Sentence constructs expression similarity and distance matrices, pairing spatially adjacent and expressionally similar cells as positive pairs while using distant and dissimilar cells as negatives. These multi-sentence representations enable LLMs to learn cellular interactions in both expression and spatial contexts. Equipped with multi-task learning, Spatial2Sentence outperforms existing single-cell LLMs on preprocessed IMC datasets, improving cell-type classification by 5.98% and clinical status prediction by 4.18% on the diabetes dataset while enhancing interpretability.

Repository Layout

ours/: preprocessing, training, and inference scripts for the paper
src/cell2sentence/: core library code (data conversion, prompt formatting, model wrapper)
src/cell2sentence/prompts/: prompt templates for cell-type, status, and multi-task settings
data/: released datasets (CSV adjacency and processed h5ad)
docs/, tutorials/: legacy documentation/examples from the base code

Data

We keep the original adjacency CSVs and regenerate processed h5ad files via the preprocessing scripts:

Diabetes IMC CSVs: data/diabete_csv_adjacency_v2/train, data/diabete_csv_adjacency_v2/test
Brain IMC CSVs: data/brain_csv_adjacency_v2/train, data/brain_csv_adjacency_v2/test

Installation

Create a Python environment (3.8+ recommended), then install dependencies:

pip install -e .

Preprocessing

Convert CSV adjacency files into h5ad (if you want to regenerate):

python ours/diabete_pre.py
python ours/brain_pre.py

Training

Fine-tune a model with multi-sentence prompts. Required arguments are --task_name, --model_name, --method, --bs, and --dataset.

Example:

python ours/finetune.py \
  --task_name both_pred \
  --model_name <hf_model_or_local_path> \
  --method s2s \
  --bs 4 \
  --dataset diabetes \
  --model_from pretrained

--method options:

c2s: single-sentence baseline
s2swos: Spatial2Sentence w/o spatial pairing
s2s: Spatial2Sentence (spatial+expression pairing)

Inference

Run inference with a fine-tuned checkpoint:

python ours/inference.py \
  --task_name both_pred \
  --dataset diabetes \
  --method s2s \
  --model_path <path_to_finetuned_model>

The script prints cell-type and status accuracy and writes predictions to predictions.txt.

Notes

All prompts live in src/cell2sentence/prompts/.
The multi-sentence spatial pairing is implemented in src/cell2sentence/prompt_formatter.py.

License

See LICENSE.

Acknowledgements

This project builds on the Cell2Sentence codebase (https://github.com/vandijklab/cell2sentence). We thank the authors for releasing their work and open-source tools that enabled this research.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
docs		docs
ours		ours
src		src
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Spatial2Sentence.pdf		Spatial2Sentence.pdf
pylintrc		pylintrc
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spatial2Sentence

Spatial Coordinates as a Cell Language: A Multi-Sentence Framework for Imaging Mass Cytometry Analysis

Repository Layout

Data

Installation

Preprocessing

Training

Inference

Notes

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

UNITES-Lab/Spatial2Sentence

Folders and files

Latest commit

History

Repository files navigation

Spatial2Sentence

Spatial Coordinates as a Cell Language: A Multi-Sentence Framework for Imaging Mass Cytometry Analysis

Repository Layout

Data

Installation

Preprocessing

Training

Inference

Notes

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages