DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning

Authors: Chengxuan Qian, Shuo Xing, Shawn Li, Yue Zhao, Zhengzhong Tu

DecAlign is a novel hierarchical cross-modal alignment framework that explicitly disentangles multimodal representations into modality-unique (heterogeneous) and modality-common (homogeneous) components, which not only facilitates fine-grained alignment through prototype-guided optimal transport but also enhances semantic consistency via latent distribution matching. Moreover, DecAlign effectively mitigates distributional discrepancies while preserving modality-specific characteristics, yielding consistent performance improvements across multiple multimodal benchmarks.

Figure 1. The Framework of our proposed DecAlign approach.

Installation

Clone this repository:

git clone https://github.com/taco-group/DecAlign.git

Prepare the Python environment:

cd DecAlign
conda create --name decalign python=3.9 -y
conda activate decalign

Install all the required libraries:

pip install -r requirements.txt

Dataset Preparation

The preprocess of CMU-MOSI, CMU-MOSEI and CH-SIMS datasets follows MMSA, here we provide the processed datasets through these links:

CMU-MOSI: https://drive.google.com/drive/folders/1A6lpSk1ErSXhXHEJcNqFyOomSkP81Xw7?usp=drive_link

CMU-MOSEI: https://drive.google.com/drive/folders/1XZ4z94I-AlXNQfsWmW01_iROtjWmlmdh?usp=drive_link

After downloading, organize the data in the following structure:

data/
├── MOSI/
│   └── mosi_data.pkl
├── MOSEI/
│   └── mosei_data.pkl
└── IEMOCAP/
    └── iemocap_data.pkl

Training

Train DecAlign on CMU-MOSI dataset:

python main.py --dataset mosi --data_dir ./data --mode train --seeds 1111 --gpu_ids 0

Train on CMU-MOSEI dataset:

python main.py --dataset mosei --data_dir ./data --mode train --seeds 1111 --gpu_ids 0

Train on IEMOCAP dataset:

python main.py --dataset iemocap --data_dir ./data --mode train --seeds 1111 --gpu_ids 0

Command Line Arguments:

Argument	Description	Default
`--dataset`	Dataset name (mosi, mosei, iemocap)	mosi
`--data_dir`	Path to data directory	./data
`--mode`	Run mode (train or test)	train
`--seeds`	Random seeds for reproducibility	1111
`--gpu_ids`	GPU device IDs to use	0
`--model_save_dir`	Directory to save trained models	./pt
`--res_save_dir`	Directory to save results	./result
`--log_dir`	Directory to save logs	./log

Evaluation

Evaluate a trained model:

python main.py --dataset mosi --data_dir ./data --mode test --gpu_ids 0

Project Structure

DecAlign/
├── main.py                 # Entry point
├── config.py               # Configuration settings
├── data_loader.py          # Data loading utilities
├── config/
│   └── dec_config.json     # Model hyperparameters
├── models/
│   └── model.py            # DecAlign model architecture
├── trains/
│   ├── ATIO.py             # Training logic
│   └── subNets/            # Sub-network modules
│       ├── BertTextEncoder.py
│       └── transformer.py
├── utils/
│   ├── functions.py        # Utility functions
│   └── metrices.py         # Evaluation metrics
└── scripts/                # Training scripts
    ├── run_mosi.sh
    ├── run_mosei.sh
    └── run_iemocap.sh

Citation

If you find this work useful, please cite our paper:

@article{qian2025decalign,
  title={DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning},
  author={Qian, Chengxuan and Xing, Shuo and Li, Shawn and Zhao, Yue and Tu, Zhengzhong},
  journal={arXiv preprint arXiv:2503.11892},
  year={2025}
}

Acknowledgement

This codebase is built upon MMSA. We thank the authors for their excellent work.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
config		config
figs		figs
models		models
scripts		scripts
static		static
trains		trains
utils		utils
.gitignore		.gitignore
.nojekyll		.nojekyll
LICENSE		LICENSE
README.md		README.md
config.py		config.py
data_loader.py		data_loader.py
data_prep.sh		data_prep.sh
eval.py		eval.py
index.html		index.html
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning

Installation

Dataset Preparation

Training

Evaluation

Project Structure

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

taco-group/DecAlign

Folders and files

Latest commit

History

Repository files navigation

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning

Installation

Dataset Preparation

Training

Evaluation

Project Structure

Citation

Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages