Authors: Chengxuan Qian, Shuo Xing, Shawn Li, Yue Zhao, Zhengzhong Tu
DecAlign is a novel hierarchical cross-modal alignment framework that explicitly disentangles multimodal representations into modality-unique (heterogeneous) and modality-common (homogeneous) components, which not only facilitates fine-grained alignment through prototype-guided optimal transport but also enhances semantic consistency via latent distribution matching. Moreover, DecAlign effectively mitigates distributional discrepancies while preserving modality-specific characteristics, yielding consistent performance improvements across multiple multimodal benchmarks.
Clone this repository:
git clone https://github.com/taco-group/DecAlign.git
Prepare the Python environment:
cd DecAlign
conda create --name decalign python=3.9 -y
conda activate decalign
Install all the required libraries:
pip install -r requirements.txt
The preprocess of CMU-MOSI, CMU-MOSEI and CH-SIMS datasets follows MMSA, here we provide the processed datasets through these links:
CMU-MOSI: https://drive.google.com/drive/folders/1A6lpSk1ErSXhXHEJcNqFyOomSkP81Xw7?usp=drive_link
CMU-MOSEI: https://drive.google.com/drive/folders/1XZ4z94I-AlXNQfsWmW01_iROtjWmlmdh?usp=drive_link
After downloading, organize the data in the following structure:
data/
├── MOSI/
│ └── mosi_data.pkl
├── MOSEI/
│ └── mosei_data.pkl
└── IEMOCAP/
└── iemocap_data.pkl
Train DecAlign on CMU-MOSI dataset:
python main.py --dataset mosi --data_dir ./data --mode train --seeds 1111 --gpu_ids 0Train on CMU-MOSEI dataset:
python main.py --dataset mosei --data_dir ./data --mode train --seeds 1111 --gpu_ids 0Train on IEMOCAP dataset:
python main.py --dataset iemocap --data_dir ./data --mode train --seeds 1111 --gpu_ids 0Command Line Arguments:
| Argument | Description | Default |
|---|---|---|
--dataset |
Dataset name (mosi, mosei, iemocap) | mosi |
--data_dir |
Path to data directory | ./data |
--mode |
Run mode (train or test) | train |
--seeds |
Random seeds for reproducibility | 1111 |
--gpu_ids |
GPU device IDs to use | 0 |
--model_save_dir |
Directory to save trained models | ./pt |
--res_save_dir |
Directory to save results | ./result |
--log_dir |
Directory to save logs | ./log |
Evaluate a trained model:
python main.py --dataset mosi --data_dir ./data --mode test --gpu_ids 0DecAlign/
├── main.py # Entry point
├── config.py # Configuration settings
├── data_loader.py # Data loading utilities
├── config/
│ └── dec_config.json # Model hyperparameters
├── models/
│ └── model.py # DecAlign model architecture
├── trains/
│ ├── ATIO.py # Training logic
│ └── subNets/ # Sub-network modules
│ ├── BertTextEncoder.py
│ └── transformer.py
├── utils/
│ ├── functions.py # Utility functions
│ └── metrices.py # Evaluation metrics
└── scripts/ # Training scripts
├── run_mosi.sh
├── run_mosei.sh
└── run_iemocap.sh
If you find this work useful, please cite our paper:
@article{qian2025decalign,
title={DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning},
author={Qian, Chengxuan and Xing, Shuo and Li, Shawn and Zhao, Yue and Tu, Zhengzhong},
journal={arXiv preprint arXiv:2503.11892},
year={2025}
}This codebase is built upon MMSA. We thank the authors for their excellent work.
