Skip to content

taco-group/DecAlign

Repository files navigation

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning

Authors: Chengxuan Qian, Shuo Xing, Shawn Li, Yue Zhao, Zhengzhong Tu

DecAlign is a novel hierarchical cross-modal alignment framework that explicitly disentangles multimodal representations into modality-unique (heterogeneous) and modality-common (homogeneous) components, which not only facilitates fine-grained alignment through prototype-guided optimal transport but also enhances semantic consistency via latent distribution matching. Moreover, DecAlign effectively mitigates distributional discrepancies while preserving modality-specific characteristics, yielding consistent performance improvements across multiple multimodal benchmarks.

EMMA diagram

Figure 1. The Framework of our proposed DecAlign approach.

Installation

Clone this repository:

git clone https://github.com/taco-group/DecAlign.git

Prepare the Python environment:

cd DecAlign
conda create --name decalign python=3.9 -y
conda activate decalign

Install all the required libraries:

pip install -r requirements.txt

Dataset Preparation

The preprocess of CMU-MOSI, CMU-MOSEI and CH-SIMS datasets follows MMSA, here we provide the processed datasets through these links:

CMU-MOSI: https://drive.google.com/drive/folders/1A6lpSk1ErSXhXHEJcNqFyOomSkP81Xw7?usp=drive_link

CMU-MOSEI: https://drive.google.com/drive/folders/1XZ4z94I-AlXNQfsWmW01_iROtjWmlmdh?usp=drive_link

After downloading, organize the data in the following structure:

data/
├── MOSI/
│   └── mosi_data.pkl
├── MOSEI/
│   └── mosei_data.pkl
└── IEMOCAP/
    └── iemocap_data.pkl

Training

Train DecAlign on CMU-MOSI dataset:

python main.py --dataset mosi --data_dir ./data --mode train --seeds 1111 --gpu_ids 0

Train on CMU-MOSEI dataset:

python main.py --dataset mosei --data_dir ./data --mode train --seeds 1111 --gpu_ids 0

Train on IEMOCAP dataset:

python main.py --dataset iemocap --data_dir ./data --mode train --seeds 1111 --gpu_ids 0

Command Line Arguments:

Argument Description Default
--dataset Dataset name (mosi, mosei, iemocap) mosi
--data_dir Path to data directory ./data
--mode Run mode (train or test) train
--seeds Random seeds for reproducibility 1111
--gpu_ids GPU device IDs to use 0
--model_save_dir Directory to save trained models ./pt
--res_save_dir Directory to save results ./result
--log_dir Directory to save logs ./log

Evaluation

Evaluate a trained model:

python main.py --dataset mosi --data_dir ./data --mode test --gpu_ids 0

Project Structure

DecAlign/
├── main.py                 # Entry point
├── config.py               # Configuration settings
├── data_loader.py          # Data loading utilities
├── config/
│   └── dec_config.json     # Model hyperparameters
├── models/
│   └── model.py            # DecAlign model architecture
├── trains/
│   ├── ATIO.py             # Training logic
│   └── subNets/            # Sub-network modules
│       ├── BertTextEncoder.py
│       └── transformer.py
├── utils/
│   ├── functions.py        # Utility functions
│   └── metrices.py         # Evaluation metrics
└── scripts/                # Training scripts
    ├── run_mosi.sh
    ├── run_mosei.sh
    └── run_iemocap.sh

Citation

If you find this work useful, please cite our paper:

@article{qian2025decalign,
  title={DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning},
  author={Qian, Chengxuan and Xing, Shuo and Li, Shawn and Zhao, Yue and Tu, Zhengzhong},
  journal={arXiv preprint arXiv:2503.11892},
  year={2025}
}

Acknowledgement

This codebase is built upon MMSA. We thank the authors for their excellent work.

About

A novel cross-modal decoupling and alignment framework for multimodal representation learning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published