This is an implementation of a transfer learning approach based on unsupervised paraphrasing for Indic languages, which uses a T5 transformer-based architecture.
model.pyhas pretrained model and tokenizer related code.datafolder have all the code related data preprocessing.main.pyhas code related to training the model on task adaptation.ssmain.pyhas the code related to training self supervised modelgenerate_data.pyhas the code related to generating pseudo labels.eval.pyhas the code related to generating evalutation metrics.
All the file are ran plainly with python without any arguments.