Efficient multilingual machine translation

Towards high quality multilingual NMT in production
Towards high quality multilingual NMT in production. Blog article by Alexandre Berard, Laurent Besacier and Vassilina Nikoulina
EMNLP 2021

This page contains resources related to our EMNLP and WMT 2021 publications about multilingual machine translation. We release model checkpoints, fairseq modules to decode from those models, the test splits we used in the papers, and translation outputs by our models.

Read this blog article for an overview of our work on multilingual machine translation

Download our fairseq modules
The archive contains a README giving installation and usage instructions. Note that most of our checkpoints cannot be used without these modules.

Efficient Inference for MNMT

Checkpoints: ParaCrawl models and TED Talks models

Test sets: TED2020 splits

Translation outputs: FLORES

Citation:

@inproceedings{berard2021_efficient,
    title={Efficient Inference for Multilingual Neural Machine Translation},
    author={B\'erard, Alexandre and Lee, Dain and Clinchant, St\'ephane and Jung, Kweonwoo and Nikoulina, Vassilina},
    booktitle="Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month=nov,
    year="2021",
    address="Punta Cana, Dominican Republic",
    publisher="Association for Computational Linguistics",
    url={https://arxiv.org/abs/2109.06679}
}

Continual Learning in MNMT via Language-Specific Embeddings

Checkpoints: ParaCrawl models and TED Talks models

Test sets: TED2020 splits

Translation outputs: FLORES

Citation:

@inproceedings{berard2021_continual,
    title={Continual Learning in Multilingual NMT via Language-Specific Embeddings},
    author={B\'erard, Alexandre},
    booktitle="Proceedings of the Sixth Conference in Machine Translation (WMT)",
    month=nov,
    year="2021",
    publisher="Association for Computational Linguistics",
    url={https://arxiv.org/abs/2110.10478}

}

Multilingual Unsupervised Neural Machine Translation with Denoising Adapters

Checkpoints: mBART full fine-tuning, task adapters and denoising adapters

Citation:

@inproceedings{ustun2021_denoising,
    title={Multilingual Unsupervised Neural Machine Translation with Denoising Adapters},
    author={\"Ust\"un, Ahmet and B\'erard, Alexandre and Besacier, Laurent and Gall\'e, Matthias},
    booktitle="Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month=nov,
    year="2021",
    address="Punta Cana, Dominican Republic",
    publisher="Association for Computational Linguistics",
    url={https://arxiv.org/abs/2110.10472}
}

Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters

Test sets: Koran, Medical and IT splits 

Translation outputs: Koran 

Citation:

@inproceedings{stickland2021_multilingual,
    title={Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters},
    author={Cooper Stickland, Asa and B\'rard, Alexandre and Nikoulina, Vassilina},
    booktitle="Proceedings of the Sixth Conference in Machine Translation (WMT)",
    month=nov,
    year="2021",
    publisher="Association for Computational Linguistics",

    url={https://arxiv.org/abs/2110.09574}}

This web site uses cookies for the site search, to display videos and for aggregate site analytics.

Learn more about these cookies in our privacy notice.

blank

Cookie settings

You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.

FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.

AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.

Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.

blank