A tool for extracting parallel sentences from comparable corpora.

What is this for?

Statistical Machine Translation relies on parallel corpora for training translation models. However these corpora are limited and take time to create. Yalign is designed to automate this process by finding sentences that are close translation matches from comparable corpora. This opens up avenues for harvesting parallel corpora from sources like translated documents and the web.

Read more about how to use and how Yalign works in the Documentation


This is an active Open Source project started by the AI and NLP team of Machinalis:

