Fork me on GitHub


A tool for extracting parallel sentences from comparable corpora.

View Source code Download

What is this for?

Statistical Machine Translation relies on parallel corpora for training translation models. However these corpora are limited and take time to create. Yalign is designed to automate this process by finding sentences that are close translation matches from comparable corpora. This opens up avenues for harvesting parallel corpora from sources like translated documents and the web.

Try it yourself!

See how the library works in our online demo!

Learn more

Read more about how to use and how Yalign works in the Documentation


This is an active Open Source project started by the AI and NLP team of Machinalis:

If you're interested in this project you might want to check our other projects: