What is this for?
Statistical Machine Translation relies on parallel corpora for training translation models. However these corpora are limited and take time to create. Yalign is designed to automate this process by finding sentences that are close translation matches from comparable corpora. This opens up avenues for harvesting parallel corpora from sources like translated documents and the web.
Try it yourself!
See how the library works in our online demo!
Read more about how to use and how Yalign works in the Documentation
This is an active Open Source project started by the AI and NLP team of Machinalis:
- Andrew Vine
- Elías Andrawos
- Gonzalo García Berrotarán
- Laura Alonso i Alemany
- Rafael Carrascosa
If you're interested in this project you might want to check our other projects:
- Quepy: A framework to transform natural language in database queries.
- SimpleAI: Python implementation of artificial inteligence algorithms.
- REfO: Regular expressions for objects