TrellisSoft interview question

How would you build a scalable NLP pipeline for multilingual data using open-source libraries?