NLP experiment

This ongoing project is an attempt to build a NLP classifier for supermarket products using their title. The following techologies have been used:

  • Backend - Django, Django REST framework
  • Fronend - React, HTML, CSS (Bulma)
  • Machine learning: Pytorch, transformers, scikit-learn
  • Data handling - pandas, matplotlib

The main difficulty of the project is that the data consists of more than 20 languages, so the main component of the classifier is XLMRoberta model - the famous multilingual model trained on more than 100 languages. I fine-tuned this model for the task at hand. I plan to add image, brand and weight/volume information to my ensemble model to achieve better results.
An important part of this project is a data cleaner tool which is used for filtering, cleaning and labeling the data. This is a web tool built with Django. Its interface is shown on the picture below: