Publications

  • Subasa - Adapting Language Models for Low-resourced Offensive Language Detection in Sinhala

    Shanilka Haturusinghe, Tharindu Cyril Weerasooriya, Christopher M Homan, Marcos Zampieri, and Sidath Ravindra Liyanage. 2025. Subasa - Adapting Language Models for Low-resourced Offensive Language Detection in Sinhala. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 260–270, Albuquerque, USA. Association for Computational Linguistics.
    Published In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop).
    Abstract: Accurate detection of offensive language is essential for a number of applications related to social media safety. There is a sharp contrast in performance in this task between low and high-resource languages. In this paper, we adapt fine-tuning strategies that have not been previously explored for Sinhala in the downstream task of offensive language detection. Using this approach, we introduce four models: "Subasa-XLM-R", which incorporates an intermediate Pre-Finetuning step using Masked Rationale Prediction. Two variants of "Subasa-Llama" and "Subasa-Mistral", are fine-tuned versions of Llama (3.2) and Mistral (v0.3), respectively, with a task-specific strategy. We evaluate our models on the SOLD benchmark dataset for Sinhala offensive language detection. All our models outperform existing baselines. Subasa-XLM-R achieves the highest Macro F1 score (0.84) surpassing state-of-the-art large language models like GPT-4o when evaluated on the same SOLD benchmark dataset under zero-shot settings. The models and code are publicly available.
  • Customer Segmentation and Churn Prediction in the Maritime Shipping Industry Using Machine Learning Techniques: A Sri Lankan Case Study

    Ushara Prabash Melder, Shanilka Haturusinghe, and S P Kasthuri Arachchi. 2026. Customer Segmentation and Churn Prediction in the Maritime Shipping Industry Using Machine Learning Techniques: A Sri Lankan Case Study. In Proceedings of the 2026 6th International Conference on Advanced Research in Computing (ICARC), pages 1–6, February 2026.
    Published in Proceedings of the 2026 6th International Conference on Advanced Research in Computing (ICARC).
    Abstract: Customer churn is a critical issue in industries with high acquisition costs, such as maritime shipping and logistics, where losing a single client can result in substantial financial losses. While churn prediction is well-researched in domains such as telecom and SaaS, limited work has been done in the maritime shipping sector, particularly in Sri Lanka. This research aims to design and deploy a machine learning-based churn prediction and customer segmentation framework tailored for maritime shipping, which uses very common features in the maritime shipping industry. Two primary datasets from the ERP system were combined: the Detailed Income Report, containing financial data, and the operational data report, capturing operational records spanning nine years. Three models were implemented: the logistic regression model, the random forest model, and the XGBoost. As the real-world data had class imbalance present, it was handled using SMOTE for logistic regression and builtin methods for random forest and XGBoost. Results showed that XGBoost achieved the best performance, outperforming the other models. Feature importance analysis revealed the strongest common predictors for the framework. The front-end for the framework was developed using Streamlit and integrated with Power BI for real-time churn monitoring and customer segmentation. This research contributes to filling the gap in maritime churn prediction by demonstrating how machine learning and business intelligence can provide actionable insights for customer retention strategies.