3 papers accepted @ COLING 2025! Click for more!

๐ŸŽ‰ 3 papers accepted @ COLING 2025!


Just before the holiday break๐ŸŽ„โ„๏ธ and I am delighted to share that some of our latest NLP work is making waves ๐ŸŒŠ since we got 3(!) papers accepted at the 31st International Conference on Computational Linguistics (COLING 2025 https://lnkd.in/dR8bgtXu)! Some info on the exceptional work of these A-๐ŸŒŸ-M-๐Ÿ’ซ-A-โœจ-Z-๐Ÿ”ฅ-I-๐ŸŽ‰-N-๐ŸŒˆ-G-๐Ÿ‘ PhD candidates below ๐Ÿ‘‡๐Ÿ‘‡๐Ÿ‘‡

1/ The ultimate PhD paper of Antoine Louis asks โ€œwhether to fuse or not to fuseโ€ in an (legal) IR scenario ๐Ÿ”. BM25 is still a performance beast ๐Ÿ’ช in IR, but itโ€™s crucial to know when it shines โœจ and where it falls short โš ๏ธ compared to dense models. In the paper we explore different scenarios and conclude: โ— BM25 = still the ๐ŸŽ of search, esp. in zero-shot tasks or when efficiency rules. โ— Fusing models? ๐Ÿค Great for zero-shotโ€”boosts general IR models โ— Got domain-specific data? ๐Ÿง  Fine-tune one model forbest results.

๐Ÿ“‘ Paper: https://lnkd.in/dxr2VQQE (w/Gijs van Dijck) ๐Ÿ’ป Code: https://lnkd.in/d8wrEP7i ๐Ÿค— Models: https://lnkd.in/d4RwVVfc

2/ Have you noticed how most information retrieval work is in English and Chinese? Well, Antoine and Vageesh noticed the same and as a PhD-side-project worked on delivering ColBERT-XM ๐ŸŒ a modular retriever for 81+ languages ๐Ÿงฉ Built w/XMOD encoders & ColBERTโ€™s backbone, it trains on English (high-resource language) and transfers zero-shot to other languages, thereby eliminating the need for language-specific labeled retrieval data.๐Ÿ’กโœจ

๐Ÿ“‘ Paper: https://lnkd.in/dXHKunum (w/Gijs van Dijck) ๐Ÿ’ป Code: https://lnkd.in/dw4N5PdP ๐Ÿค— Model: https://lnkd.in/dHpAN5yR

3/ Paweล‚ Mฤ…kaโ€™s 2nd PhD paper dives into: how do context-aware machine translation models really use context? ๐Ÿค” We analyzed attention heads and found: ๐Ÿ”‘ Some are critical for pronoun disambiguation. ๐Ÿš€ Fine-tuning these heads = boosted performance! This work builds on VOXReality EU project, where we -efficiently- integrate SoTA MT models in AR/VR ๐Ÿ•ถ๏ธ๐ŸŒ๐ŸŽฎ scenarios, therefore context use is essential.

๐Ÿ“‘ Paper: https://lnkd.in/dc9sVYtn (w/Yusuf Can Semerci, Johannes (Jan) C. Scholtes) ๐Ÿ’ป Code: https://lnkd.in/duNdb5YY