Phonological Representations for NLP

Leveraging phonological representations for NLP tasks

The ways in which phonology can be used for NLP ends is underexplored. Members of this lab paved the way for future work in this area with tools like Epitran (Mortensen et al., 2018) and PanPhon (Mortensen et al., 2016). We are now seeking to apply phonological representations in a variety of tasks, following the path cleared by (Bharadwaj et al., 2016) and (Chaudhary et al., 2018). We have recently extended this work to modern classes of pretrained models like XPhoneBERT (Sohn et al., 2024). This year, we plan to generalize this investigation to a variety of linguistic tasks (instead of just NER and MT, as in past work) and develop better techniques for exploiting phonological resources.

References

2024

  1. Zero-Shot Cross-Lingual NER Using Phonemic Representations for Low-Resource Languages
    Jimin Sohn, Haeji Jung, Alex Cheng, and 3 more authors
    Dec 2024

2018

  1. Epitran: Precision G2P for Many Languages
    David R. Mortensen, Siddharth Dalmia, and Patrick Littell
    In Proceedings of the 11th Language Resources and Evaluation Conference, May 2018
  2. Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations
    Aditi Chaudhary, Chunting Zhou, Lori Levin, and 3 more authors
    In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Oct 2018

2016

  1. PanPhon: A Resource for Mapping IPA Segments to Articulatory Feature Vectors
    David R. Mortensen, Patrick Littell, Akash Bharadwaj, and 3 more authors
    In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Dec 2016
  2. Phonologically Aware Neural Model for Named Entity Recognition in Low Resource Transfer Settings
    Akash Bharadwaj, David Mortensen, Chris Dyer, and 1 more author
    In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Nov 2016