
David Mortensen's publications including all publications with other members of ChangeLing Lab.


  1. Self-supervised Speech Representations Still Struggle with African American Vernacular English
    Kalvin Chang, Yi-Hui Chou, Jiatong Shi, and 4 more authors
    In Proc. INTERSPEECH 2024, 2024
  2. Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction
    Atharva Naik, Kexun Zhang, Nathaniel Robinson, and 7 more authors
  3. Zero-Shot Cross-Lingual NER Using Phonemic Representations for Low-Resource Languages
    Jimin Sohn, Haeji Jung, Alex Cheng, and 3 more authors
  4. Semisupervised Neural Proto-Language Reconstruction
    Liang Lu, Peirong Xie, and David R. Mortensen
  5. Neural Proto-Language Reconstruction
    Chenxuan Cui, Ying Chen, Qinxin Wang, and 1 more author
  6. A Review of the Applications of Deep Learning-Based Emergent Communication
    Brendon Boldt, and David R. Mortensen
    Transactions on Machine Learning Research, 2024
  7. Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons
    Shijia Zhou, Leonie Weissweiler, Taiqi He, and 3 more authors
    In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024
  8. Improved Neural Protoform Reconstruction via Reflex Prediction
    Liang Lu, Jingzhi Wang, and David R. Mortensen
    In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024
  9. Phonotactic Complexity across Dialects
    Ryan Soh-Eun Shim, Kalvin Chang, and David R. Mortensen
    In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024
  10. PWESuite: Phonetic Word Embeddings and Tasks They Facilitate
    Vilém Zouhar, Kalvin Chang, Chenxuan Cui, and 4 more authors
    In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024
  11. Verbing Weirds Language (Models): Evaluation of English Zero-Derivation in Five LLMs
    David R. Mortensen, Valentina Izrailevitch, Yunze Xiao, and 2 more authors
    In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024
  12. XferBench: a Data-Driven Benchmark for Emergent Language
    Brendon Boldt, and David Mortensen
    In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Jun 2024


  1. Kuki-Chin Phonology: An Overview
    David R Mortensen
    Himalayan Linguistics, Jun 2023
  2. Evaluating self-supervised speech models on a Taiwanese Hokkien corpus
    Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, and 8 more authors
    In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Jun 2023
  3. African Substrates Rather Than European Lexifiers to Augment African-diaspora Creole Translation
    Nathaniel Romney Robinson, Matthew Dean Stutzman, Stephen D. Richardson, and 1 more author
    In 4th Workshop on African Natural Language Processing, Jun 2023
  4. Automating Sound Change Prediction for Phylogenetic Inference: A Tukanoan Case Study
    Kalvin Chang, Nathaniel Robinson, Anna Cai, and 3 more authors
    In Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change, Dec 2023
  5. ChatGPT MT: Competitive for High- (but Not Low-) Resource Languages
    Nathaniel Robinson, Perez Ogayo, David R. Mortensen, and 1 more author
    In Proceedings of the Eighth Conference on Machine Translation, Dec 2023
  6. Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models
    Orevaoghene Ahia, Sachin Kumar, Hila Gonen, and 4 more authors
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023
  7. Counting the Bugs in ChatGPT’s Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model
    Leonie Weissweiler, Valentin Hofmann, Anjali Kantharuban, and 10 more authors
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023
  8. Calibrated Seq2seq Models for Efficient and Generalizable Ultra-fine Entity Typing
    Yanlin Feng, Adithya Pratapa, and David Mortensen
    In Findings of the Association for Computational Linguistics: EMNLP 2023, Dec 2023
  9. Construction Grammar Provides Unique Insight into Neural Language Models
    Leonie Weissweiler, Taiqi He, Naoki Otani, and 3 more authors
    In Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023), Mar 2023
  10. Transformed Protoform Reconstruction
    Young Min Kim, Kalvin Chang, Chenxuan Cui, and 1 more author
    In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Jul 2023
  11. Generalized Glossing Guidelines: An Explicit, Human- and Machine-Readable, Item-and-Process Convention for Morphological Annotation
    David R. Mortensen, Ela Gulsen, Taiqi He, and 4 more authors
    In Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology, Jul 2023
  12. Multilingual TTS Accent Impressions for Accented ASR
    Georgios Karakasidis, Nathaniel Robinson, Yaroslav Getman, and 6 more authors
    In International Conference on Text, Speech, and Dialogue, Jul 2023
  13. SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear Glossing
    Taiqi He, Lindia Tjuatja, Nathaniel Robinson, and 4 more authors
    In Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology, Jul 2023


  1. Data-adaptive Transfer Learning for Translation: A Case Study in Haitian and Jamaican
    Nathaniel Robinson, Cameron Hogan, Nancy Fulda, and 1 more author
    In Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022), Oct 2022
  2. WikiHan: A New Comparative Dataset for Chinese Languages
    Kalvin Chang, Chenxuan Cui, Youngmin Kim, and 1 more author
    In COLING 2022, Oct 2022
  3. When Is TTS Augmentation Through a Pivot Language Useful?
    Nathaniel Romney Robinson, Perez Ogayo, Swetha R. Gangu, and 2 more authors
    In Interspeech 2022, Oct 2022
  4. Speech Recognition for Around 2000 Languages without Audio
    Xinjian Li, Florian Metze, David R. Mortensen, and 2 more authors
    In Interspeech 2022, Oct 2022
  5. A Hmong Corpus with Elaborate Expression Annotations
    David R. Mortensen, Xinyu Zhang, Chenxuan Cui, and 1 more author
    In Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022), Oct 2022
  6. Phone Inventories and Recognition for Every Language
    Xinjian Li, Florian Metze, David R. Mortensen, and 2 more authors
    In Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022), Oct 2022
  7. Large-Scale Computerized Forward Reconstruction Yields New Perspectives in French Diachronic Phonology
    Clayton Marr, and David Mortensen
    Diachronica, Oct 2022
  8. Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble
    Xinjian Li, Florian Metze, David Mortensen, and 2 more authors
    In Findings of the Association for Computational Linguistics: ACL 2022, May 2022
  9. Learning the Ordering of Coordinate Compounds and Elaborate Expressions in Hmong, Lahu, and Chinese
    Chenxuan Cui, Katherine Zhang, and David Mortensen
    In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jul 2022
  10. Wfst4Str
    David R. Mortensen
    Jul 2022
    Rust/Python library for working with strings using weighted finite state transducers


  1. ASR2K: Speech Recognition for Around 2000 Languages without Audio
    In Interspeech 2022, Jul 2021
  2. Quantifying Cognitive Factors in Lexical Decline
    David Francis, Ella Rabinovich, Farhan Samir, and 2 more authors
    Transactions of the Association for Computational Linguistics, Dec 2021
  3. Multilingual phonetic dataset for low resource speech recognition
    Xinjian Li, David R Mortensen, Florian Metze, and 1 more author
    In ICASSP 2021, Dec 2021
  4. East Tusom: A phonetic and phonological sketch of a largely undocumented Tangkhulic language
    David R. Mortensen, and Jordan Picone
    Dec 2021
  5. Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered Language for Universal Phone Recognition Experiments
    David R. Mortensen, Jordan Picone, Xinjian Li, and 1 more author
    In Proc. Interspeech 2021, Dec 2021
  6. Evaluating the Morphosyntactic Well-formedness of Generated Texts
    Adithya Pratapa, Antonios Anastasopoulos, Shruti Rijhwani, and 4 more authors
    In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Nov 2021
  7. Phoneme Recognition Through Fine Tuning of Phonetic Representations: A Case Study on Luhya Language Varieties
    Kathleen Siminyu, Xinjian Li, Antonios Anastasopoulos, and 3 more authors
    In Proc. Interspeech 2021, Nov 2021
  8. Ranking Transfer Languages with Pragmatically-Motivated Features for Multilingual Sentiment Analysis
    Jimin Sun, Hwijeen Ahn, Chan Young Park, and 2 more authors
    In EACL 2021, Nov 2021
  9. Differentiable Allophone Graphs for Language-Universal Speech Recognition
    Brian Yan, Siddharth Dalmia, David R. Mortensen, and 2 more authors
    In Proc. Interspeech 2021, Nov 2021


  1. Automatic Extraction of Rules Governing Morphological Agreement
    Aditi Chaudhary, Antonios Anastasopoulos, Adithya Pratapa, and 4 more authors
    In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 2020
  2. Towards Zero-shot Learning for Automatic Phonemic Transcription
    Xinjian Li, Siddharth Dalmia, David R. Mortensen, and 3 more authors
    In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, Nov 2020
  3. Universal Phone Recognition with a Multilingual Allophone System
    Xinjian Li, Siddharth Dalmia, Juncheng Li, and 8 more authors
    In ICASSP 2020, Nov 2020
  4. Computerized Forward Reconstruction for Analysis in Diachronic Phonology, and Latin to French Reflex Prediction
    Clayton Marr, and David R. Mortensen
    In 1st Workshop on Language Technologies for Historical and Ancient LAnguages (LT4HALA), Nov 2020
  5. Characterizing Sociolinguistic Variation in the Competing Vaccination Communities
    Shahan Ali Memon, Aman Tyagi, David R. Mortensen, and 1 more author
    In Proceedings of the International Conference SBP-BRiMS 2020, Nov 2020
  6. AlloVera: A Multilingual Allophone Database
    David R. Mortensen, Xinjian Li, Patrick Littell, and 6 more authors
    In Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020), Nov 2020
  7. Where New Words Are Born: Distributional Semantic Analysis of Neologisms and Their Semantic Neighborhoods
    Maria Ryskina, Ella Rabinovich, Taylor Berg-Kirkpatrick, and 2 more authors
    In Proceedings of the Society for Computation in Linguistics, Nov 2020


  1. CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology
    Aditi Chaudhary, Elizabeth Salesky, Gayatri Bhat, and 3 more authors
    In Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, Aug 2019
  2. Hmong (Mong Leng)
    David R. Mortensen
    In The Mainland Southeast Asia Linguistic Area, Aug 2019
  3. IndoMorph
    David R. Mortensen, and Jong Hyuk Park
    Aug 2019
    Collection of Foma FST morphological analyzers for languages of the Indian subcontinent


  1. Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations
    Aditi Chaudhary, Chunting Zhou, Lori Levin, and 3 more authors
    In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Oct 2018
  2. The ARIEL-CMU situation frame detection pipeline for LoReHLT16: a model translation approach
    Patrick Littell, Tian Tian, Ruochen Xu, and 12 more authors
    Machine Translation, Oct 2018
  3. Parser combinators for Tigrinya and Oromo morphology
    Patrick Littell, Tom McCoy, Na-Rae Han, and 5 more authors
    In Proceedings of the 11th Language Resources and Evaluation Conference, May 2018
  4. Epitran: Precision G2P for Many Languages
    David R. Mortensen, Siddharth Dalmia, and Patrick Littell
    In Proceedings of the 11th Language Resources and Evaluation Conference, May 2018
  5. Epitran
    David R. Mortensen
    May 2018
    Precision orthography-to-IPA conversion for 65 languages
  6. MStem
    David R. Mortensen
    May 2018
    Python multilingual morphological stemming framework and stemmer collection


  1. URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors
    Patrick Littell, David R. Mortensen, Ke Lin, and 3 more authors
    In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Apr 2017
  2. Hmong-Mien Languages
    David R. Mortensen
    In Oxford Research Encyclopedia of Linguistics, May 2017
  3. Hmong-Mien Languages
    David R. Mortensen
    In Oxford Bibliographies in Linguistics, May 2017


  1. Phonologically Aware Neural Model for Named Entity Recognition in Low Resource Transfer Settings
    Akash Bharadwaj, David Mortensen, Chris Dyer, and 1 more author
    In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Nov 2016
  2. Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik
    Patrick Littell, David R. Mortensen, Kartik Goyal, and 2 more authors
    In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), May 2016
  3. Named Entity Recognition for Linguistic Rapid Response in Low-Resource Languages: Sorani Kurdish and Tajik
    Patrick Littell, Kartik Goyal, David R. Mortensen, and 3 more authors
    In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Dec 2016
  4. PanPhon: A Resource for Mapping IPA Segments to Articulatory Feature Vectors
    David R. Mortensen, Patrick Littell, Akash Bharadwaj, and 3 more authors
    In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Dec 2016
  5. PanPhon
    David R. Mortensen
    Dec 2016
    Articulatory feature extractor and library
  6. Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning
    Yulia Tsvetkov, Sunayana Sitaram, Manaal Faruqui, and 6 more authors
    In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun 2016


  1. Lexical Prefixes and Tibeto-Burman Laryngeal Contrasts
    David R. Mortensen
    In Proceedings of the Thirty-Seventh Annual Meeting of the Berkeley Linguistics Society (BLS 37), Jun 2013
  2. A Reconstruction of Proto-Tangkhulic Rhymes
    David R. Mortensen, and James A. Miller
    Linguistics of the Tibeto-Burman Area , Jun 2013
  3. Tonally Conditioned Vowel Raising in Shuijingping Mang
    David R. Mortensen
    Journal of East Asian Linguistics, Jun 2013


  1. Sorbung, an Undescribed Language of Manipur: Its Phonology and Place in Tibeto-Burman
    David R. Mortensen, and Jennifer Keogh
    Journal of the Southeast Asian Linguistics Society, Jun 2012
  2. The Emergence of Obstruents after High Vowels
    David R. Mortensen
    Diachronica, Jun 2012
  3. NetSPE
    David R. Mortensen
    Jun 2012
    A web-based application for exploring rule-based analyses of phonological data
  4. A Classification of Compounding in American Sign Language: an Evaluation of the Bisetto and Scalise Framework
    Mary Lou Vercellotti, and David R. Mortensen
    Morphology, Jun 2012


  1. HsSPE
    David R. Mortensen
    Jun 2011
    Haskell library implementing SPE-style rule-based phonology
  2. Web Comparator
    David R. Mortensen
    Jun 2011
    A web-based application for organizing and analyzing comparative lexical databases. Used in the production of “Emergence of Obstruents” and “Proto-Tangkhulic Rhymes”


  1. The emergence of dorsal stops after high vowels in Huishu
    David R. Mortensen
    In Proceedings of the Thirtieth Annual Meeting of the Berkeley Linguistics Society (BLS 30), Jun 2004


  1. Review of \emphBaheng Yu Yanjiu [research on the Pa-Hng language] by Mao Zongwu and Li Yunbing
    David R. Mortensen
    Linguistics of the Tibeto-Burman Area, Jun 2003


  1. Review of \emphLes langue Hmong-Mjen: phonologie historique by Barbara Niederer
    David R. Mortensen
    Linguistics of the Tibeto-Burman Area, Jun 2002