publications
David Mortensen's publications including all publications with other members of ChangeLing Lab.
2024
- Self-supervised Speech Representations Still Struggle with African American Vernacular EnglishIn Proc. INTERSPEECH 2024, 2024
- Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction2024
- Zero-Shot Cross-Lingual NER Using Phonemic Representations for Low-Resource Languages2024
- Semisupervised Neural Proto-Language Reconstruction2024
- Neural Proto-Language Reconstruction2024
- A Review of the Applications of Deep Learning-Based Emergent CommunicationTransactions on Machine Learning Research, 2024
- Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong ReasonsIn Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024
- Improved Neural Protoform Reconstruction via Reflex PredictionIn Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024
- Phonotactic Complexity across DialectsIn Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024
- PWESuite: Phonetic Word Embeddings and Tasks They FacilitateIn Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024
- Verbing Weirds Language (Models): Evaluation of English Zero-Derivation in Five LLMsIn Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024
- XferBench: a Data-Driven Benchmark for Emergent LanguageIn Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Jun 2024
2023
- Kuki-Chin Phonology: An OverviewHimalayan Linguistics, Jun 2023
- Evaluating self-supervised speech models on a Taiwanese Hokkien corpusIn 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Jun 2023
- African Substrates Rather Than European Lexifiers to Augment African-diaspora Creole TranslationIn 4th Workshop on African Natural Language Processing, Jun 2023
- Automating Sound Change Prediction for Phylogenetic Inference: A Tukanoan Case StudyIn Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change, Dec 2023
- ChatGPT MT: Competitive for High- (but Not Low-) Resource LanguagesIn Proceedings of the Eighth Conference on Machine Translation, Dec 2023
- Do All Languages Cost the Same? Tokenization in the Era of Commercial Language ModelsIn Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023
- Counting the Bugs in ChatGPT’s Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language ModelIn Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023
- Calibrated Seq2seq Models for Efficient and Generalizable Ultra-fine Entity TypingIn Findings of the Association for Computational Linguistics: EMNLP 2023, Dec 2023
- Construction Grammar Provides Unique Insight into Neural Language ModelsIn Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023), Mar 2023
- Transformed Protoform ReconstructionIn Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Jul 2023
- Generalized Glossing Guidelines: An Explicit, Human- and Machine-Readable, Item-and-Process Convention for Morphological AnnotationIn Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology, Jul 2023
- Multilingual TTS Accent Impressions for Accented ASRIn International Conference on Text, Speech, and Dialogue, Jul 2023
- SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear GlossingIn Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology, Jul 2023
2022
- Data-adaptive Transfer Learning for Translation: A Case Study in Haitian and JamaicanIn Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022), Oct 2022
- WikiHan: A New Comparative Dataset for Chinese LanguagesIn COLING 2022, Oct 2022
- When Is TTS Augmentation Through a Pivot Language Useful?In Interspeech 2022, Oct 2022
- Speech Recognition for Around 2000 Languages without AudioIn Interspeech 2022, Oct 2022
- A Hmong Corpus with Elaborate Expression AnnotationsIn Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022), Oct 2022
- Phone Inventories and Recognition for Every LanguageIn Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022), Oct 2022
- Large-Scale Computerized Forward Reconstruction Yields New Perspectives in French Diachronic PhonologyDiachronica, Oct 2022
- Zero-shot Learning for Grapheme to Phoneme Conversion with Language EnsembleIn Findings of the Association for Computational Linguistics: ACL 2022, May 2022
- Learning the Ordering of Coordinate Compounds and Elaborate Expressions in Hmong, Lahu, and ChineseIn Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jul 2022
- Wfst4StrJul 2022Rust/Python library for working with strings using weighted finite state transducers
2021
- ASR2K: Speech Recognition for Around 2000 Languages without AudioIn Interspeech 2022, Jul 2021
- Quantifying Cognitive Factors in Lexical DeclineTransactions of the Association for Computational Linguistics, Dec 2021
- Multilingual phonetic dataset for low resource speech recognitionIn ICASSP 2021, Dec 2021
- East Tusom: A phonetic and phonological sketch of a largely undocumented Tangkhulic languageDec 2021
- Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered Language for Universal Phone Recognition ExperimentsIn Proc. Interspeech 2021, Dec 2021
- Evaluating the Morphosyntactic Well-formedness of Generated TextsIn Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Nov 2021
- Phoneme Recognition Through Fine Tuning of Phonetic Representations: A Case Study on Luhya Language VarietiesIn Proc. Interspeech 2021, Nov 2021
- Ranking Transfer Languages with Pragmatically-Motivated Features for Multilingual Sentiment AnalysisIn EACL 2021, Nov 2021
- Differentiable Allophone Graphs for Language-Universal Speech RecognitionIn Proc. Interspeech 2021, Nov 2021
2020
- Automatic Extraction of Rules Governing Morphological AgreementIn Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 2020
- Towards Zero-shot Learning for Automatic Phonemic TranscriptionIn Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, Nov 2020
- Universal Phone Recognition with a Multilingual Allophone SystemIn ICASSP 2020, Nov 2020
- Computerized Forward Reconstruction for Analysis in Diachronic Phonology, and Latin to French Reflex PredictionIn 1st Workshop on Language Technologies for Historical and Ancient LAnguages (LT4HALA), Nov 2020
- Characterizing Sociolinguistic Variation in the Competing Vaccination CommunitiesIn Proceedings of the International Conference SBP-BRiMS 2020, Nov 2020
- AlloVera: A Multilingual Allophone DatabaseIn Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020), Nov 2020
- Where New Words Are Born: Distributional Semantic Analysis of Neologisms and Their Semantic NeighborhoodsIn Proceedings of the Society for Computation in Linguistics, Nov 2020
2019
- CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in MorphologyIn Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, Aug 2019
- Hmong (Mong Leng)In The Mainland Southeast Asia Linguistic Area, Aug 2019
- IndoMorphAug 2019Collection of Foma FST morphological analyzers for languages of the Indian subcontinent
2018
- Adapting Word Embeddings to New Languages with Morphological and Phonological Subword RepresentationsIn Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Oct 2018
- The ARIEL-CMU situation frame detection pipeline for LoReHLT16: a model translation approachMachine Translation, Oct 2018
- Parser combinators for Tigrinya and Oromo morphologyIn Proceedings of the 11th Language Resources and Evaluation Conference, May 2018
- Epitran: Precision G2P for Many LanguagesIn Proceedings of the 11th Language Resources and Evaluation Conference, May 2018
- EpitranMay 2018Precision orthography-to-IPA conversion for 65 languages
- MStemMay 2018Python multilingual morphological stemming framework and stemmer collection
2017
- URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectorsIn Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Apr 2017
- Hmong-Mien LanguagesIn Oxford Research Encyclopedia of Linguistics, May 2017
- Hmong-Mien LanguagesIn Oxford Bibliographies in Linguistics, May 2017
2016
- Phonologically Aware Neural Model for Named Entity Recognition in Low Resource Transfer SettingsIn Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Nov 2016
- Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and TajikIn Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), May 2016
- Named Entity Recognition for Linguistic Rapid Response in Low-Resource Languages: Sorani Kurdish and TajikIn Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Dec 2016
- PanPhon: A Resource for Mapping IPA Segments to Articulatory Feature VectorsIn Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Dec 2016
- PanPhonDec 2016Articulatory feature extractor and library
- Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation LearningIn Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun 2016
2013
- Lexical Prefixes and Tibeto-Burman Laryngeal ContrastsIn Proceedings of the Thirty-Seventh Annual Meeting of the Berkeley Linguistics Society (BLS 37), Jun 2013
- A Reconstruction of Proto-Tangkhulic RhymesLinguistics of the Tibeto-Burman Area , Jun 2013
- Tonally Conditioned Vowel Raising in Shuijingping MangJournal of East Asian Linguistics, Jun 2013
2012
- Sorbung, an Undescribed Language of Manipur: Its Phonology and Place in Tibeto-BurmanJournal of the Southeast Asian Linguistics Society, Jun 2012
- The Emergence of Obstruents after High VowelsDiachronica, Jun 2012
- NetSPEJun 2012A web-based application for exploring rule-based analyses of phonological data
- A Classification of Compounding in American Sign Language: an Evaluation of the Bisetto and Scalise FrameworkMorphology, Jun 2012
2011
- HsSPEJun 2011Haskell library implementing SPE-style rule-based phonology
- Web ComparatorJun 2011A web-based application for organizing and analyzing comparative lexical databases. Used in the production of “Emergence of Obstruents” and “Proto-Tangkhulic Rhymes”
2004
- The emergence of dorsal stops after high vowels in HuishuIn Proceedings of the Thirtieth Annual Meeting of the Berkeley Linguistics Society (BLS 30), Jun 2004
2003
- Review of \emphBaheng Yu Yanjiu [research on the Pa-Hng language] by Mao Zongwu and Li YunbingLinguistics of the Tibeto-Burman Area, Jun 2003
2002
- Review of \emphLes langue Hmong-Mjen: phonologie historique by Barbara NiedererLinguistics of the Tibeto-Burman Area, Jun 2002