XferBench

Evaluating Emergent Communication Systems with Downstream Tasks

So far, the project has produced one paper (Boldt & Mortensen, 2024):

In this paper, we introduce a benchmark for evaluating the overall quality of emergent languages using data-driven methods. Specifically, we interpret the notion of the “quality” of an emergent language as its similarity to human language within a deep learning framework. We measure this by using the emergent language as pretraining data for a downstream NLP tasks in human language—the better the downstream performance, the better the emergent language. We implement this benchmark as an easy-to-use Python package that only requires a text file of utterances from the emergent language to be evaluated. Finally, we empirically test the benchmark’s validity using human, synthetic, and emergent language baselines.

References

2024

  1. XferBench: a Data-Driven Benchmark for Emergent Language
    Brendon Boldt, and David Mortensen
    In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Jun 2024