Youngah presented at the 2023 Linguistics Colloquium organized by the Seoul National University.
Infants require two crucial skills to successfully begin language acquisition: (a) the ability to learn fundamental speech sound units, or phonemes, and (b) the capacity to decompose sound sequences into meaningful units. This talk will discuss the effectiveness of an autoencoder model in learning phonemes and phoneme boundaries from unsegmented, non-transcribed wave data, similar to the early stages of infant language acquisition. The experiment was conducted in Mandarin and English, and the results demonstrate that phonemes and their associated features can be learned through repeated projection and reconstruction without prior knowledge of segmentation. The model clusters segments of the same phoneme and projects different phonemes to separate regions in the hidden space. Furthermore, the model successfully decomposes words into phonemes in sequential order, which is a crucial foundation for phonotactic knowledge. However, the model struggles to cluster allophones closely, indicating the boundary between bottom-up and top-down information in phonological learning. This study suggests that fundamental sound knowledge in the early stages of language acquisition can be learned to some extent through unsupervised learning without labeled data or prior knowledge of segmentation, providing valuable insights into early human language acquisition.
Do, Y. (2023). Unsupervised learning of phonemes in early language acquisition: Insights from an autoencoder model. 2023 Linguistics Colloquium, Seoul National University.