JSEALS Special Publication No. 14 Papers from the 34th MEETING of the southeast asian linguistics society (2025) Edited by Mark Alves © 2026 University of Hawai‘i Press All rights reserved OPEN ACCESS – Semiannual with periodic special publications E-ISSN: 1836-6821 DOI: https://urldefense.com/v3/__https://doi.org/10.21313/10524/52560__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrnvq6Nh_$ Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. JSEALS publishes fully open access content, which means that all articles are available on the internet to all users immediately upon publication. Non-commercial use and distribution in any medium are permitted, provided the author and the journal are properly credited. Cover photo adapted from the cover of the JSEALS journal provided by Paul Sidwell was taken at the Vietnam Museum of Ethnology in Hanoi. JSEALS Journal of the Southeast Asian Linguistics Society Editor-in-Chief Mark Alves (Montgomery College, USA) Managing Editors Mathias Jenny (Chiang Mai University, Thailand) Sigrid Lew (Dallas International University, USA) Paul Sidwell (University of Sydney, Australia) Editorial Advisory Committee Edith ALDRIDGE (Academia Sinica, Taiwan) Nathan BADENOCH (Villanova University, USA) Luke BRADLEY (University of Freiburg, Germany) Marc BRUNELLE (University of Ottawa, Canada) Christopher BUTTON (Independent researcher) Thomas CONNERS (University of Maryland, USA) Kamil DEEN (University of Hawai‘I at Mānoa, USA) Rikker DOCKUM (Binghamton University, State University of New York, USA) Ryan GEHRMANN (Payap University, Thailand) Nathan HILL (Trinity College Dublin, UK) San San HNIN TUN (INALCO, France) Kitima INDRAMBARYA (Kasetsart University, Thailand) Peter JENKS (UC Berkeley, USA) Daniel KAUFMAN​ (Queens College, City University of New York & Endangered Language Alliance, USA) James KIRBY (Ludwig-Maximilians-Universität München, Germany) Hsiu-chuan LIAO (National Tsing Hua University, Taiwan) Bradley MCDONNELL (University of Hawai‘i at Mānoa, USA) Alexis MICHAUD (CNRS (Le Centre National de la Recherche Scientifique), France) Marc MIYAKE (Independent Researcher, Hawaii, USA) David MORTENSEN (Carnegie Mellon University, USA) Peter NORQUEST (University of Arizona, USA) Teresa Wai See ONG (Singapore University of Social Sciences, Singapore) Christina Joy PAGE (Kwantlen Polytechnic University, Canada) John D. PHAN (Columbia University, USA) Trang PHAN (Ca' Foscari University of Venice, Italy) Pittayawat PITTAYAPORN (Chulalongkorn University, Thailand) Alexander D. SMITH (Fudan University, China) Napasri TIMYAM (Kasetsart University, Thailand) Kenneth VAN BIK (California State University, Fullerton, CA) Alice VITTRANT (Aix-Marseille Université / CNRS-DDL, France) Heather WINSKEL (Southern Cross University, Lismore, Australia) The Journal of the Southeast Asian Linguistics Society publishes articles on a wide range of linguistic topics of the languages and language families of Southeast Asia and surrounding areas. JSEALS has been hosted by the UH Press since the beginning of 2017. CONTENTS Statement from the JSEALS Editor-in-Chief v Articles on Syntax A Brief Report on Adverbial Realization in Amis 1–12 Yi-Ting CHEN On the DP Structure of Standard Indonesian: An Exploration of =nya and se- 13–25 Evelyn Elmer FETTES Comparative Expressions in Longming Zhuang: A Functional Analysis of Kwaː and Piː 26–38 Haiping HUANG The verb ‘see’ in Hlai 39–45 Hui-chi LEE Singlish pre-nominal relative clauses and the systemic transfer of Chinese pre-modifiers 46–54 Wesley Mark LINCOLN & Marlyse BAPTISTA Serial Verb Constructions in Lio 55–66 Grace B. WIVELL, Khanin CHAIPHET, Michelle MAYRO, Maria Magdalena RINI, Maria Floriani SERLIN, & Thomas CONWAY Articles on Phonology Modern Standard Tai-Ahom (MSTA): Phonology and Orthography of a Posteriori-Conlang based on Old Ahom 67–80 Madhurjya BURHAGOHAIN A Sketch Grammar of Black Lolo 81–93 Elaine KHARBANDA A preliminary phonemic analysis of Chuyo 94–103 Mijke MULDER Phonetic system of Hani language in Lai Chau, Vietnam 104–111 PHAN Lương Hùng Articles on Historical Linguistics Etymological Notes on Words for Textiles and Fabric Production in Mainland Southeast Asia 112–128 Mark ALVES & Rikker DOCKUM Spirit numeral systems in Oceanic and (the) beyond 129–153 Russell BARLOW Regional Wanderwörter of Formosan Languages in Northwestern Taiwan: A Case Study of Some Plant Names 154–171 Samuel Yu-hsiang PAN & Walis Hian-chi SONG The Precedence for a Conservative Apical *r in Proto-Malayic: Analysis of Early Modern Malay's /r/-phoneme from Foreign Sources 172–189 M. Natsir Fachruddin SURYATAMA Loanwords in Kavalan and Basay: Language Contact with Formosan and Philippine Languages 190–201 Li-yang TSENG Articles on Sociolinguistics Towards Continuous Community Collaboration in Language Documentation: Insights from Bugkalot/Eg̓ongot 202–207 John Michael Vincent S. DE PANO & Patricia Anne Y. ASUNCION The vitality of ethnic languages in multilingual societies in the southern Philippines 208–219 Atsuko UTSUMI & Nelson DINO Gender Representation in Vietnamese Internet Memes: A Multimodal Critical Discourse Analysis 220–239 LƯƠNG Thị Hiện, NGUYỄN Đức Long, TRỊNH Khánh Hiền, & NGUYỄN An Nguyên FROM THE JSEALS EDITOR-IN-CHIEF This is the fourteenth JSEALS Special Publication. The goal of JSEALS Special Publications is to share collections of linguistics articles, such as papers from conferences or other academic events, as well as to offer a way for linguistic researchers in the greater Southeast Asian region to publish monograph‑length works. This publication contains papers of talks given at the 34th Annual Meeting of the Southeast Asian Linguistics Society. SEALS 34 was held in Bali, Indonesia from the 11th to the 13th of June in 2025. Eighteen papers are included from the conference on a full range of topics: six papers on syntax, four on phonology, five on historical linguistics, and three on sociolinguistics. The languages covered in this volume are spoken throughout the greater Southeast Asian region, including both Mainland and Insular Southeast Asia (and Taiwan), bordering areas of southern China, and the Indian Subcontinent. The studies cover the five main language families in the region: Austronesian, Austroasiatic, Hmong-Mien, Kra-Dai, and Sino-Tibetan/Trans-Himalayan. We are very pleased that JSEALS is able to support SEALS conferences and to contribute to the sharing of quality linguistic research of Greater Southeast Asia. We look forward to being able to produce such works for SEALS in the future. Mark J. Alves May 1st, 2026 Montgomery College Rockville, Maryland A BRIEF REPORT ON ADVERBIAL REALIZATION IN AMIS1 Yi-Ting CHEN Wenzao Ursuline University of Languages 97067@mail.wzu.edu.tw Abstract This study aims to present how four types of adverbials in Amis–speaker-oriented, modal-aspectual, subject-oriented, and manner–are realized. First, it is found that Amis adverbials can be broadly categorized into adverb and predicate-like types. Adjunct-like adverbials, such as aca (‘also’) and alatek (‘probably’), do not take voice, TAM, or imperative markers. While they can occur freely, their distribution is constrained. For instance, alatek must occupy the outermost position, enclosing temporal adverbs like ‘today,’ whereas aca cannot precede the main predicate. In addition, if adverbials are transformed into predicates, higher adverbials tend to exhibit fewer predicate properties. Speaker-oriented adverbials are mostly frozen in form and incompatible with voice, TAM markers, and arguments. They typically introduce a clause. Modal and aspectual adverbial verbs also have fixed voice marking but allow TAM markers and can introduce arguments or clauses. Subject-oriented adverbials, divided into agent-oriented and mental-attitude types, are influenced by their semantics. Agent-oriented ones require an agent argument, while mental-attitude ones only allow the ma- voice marker. Finally, manner adverbial predicates are the closest to ordinary Amis predicates, as they accept a range of voice and TAM markers but require the presence of a lexical verb to maintain their adverbial reading. This classification highlights structural and functional distinctions in Amis adverbials, illustrating a continuum of predicate properties across different classes of adverbial predicates. Keywords: Amis, Formosan languages, adverbs, adverbial predicates ISO 639-3 codes: ami 1 Introduction In many Formosan languages, adverbials are realized as predicates or verbs. (e.g., Starosta 1988; Li 2003 for Thao; Liu 2003 for Amis; Hsia 2004 for Atayal; Chang 2006 for Kavalan; Holmer 2006 for Seediq; Wu 2006 for Paiwan; Li 2007 for Puyuma; Su 2008 for Bunum; Chang 2009 for Tsou) and function as matrix verbs in complex sentences. Liu (2003) makes a similar argument for Amis. Liu (2003) shows that in Amis, manner adverbials act as matrix verbs heading a control adjunct, with the lexical verb as the modified action, as in (1) and (2). In (2), switching the order of the manner adverbial verb and the lexical verb renders the sentence ungrammatical. 1. hacikay φ-ci aki (a) cikay fast.AV NOM-PPN Aki (LNK) run ‘Aki runs very fast.’ 2. *cikay φ-ci aki (a) hackkay run NOM-PPN Aki (LNK) fast.AV In addition, adverbial verbs impose an actor-voice restriction (AV restriction) on the lexical verb. That is, regardless of the voice marked on the manner adverbial verb, the lexical verb (the modified verb) must appear in the actor voice. For example, (4) and (6), in which the lexical verb takes an undergoer voice, are judged ungrammatical. 3. palifud tu φ-ci aki (a) mi-palu’ ci kacaw-an AV-violent ASP NOM-PPN Aki (LNK) AV-beat PPN Kacaw-DAT ‘Aki violently hit Kacaw.’ 4. *palifud tu n-i aki (a) ma-palu’ φ-ci kacaw UV-violent ASP GEN-PPN Aki (LNK) UV-beat nom-ppn kacaw 5. palifud tu n-i aki (a) mi-palu’ φ-ci kacaw UV-violent ASP GEN-PPN Aki (LNK) AV-beat NOM-PPN kacaw ‘Kacaw was hit violently by Aki.’ 6. *pa-palifud tu n-i aki (a) ma-palu’ φ-ci kacaw UV-violent ASP GEN-PPN Aki (LNK) UV-beat NOM-PPN kacaw Another observation by Liu (2003) is that the manner verb rather than the lexical verb assigns case. For instance, in (5), the nominative case is assigned to the undergoer and the genitive case to the agent when the manner verb is marked with the undergoer voice. This case alignment is consistent with the pattern found in undergoer voice verbs. Based on these observed features, Liu (2003) concludes that manner adverbial verbs function not only as verbs but also as main verbs. Liu (2003) also proposes that the clause headed by the lexical verb functions as a control adjunct to the manner adverbial verb. The main reason is that the structure of manner adverbial verb resembles start-type control construction in terms of AV restriction on the embedded verb, the occurrence of the linker a, and the inflexibility of word order. This argument does not hold, however. One major problem is the extractability of an NP from the ‘adjunct’ clause, which violates the Condition on Extraction Domain (Huang 1982). The other thorough investigation on adverbial verbs in Amis is conducted by Jheng (2025). Jheng (2025) surveys a wide range of adverbial modifiers, including Mood, Aspect, and Voice. Consistent with Liu (2003), these adverbial modifiers are regarded as verbal heads, reconstruct verbs specifically, given that they “inflect for voice morphology, occur in initial position, take TAM affixes, and attract clitic pronouns” (p. 36). Nonetheless, Jheng (2025:36) departs from Liu’s argument by proposing that adverbial verb constructions (AVCs hereafter) themselves constitute reconstructing constructions rather than control adjuncts. Jheng’s comprehensive analysis (2025) suggests a homogeneous conclusion for different classes of adverbial modifiers. However, it has long been recognized that particular types of adverbials are restricted to particular clausal domains for their licensing. It remains unresolved how Amis adverbial modifiers reflect such variation. Thus, this study aims to present how four types of adverbials in Amis, including speaker-oriented, modal-aspectual, subject-oriented, and manner, are realized and how their realizations contrast with one another. The organization of this paper is as follows. First, this paper briefly reviews previous studies of general theories of adverbials distribution and AVCs in Austronesian and Formosan languages. Then, the next session presents how different types of Amis adverbials are realized. Finally, the conclusion appears in the last session. 2 Previous Literature 2.1 Distribution Particular classes of adverbials have long been found to be licensed in a specific clausal domain (e.g., Chapter 2 & 3 in Jackendoff 1972; Pollock 1989:365-375). For instance, speaker-oriented adverbs are not allowed in raising-verb or Exceptional Case-Marking (ECM) constructions in which CP is missing. Manner adverbs, on the other hand, are a landmark of a VP boundary. As a result, adverbs are grouped into different classes parallel to the three major clausal domains. Table 1 contains a summary. Table 1: Adverb classes (Ernst, 2002:10) [CP SPEECH ACT [IP PROPOSITION [VP EVENT [VP EVENT-INTERNAL]]]] Speaker-oriented Subject-oriented manner Jackendoff (1972) Conjunct Disjunct Process-adjunct Quirk et al. (1972) Ad-S Ad-VP Ad-V McConnell-Ginet (1982) Frame Proposition Event Process Frey & Pittner (1999) Obviously Frankly Probably Willingly Happily Necessarily Still Always Often Almost Completely Examples There are several accounts for adverbial distribution in current literature. One approach is to present adverbials as functional categories, linearly situated in specifier positions of hierarchically-ordered functional projections. This elaborated order is universal and rigid; adverbials, which can occur in different positions, result from movement (of heads) or additional projection licensing the same adverbial. Representative works include Laenzlinger (1998), Alexiadou (1997), Cinque (1999) and so on. The other proposal is opposed to the rigidly ordered projections for adjuncts to be licensed. Under this proposal, adjunction is comparatively free. Adjunction of adjuncts is subject to lexicosemantic properties of adjuncts themselves and general compositional rules. The syntactic principles in play are those applicable to phrasal structure in general, and there seems to be no adjunct-specific syntactic principle. According to Ernest (2002:11), Directionality Principles and extended projection features first set up positions where adjuncts are allowed in a given language, and then the lexical semantics of an adjunct, the nature of the compositional rule system applying to the adjunct, and morphological weight influence the adjunct’s possible distribution position. Representative works include Rochette (1990), Haider (2004), Ernst (2002), among others. A review of the proposals and adverb classes discussed above suggests three principal analyses regarding the syntactic location of adverbs: the first is an XP adjoining to the modified XP, the second a specifier of the modified XP, and the last a specifier in a dedicated functional projection. So far, in studies of adverbial verbs in Formosan languages, each analysis has its own supporters. 2.2 Adverbials in Austronesian and Formosan languages One line of research on Austronesian adverbials examines their linear placement and how they align with general syntactic theories such as Cinque (1999). Rackowski and Travis (2000) show that in Malagasy and Massam, preverbal adverbials occur in the order predicted by Cinque (1999), while postverbal adverbials appear in the exact mirror image of Cinque’s hierarchy. They argue that the reversed order in Malagasy arises from “iterative VP movement through their specifier” within an antisymmetric SVC structure (Rackowski & Travis, 2000:121), providing support for Kayne’s (1994) antisymmetry. Similarly, Holmer (2006) seeks to reconcile Cinque’s (1999) hierarchy, Kayne’s (1994) Linear Correspondence Axiom, and Seediq data, though he adopts a different perspective on the nature of adverbial particles. While leaning toward Cinque’s universal hierarchy, Holmer (2006) treats adverbials as functional heads rather than specifiers. He further proposes two distinct types of heads, X and Y: X-heads undergo head movement, whereas Y-heads trigger XP movement. Accordingly, if adverbials are X-heads in the left periphery, their order reflects Cinque’s hierarchy. On the other hand, if they are Y-heads, they induce leftward XP movement into their specifier positions, and after successive XP movements, the Y-heads surface in the reverse order of Cinque’s hierarchy. Another line of research on Austronesian adverbials focuses on their grammatical status. A central claim in this direction is that Formosan adverbials should be analyzed as verbs or predicates. Starosta (1988) suggested nearly three decades ago that adverbials in Formosan languages are likely verbal in nature. Building on this view, Tang (2001) and Chang (2006) classify adverbial verbs, particularly manner adverbials, as predicates or lexical verbs. Their arguments rest on several points. First, unlike typical functional heads such as negation, manner adverbial verbs can take voice markers, whereas negation heads cannot. Second, they behave like lexical elements given their compatibility with the morphological causative prefix, which is characteristically lexical. Moreover, in many Formosan languages, manner adverbial verbs display dual functionality: they may serve as the main predicate of a clause, taking another lexical verb as their complement, or appear as the sole predicate directly selecting an NP. Nevertheless, equating adverbial verbs with lexical verbs is far from unproblematic. The central issue lies in the diversity of adverb types and the range of languages examined, which renders adverbial verbs a heterogeneous group within the Austronesian family. As Chang (2010) observes, not all adverbial verbs can function as the main predicate of a clause. For instance, frequency verbs cannot. Furthermore, most adverbial verbs are limited in their inflectional possibilities, typically occurring only in AF or NAF forms, whereas ordinary lexical verbs exhibit a broader range of inflectional variation (Chang 2010). Another line of analysis treats adverbial verbs as parallel to light verbs, given their “hybrid nature” (Chang 2010, p. 206). Like light verbs more generally, adverbial verbs in Formosan languages are semantically dependent on the lexical verb they co-occur with (Butt & Geuder, 2001, cited in Chang, 2010). Yet, as Chang (2010) further points out, the way AVCs are realized is highly language-specific, varying even among closely related Formosan languages. Building on the treatment of adverbials as verbs, analyses that mix AVCs, SVCs, or other complement clause types are also widespread among Formosan languages. Evidence shows that in some languages AVCs and SVCs are structurally related, while in others they are syntactically distinct. For example, Chang (2010) argues that Amis, along with Paiwan and Marinax Atayal, should not be considered serializing languages because of the presence of a linker. By contrast, Chang (2006) identifies four types of AVCs in Kavalan, two of which qualify as SVCs. Yeh and Huang (2009) further contend that AVCs are in fact SVCs in Kavalan, Saisiyat, Squliq Atayal, and Tsou, with adverbial verbs linearized in a relatively fixed order consistent with Cinque’s universal hierarchy. Similarly, Wu (2006) shows that in Kanakanavu, manner adverbials behave like lexical verbs and can be serialized with other lexical verbs subject to AF restriction. Although no consensus has been reached, the complexity of adverbials continues to motivate research on their licensing, distribution, and clausal relationship with other lexical verbs. This research seeks to explore the following classes of Amis adverbial modifiers, long considered to be licensed in separate clausal domains, concentrating on their argument-taking capacity, their interaction with voice and TAM markers, and their distribution within the clause. Higher adverbs or speaker-oriented: (i) speech act: frankly Frankly, I disagree with your idea. (ii) evaluative: fortunately Fortunately, it’s Aki who was elected. (iii) epistemic: probably Probably, it will rain today. Subject oriented: (i) agent-oriented: intentionally Aki intentionally beat Panay. (ii) mental-attitude: sadly He sadly left the village. Manner: e.g., violently Aki violently beat Panay. Aspect-oriented: (i) repetitive: again He went to Taipei again. (ii) prospective: almost He almost finished the homework. (iii) frequency: once, always, often Aki often beats Panay. Emphatic: really He really likes Panay. Focusing: only, also He only likes Panay. He also likes Panay. 3 Results 3.1 Two types of adverbial modifiers Two main arguments are put forward. First, consistent with earlier findings, the majority of Amis adverbial modifiers appear as verbs or predicates, while a small subset pattern more like English-type adverbs. (i) Adverbs: heca ‘also’, alatek ‘probably’ (ii) Verb/predicate: most speaker-oriented adverbials, subject-oriented adverbials, lower adverbials, and manner adverbials. Second, for adverbial modifiers realized as verbs, the individual classes diverge in their use of voice markers and TAM morphology and in how they take arguments. Amis adverbial modifiers which are equivalent to English adverbs does not take voice, TAM, or imperative markers. Thus, forms such as *mi-alatek, *alatek-ay, or *alatek-en are unattested. They also do not impose AV restrictions on the lexical verb, nor do they permit a linker between themselves and the verb. Moreover, they are not confined to the preverbal slot but may occur in various positions within the clause. As illustrated in (7), (8), (10), and (11), heca and alatek can appear in multiple sentence positions. An interesting observation is that these two seem restricted to the licensing domains predicted by Ernst (2002). For instance, in (7), (8) and (9), ‘probably’ must occur at the outermost edge of the sentence, taking scope over the temporal adverb ‘today.’ By contrast, heca ‘also’ cannot precede the main verb or cross the Mod-AspP boundary (see (12)). Within Mod-AspP, however, the positions of ‘also’ and ‘probably’ are relatively flexible (see (13)). Based on their properties and distribution, this study suggests that alatek and heca are integrated into the structure through adjunction. 7. alatek (*a) ma-‘urad anini probably (*LNK) NEUT-rain today ‘It will probably rain today.’ 8. ma-‘urad anini alatek NEUT-rain today probably ‘It will probably rain today.’ 9. *ma-‘urad alatek anini NEUT-rain probably today 10. ma-emin tu heca ma-curah k-u-ra luma’ NEUT-completely ASP also UV-burn NOM-CM-that house ‘That house was also completely burned out.’ 11. ma-emin tu ma-curah k-u-ra luma’ heca NEUT-completely ASP UV-burn NOM-CM-that house also ‘That house was also completely burned out.’ 12. *heca ma-emin tu ma-curah k-u-ra luma’ also NEUT-completely ASP UV-burn NOM-CM-that house 13. ma-fana’ (alatek heca/heca alatek) φ-ci aki NEUT-know (probably also/also probably) NOM-PPN Aki ‘Aki also probably know (this).’ 3.2 Different classes of adverbial verbs With the exception of a small set of adverbials, such as probably and also, most adverbial modifiers in Amis are realized as verbs or predicates. In Amis, verbs may take voice markers, TAM markers,2 applicatives, imperatives, and arguments. The following section provides a descriptive account of how these properties manifest in adverbial verbs. 3.2.1 Argument taking Different types of adverbial modifiers behave differently with respect to argument-taking. For example, speaker-oriented modifiers, such as the evaluative fortunately, select for a proposition that appears with only one argument (see (14)). Aspect-oriented modifiers, such as almost and often, can preserve their adverbial interpretation even without an accompanying lexical verb, provided that the preceding discourse supplies the relevant context (see (16) and (18)). One interesting point is that in these two categories, the sole argument, whether surfacing within a proposition or directly, is restricted to nominative case in order to preserve its adverbial interpretation. If the sole argument is assigned genitive case, the adverbial interpretation fades away (see (17)). Within the subject-oriented class, a mental-attitude verb such as sadly functions as an adjectival predicate when restricted to a single argument (see (22)), and this argument must bear nominative case (see the contrast between (21) and (22)). By contrast, an agent-oriented adverbial verb with only one argument may assign either nominative or genitive case, exemplified in (20) and (21), respectively. Manner adverbial verbs cannot independently select an argument in the absence of another lexical verb. One possible explanation for the ungrammaticality of (24) is that manner verbs, although functioning as predicates, are inherently dependent on modifying another verb. Thus, (24) is ungrammatical because there is no independent event for the manner modifiers to modify. 14. Atay hani φ-ci aki fortunately NOM-PPN Aki ‘Fortunately, it’s Aki.’ 15. *atay hani n-i Aki fortunately GEN-PPN Aki 16. ma-ngata kaku AV-close 1st.SG.NOM ‘I am almost (hit).’ 17. ma-ngata n-i aki UV-close GEN-PPN Aki ‘Aki was approached (by someone). 18. pa-rarid φ-ci aki CAU-often NOM-PPN Aki ‘Aki often (does this).’ 19. *rarid-en n-i Aki often-UV GEN-PPN Aki 20. patedeng saan φ-ci Aki intentionally SAAN NOM-PPN Aki ‘Aki did (it) intentionally.’ 21. patedeng han n-i Aki intentionally HAN GEN-PPN Aki ‘Something was done by Aki intentionally. 22. ma-rarum φ-ci aki NEUT-sad NOM-PPN Aki ‘Aki is sad.’ 23. *ma-rarum n-i aki UV-sad GEN-PPN Aki 24. 3*mi-palifud φ-ci aki AV-violent NOM-PPN Aki 3.2.2 TAM taking Verbs in Amis can carry aspectual markers, such as the factual marker –ay, the irrealis Ca- reduplication (Wu 2006), and the past tense marker na-. With the exception of speaker-oriented adverbs, particularly those not derived from other lexical roots (see (25)), all classes can take TAM markers. 25. *4na-ka-tengil-an φ-ci panay PST-KA-hear-UV NOM-PPN Panay 26. pa-rarid-ay tu φ-ci aki (a) mi-palu’ ci panay-an CAU-often-FAC ASP NOM-PPN Aki (LNK) AV-beat PPN Panay-DAT ‘Aki often beat Panay.’ 27. ma-ma-rarum φ-ci aki (a) mi-laliw t-u niyaru RED-NEUT-sad NOM-PPN Aki (LNK) AV-leave DAT-CM village ‘Aki will sadly leave the tribe.’ 28. ma-mi-palifud φ-ci aki mi-palu’ ci panay-an RED-AV-violent NOM-PPN Aki AV-beat PPN Panay-DAT ‘Aki is going to beat Panay violently.’ 3.2.3 Voice marking, Imperative, and Applicative Amis verbs are obligatorily marked with a voice marker, overtly or covertly. According to Wu (2006), the Amis voice system is dichotomous, consisting of actor and undergoer voices, while markers traditionally described as locative, goal, or instrument “voices” are better analyzed as applicatives. The least marked voice affixes include mi-, ma-, and –en. Other markers, such as infix–um-, ma-ka-, the covert form, etc., are lexically restricted and occur only with specific verbs, making them relatively rare. As in many other Formosan languages, the choice of voice in Amis determines the case alignment of the clause. Different classes of adverbial verbs exhibit different voice marker taking patterns. Speaker-oriented adverbial predicates/verbs are default actor-voiced and 5a lot of them are not overtly marked with any voice marker. Many mental attitude adverbial verbs only take the voice marker ma-, correspondent to what Wu (2007) argues for the fourth type of ma-, which is often encoded a transient or plain state. Since the nature of these adverbials, they cannot take voice markers other than ma-. With the exception of speaker-oriented and mental-attitude adverbial verbs, the voice system of other adverbial verbs is generally dichotomous, even when they lack clearly distinct voice markers. The ability of an adverbial verb to bear specific voice markers depends on whether it is derived from an existing lexical item. When derived, such verbs tend to be more restricted in the range of voice markers they can take. For instance, Ma-ngata ‘almost,’ derived from ngata ‘close,’ occurs only with the voice marker ma-, but it can nevertheless function in either actor or undergoer voice, assigning nominative or genitive case to the actor when accompanied by a lexical verb (see (29) and (30)). In contrast, ma-rarid (or pa-rarid) ‘often,’ which is not derived from another form, allows for a wider set of markers, including voice marker –en, the imperative marker (see (31)), and even applicatives such as sa- (see (32)). 29. Ma-ngata φ-ci aki (a) mi-laheci t-u-na tatiliden AV-close NOM-PPN Aki (LNK) AV-finish DAT-CM-that homework ‘Aki almost finished that homework.’ 30. ma-ngata n-i aki (a) mi-laheci k-u-na tatiliden UV-close GEN-PPN Aki (LNK) AV-finish NOM-CM-that homework ‘Aki almost finished that homework.’ 31. rarid-en (a) kaen k-u futing Often-UV (LNK) eat.AV NOM-CM fish ‘Eat fish often.’ 32. sa-ka-rarid ningra (a) mi-palu’ ci panay-an APPL-KA-often 3rd.SG.GEN (LNK) AV-beat PPN Panay-DAT k-u-ni ‘arad NOM-CM-this stick ‘Aki often beats Panay with this stick.’ Amis is a predicate-initial language. It has been reported in previous studies (e.g., Liu 2003; Jheng 2025) that when another lexical verb is present, all adverbial verbs must precede it. Changing their relative order leads to ungrammaticality. This is true for adverbial verbs of all classes, illustrated from (33) to (37). 33. *Mi-li’ayw-ay kita (a) mi-laliw atay hani AV-before-FAC 2nd.PL.NOM (LNK) AV-leave fortunately 34. *ma-mi-laheci φ-ci aki t-u tatiliden ma-ngata RED-AV-finish NON-PPN Aki DAT-CM homework AV-close 35. *mi-laliw-ay φ-ci aki t-u niyaru ma-rarum AV-leave-FAC NOM-PPN Aki DAT-CM tribe NEUT-sad 36. *mi-cumud φ-ci aki t-u-na luma’ takaw saan AV-enter NOM-PPN Aki DAT-CM-that house steal SAAN 37. *ma-palu’ n-i aki φ-ci aki mi-palifut UV-beat GEN-PPN Aki NOM-PPN Aki AV-violent Table 2 summarizes the derivation, voice selection, TAM affixation, and argument introducing capacity of different classes of adverbial verbs in Amis. Table 2: Elicited Adverbials in Amis Class Example Derivation Voice marker Voice (with lexical verb) TAM Argument Taking Placement Speaker-oriented Speech-act Su’elin-ay ‘frankly’ (lit. frankly speaking) Su’elin ‘real’ N/A Default AV N/A Taking a clause Preverbal Evaluative Atay hani ‘fortunately’ N/A N/A Default AV N/A Taking a clause Preverbal Class Example Derivation Voice marker Voice TAM Argument taking Placement Modal-Aspectual Deontic Ma-nga’ay Nga’ay ‘well’ Ma- Fixed AV/UV Yes Yes; adverbial reading recoverable Preverbal Ability Ma-edeng Edeng ‘capacity’ Ma- Fixed AV/UV Yes Yes; adverbial reading recoverable Preverbal Aspectual Ma-ngata ‘almost’ Ngata ‘close’ Ma- Fixed AV/UV Yes Yes; adverbial reading recoverable Preverbal Frequency Ma-rarid ‘often’; pa-rarid Ma-, -en AV/UV Yes Yes; adverbial reading recoverable Preverbal Additive Ma-liyaw, mi-liyaw, liyaw-en Ma-, mi-, -en AV/UV Yes Yes; adverbial reading recoverable Preverbal Subject-oriented Agent-oriented Ma-padeteng, mi-padeteng, padeteng saan, padeteng han ‘intentionally’ mi-, ma-, -en but saan/han preferred AV/UV Yes Yes; adverbial reading recoverable Preverbal Manner Manner Ma-palifut, mi-palifut, palifut-en ‘violently’ Ma-, mi-, -en AV/UV Yes N/A without the other lexica verb Preverbal 4 Conclusion After examining various classes of adverbials through tests involving argument-taking, TAM markers, voice markers, and sentence position, two main conclusions can be drawn. First, Amis adverbials can be divided into two broad categories: adverbs and predicates. Adverbs, such as aca ‘also’ and alatek ‘probably,’ do not allow voice, TAM, or imperative markers. While they can appear relatively freely in a clause, their distribution is restricted by the domains in which they are licensed. For instance, alatek must occur in the outermost position, encompassing the temporal adverb ‘today,’ whereas aca cannot precede the main predicate. Secondly, there is a tendency for higher adverbials to be less flexible in terms of taking voice markers, TAM, and argument. At the top of the hierarchy, speaker-oriented adverbials are fixed in form and a majority of them are incompatible with voice markers and TAM markers. They typically select for a clause. Modal and aspectual adverbials, while also limited in their voice marking, permit TAM markers. Subject-oriented adverbials, subdivided into agent-oriented and mental-attitude types, behave differently according to their semantic nature: mental-attitude adverbials are compatible only with the ma- voice marker, whereas agent-oriented ones can freely alternate between actor and undergoer voices. Finally, manner adverbial predicates most closely resemble typical Amis predicates, as they can take a wider range of voice and TAM markers. In conclusion, while most adverbial modifiers in Amis are realized as predicates or verbs, the variation remains and likely results from the inherent nature of their classes. The extent to which they resemble canonical predicates falls along a continuum across the different classes. References Alexiadou, Artemis. 1997. Adverb placement. Amsterdam: John Benjamins Publishing Company. Butt, Miriam. & Geuder, Wilhelm. 2001. On the (semi) lexical status of light verbs. In Norbert Cover and Henk van Riemsdjik, eds., On semi-lexical categories: The function of content words and the content of function words, pp. 323–370. Berlin: Mouton de Gruyter. Chang, Henry Yung-Li. 2006. The guest playing host: Adverbial modifiers as matrix verbs in Kavalan. In Hans-Martin Gartner, Paul Law, and Joachim Sabel, eds., Clause structure and adjuncts in Austronesian languages, pp. 43–82. Berlin: Mouton de Gruyter. Chang, Henry Yung-Li. 2009. Adverbial verbs and adverbial compounds in Tsou: a syntactic analysis. Oceanic Linguistics 48:439–476. Chang, Henry Yung-Li. 2010. On the syntax of Formosan adverbial verb constructions. In Raphael Mercado, Eric Potsdam, and Lisa deMena Travis, eds., Austronesian and theoretical linguistics, pp. 183–212. Amsterdam: John Benjamins Publishing Company. Cinque, Guglielmo. 1999. Adverbs and functional heads: A cross-linguistic perspective. New York: Oxford University Press. Ernst, Thomas. 2002. The syntax of adjuncts. Cambridge: Cambridge University Press. Haider, Hubert. 2004. Pre and post verbal adverbials in OV and VO. Lingua 114:779–807. Holmer, Arthur. 2006. Seediq: Adverbial heads in a Formosan language. In Hans-Martin Gartner, Paul Law, and Joachim Sabel, eds., Clause structure and adjuncts in Austronesian languages, pp. 83–123. Berlin: Mouton de Gruyter. Hsiao, Yi-Ling. 2004. Adverbials in Squliq Atayal. Master thesis. Hsin-Chu, Taiwan: National Tsing-Hua University. Huang, James Cheng-Teh. 1982. Local relations in Chinese and the theory of grammar. Ph.D. dissertation. Cambridge, USA, MIT. Jackendoff, Ray. 1972. Semantic interpretation in generative grammar. Cambridge: MIT Press. Jheng, Sam Wei-Cherng. 2025. The syntactic categories of adverbials and the structural integration of complement clauses in Siwkolan Amis. Language and Linguistics DOI: https://urldefense.com/v3/__https://doi.org/10.1075/lali.00240.jhe__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPriV-F04H$ Kayne, Richard. 1994. The antisymmetry of syntax. Cambridge: MIT Press. Laenzlinger, Christopher. 1998. Comparative Studies in Word Order Variation: Adverbs, Pronouns, and Clause Structure in Romance and Germanic. Amsterdam: John Benjamins Publishing Company. Li, Chao-Lin. 2007. Adverbial verbs and argument attraction in Puyuma. Nanzan Linguistics, Special issue 3:165–201. Li, Paul Jen-Kuei. 2003. Verbs or adverbs in Thao? Paper presented at the Second Workshop on Formosan languages, Academia Sinica, November 1 to 2. Liu, En-Hsin. 2003. Conjunction and modification in Amis. Master thesis, Hsin-Chu: National Tsing-Hua University. Pollock, Jean-Yves. 1989. Verb movement, universal grammar, and the structure of IP. Linguistic Inquiry 20: 365–525. Rackowski, Andrea. & Travis, Lisa 2000. V-initial languages: X or XP movement and adverbial placement. In Andrew Carnie and Eithne Guilfoyle, eds. The syntax of verb-initial languages, pp. 117–141. Oxford: OUP. Rochette, Annette. (1990). The selectional properties of adverbs. In Karen Deaton, Manuela Noske, and Michael Ziolkowski, CLS 26, pp. 379–391. Chicago: Linguistics Department, University of Chicago. Starosta, Stanley. 1988. A grammatical typology of Formosan languages. Bulletin of the Institute of History and Philosophy 59:541–576. Taipei: Academia Sinica. Su, Yi-Fan. 2008. Adverbials in Takituduh Bunum. Master thesis. Hsin-Chu: National Tsing-Hua University. Tang, Chih-Chen. 2001. Functional categories and adverbial expressions: A case study of Paiwan and Tsou. NSC project report. Wu, Cumming. 2006. Adverbials in Paiwan. Paper [presented at the 10th International Conference on Austronesian Linguistics. January 17–20, 2006. Palawan, Philippines. Yeh, Yu-Ting. and Huang, Shuanfan. 2009. A study of triple verb serialization in four Formosan languages. Oceanic Linguistics 48:78–110. ON THE DP STRUCTURE OF STANDARD INDONESIAN: AN EXPLORATION OF =NYA AND SE- Evelyn Elmer FETTES Cornell University eef55@cornell.edu Abstract In Standard Indonesian, the official national language of Indonesian, the clitic =nya may function as an anaphoric definite determiner while the prefix se- may function as a marker of specificity. se- must prefix to a classifier that precedes a noun phrase, while =nya cliticizes to a noun phrase. =nya cannot cliticize to a noun phrase preceded by a se- + a classifier without yielding a partitive interpretation as in (1) below. 1. *Seorang gurunya pintar. a.CL teacher=NYA clever Intended: ‘The (certain) teacher is clever.’ (Can mean ‘A certain one of the teachers is clever.’) I argue that the unacceptability of the nonpartitive reading of this sentence arises from the fact that both se- and =nya compete for the same position as the head of a DP projection dominating a coalesced Num/ClP. Keywords: syntax, coalescence, determiners, dialects ISO 639-3 codes: ind, jv 1 Introduction & Data 1.1 Language Overview It is necessary here to note that I am only discussing Standard Indonesian (SI) in §1-3. SI (bahasa Indonesia baku) is the national language of Indonesia and is used in domains of academia and government. SI is some sense, unnatural, in that it is acquired later in life via schooling and is rarely used in daily life. This contrasts with the many varieties of Colloquial Indonesian (CI), or bahasa sehari-hari, which can be conceptualized as a blend of SI and bahasa daerah / tanah or local/indigenous languages. A typical example of a CI is (mesolectal) Jakarta Indonesian (JI), which exists on a continuum between SI and Betawi, the local Malay isolect of Jakarta. Occasionally, Indonesians refer to CIs as bahasa gaul (trendy and/or vulgar language, slang). However, just as local languages vary from place to place, so do CIs. For example, the CI used in Yogyakarta is heavily influenced by Javanese. CIs will be discussed further in §4. 1.2 =nya as a definite determiner Schwarz (2009) and Moroney (2021), among others, note that definite determiners may be used in a multitude of contexts, notably in cases of uniqueness and anaphoricity. Languages differ in the number of definite determiners they have and their function. For example, German has two forms of the definite determiner, one used in uniqueness contexts, and one used in anaphoric contexts, while English has just one form, that is used both uniqueness and anaphoric contexts.1 In SI, bare noun phrases may be interpreted as definite or indefinite depending on the context. Bare noun phrases are used in uniqueness contexts; that is, uniqueness is not necessarily morphologically marked in SI. 2. Presiden Amerika Serikat menatap gerhana matahari. president USA stare eclipse sun ‘The President of the USA stared into a/the solar eclipse.’ However, SI does have an anaphoric definite determiner clitic =nya that is used to reference a topic already under discussion, as in (3) and (4).2 3. Saya membeli buku dan permainan video. Buku*(nya) tebal. I buy book and game video book=NYA thick ‘I bought a book and a video game. The book is thick.’ 4. Presiden*(nya) bodoh. President=NYA stupid ‘The president is stupid.’ (Context = discussing the US president staring into the eclipse) =nya is also used in “bridging” contexts, such as making a part-whole or producer-product connection as in (5) and (6). 5. Saya naik sepeda ke kerja. Dudukan*(nya) kurang nyaman. I ride bike to work seat=NYA less comfortable ‘I ride a bike to work. The/Its seat is uncomfortable.’ 6. Saya membeli buku hari ini. Penulis*(nya) sombong. I buy book day this writer=NYA arrogant ‘I bought a book today. The/Its writer is pretentious.’ However, these examples are ambiguous, as one of =nya’s other functions is to act as a third person possessive marker.3 In SI, possession can be marked by a clitic following a noun phrase (7) or a lexical possessor following a noun phrase (8), but not both (9). 7. buku tebalnya book thick=NYA ‘her(/the) thick book’ 8. buku tebal Jolanda book thick Jolanda ‘Jolanda’s thick book’ 9. *buku tebalnya Jolanda book thick=NYA Jolanda Intended: ‘Jolanda’s thick book’4 Notably =nya functioning as an anaphoric determiner may cooccur with a lexical possessor (10), although =nya marking possession and =nya marking anaphoricity cannot occur on the same noun phrase (11). 10. Saya dan John masing-masing membeli satu buku. Saya membaca I and John each buy one book I read kedua bukunya. Bukunya John lebih menarik dari buku both book=NYA. Book=NYA John more interesting than book saya.5 I ‘John and I each bought a book. I read both books. John’s book was more interesting than my book.’ 11. *bukunyanya book=NYA=NYA Intended: ‘the book of his/hers/theirs’ 1.3 se- as an indefinite determiner SI has classifiers that are typically considered “optional”, as they do not mandatorily cooccur between a numeral and a noun phrase (12). However, when classifiers are present, they must cooccur with a numeral/quantifier (13). The three most common classifiers in Indonesian are orang (for people), ekor (for animals), and buah (for inanimate referents). These classifiers are homophonous with lexical nouns orang ‘person’, ekor ‘tail’, and buah ‘fruit’. There are other less common classifiers, such as batang for rod-like objects and tangkai for stems/stalks.6 12. Tiga/Banyak (orang) anak perempuan memasuki ruang kelas. three/many (CL) child female enter room class ‘Three/Many girls enter the/a classroom.’ 13. *Orang anak perempuan memasuki ruang kelas. CL child female enter room class Intended: ‘Girl(s) enter the/a classroom.’ Classifiers may cooccur with either the prefix se- or the numeral satu ‘one’. While these two morphemes are very similar in meaning, they occur in different environments. se- is a prefix that attaches to classifiers or units of measurement; it may not precede a common noun directly, unlike satu (14-15). 14a. satu (ekor) keledai 14b. seekor keledai 14c. *sekeledai one (CL) donkey a.CL donkey a.donkey ‘one donkey’ ‘a donkey’ Intended: ‘a donkey’ 15a. satu (gelas) kopi 15b. segelas kopi 15c. *sekopi one (glass) coffee a.glass coffee a.coffee ‘one (glass of) coffee’ ‘a glass of coffee’ Intended: ‘a coffee’ Se- is often used discursively to introduce new referents (16). 16. Seorang anak laki-laki dan (seorang) anak perempuan memasuki a.CL child male and (a.CL) child female enter kelas. Anak perempuannya duduk di baris pertama. class child female=NYA sit in row first. ‘A boy and a girl enter the class. The girl sits in the first row.’ There are cases where se- and satu appear to be interchangeable, such as in (17) where a consultant was describing a school in which one teacher is known to be a harsh grader. However, in intentional contexts, se- and satu are interpreted very differently: In (18a), se- was used to express the desire to meet a particular movie star, while in (18b), satu was used to express the desire to meet any movie star. This indicates that se- is behaving as an indefinite article, and that it has a narrower interpretation than ‘a(n)’ in English; instead of indicating any one member of a collection of objects, se- indicates a specific one.7 17. Satu guru / Seorang guru di sana sulit memberikan nilai. one teacher / a.CL teacher there hard give grade ‘A teacher there grades harshly.’ 18a. Ibu Jolanda pergi ke LA untuk melihat seorang bintang film. Ibu Jolanda go to LA for see a.CL star movie ‘Ibu Jolanda went to LA to see a movie star.’ (specific) 18b. Ibu Jolanda pergi ke LA untuk melihat satu bintang film. Ibu Jolanda went to LA for see one star movie ‘Ibu Jolanda went to LA to see a movie star.’ (any) 1.4 Unacceptable cooccurrence of se- and =nya As shown in (1), repeated below, se- + a classifier may never appear before a noun phrase with =nya to mean a specific previously discussed referent, although such phrases may be interpreted as referring to a specific member of a previously discussed group, i.e., as a partitive. 1. *Seorang gurunya pintar. a.CL teacher=NYA clever Intended: ‘The (certain) teacher is clever.’ (Can mean ‘A certain one of the teachers is clever.’) I will argue in the following section that this ungrammaticality stems from the fact that both =nya and se- base generate as the Head of a DP. 2 Analysis 2.1. DP spine: The following figure (Figure 1) is a simple representation of the DP spine in SI: Figure 1: DP spine in Standard Indonesian DP represents a Determiner Phrase, NumP a Number Phrase, ClP a Classifier Phrase, and NP a Noun Phrase (whose internal structure I will not discuss). DPs may host a [+spec] feature associated with specificity. NumPs host a singular/plural feature, and numerals optionally generate in the Spec, NumP position. ClPs host a [+unit] or atomic feature; following Chung 2000 and Winarto 2016, I assume that this feature must always be present syntactically even if no lexical item is externally merged to spell it out. Classifiers also have an uninterpretable number feature which must be checked via coalescence, see §2.2. Overt classifiers agree with the NP in [φ] features. The head of DP may either be filled by se- or =nya to spell out the [+spec] feature. Se- agrees with the [+sg] feature of NumP; =nya does not agree with any features of lower projections. Se- has an uninterpretable unit feature that must be checked by the head-to-head movement of the classifier in the head of a coalesced Num/ClP projection. =nya has an EPP/edge feature which triggers the phrasal movement of NP to Spec, DP. Sample derivations are given in §2.3. 2.2 Coalescence Hsu (2021) posits that coalescence is a process by which head-adjacent projections can be bundled together, enabling multiple features to be introduced on one head: ‘α and β are head-adjacent if and only if a) α and β are minimal projections (i.e., heads), b) α asymmetrically c-commands β, and c) there is no node κ that asymmetrically c-commands β and is asymmetrically c-commanded by α …[thus]… Coalescence creates a single node that contains all features associated with the individual heads’. (Hsu 2021:53). This is schematized in Figure 2. Figure 2: Coalescence (Hsu, 2021:53) According to the spine proposed in §2.1, the ClP necessarily lacks a specifier and therefore may always be bundled with the adjacent NumP projection, as below. Hsu (2021) also posits that category features introduced by heads may be dominant(D) or recessive(R). Recessive features are those that must occur on a bundled head while dominant features do not need to be bundled and are potential hosts for recessive features. Only a head with a dominant feature can license a phrasal specifier. In this case, the [+unit] feature of the ClP is recessive and therefore may be hosted by a [+sg/pl] dominant feature of NumP. This is schematized in Figure 3. Figure 3: Coalesced NumP and ClP Under Hsu’s theory of coalescence, the recessive feature (in this case [+unit]) does not need to be spelled out with an externally merged lexical item, hence the optionality of classifiers. However, if a classifier is externally merged, the [+sg/pl] feature is not spelled out. I assume that this necessitates the external merge of a numeral specifier or a quantifier/determiner in a higher projection that agrees with the number feature, such as se-. See sample derivations in §2.3 and an explanation of failed derivations in §2.4. 2.3 Derivations Before presenting a derivation with se-, let’s do one with satu as a basis for contrast: As per Figure 3, nothing is necessarily merged as the head of Num/ClP as the [+unit] feature is recessive and therefore does not have to be spelled out. However, if a classifier is merged, a numeral phrase is necessarily merged as a specifier. I am agnostic on whether a DP necessarily base-generates above the NumP, as noun phrases with numerals can be interpreted as definite or indefinite depending on context. If a DP is present above the Num/ClP projection below, it either has no probes that motivate the movement of lower heads/phrases or is further coalesced with a QP and/or the Num/ClP, see §3 for a brief discussion. 19a. satu (orang) guru pintar one (CL) teacher clever ‘one clever teacher’ 19b. In comparison to satu, se- base-generates as the head of D. se- agrees with the [+sg] feature of the NumP and has a specific feature and an uninterpretable unit feature. The head of the Num/ClP projection undergoes head-to-head movement to adjoin with the D-head se- to check the uninterpretable unit feature. 20a. seorang guru pintar a-CL teacher clever ‘a (certain) clever teacher’ 20b. Lastly, =nya also base-generates as Head, D. It has a specific feature and an EPP/edge feature. The closest phrase available for movement to check the EPP/edge feature is the NP. 21a. guru pintarnya teacher clever=NYA ‘the (previously mentioned) clever teacher(s)’ 21b. There is an interesting, seemingly epiphenomenal, effect that φ-features such as category/animacy are present in the DP in both (20) – as the classifier agrees with the NP in φ-features – and in (21) – as the NP itself moves to DP. These φ-features are therefore visible to higher projections and future operations. 2.4 A Brief Discussion of Failed Derivations It is now clear why (1) fails as a derivation meaning ‘the one clever teacher’, as both se- and =nya cannot occupy the same position as Head, DP. I have not yet investigated theories about the structure of partitives and therefore cannot present a tree accurately representing the acceptable partitive interpretation of (1), however see §3 for thoughts about QP structure in general. A more salient question is why the following word orders are unacceptable: 22. *tiga (orang) guru pintarnya three (CL) teacher clever=NYA Intended: ‘the three clever teachers’ 23. *guru pintarnya tiga (orang) teacher clever=NYA three (CL) Intended: ‘the three clever teachers’ (Can mean three of the clever teachers) =nya must cliticize to a maximal projection/lexical phrase that occupies the Spec, DP position, motivated by the EPP/edge feature. However, antilocality rules out moving a Num/ClP projection, thus making (22) impossible, as schematized in (24) below: 24. Less clear is why (23) is unacceptable. It is possible that phrasal movement is “blocked” by the intervening externally merged lexical items tiga (orang). Another possibility is that the EPP/edge feature works in conjunction with a φ-feature probe: As classifiers agree with the noun phrase in φ-features, this probe would find the Num/ClP phrase before the NP, but the Num/ClP cannot undergo phrasal movement due to antilocality as shown in (24) and therefore the derivation collapses. Se- is forbidden from attaching to an N(P), as shown in (14c) and (15c). I attributed this to an uninterpretable unit feature, but it could simply be ruled out by the Head Movement Constraint as well: Head, DP is not in a head-adjacent position to NP and therefore the head of NP cannot undergo movement to DP. 2.5 Interim Conclusion DPs exist in SI. Head, DP can be occupied by se- or =nya, but obviously not both simultaneously. Coalescence results in the bundling of the NumP and ClP projections. se- as Head, DP agrees in number with Num/ClP and has an uninterpretable unit feature that necessitates the presence of a classifier. =nya as Head, DP has an EPP/edge feature that results in the phrasal-to-specifier movement of the NP to Spec,DP. Antilocality restrictions and the need to check uninterpretable features rule out the majority of unacceptable words orders. 3 What’s up with QP and Reduplication? At ISMIL28 prior to the SEALS34 meeting, I noted that classifiers in SI may never reduplicate or cooccur with a reduplicated noun, as in (25). 25. *Saya melihat ekor~ekor keledai / ekor keledai~keledai. I see CL~CL donkey / CL donkey~donkey Intended: ‘I see donkeys.’ I proposed that this was a syntactic phenomenon that emerged from competition between classifiers and a reduplicative morpheme RED to head a coalesced Quantifier/Number/Classifier Phrase. The Head of Q/Num/ClP may host the recessive feature [+unit] and the dominant features [+sg/pl] and [+quant]. Either classifiers or RED may be externally merged as Head, Q/Num/ClP. As was expressed in §2.2, if classifiers are merged to spell out the [+unit] feature, a numeral or quantifier must be externally merged in the Spec, Q/Num/ClP to spell out the [+quant] or [+sg/pl] feature. The spine is schematized in Figure 4 below. Note that this structure is similar to that proposed by Sato 2009, as schematized in Figure 5. The salient difference between my analysis and Sato’s, outside of labels, is that in my analysis, numerals live in Spec, Q(/Num/ClP), rather than the Head. In this I am following an aspect of Winarto’s 2016 analysis in using a numeral specifier to “block” NP phrasal movement (as in (23)) and embracing a core tenant of Hsu 2021’s theory, which is that projections with dominant category features have specifiers. Figure 4: Coalesced QP, NumP, and ClP Figure 5: QP spine for SI (Sato, 2009:199) A flaw with Figure 4’s analysis is that in §2.2, I hypothesized that the Number projection hosted a dominant category feature while the Classifier projection hosted a recessive category feature, thus allowing the ClP to be bundled with NumP. However, if the Number category feature is dominant, it does not need to be bundled with QP. I have a few thoughts regarding this dilemma: One possibility is that QP must agree with NumP and that somehow this enables the bundling of a QP head hosting a recessive feature. A second, possibility is that two grammars exist simultaneously in “Indonesian”8: In the first, NumP has a dominant feature and the ClP a recessive feature as previously described. In the second, the ClP is similar to a MeasureP in that it is not mandatorily present. In this case, the NumP would likely have a recessive category feature and be able to be bundled with a QP with a dominant feature. The second theory is perhaps the more likely one, given the tendency of CIs to lack classifiers altogether (see §4). Another (related) flaw is that I was, and continue to be, unclear about is the nature of RED in terms of what features in spells out. Authors disagree on whether RED encodes simple plurality or has distributive properties. In other words, it is not clear whether RED is associated with a [+unit], a [+pl], a [+quant] feature, or some combination thereof. The answer likely lies in exploring a fascinating phenomenon in Indonesian, which is that with an NP comprised of a noun and an adjective may reduplicate either the noun or the adjective, as in (26). 26a. guru~guru pintar 26b. guru pintar~pintar teacher~teacher clever teacher clever~clever ‘clever teachers’ ‘clever teachers’ It is possible that in (26b) the NP raises around the projection containing RED while in (26a) the NP remains in situ. The mechanics of this are as of yet unknown, as they depend on resolving outstanding questions regarding the relationship of QP and DP. Sato states: ‘It is possible that contemporary Indonesian has the DP on top of QP, given MacDonald’s (1976: 85) observation that itu “that” and the enclitic pronoun –nya “his, her, its, their”, when combined with nouns, tend to make them definite, thereby “coming to fulfill a function very much like that of the definite article “the” in English”.’ (Sato 2009:99) If the structure of Indonesian is [DP [QP [Num(/Cl)P [NP]]]], there are still multiple possibilities for coalesced structures: For instance, does DP host a dominant feature and QP a recessive feature that would allow for a D/QP bundling? Can DPs take a NumP directly ([DP [Num(/Cl)P [NP]]]), or must there be an intervening QP level? What’s going on with partitive structures? Are they QPs that have DP complements? This section must end on the classic tropey line: More research is required. 4 Microvariation in Varieties of Colloquial Indonesian As noted in §1.1, the preceding sections have only discussed Standard Indonesian (SI). Colloquial Indonesians (CIs) and local languages show a much wider range of possible combinations of words in nominal phrases, perhaps due to the lack of classifiers in many isolects. For example, it has been noted by several authors that mesolectal Jakarta Indonesian (JI) frequently omits or potentially does not have classifiers. This suggests that the [+unit] feature has become lexicalized / is introduced in the NP, or in a higher projection, rather than in a ClP projection in CIs, supporting the idea presented in §3 that multiple grammars of Indonesian can exist simultaneously depending on whether a speaker is using the formal or an informal register. These competing/complementary grammars are possibly responsible for variation in speaker judgment as to whether it is acceptable in “Indonesian” to use numerals with =nya to express a definite number of objects. Occasionally, scholars even disagree with themselves. For example, Winarto (2016) claims that numerals may never cooccur with =nya, yet Little & Winarto (2018) claim that phrases such as (27) below are grammatical. However, both articles agree that classifiers and =nya are incompatible, leading to their conclusion that it is the classifier itself that is functioning (perhaps in conjunction with se-) as an indefinite article9, as in (28). 27. ?lima mangganya 28. sebuah mangga(*=nya) five mango=NYA se.CL mango ‘the five mangoes’ ‘a mango’ In SI, RED may never cooccur with a numeral or a quantifier. However, in JI, it is grammatical to use both banyak ‘many’ and segala ‘all’ with reduplicated nouns10. A few speakers I have interviewed informally find other quantifiers and numerals to be marginally acceptable with a reduplicated noun. However, I found in a 2024 corpus study that there were no instances in the Jakarta Indonesian corpus compiled by Gil and Tadmor (2015) of numerals or quantifiers other than banyak/segala occurring with a reduplicated noun. (There were 11 instances of banyak and 3 instances of segala preceding a reduplicated noun (Fettes 2024).) Interestingly, this study also found several instances of banyak following a reduplicated noun phrase in JI. In SI, quantifiers must precede a noun phrase. However, quantifier order seems more fluid in CIs. For example, in the variant spoken in Yogyakarta, quantifiers may precede or follow that noun phrase. This is perhaps due to Javanese influence, where quantifier placement seems equally fluid11. 29a. Colloquial Indonesian (Yogyakarta): 29b. Standard Javanese (Ngoko): banyak orang/ orang banyak wong akeh/ akeh wong many person/ person many person many/ many person ‘many people’ ‘many people’ Another characteristic of the CI spoken in Yogyakarta is that =nya may cooccur with a lexical possessor. As mentioned in §1.2, in SI, bare noun phrases are followed by a clitic or a lexical possessor to indicate possession. However, in CIs noun phrases take the clitic =nya before being followed by the third person to indicate possession. Again, this is likely due to the influence of Javanese, in which the cognate clitic =(n)e12 is mandatory. 30a. Standard Indonesian: 30b. Colloquial Indonesian (Yogyakarta): rumah/ buku John rumahnya/ bukunya John house / book John house=NYA/ book=NYA John ‘John’s house/book’ ‘John’s house/book’ 30c. Standard Javanese (Ngoko): omah*(e) / buku*(ne) John house=(N)E/ book=(N)E John ‘John’s house/book’ Sneddon (2006:36) notes that =nya functions as “an optional ligature or linker before possessive nouns” in Jakarta Indonesian (JI) as well, also attributing this function to Javanese influence. It can occur not only before third person possessors, but first and second as well as in (31). Beyond that, it can occur with proper names for “emphasis” as in (32). 31. Abu-abu, kayak-nya handphone-nya aku. grey like-nya mobile.phone-nya my It’s grey, like my mobile phone. (Sneddon, 2006:36) 32. Selamat malam. Vennynya ada? [good evening] V-nya be Good evening. Is Venny in? (Sneddon, 2006:38) Sneddon (2006) also notes that =nya can act as a “definitiser or identifier” and gives examples of =nya acting anaphorically in JI in ways analogous to how it behaves in SI. However, it is less clear whether =nya has gained additional characteristics of definite determiners. Little & Winarto (2018) claim that =nya is optional in uniqueness contexts, as in (33). 33. Matahari(-nya) panas sekali hari ini. sun(-DEF) hot very day this ‘The sun is very hot today.’ (Little & Winarto, 2018:201) In contrast, Kaufman, Martohardjono, & Dayal (forthcoming) claim that =nya is required with an adjectival predicate as in (33,34) but infelicitous with a verbal predicate as in (35) and that ‘N-nya denotes the N-now (or at the time of evaluation) while the bare N denotes the N-generally’. 34. Matahari#(-nya) terang hari ini. sun-NYA bright day this ‘The sun is bright/shining today.’ 35. Matahari(#-nya) ber-sinar. sun(-NYA) AV-shine ‘The sun shines.’ (Kaufman, Martohardjono, & Dayal, forthcoming) It seems possible that competing grammars are responsible for the “optionality” of =nya occurring with matahari in (33,34), with speakers either producing the SI structure, without =nya, or the CI structure, where =nya functions as a ligature / coindexes matahari with hari ini. In summary, CIs show microvariation in acceptable cooccurrences of numerals/quantifiers with reduplication, more variable word order with respect to quantifiers, and a polysemous =nya with more functions than it has in SI. The influence of Javanese on CIs spoken in both Yogyakarta and Jakarta raises the question of how CIs in regions with different local languages utilize determiners. This question will (hopefully!) be explored in my forthcoming dissertation. 5 Conclusion The impetus for this paper and the SEALS presentation on which it was based was the observation that in Standard Indonesian =nya cannot cliticize to a noun phrase preceded by a se- + a classifier without yielding a partitive interpretation. I attribute this fact to se- and =nya both being externally merged as Head, DP since =nya functions as an anaphoric definite determiner and se- functions as a type of specific indefinite determiner, as per the data given in Section 1. Section 2 presents an analysis of DP structure in SI inspired by Hsu (2021), Winarto (2016), and Sato (2009). Section 3 outlines additional issues that must be addressed in any extension of the analysis presented in Section 2. Section 4 gives an overview of some interesting points of microvariation that exist in Colloquial Indonesians in comparison to Standard Indonesian. Acknowledgements I am indebted to my primary Standard Indonesian consultant Ibu Jolanda Pandin, and my Javanese & Indonesian instructors at Wisma Bahasa, Yogyakarta for both allowing me to elicit data and their saintlike patience in dealing with my seemingly erratic focus in language classes. I am also grateful to my dissertation committee (John Whitman, Sarah Murray, and Abby Cohn), as well as the SEALS program and Einaudi Center at Cornell University, for supporting my winding fieldwork journey. References Arkoh, Ruby, and Lisa Matthewson. 2013. A Familiar Definite Article in Akan. Lingua 123 (January):1–30. https://urldefense.com/v3/__https://doi.org/10.1016/j.lingua.2012.09.012__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrltX1Fw4$ Blust, R. A. 2009. The Austronesian Languages. Pacific Linguistics 602. Canberra: Pacific Linguistics, Research School of Pacific and Asian Studies, Australian National University. Chung, Sandra. 2000. On reference to kinds in Indonesian. Natural Language Semantics 8:157–171. Dayal, V. (ed). To appear. Kaufman, Daniel, Gita Martohardjono and Veneeta Dayal. Chapter 7: (In)definiteness in Indonesian: A Case Study. The Open Handbook of (In)definiteness: A Hitchhiker’s Guide to Interpreting Bare Arguments, Open Handbooks in Linguistics, MIT Press. Fettes, Evelyn Elmer. 2024. The functions of full nominal reduplication in Jakarta Indonesian: A corpus-based examination. Proc Ling Soc Amer 9(1). 5695. Gil, David & Uri Tadmor, 2015. Jakarta Indonesian. A joint project of the Department of Linguistics, Max Planck Institute for Evolutionary Anthropology and the Center for Language and Culture Studies, Atma Jaya Catholic University. https://urldefense.com/v3/__https://hdl.handle.net/1839/00-0000-0000-0022-5AC9-0__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrmWcYho4$ Hsu, Brian. 2021. Coalescence: A Unification of Bundling Operations in Syntax. Linguistic Inquiry 52.1:39–87. https://urldefense.com/v3/__https://doi.org/10.1162/ling_a_00372__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrrj9OlU3$ Little, Carol-Rose and Ekarina Winarto. 2019. Classifiers and the definite article in Indonesian. Proceedings of the 49th North East Linguistics Society 2 (2019):209–220. MacDonald & Soenjono. 1967. A Student’s Reference Guide of Modern Formal Indonesian. Georgetown University Press. Moroney, Mary. 2021. Updating the Typology of Definiteness: Evidence from Bare Nouns in Shan. Glossa: A Journal of General Linguistics 6 (1). https://urldefense.com/v3/__https://doi.org/10.5334/gjgl.1221__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrmexH__z$ Robson, Stuart. Javanese grammar for students. Centre of Southeast Asian Studies, Monash Asia Institute, Monash University, 1992. Sato, Y. 2009. Radical underspecification, general number and nominal mapping in Indonesian. AFLA16: The Proceedings of the 16th meeting of the Austronesian Formal Linguistics Association Schwarz, Florian. 2009. Two Types of Definites in Natural Language. University of Massachusetts Amherst. Sneddon, James N. 2006. Colloquial Jakartan Indonesian. Canberra: Pacific Linguistics, Research School of Pacific and Asian Studies, The Australian National University. Sneddon, J.N., K.A. Adelaar, D.N. Djenar, and M. Ewing. 2010. Indonesian: A Comprehensive Grammar. 2nd Edition. Routledge Comprehensive Grammars. Taylor & Francis. Winarto, Ekarina. 2016. The Indonesian DP. AFLA22: The Proceedings of the 22nd meeting of the Austronesian Formal Linguistics Association. https://urldefense.com/v3/__http://hdl.handle.net/1885/101155__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrvlZHXxg$ COMPARATIVE EXPRESSIONS IN LONGMING ZHUANG: A FUNCTIONAL ANALYSIS OF KWAː AND PIː Haiping HUANG Keio University chiwann2011@hotmail.co.jp Abstract This paper presents a descriptive analysis of comparative constructions in Longming Zhuang, a previously undocumented Southern Zhuang (Central Tai) dialect. Based on original fieldwork, this study identifies two primary strategies for expressing comparison: one employing lexical or adverbial means, and another utilizing two distinct syntactic markers, kwaː³ and piː⁵. The analysis focuses on these marked constructions, revealing a significant functional and syntactic divergence between the two morphemes. While kwaː³ serves as the default marker for objective superiority, piː⁵ is pragmatically specialized to convey subjective, often negative evaluations such as blame or complaint. This functional divergence is mirrored in their distinct syntactic behaviors, including word order and the placement of negation, suggesting that kwaː³ functions as a particle, while piː⁵ exhibits a split syntactic status: acting as a preposition-like head in superiority contexts, but as a particle in equality and inferiority contexts. Crucially, the analysis reveals that the marker piː⁵, typically associated with Northern Zhuang, is used productively in this southern dialect. This finding highlights the need for further investigation into similar usage in other southern varieties and suggests that the degree of Chinese influence on Zhuang comparative constructions may vary regionally. Keywords: Zhuang language, comparative constructions, syntactic markers, functional specialization ISO 639-3 codes: zzj, zyb, cmn 1 Introduction 1.1 Purpose and Background The expression of comparison is a universal semantic domain, yet its linguistic realization is highly diverse across languages. While much research has focused on dedicated grammatical markers, a comprehensive understanding requires viewing these markers as components of a broader system of “comparative strategies.” Adopting such a systemic approach, this paper analyzes the comparative constructions of Longming Zhuang, a dialect for which a detailed grammatical description has been hitherto unavailable. Longming Zhuang is a Southern Zhuang dialect belonging to the Central Tai branch (Li 1977) of the Tai-Kadai language family. Although various Zhuang dialects have been the subject of increasing research, a comprehensive grammatical description of Longming Zhuang has not yet been published. Previous studies by the author (Huang 2018-2024) have examined its phonology, tone system, and certain grammatical features such as serial verb constructions and negation. However, comparative constructions have remained uninvestigated, representing a significant gap in the literature. The primary purpose of this paper is to address this gap. Drawing on field data collected by the author, this study provides the first detailed analysis of the grammatical structures of comparative expressions in Longming Zhuang. Specifically, it maps the full inventory of comparative strategies available in the dialect and investigates the functional division of labor between its two primary syntactic options. By providing the first dedicated description of comparative expressions in this dialect, this research contributes not only to a deeper understanding of Zhuang syntax but also to the typological study of functional variation within comparative systems. 1.2 The Zhuang Language and Its Dialects The Zhuang language is spoken by over 18 million people, primarily in the Guangxi Zhuang Autonomous Region of Southern China. The language is characterized by considerable dialectal diversity and is conventionally divided into Northern and Southern dialect groups, which are distinguished by significant phonological and lexical differences (Wang et al. 1984). The focus of this study, Longming Zhuang, belongs to the Southern group. It is a variety of the Zuojiang dialect spoken in and around Longming Town, Tiandeng County, Guangxi, near the Vietnamese border (see Figure 1). The dialect is situated within a linguistically dense Zhuang-speaking area. Tiandeng County had a total population of approximately 450,000, with ethnic Zhuang people constituting about 98% of this total (People’s Government of Tiandeng County 2025a). The immediate research site, Longming Town, has a population of 24,375 (People’s Government of Tiandeng County 2025b). While official ethnic statistics for Longming Town are unavailable, given the overwhelming demographic homogeneity of the county, it is reasonable to assume that the town reflects a similar concentration of Zhuang speakers. This geographical position has facilitated historical contact not only with other Tai varieties but also with Southwestern Mandarin, leading to distinct phonological and lexical features in the dialect. Figure 1: Map of Tiandeng County, with the location of Longming Town indicated 1.3 Key Linguistic Features of Longming Zhuang Longming Zhuang exhibits several typological features characteristic of the Tai-Kadai language family. It is a tonal, analytic language with a canonical SVO (Subject-Verb-Object) word order. Grammatical relations are primarily determined by word order rather than morphological inflection, as the language lacks markings for case, gender, or number. While grammatical aspect (e.g., perfective) is indicated through particles, tense is typically inferred from the context or temporal adverbs rather than being grammatically marked. A quintessential feature of Longming Zhuang, like all Tai languages, is its use of a tone system, where pitch contours distinguish lexical meaning. The language is also topic-prominent, a common feature in many East and Southeast Asian languages. This allows discourse topics, such as time, place, or participants, to be fronted to a sentence-initial position for emphasis or to be omitted entirely if they are understood from the preceding context. Other notable syntactic and morphological features are found within its nominal and verbal structures. The language possesses a rich system of noun classifiers, which are required when a noun is quantified, and employs post-nominal modification, where adjectives follow the nouns they describe. Furthermore, Longming Zhuang makes extensive use of serial verb constructions, in which multiple verbs appear sequentially within a single clause without overt conjunctions to denote a complex, multi-component event. Example (1) illustrates post-nominal modification, with the adjective ɗeːŋ² (‘red’) following the noun θɯː⁵ (‘clothes’). (1) θɯː⁵ ɗeːŋ² clothes red ‘red clothes’ Example (2) is a clear instance of a serial verb construction. A chain of verbs—pɤj¹ (‘go’), θɤɰ⁵ (‘buy’), maː² (‘come’), cʰeːw⁵ (‘cook’), and hɤɰ⁵ (‘give’)—is concatenated to express a sequence of related actions performed by a single subject. This structure is a highly productive and characteristic feature of the language's grammar. (2) teː¹ pɤj¹ ɰaːŋ² θɤɰ⁵ pjaː¹ maː² cʰeːw⁵ hɤɰ⁵ noːŋ⁵-ɓaːw³. he go market buy fish come cook give younger.brother ‘He went to the market, bought some fish, came back, and cooked it for his younger brother.’ 1.4 Previous Studies on Comparative Expressions in Zhuang The syntax of comparative expressions in Zhuang has been the subject of several dialect-specific studies. These investigations reveal both common structural patterns and considerable lexical variation across dialect regions, but they consistently converge on a canonical pattern for comparatives of superiority. This pattern is comprised of four core functional components: the entity being evaluated (the Comparee), the quality or action under comparison (the Parameter), a dedicated comparative marker (CMK), and the benchmark for the comparison (the Standard). Syntactically, the Comparee and Standard are typically realized as Noun Phrases (NPs), while the Parameter is expressed as an Adjective Phrase (AP) or a Verb Phrase (VP). The relationship between these functional roles and their syntactic realizations is captured in the following schema: NP (Comparee) + (AdvP) + {AP / VP} (Parameter) + CMK + NP (Standard) While the syntactic frame is relatively stable, the lexical item used for the comparative marker varies considerably, often correlating with the primary Northern-Southern dialectal division. As summarized in Table 1, research has documented a range of markers.1 For instance, in Southern Zhuang dialects, markers such as kwaː and ka have been identified (Lu 2011; Yan 2018). In contrast, studies on Northern dialects have documented a wider array of markers, including kwaː, jiŋ, pi, poːi, and waːi (Liu 2020; Wei et al. 2011; Wei 2014). Table 1: Previous Studies on Comparative Expressions in Zhuang Dialects Study Dialect Comparative markers Dialectal Area Lu 2011 Daxin Zhuang kwaː, ɳaːŋ…kwaː Southern Zhuang Yan 2018 Zuozhou Zhuang ka Liu 2020 Lingyun Zhuang kwaː, jiŋ, pi Northern Zhuang Wei et al. 2011 Yanqi Zhuang kwaː, poːi Wei 2014 Xia’ao Zhuang waːi, pi The markers pi and poːi, in particular, are often considered to be borrowings from Chinese and have been predominantly documented in Northern Zhuang dialects, which have historically had more contact with Mandarin. Although these studies provide a foundational understanding of the topic, they also reveal a significant gap in the literature. Specifically, the presence and function of the marker pi in Southern dialects remain underexplored. Its full range of semantic and pragmatic functions, and how it contrasts with indigenous markers like kwaː, have not been systematically analyzed. This study aims to address this lacuna by providing a detailed investigation of comparative markers in Longming Zhuang, a Southern dialect. 1.5 Research Objectives and Methodology Building on the unresolved issues identified in the previous section, this study has two primary objectives: first, to provide a systematic description of comparative constructions in Longming Zhuang, and second, to analyze the functional and structural distinctions between its two primary markers, kwaː³ and piː⁵. The analysis is based on a data corpus collected through fieldwork with six native speakers of Longming Zhuang (three female, three male; b.1941–1986), who exhibited no significant internal dialectal variation. A dual methodology was employed, combining the collection of spontaneous utterances from semi-structured, topic-based conversations with targeted elicitation designed to test specific grammatical constraints. This approach was considered essential, as spontaneous speech confirms the naturalistic use of these constructions, while elicited data yield crucial evidence for underlying grammatical properties that infrequently surface in conversation. These two complementary data sources, taken together, form the empirical basis for the synchronic description presented in this study. 2 Comparative Structures in Longming Zhuang Longming Zhuang employs two primary strategies to express comparison. The first strategy relies on lexical or adverbial means, where the comparative meaning is inferred from the juxtaposition of clauses or the inherent semantics of a specific verb or adverb, without a dedicated comparative morpheme. The second, and more structurally explicit, strategy utilizes dedicated syntactic constructions built around one of two comparative markers: kwaː³ and piː⁵. This section provides an overview of both strategies, though the primary focus of this paper will be on the syntactic constructions, given the complex functional division between the two markers. 2.1 Lexical and Context-Dependent Comparatives In addition to the primary syntactic constructions built around the markers kwaː³ and piː⁵, Longming Zhuang utilizes several other strategies to express comparison. These methods do not employ a general comparative marker but instead rely on pragmatically inferred context, dedicated adverbs, or specific lexical items that carry an inherent comparative or equative meaning. First, a comparison can be made against a standard that is not explicitly stated but is pragmatically inferred from the shared situational context between speakers. In these cases, the degree of superiority is often reinforced by an adverbial modifier such as nok² (‘’) as in (3) or laːj¹ (‘more’) as in (4). (3) ʔan² taŋ³ ʔan² neː⁴ leː⁴ han¹ nok² maɰ³. ᴄʟ chair ᴄʟ this see appear more new   ‘This chair looks newer (than the other chairs).’ (4) ruːŋ² θɯː⁵ ruːŋ² neː⁴ peːŋ² laːj¹ cɤː³. ᴄʟ clothes ᴄʟ this expensive more sғᴘ ‘This clothing is more expensive (than the other items).’ Second, superlatives, which establish an entity as possessing a quality to the highest degree within a given set, are formed with the dedicated adverb cɤj⁵ ‘most’, as exemplified in (5). (5) pʰiŋ² cɤj⁵ ɗiːp³ kɤn2-keː³ lɤw⁴. Ping most devoted parent ꜱꜰᴘ ‘Ping is the most devoted (to her parents among her siblings).’ Finally, certain common comparative notions, particularly age and equality, are lexicalized. That is, they are expressed through specific verbs or dedicated compounds rather than a general-purpose comparative structure. For instance, relative age is expressed via specific predicates as in (6), while equality employs dedicated lexical items as shown in (7) and (8). (6) teː¹ hat⁴ noːŋ⁵/pɤj⁴ ŋoː⁴ (θoːŋ¹ pɤj¹). she do younger/older I (two years) ‘She is younger/older than me (by two years).’ (7) ŋoː⁴ caj² teː¹ kuːŋ⁴ pɤj¹.  I with she same age  ‘I am the same age as her.’ (8) ŋoː⁴ caj² teː¹ θuŋ¹ piː²-kaŋ¹. I with she tall same ‘I am as tall as her.’ As these examples illustrate, the expression of comparison in Longming Zhuang is not limited to the two primary syntactic markers. However, as these lexical and context-dependent strategies are relatively straightforward, the remainder of this paper will focus on the more complex functional division between the dedicated syntactic markers, kwaː³ and piː⁵. 2.2 Syntactic Comparative Constructions The more common and grammatically intricate method for expressing comparison in Longming Zhuang involves a dedicated syntactic frame that employs a specific comparative marker (CMK). As identified in the author's fieldwork, the dialect utilizes two distinct markers for this purpose: kwaː³ and piː⁵. These two markers are not interchangeable. As the subsequent analysis will demonstrate, they exhibit a clear functional specialization: kwaː³ serves as the default, general-purpose marker for expressing objective superiority, while piː⁵ is pragmatically restricted to contexts conveying a subjective, often negative, evaluation. The following sections will provide a detailed analysis of the distinct syntactic and semantic-pragmatic properties of each marker. 2.2.1 The kwaː³-type Comparative Construction The marker kwaː³ is the canonical means of expressing superiority in Longming Zhuang. Its semantic function is narrowly constrained: it is used exclusively for superiority comparisons (‘A is more X than B’) and cannot be used to express equality or inferiority. Syntactically, kwaː³ functions as a particle that follows the predicative element (the Parameter) and precedes the nominal representing the Standard of comparison. The canonical structure for affirmative clauses employing kwaː³ is represented in the following schema: NP (Comparee) + (AdvP) + {AP / VP} (Parameter) + CMK + NP (Standard) + (Measure Phrase) The prototypical instantiation of this schema is shown in (9), where the Parameter is a simple adjectival phrase (AP). (9) maːk³ kaːm¹ peː² kwaː³ kjuːj⁵. fruit orange expensive ᴄᴍᴋ banana ‘The orange is more expensive than the banana.’ The Parameter slot is not limited to adjectives but can be filled by a more complex Verb Phrase (VP), often one describing a capacity or performance. This is illustrated in (10), where the VP kin¹ law⁵ kʰeː⁵ (‘be strong at drinking’) functions as the basis for comparison. (10) ŋoː⁴ kin¹ law⁵ kʰeː⁵ kwaː³ teː¹. I drink alcohol strong ᴄᴍᴋ he ‘I have a greater capacity for alcohol than he does.’ The schema also accommodates an optional Measure Phrase (MP), which quantifies the degree of difference between the Comparee and the Standard. This phrase typically follows the Standard NP, as shown in (11). (11) noːŋ⁵-ɓaːw³ θuŋ¹ kwaː³ ceː¹ haː⁵ liː²-miː¹. younger.brother tall ᴄᴍᴋ older.sister five centimeter ‘My younger brother is taller than my older sister by five centimeters.’ Finally, core arguments of the construction—namely the Comparee and the Standard—can be elided when they are recoverable from the discourse context. In such elliptical contexts, as in the short answer provided in (12), kwaː³ may appear in a clause-final or near-final position, modifying the predicate directly. (12) A: Is this dish tastier than that one? B: ɗaj²-kin¹ kwaː³. delicious ᴄᴍᴋ   ‘(This one) is more delicious.’ 2.2.2 The piː⁵-type Comparative Construction In sharp contrast to the functionally specialized marker kwaː³, the marker piː⁵ is significantly more versatile, capable of expressing superiority, equality, and inferiority. Crucially, this functional versatility is mirrored by a structural bifurcation: its syntactic position relative to the predicate is not fixed. As the following analysis will show, piː⁵ precedes the predicate in superiority constructions but follows it in equality and inferiority constructions. This fundamental syntactic difference suggests that piː⁵ may not be a single grammatical morpheme, but rather two homophonous markers with distinct syntactic functions: a preposition-like element for superiority, and a particle for equality/inferiority. 2.2.2.1 Superiority Comparisons: piː⁵ as a Pre-predicate Element When used to express superiority, piː⁵ functions as a pre-predicate element, structurally akin to a preposition. It forms a constituent with the Standard of comparison, and this [piː⁵ + NP (Standard)] phrase precedes the predicative element (the Parameter). The schema for this construction is as follows. NP (Comparee) + (AdvP) + [CMK + NP (Standard)] + {AP / VP} (Parameter) + (MP) (13) cʰiŋ²-ciː² niː⁴ piː⁵ moː⁵-kɤn² rɤː³ laːj¹ lɤw⁴. grades you ᴄᴍᴋ other people bad much ꜱꜰᴘ  ’Your grades are much worse than those of others.’ (14) teː¹ piː⁵ ŋoː⁴ maː² θwaj¹ θoːŋ¹ ʔan² coːŋ¹-tʰaw². he ᴄᴍᴋ I come late two ᴄʟ hour  ’He came two hours later than me.’ While structurally possible, this construction is less frequent in neutral, objective statements of fact than the kwaː³ construction and is often pragmatically loaded with a negative evaluation of the Comparee. In summary, this pre-predicate piː⁵ behaves like a head that takes the Standard as its complement, forming a phrase that functions as an adjunct modifying the predicate. 2.2.2.2 Equative and Inferiority Comparisons: piː⁵ as a Post-predicate Particle For both equality and inferiority comparisons, the syntax of piː⁵ shifts dramatically. In these functions, it behaves as a post-predicate particle, occupying the same syntactic slot as kwaː³. The inferiority reading is derived by adding a negative marker before the predicate. The schema for the equative construction is: NP (Comparee) + {AP / VP} (Parameter) + CMK + NP (Standard). (15) raj³ kaj³ tʰoː⁵ ɗaj²-kin¹ piː⁵ raj³ nok² pʰjaː¹. [Equative] egg chicken local delicious ᴄᴍᴋ egg bird mountain ‘The local chicken eggs are as delicious as the mountain bird eggs.’ (16) ŋoː⁴ mɤj² ŋɤn² laːj¹ piː⁵ teː¹ jaː³ lɤw⁴. [Equative] I have money much ᴄᴍᴋ he perfective ꜱꜰᴘ ‘I have become as rich as him.’ The schema for the inferiority construction is: NP (Comparee) + NEG + {AP / VP} (Parameter) + CMK + NP (Standard) The inferiority construction, shown in (17) and (18), is formed by negating the predicate within this same equative frame.2 This structure expresses that the Comparee does not possess the relevant quality to the same degree as the Standard. (17) ŋoː⁴ poː⁴ mɤj² ŋɤn² laːj¹ piː⁵ teː¹ neː¹. [Inferiority]   I ɴᴇɢ have money much ᴄᴍᴋ he ꜱꜰᴘ   ‘I have less money than him.’ (Lit. ‘I don’t have as much money as him.’) (18) ɓaj³ mɤw¹ mɤj² peːŋ² piː⁵ ɓaj³ moː² (kiː⁵-laːj1) neː¹. [Inferiority] meat pig ɴᴇɢ expensive ᴄᴍᴋ meat cow how.much ꜱꜰᴘ ‘Pork is not as expensive as beef (by much).’ Unlike the superiority construction with piː⁵, the Comparee (subject) in these constructions can be omitted in conversational contexts. In summary, the post-predicate piː⁵ functions as a particle that links the predicate to the Standard, a role syntactically parallel to that of kwaː³. 2.2.2.3 Lexicalized Expressions Finally, piː⁵ appears in several fixed, lexicalized expressions where it has fused with a verb to form a compound. (19) niː⁴ piː⁵-ɗaj⁵ teː¹ ʔaː¹? you ᴄᴍᴋ-able him Q ‘Can you even compare to him?’ (20) niː⁴ piː⁵-mɤj²-ɗaj⁵ kɤn². you ᴄᴍᴋ-ɴᴇɢ-able people ‘You’re no match for others.’ These expressions demonstrate a further stage of grammaticalization, where the morpheme has become part of the lexicon. 2.2.3 Summary: An Asymmetric System and the Homophone Hypothesis The preceding analyses reveal that kwaː³ and piː⁵ do not form a simple pair of alternatives but constitute a highly asymmetric system. While kwaː³ is a functionally dedicated and syntactically fixed particle, piː⁵ exhibits a complex pattern of behavior that challenges the notion of it being a single, unified morpheme. Their core properties are directly contrasted in Table 2. Table 2: Core Feature Comparison of kwaː³ and piː⁵ Feature kwaː³ piː⁵ Semantic Function Superiority only Superiority, Equality, & Inferiority Syntactic Position Post-predicate (particle) Split: Pre-predicate (Superiority) / Post-predicate (Equality & Inferiority) Pragmatic Nuance Neutral, objective Subjective, negative (in superiority contexts) As the table highlights, the markers diverge on every analytical level. The most striking divergence is the syntactic bifurcation of piː⁵: it occupies a pre-predicate position for superiority but a post-predicate position for equality and inferiority. A single morpheme having two distinct and mutually exclusive syntactic frames based on semantic function is highly unusual. This structural split, which correlates perfectly with a clear division in functional and pragmatic labor, provides compelling evidence for the central hypothesis of this analysis: the phonological form piː⁵ represents two distinct, homophonous grammatical morphemes. • piː⁵₁ functions as a pre-predicate element, likely a preposition, and is used for pragmatically specialized superiority comparisons. • piː⁵₂ functions as a post-predicate particle, syntactically analogous to kwaː³, and is used for neutral equality comparisons as well as inferiority comparisons when negated. From a synchronic perspective, this ‘two-morpheme’ hypothesis offers a more parsimonious and structurally coherent explanation of the observed data than an analysis assuming a single morpheme with multiple contradictory grammatical properties. Recognizing this functional and structural division is therefore fundamental to understanding the comparative system of Longming Zhuang. 3 Negation and Interrogation in Comparison Having established the basic affirmative structures of kwaː³ and piː⁵ constructions, this section examines their behavior in more complex syntactic environments: negation and interrogation. The distinct ways in which these markers interact with negative and interrogative operators provide crucial diagnostic evidence that corroborates the structural analysis proposed in section 2.2.3. 3.1 Negated Comparative Constructions The placement of the negative marker in a comparative construction is a powerful tool for diagnosing its underlying syntactic structure. In Longming Zhuang, kwaː³ and piː⁵ constructions exhibit starkly different and mutually exclusive negation patterns, reinforcing the hypothesis that they belong to different grammatical categories. 3.1.1 Negating the kwaː³ Construction To negate a superiority comparison with kwaː³, the negator (mɤj² or mɤj²-cɤɰ⁴) is placed immediately before the predicative element (the Parameter). It negates the property being ascribed, not the comparison itself. The schema for the negated kwaː³ construction is as shown here. NP (Comparee) + NEG + {AP / VP} (Parameter) + CMK + NP (Standard) + (MP). (21) ŋoː⁴ mɤj²/mɤj²-cɤɰ⁴ θuŋ¹ kwaː³ teː¹ (naːw³)3. I ɴᴇɢ/ɴᴇɢ-be tall ᴄᴍᴋ he (ɴᴇɢ.ꜱꜰᴘ) ‘I am not taller than him.’ The choice between the two negators is constrained by the presence of a Measure Phrase (MP). Without an MP, both mɤj² and mɤj²-cɤɰ⁴ are possible, as in (21). However, when an MP is present, only the emphatic negator mɤj²-cɤɰ⁴ is permissible. (22) ŋoː⁴ mɤj²-cɤɰ⁴ θuŋ¹ kwaː³ teː¹ haː⁵ liː²-miː¹ (naːw³). I ɴᴇɢ-be tall ᴄᴍᴋ he five centimeter (ɴᴇɢ.ꜱꜰᴘ) ‘I am not taller than him by five centimeters (e.g., but by three).’ This negation pattern, where the negator targets the predicate directly, is consistent with the analysis of kwaː³ as a post-predicate particle. 3.1.2 Negating the piː⁵ Superiority Construction In stark contrast, negating a superiority comparison with piː⁵ requires placing the negator immediately before the marker itself. Since the negator targets the comparative relation rather than the predicate, the schema for the negated piː⁵ superiority construction, as exemplified in (23) and (24), is structured as follows. NP (Comparee) + NEG + [CMK + NP (Standard)] + {AP / VP} (Parameter) + (MP) (23) ŋoː⁴ mɤj²-cɤɰ⁴ piː⁵ teː¹ θuŋ¹ (naːw¹). I ɴᴇɢ-be ᴄᴍᴋ he tall (ɴᴇɢ.ꜱꜰᴘ) ‘I am not taller than him.’ (24) koː¹ mɤj²-cɤɰ⁴ piː⁵ ŋoː⁴ θuŋ¹ θam¹ liː²-miː¹ (naːw¹). older brother ɴᴇɢ-be ᴄᴍᴋ I tall three centimeter (ɴᴇɢ.ꜱꜰᴘ) ‘My older brother is not taller than me by three centimeters.’ In these constructions, only the negator mɤj²-cɤɰ⁴ is used, regardless of whether a Measure Phrase is present. Crucially, the standard verbal negator mɤj² is ungrammatical (*mɤj² piː⁵). This rigid placement and the specific choice of negator provide the strongest evidence yet for the prepositional status of piː⁵. To illustrate this, we can compare the negation of piː⁵ with that of the locative marker jɤw³ ‘at’. Since jɤw³ can function as both a main verb (‘to be at’) and a preposition-like coverb, it allows for two types of negation. When functioning as a verbal predicate, it takes the standard verbal negator mɤj² (25). However, when the locative phrase is negated as a constituent, similar to ‘not at home’ in English, the form mɤj²-cɤɰ⁴ is used (26). (25) teː¹ mɤj² jɤw³ rɯːn². he ɴᴇɢ at home ‘He is not at home.’ (Verbal negation) (26) teː¹ mɤj²-cɤɰ⁴ jɤw³ rɯːn² (naːw³). he ɴᴇɢ-be at home (ɴᴇɢ.ꜱꜰᴘ) ‘He is not [at home].’ / ‘It is not at home that he is.’ (Constituent negation) The syntax of piː⁵ aligns exclusively with the pattern in (26). Unlike jɤw³, piː⁵ cannot be negated by mɤj². This demonstrates that it lacks the verbal properties exhibited in (25). Instead, piː⁵ functions structurally as a prepositional constituent that requires the constituent-negator mɤj²-cɤɰ⁴. Furthermore, the negator strictly targets the comparative marker and cannot be placed before the predicate. As shown in the ungrammatical example (27), placing the negator before the adjective results in ill-formedness. This confirms that the syntactic relationship between piː⁵ and the predicate is fundamentally different from that of a serial verb construction or clausal embedding. (27) *ŋoː⁴ piː⁵ teː¹ mɤj²-cɤɰ⁴ θuŋ¹. I ᴄᴍᴋ he ɴᴇɢ-be tall Intended meaning: ‘I am not taller than him.’ 3.2 Interrogative Comparative Constructions The formation of interrogatives serves as another crucial diagnostic for determining the underlying structure of a construction. The ability of core components—such as the Standard or the Measure Phrase—to be questioned, and the placement of question particles, reveals the constituency and integrity of the comparative phrase. Both the kwaː³ and piː⁵ constructions can be systematically questioned, but the patterns observed, particularly in yes/no questions involving inferiority, offer further support for the distinct analyses proposed. 3.2.1 Interrogatives with the kwaː³ Construction The kwaː³ construction forms both yes/no and wh-questions in a straightforward and predictable manner, consistent with its analysis as a simple post-predicate structure. Yes/no questions are formed by appending one of the sentence-final interrogative particles, mɤj² or ʔaː¹, to the end of the declarative clause. (28) niː⁴ θuŋ¹ kwaː³ teː¹ mɤj²/ʔaː¹?        you tall ᴄᴍᴋ he ɪᴍᴋ         ‘Are you taller than him?’ Wh-questions are formed by substituting a core component of the construction with a wh-phrase. The Standard can be questioned with kaː⁴-raɰ⁴ (‘who’), and the Measure Phrase with kiː⁵-laːj1 (‘how much’). (29) niː⁴ θuŋ¹ kwaː³ kaː⁴-raɰ⁴?        you tall ᴄᴍᴋ who        ‘Who are you taller than?’ (30) niː⁴ θuŋ¹ kwaː³ teː¹ kiː⁵-laːj1?        you tall ᴄᴍᴋ he how.much ‘How much taller are you than him?’ The ease with which these questions are formed confirms that the Standard and the Measure Phrase are distinct constituents that are accessible to syntactic operations like questioning. 3.2.2 Interrogatives with the piː⁵ Constructions The piː⁵ constructions are also fully compatible with interrogative formation, and the patterns observed align with the proposed structural division between the pre-predicate superiority construction and the post-predicate equality/inferiority constructions. Yes/no questions across all three functions (superiority, equality, inferiority) are formed by adding a sentence-final interrogative particle. (31) ɓaj³ mɤw¹ piː⁵ ɓaj³ moː² peːŋ² mɤj²/ʔaː¹? [Superiority]        meat pig ᴄᴍᴋ meat cow expensive ɪᴍᴋ        ‘Is pork more expensive than beef?’ (32) ɓaj³ mɤw¹ ɗaj²-kin¹ piː⁵ ɓaj³ moː² mɤj²/ʔaː¹? [Equality]        meat pig delicious ᴄᴍᴋ meat cow ɪᴍᴋ        ‘Is pork as tasty as beef?’ (33) ɓaj³ mɤw¹ mɤj² peːŋ² piː⁵ ɓaj³ moː² ʔaː¹? [Inferiority] meat pig ɴᴇɢ expensive ᴄᴍᴋ meat cow ɪᴍᴋ ‘Is pork less expensive than beef?’ A notable morphosyntactic constraint appears in the inferiority question (33): because the internal negator mɤj² is already present, the interrogative particle mɤj² is blocked, and only ʔaː¹ can be used. This avoidance of repetition provides further evidence that the inferiority construction is indeed a negated form of the equative. Wh-questions targeting the Standard (kaː⁴-raŋ¹ ‘what’) and the Measure Phrase (kiː⁵-laːj1‘how much’) are also possible with the superiority construction, paralleling the kwaː³ pattern. (34) ɓaj³ mɤw¹ piː⁵ (ɓaj³) kaː⁴-raŋ¹ peːŋ²?        meat pig ᴄᴍᴋ (meat) what expensive       ‘Pork is more expensive than what (meat)?’ (35) ɓaj³ mɤw¹ piː⁵ ɓaj³ moː² peːŋ² kiː⁵-laːj1?        meat pig ᴄᴍᴋ meat cow expensive how.much        ‘How much more expensive is pork than beef?’ 3.3 Summary of Evidence from Negation and Interrogation The behavior of kwaː³ and piː⁵ in sentences with negation and interrogation provides powerful converging evidence for the structural analyses proposed. The placement of the negator—before the predicate for kwaː³ versus before the marker for piː⁵₁—offers the most conclusive proof of their distinct categorical statuses (particle vs. preposition). Furthermore, the formation of interrogatives demonstrates that both constructions are productive and that their core components are accessible syntactic constituents. The constraint observed in negated interrogatives with piː⁵₂ further solidifies its analysis as a particle operating within a negated equative frame. In sum, these complex syntactic environments do not challenge, but rather strongly reinforce, the fundamental asymmetry of the comparative system in Longming Zhuang. 4 Discussion The analysis presented in the preceding sections reveals that the comparative markers kwaː³ and piː⁵ in Longming Zhuang form a highly asymmetric system. This section discusses the theoretical and dialectological implications of these findings, focusing on three key areas: the functional division of labor, the structural status of the markers, and the typological significance of piː⁵. First, the data confirms a clear functional division of labor that extends into the pragmatic domain. Kwaː³ is the default, pragmatically neutral marker for objective superiority. The crucial finding of this study is that piː⁵, in its superiority function, is a pragmatically specialized tool employed to frame a comparison with a subjective, often negative, authorial stance. This reveals a sophisticated system where the choice of a grammatical construction is governed not only by semantic content but also by the speaker's attitude. Second, the structural evidence strongly supports the synchronic analysis of piː⁵ as two functionally and syntactically distinct morphemes: a pre-predicate element for superiority (piː⁵₁) and a post-predicate particle for equality/inferiority (piː⁵₂). The distinct negation patterns provide the most compelling evidence for this structural division. This synchronic reality, however, raises a deeper diachronic and theoretical question: does this split represent a case of true homophony, where two unrelated forms converged, or an extreme instance of polysemy, where a single historical morpheme has diverged into two incompatible grammatical functions? While their shared phonological form suggests a common origin (polysemy), their complete lack of syntactic interchangeability makes them function as distinct lexemes (homophones) in contemporary grammar. Thus, while the ultimate diachronic pathway remains a topic for future investigation, the synchronic facts of Longming Zhuang are unambiguous. For an accurate grammatical description of the language as it is spoken today, treating them as two separate morphemes provides the clearest and most empirically adequate account. Finally, the productive use of piː⁵ in Longming Zhuang is typologically and dialectologically significant. This marker, often considered a loanword from Chinese and predominantly associated with Northern Zhuang, is shown here to be fully integrated into a Southern dialect. This finding challenges the traditional view of its geographical distribution and suggests a more complex history of diffusion, language contact, and internal development within the Zhuang language family. 5 Conclusion This study has provided the first detailed analysis of the syntactic comparative constructions in Longming Zhuang. The primary contribution of this paper is the demonstration that the two main markers, kwaː³ and piː⁵, constitute two fundamentally different grammatical subsystems. It has shown that kwaː³ is a dedicated post-predicate particle for objective superiority, while the phonological form piː⁵ operates within two distinct and mutually exclusive syntactic frames: a pre-predicate structure for subjective superiority and a post-predicate structure for equality and inferiority. This structural dichotomy, supported by converging evidence from syntactic position and negation patterns, offers a coherent explanation for the complex data observed. While this research has illuminated the core system of syntactic comparison, it also opens several important avenues for future investigation. A comprehensive study of the lexical and context-dependent comparatives is needed to achieve a complete understanding of the language's full comparative system. Furthermore, the origin and dialectal variation of piː⁵, including its diachronic relationship to its two distinct syntactic frames, requires further research. Finally, the rich socio-pragmatic functions of the pre-predicate piː⁵ construction warrant a more detailed discourse-based analysis. These unresolved issues will be the focus of future research. Acknowledgements This study was funded by the Japan Grant-in-Aid for Young Scientists, ‘Descriptive Study of the Zhuang Dialects as a Language in Danger of Extinction’ (task number: JP22K13122). I would like to express my sincere gratitude to the Zhuang people who cooperated in this research. I am also deeply indebted to Professor MINEGISHI Makoto, Professor Emeritus of Tokyo University of Foreign Studies, for his invaluable guidance. Finally, I wish to thank my friend Dr. Salvatore Carlino for his insightful advice. References Huang, Haiping 黄海萍. 2018a. チワン語龍茗方言の音韻体系 chiwann go ryūmei hōgen no on’in taikei [Phonological system of the Longming Zhuang dialect]. 言語社会 gengoshakai 12:343–366. Tokyo: The Graduate School of Language and Society, Hitotsubashi University. Huang, Haiping 黄海萍. 2018b. チワン語龍茗方言研究 chiwann go ryūmei hōgen kenkyū [The study of the Longming Zhuang dialect]. Doctoral dissertation, Hitotsubashi University, Tokyo. Huang, Haiping 黄海萍. 2021. チワン語の情報構造について chiwann go no jōhō kōzō ni tsuite [Information structure in Zhuang]. 言語の類型的特徴対照研究会論集 gengo no ruikeiteki tokuchō taishō kenkyūkai ronshū 4:119–138. Osaka: 日中言語文化出版社 Nicchū Gengo Bunka Shuppansha. Huang, Haiping 黄海萍. 2022a. チワン語の動詞連続:運動・移動表現を中心に chiwann go no dōshi renzoku: undō idō hyōgen o chūshin ni [Serial verb constructions in Zhuang: Focus on motion expressions]. 言語社会 gengoshakai 16:305–333. Tokyo: The Graduate School of Language and Society, Hitotsubashi University. Huang, Haiping 黄海萍. 2022b. チワン語の時の表現 chiwann go no toki no hyōgen [Temporal expressions in Zhuang]. 言語の類型的特徴対照研究会論集 gengo no ruikeiteki tokuchō taishō kenkyūkai ronshū 5:65–90. Osaka: 日中言語文化出版社 Nicchū Gengo Bunka Shuppansha. Huang, Haiping 黄海萍. 2024. チワン語の否定表現 chiwann go no hitei hyōgen [Negative expressions in Zhuang]. 言語の類型的特徴対照研究会論集 gengo no ruikeiteki tokuchō taishō kenkyūkai ronshū 6:129–159. Osaka: 日中言語文化出版社 Nicchū Gengo Bunka Shuppansha. Li, Fang-Kuei. 1977. A Handbook of Comparative Tai. Honolulu: University of Hawai‘i Press. Liu, Lifeng 刘立峰. 2020. 凌云壮语参考语法 lingyun zhuangyu cankao yufa [A reference grammar of Lingyun Zhuang]. Doctoral dissertation, Shanghai Normal University, Shanghai. Lu, Yelin 卢业林. 2011. 大新壮语语法调查与研究 daxin zhuangyu yufa diaocha yu yanjiu [Investigation and research on Daxin Zhuang grammar]. Master’s thesis, Guangxi University, Nanning. People’s Government of Tiandeng County (天等县人民政府). 2025a. Tiandeng County Overview (天等县概况). Official Portal of the People’s Government of Tiandeng County, Chongzuo, Guangxi. https://urldefense.com/v3/__http://www.tiandeng.gov.cn/zjtd/tdgk/t2645488.shtml__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrqV2qhkC$ (Accessed 15 November 2025). People’s Government of Tiandeng County (天等县人民政府). 2025b. Longming Town Introduction (龙茗镇简介). Official Portal of the People’s Government of Tiandeng County, Chongzuo, Guangxi. https://urldefense.com/v3/__http://www.tiandeng.gov.cn/zjtd/xzjs/t26094253.shtml__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrqob6BU4$ (Accessed 15 November 2025). Wang, Jun 王均, et al. 1984. 壮侗语族语言简志 zhuang dong yuzu yuyan jianzhi [A brief history of the Zhuang and Dong languages]. Beijing: Publishing House of Minority Nationalities. Wei, Jingyun 韦景云, He, Shuang 何霜, and Luo, Yongxian 罗永现. 2011. 燕齐壮语参考语法 yanqi zhuangyu cankao yufa [A reference grammar of Yanqi Zhuang]. Beijing: China Social Sciences Press. Wei, Maofan 韦茂繁. 2014. 下坳壮语参考语法 xia’ao zhuangyu cankao yufa [A reference grammar of Xia’ao Zhuang]. Nanning: Guangxi Normal University Press. Yan, Shu 晏姝. 2018. 崇左左州壮语参考语法 chongzuo zuozhou zhuangyu cankao yufa [A reference grammar of Chongzuo Zuozhou Zhuang]. Master’s thesis, Guangxi University, Nanning. THE VERB ‘SEE’ IN HLAI Hui-chi LEE National Cheng Kung University hclee6@mail.ncku.edu.tw Abstract This paper examines the semantic and grammatical extensions of the Hlai verb of visual perception laai55. Hlai, spoken on Hainan Island in China, belongs to the Kra–Dai language family rather than the Sinitic branch (Ostapirat 2004, 2008). Based on field data, laai55 exhibits four major semantic functions: faculty, meet, evaluate, and appear. In addition, it displays two highly atypical functions cross-linguistically, namely its use as a conditional conjunction and as an otherwise marker. The direct grammaticalization of a verb meaning ‘see’ into a conditional conjunction ‘if’ appears to be unique to Hlai and has no parallel in Mandarin or Hainan Min. This paper argues that the conditional function most plausibly developed from a recurrent see–if sequence, while the otherwise meaning reflects a shift toward conditional mood combined with a negative element. These two developments reveal distinct semantic–syntactic pathways converging on conditionality and contribute to typological discussions on perception verbs and grammaticalization. Keywords: Hlai, visual perception verbs, semantic extension, conditional markers ISO 639-3 codes: eng, deu, ell, spa I Introduction Hlai is a language of the Kra-Dai language family (Ostapirat 2004, 2008, Ouyang and Zheng 1983, Yuan 1994). The Hlai language is spoken exclusively on Hainan Island. The data discussed in the current study is collected from my fieldwork on Hainan Island1. The present study takes the Hlai verb of visual perception, laai55, as the point of departure for an inquiry into the class of perceptual verbs in Hlai. This paper presents a semantic examination of the multiple senses associated with the Hlai visual perception verb. The core sections of the paper undertake an investigation of the polysemous behaviors of the verb of visual perception. In Lien (2005), verbs of visual perception in Taiwanese Southern Min are shown to exhibit a number of semantic interpretations: faculty, spectate, guard/tend, read, meet, visit, purpose, consult, appear, condition, recognize, determine, classification, judging, face, and tentative marker. There are three verbs with the meaning ‘to see’ in Taiwan Southern Min: khoann3, kinn3, and siong3. These three verbs each have distinct meanings, but they also share some overlapping senses. In Hainan, a related variety of southern Min, namely, Hainan Min, is spoken in proximity to Hlai. Due to prolonged language contact between Hlai and Hainan Min, certain linguistic similarities may have emerged. Since verbs of seeing have not been previously discussed in the literature of Hlai, this paper uses the semantics of seeing verbs in Taiwanese Southern Min as a reference to analyze the meanings of seeing verb in Hlai. In the following discussion, I examine how the verb laai55 extends its meaning within the domain of visual perception. 2 Meanings of laai55 We begin by investigating how the realm of visual perception interfaces with various non-perceptual frames, including faculty, meet, evaluate and appear. 2.1 Faculty In the see_faculty frame, what is foregrounded is the inherent perceptual capacity encoded by the lexeme, rather than the volitional act of gaze direction. Moreover, the frame does not require an explicitly specified object of perception. Instead, a generic referent is presupposed and left unexpressed in surface form. In (1), the generic referent is ‘thing’. In (2), the verb of seeing, laai55, may occur without an overt object. (1) Muut55 plom51 tshuung55-tsha33 vei11 laai55 mau51. hat cover hole-eye NEG see thing ‘The hat covers my eyes, so I can’t see anything.’ (2) Ha55aau33 haɯ51 tshuung55-tsha33 pɯ33tsun51 laai55 guaai33 e33, person that hole-eye previously see EXP PERF, phɯɯn33-ni51 tshok55, ka51 laai55 e33. generation-this sick cannot see PERF ‘That person once had normal eyesight, but due to illness he can no longer see.’ 2.2 Meet Cao and Su (1999: 249) state that ‘meet’ is a semantic extension derived from the more basic sense of ‘see’. In many languages, the verb meaning ‘see’ tends to undergo semantic extension to cover the sense of ‘meet.’ Hlai is no exception, as its verb for ‘see’ has also developed the meaning of ‘meet,’ as in (3) and (4). (3) Pha55 thom33 kuun33 laai55 a55ra121, phɯɯn33-ni51 zaan11 vei11 paaɯ11. father half road see who now stil NEG return ‘Father met someone on his way and has not returned yet.’ (4) Mɯɯ33 ɗuuh55 laai55 ha55-aau33 he33 vei11? 2SG have see people what NEG ‘Do you have anyone to meet?’ 2.3 Evaluate/Judge The verb ‘see’ can develop an evaluative meaning through a process of semantic extension grounded in embodied cognition and metaphorical mapping. In Chinese and many other languages, visual perception is closely linked to judgment and evaluation. What we ‘see’ often forms the basis of what we ‘think’ or ‘believe.’ In English, we say ‘I see what you mean,’ meaning ‘I think I understand what you mean.’ The semantic scope of the verb ‘see’ in Hlai extends to express the speaker’s subjective interpretation and epistemic assessment of events. For example, sentence (5) conveys a subjective evaluation of the individual’s condition, specifically construing him as not pitiable. (5) Hou33 laai55 na33 vei11 ɯ55-gen11. 1SG see 3SG NEG pitiful ‘I do not regard him as pitiable.’ Cross-linguistically, verbs of perception such as ‘see’ frequently undergo semantic extension to express the speaker’s stance or evaluative judgment toward events, a development that is also attested in Hlai. 2.4 Appear In Mandarin, the verb see commonly undergoes semantic extension to convey the meaning of ‘appear’ (e.g., bu-jian ‘not-see’), but this reading is licensed exclusively in negative contexts. In Hlai, the verb denoting see likewise allows an interpretation corresponding to ‘appear’. However, this reading is restricted to negative environments, as in (6) and (7). (6) Ɗi55-liih55 vei11 laai55 e33. DIM-kid NEG see PERF ‘The child disappeared.’ (7) Ɯ55 lang33 ɬa33-tsaɯ55 nun55 ɬut55 pleeh55 vei11 laai55 e33. one CL fish-grandmother creep enter soil NEG see PERF ‘A loach burrowed into the soil and disappeared.’ Analysis of the above data confirms that the Hlai verb for laai55 ‘see’ encompasses the following semantic frames: faculty, meet, evaluate, and appear. In contrast to Southern Min (as per Lien 2005 noted above), based on my fieldwork, the Hlai verb meaning ‘see’ lacks semantic extensions into frames such as spectate, guard, recognize, read, purpose, consult, condition, determine, classification, judging, face and tentative marking. 3 Semantic Changes 3.1 Conditional conjunction if Hlai laai55 ‘see’ exhibits a distinctive usage that is relatively rare cross-linguistically: it can carry a hypothetical meaning and function as a conditional connective. When employed in hypothetical constructions, it exhibits both semantic and syntactic properties comparable to those of the English conditional marker if, as in (8) and (9). (8) Laai55 mɯɯ33 ɗuuh55 me55he33 thun33, ɓe55 hou33. if 2SG have what thing call 1SG ‘Call me if anything happens to you.’ (9) Laai55 vei11 fun33, pok55 vei11 uut55-kou33. if NEG rain millet NEG sprout-mushroom ‘If it hadn’t rained, the grain wouldn’t have gotten moldy.’ Similar to English if, the word laai55 may appear in sentence-initial position. Nevertheless, in contrast to English if, it can also occur immediately after the subject, as in (10). This positional flexibility highlights a typologically unusual property of Hlai conditional marking. (10) Mɯɯ33 laai55 rien11 fei33 plan51 kuun33 2SG see say walk wrong road mɯɯ33 ka51 ɗaan55 plong55 na33. 2SG cannot arrive house 3SG ‘If you take the wrong path, you won’t be able to get to his house.’ In Sinitic languages such as Mandarin, Cantonese, and Southern Min, the conditional marker ‘if’ may occur either in a sentence-initial position or in a post-subject position. This syntactic distribution is consistent with that of the conditional marker in Hlai. The positional flexibility mirrors the syntactic distribution of the corresponding conditional marker in Hlai. The syntactic distribution of ‘if’ in Hlai and Sinitic languages clearly differs from that found in languages such as English. 3.2 Conditional adverb otherwise An additional typologically remarkable property of Hlai lies in the fact that the verb of visual perception, when employed in hypothetical constructions in combination with a negator vei11, gives rise to the semantic interpretation of otherwise, as in (11) and (12). (11) Mɯɯ33 kham55 roong121 tha51, laai55-vei11 mɯɯ33 vei11 ɗuuh55 tha51 lau51. 2SG must cook rice otherwise 2SG NEG have rice eat ‘You need to cook. Otherwise, you won’t have anything to eat.’ (12) Khong33mo51 mɯɯ33 lau51 ɓaai55 ɓo11, laai55-vei11 hou33 vei11 laai55. thing 2SG eat finish Q otherwise 1SG NEG see ‘Did you finish all the food? Otherwise, why didn’t I see any left?’ Such a conditional connective function is likewise attested in English, in which otherwise commonly exhibits distributional and semantic similarities to if not. The see–not sequence observed in the Hlai otherwise construction is not a typical word order in Hlai. Lee (2023) states that the negator in Hlai typically precedes the verb and does not occur in the post-verbal position, as in (13). (13) a. Na33 oh33 ngaau51. 3SG drink wine ‘He drinks.’ b. Na33 vei11 oh33 ngaau51. 3SG NEG drink wine ‘He does not drink. c. *Na33 oh33 vei11 ngaau51. 3SG drink NEG wine ‘He does not drink. d. *Na33 oh33 ngaau51 vei11. 3SG drink wine NEG ‘He does not drink. Following Lee (2023), the verb-negator sequence is not attested as a natural pattern in Hlai. In addition, the see-not sequence is not a natural pattern in Chinese either. Lee (2020) observes that in Hainan Min, a Chinese dialect spoken on Hainan Island, the negator consistently occurs before the element it modifies, as in (14). (14) a. Mue22 bo22 tsiah33 ɗit55 liau21. rice NEG eat get finished ‘The rice cannot be finished’. b. *Mue22 tsiah33 bo22 ɗit55 liau21. rice eat NEG get finished ‘The rice cannot be finished’. c. *Mue22 tsiah33 ɗit55 bo22 liau21. rice eat get NEG finished ‘The rice cannot be finished’. Accordingly, this see-not sequence represents a distinctive lexical item in Hlai, dedicated to encoding the meaning of ‘otherwise,’ and shows no evidence of influence from the neighboring Chinese dialects. The development of the see–not sequence into an otherwise meaning should be regarded as a language‑internal lexical change in Hlai, rather than as a change involving grammatical rules. 4 From SEE to IF Given the foregoing discussion, the verb-negator sequence appears to be specific to Hlai, and so too the employment of the verb laai55 ‘see’ as a conditional connective, which represents another language-specific innovation. In this section, I explore how the Hlai visual perception verb laai55 ‘see’ may have developed into a conditional connective. 4.1 See if To my knowledge, no other language employs the verb see as a conditional complementizer equivalent to if. The development of the verb see into a conditional marker such as if has not been reported in the literature. However, the co-occurrence of see with if constructions is comparatively common across languages. To illustrate this phenomenon, examples from English, German, Greek, and Spanish are presented below. English The English language inherently exhibits a subtle nuance of conditionality. For example, in the expression (15), the verb see implicitly conveys a sense of hypothesis or contingency, functioning in a manner semantically proximate to if. (15) Let’s see if it works. Although the grammatical subject ‘we/us’ is formally construed as the experiencer of the verb ‘see,’ the utterance as a whole functions pragmatically as a conditional statement. German In German, the verb sehen means ‘to see’, while conditionality is typically expressed through conjunctions such as wenn ‘if’. However, in colloquial usage, constructions expressing tentative observation can occur, such as (16). (16) Mal sehen, ob es morgen regnet. let’s see if it tomorrow rain ‘Let’s see if it rains tomorrow.’ In this example, mal sehen functions as ‘let’s see,’ while ob introduces an indirect question, equivalent to English if. German employs a combination of ‘seeing’ and a conditional particle ‘ob’, but sehen itself does not grammaticalize into a conditional marker. Moreover, the verb see and the conditional conjunction if can be divided by a pause or a comma. Greek In Greek, the verb vlépo means ‘see’, while conditionality is typically expressed using the particle an ‘if’. The see verb vlépo will not directly express conditionality; however, it often co-occurs with the conditional particle an ‘if’, as in (17). (17) Gia na doúme an tha vréxei ávrio. for SUBJUN we.see if FUT it.rains tomorrow ‘Let’s see if it will rain tomorrow.’ In Greek, the conditional meaning requires the explicit use of an ‘if’; the verb see merely contributes an evidential or exploratory nuance rather than fulfilling a grammatical conditional role. Spanish Among the languages considered, Spanish a ver si ‘to see if’ most closely exemplifies the direct use of a perception verb, ver ‘see’, to express a conditional meaning akin to ‘if’. In Spanish, the verb ver ‘see’ frequently appears in the colloquial construction a ver si... ‘to see if’, which pragmatically functions to introduce conditional or hypothetical propositions, approximating the semantics of ‘if’ in English. (18) A ver si llueve mañana. to see if it.rains tomorrow ‘Let’s see if it rains tomorrow.’ (19) A ver si llegamos a tiempo al cine. to see if we.arrive to on.time to.the cinema ‘Let’s see if we can arrive at the cinema on time.’ (20) A ver si me llama esta noche. to see if 1SG call this night ‘Let’s see if he calls me tonight.’ Examples (18) to (20) demonstrate that the sequence a ver si ‘to see if’ serves a conditional function. In this construction, a ver si functions almost equivalently to if in English and is perceived as entirely natural in colloquial Spanish. However, unlike Hlai, Spanish does not ordinarily use ver directly to convey conditional meanings. To sum up, the languages, English, German, Greek and Spanish, exhibit a related tendency in expressions such as let’s see if..., where see carries a subtle conditional or hypothetical nuance. On the other hand, Hlai employs the verb of seeing laai55 directly to convey a sense of conditional stance. Cross-linguistic comparison suggests that the development from a verb of seeing to a conditional conjunction may represent a natural diachronic pathway. 4.2 From if to otherwise In English, otherwise is a conjunctive adverb, carrying a conditional mood. Because the nature of otherwise is essentially conditional or hypothetical, it often conveys the sense of ‘if not so’. Regarding the semantics of if not so, when the Hlai verb of visual perception grammaticalizes into a conditional marker if, the negator may co-occur with it, resulting in a lexical reanalysis in Hlai: the original sequence see–not is reinterpreted as if–not, which subsequently gives rise to an otherwise meaning. In Hlai, the meaning conveyed by ‘otherwise’ is analytically distributed across two elements, negation and conditional, whereas in English, otherwise represents a synthetic form that unites these functions in a single lexical item. English otherwise functions as a conjunctive adverb that often conveys a conditional or hypothetical meaning, typically introducing an implied condition rather than an overt one. Syntactically, ‘otherwise’ does not function as a conditional conjunction like if. Instead, it is analyzed as an adverb that can appear in clause-initial position, as in (21). (21) Otherwise, we would have left. The expression of laai55-vei11 ‘otherwise’ in Hlai has also developed into a conjunctive function. In this conjunctive function, laai55-vei11 ‘otherwise’ functions as a clause linker connecting two clauses, as in (22). (22) Mɯɯ33 vei11 meeh55 vei11 phoh31 thun33, laai55-vei11 pi15 gua51-heeng33 2SG NEG get NEG listen word otherwise mother angry tsiang33 tshok55. become sick ‘You must not be disobedient. Otherwise, Mom might get sick with anger.’ 5 Concluding remarks The present study has investigated the visual perception verb laai55 in Hlai. Previous research on Hlai syntax and semantics has been extremely limited in scope, and there has been no work devoted specifically to verbs of seeing. Lien (2005) explores three different verbs of visual perceptions in Taiwan Southern Min. He finds that these ‘seeing’ verbs are polysemous, and they can be associated with semantic categories such as faculty, read, visit, determine, judge, recognize, consult, comprehend and so on. Following Lien’s (2005) findings in Taiwan Southern Min, our study first examines the ‘seeing’ verb in Hlai with respect to its polysemy. In addition to functioning as a common verb, laai⁵⁵ has developed from the lexical verb ‘see’ into a conditional conjunction ‘if’ and even a functional adverb ‘otherwise’. This study further addresses the semantic development of the visual perception verb. The study demonstrates that the Hlai verb of visual perception, laai55, encodes six semantic functions: faculty, meet, appear, evaluate, conditional and otherwise. The first four meanings are expressed through verbal forms, whereas the latter two are no longer realized as verbs. All of the five meanings, except for ‘otherwise’ which obligatorily combines with negation, can be expressed independently by the Hlai verb of seeing. Most remarkable is the conditional interpretation, conveyed solely by the verb itself, a phenomenon that is exceedingly rare cross-linguistically. While in many languages the construction see if is widespread, Hlai exhibits a different pattern: the verb of visual perception alone expresses conditionality, without requiring an accompanying conditional conjunction. In Hlai, the verb ‘see’ has grammaticalized as a conditional subordinator, a pattern very unusual. These findings and analyses of Hlai contribute to a broader understanding of perception verbs across languages. References Cao, Xianzhuo, and Peicheng Su. (eds.) 1999. Hanzi xingyi fenxi zidian [An analytic morpho-semantic dictionary of Chinese characters]. Beijing: Beijing Daxue Chubanshe. Lee, Hui-chi. 2020. Negation of dynamic modals wit dit 得 in Hainan Min. In Lien, Chinfa & Alain Peyraube (eds.), Diachronic perspectives and synchronic variation in Southern Min, 76–101. New York: Routledge. Lee, Hui-chi. 2023. Negation in Hlai. Cahiers de Linguistique Asie Orientale 52:163–189. Lien, Chinfa. 2005. Verbs of visual perception in Taiwanese Southern Min: A Cognitive Approach to Shift of Semantic Domains. Language and Linguistics 6.1:109-132. Ostapirat, Weera. 2004. Proto-Hlai sound system and lexicons, in Lin, Ying-chin, Fang-min Hsu, Tsun-chih Lee, Jackson T.-S. Sun, Hsiu fang Yang and Dah-an Ho (eds.), Hanzangyu yanjiu: Gong Huangcheng xiansheng qishi shouqing lunwenji [Studies on Sino-Tibetan languages: Papers in honor of Professor Hwang-Cherng Gong on his seventieth birthday]. Taipei: Institute of Linguistics, Academia Sinica, 121–175. Ostapirat, Weera. 2008. The Hlai language. In Diller, Anthony V.N. & Edmondson, Jerold A. & Luo, Yongxian (eds.), The Tai-Kadai languages, 623–652. New York: Routledge. Ouyang, Jueya, and Yiqing Zheng. 1983. Liyu diaocha yanjiu [A survey and study of Hlai languages]. Beijing: Chinese Academy of Social Science. Yuan, Zhong-shu. 1994. Liyu yufa gangyao [Outline of grammar in Li]. Beijing: Zhongyang Minzu University Press. SINGLISH PRE-NOMINAL RELATIVE CLAUSES AND THE SYSTEMIC TRANSFER OF CHINESE PRE-MODIFIERS Wesley Mark LINCOLN Marlyse BAPTISTA University of Pennsylvania University of Pennsylvania wlincoln@sas.upenn.edu marlyse.baptista@sas.upenn.edu Abstract In this article, we show that Singlish (“Colloquial Singapore English”) nominals can take pre-modifiers such as pre-nominal relative clauses and prepositional phrases, which we argue to have been transferred from Mandarin pre-modifiers linked by de. Bao (2009) shows that phrase-final de was transferred to Singlish as one, claiming that pre-nominal de was not transferred due to an inviolable constraint. We argue that de in pre-nominal positions was in fact reanalyzed as a definite article the and is not relexified by one, emphasizing that substrate morphemes do not necessarily map one-to-one with a single output in a resulting contact variety. We propose a pivot-matching account à la Matras and Sakel (2007) without recourse to an ad hoc inviolable constraint, while preserving Bao’s (2009, 2012) insight that substrate transfer targets grammatical subsystems. Keywords: creoles, Singlish, English, Singapore, language contact ISO 639-3 codes: eng, cmn 1 Introduction Singlish, frequently termed Colloquial Singapore English by scholars, is an English-lexified contact variety spoken in Singapore. Most scholars agree that Singlish has predominantly Chinese languages as its substrates, principal among which are Hokkien, which was formerly the de facto lingua franca for the Singaporean Chinese community, and Mandarin, which has become the main variety spoken today owing to language policy efforts in the 1970s, although Mandarin education had been present since earlier in the 20th century (Bao 2020). Malay and Malay-lexified contact varieties like Baba Malay may have also had an influence, albeit less extensive, on Singlish grammar (Sato 2013). Key features of Singlish that have been elucidated by prior contact-oriented research include topic prominence, a repertoire of sentence-final particles, and the tense-aspect-modality (TAM) system, all of which have been tied to Chinese influence. Singlish also exhibits optional nominal (Kim et al. 2009) and verbal inflection (Sato & Kim 2012), the latter of which has been linked to copula deletion under Chinese influence (Kim 2012). In work on the aspectual system of Singlish, Bao has observed the systematic transfer of related grammatical features from Chinese to Singlish, such as the entire aspectual system being modelled after that of Chinese, albeit within the constraints of English morphosyntax (Bao 2010), as well as the transfer of topic-prominence (Bao & Lye 2005). Based on these patterns, he proposes the concept of system transfer (1). (1) System transfer: Substratum transfer involves an entire grammatical subsystem. (Bao 2009:348) This study focuses on pre-modifiers in Singlish, such as the pre-nominal RC in (2a) and the PP adjunct in (2b).1 (2) a. John eating [DP [RC Mary make] the cake]. (Singlish) ‘John is eating the cake that Mary made.’ b. [DP [PP on the table] that cake] ‘that cake on the table’ We argue here that pre-modifiers were transferred from Mandarin, where they are linked to nominals by 的 de, as shown by the examples in (3) that parallel (2). (3) a. Yuehan zai chi [DP [RC Mali mai ] de dangao] (Mandarin) John PROG eat Mary buy DE cake ‘John is eating the cake that Mary bought.’ b. [DP [PP zuozi shang ] de dangao] table on DE cake ‘the cake on the table’ Bao (2009) demonstrates that Mandarin de was calqued into Singlish as one, which can act as a sentence-final particle (SFP, 4a) or a nominalizer (5a), parallel to Mandarin, as shown by the corresponding examples in (4b) and (5b) respectively. (4) a. He always like that one! (Singlish) b. ta meici dou zhe-yang de! (Mandarin) 3SG always all this-way DE ‘He’s always like this!’ (5) a. large one (Singlish) b. da de (Mandarin) big DE ‘large one(s)’ Bao proposes that the grammatical system of de was transferred to Mandarin as one, in line with (1), but concludes that pre-modifiers were exceptionally excluded from this transfer on the basis of comparisons like (6). In (6a), we see that the AdjP da ‘big’ can be linked pre-nominally to shu ‘book’ by de in Mandarin, whereas in (6b), one cannot link large to durian in Singlish. Bao uses similar comparisons, substituting de with one, to argue that nominal, pronominal, and PP pre-modifiers are not available in Singlish, which he then accounts for by means of an inviolable constraint against *[XP-one N] strings in Singlish. (6) a. da de shu (Mandarin) big DE book ‘big book’ b. *large one durian (Singlish) Intended: large durian(s) In this paper, we argue that it is erroneous to predict that de should have the same reflex, one, in all syntactic environments after transfer, and as such, comparisons like (6) are not sufficient evidence to demonstrate the absence of pre-modifiers in Singlish, which are clearly available, as in (2). We show that the range of pre-modifiers in Mandarin is also available in Singlish, and that such pre-modified DPs must be definite with either a definite article the or a demonstrative like that. We propose that D[+DEF] is in fact the exponent of de in Singlish, a state of affairs which arose due to conflation of Mandarin pre-nominal de with English the, in agreement with S. Lee’s (2023) proposal in the context of pre-nominal RCs (see Baptista 2020 for the role of congruence in creole formation). Finally, we present a pattern-replication account of de transfer by pivot-matching (Matras 2010, Matras & Sakel 2007), proposing that de was transferred to Singlish as two separate systems: one in phrase-final position with a one exponent, and another in pre-nominal position with a D[+DEF] exponent. 2 Bao (2009) on Singlish one In Singlish, one is used as a nominalizer, rather reminiscent of English one. This function, previewed in (4), is further illustrated by (7a) and (8a), taken from Bao (2009) with modifications to the glosses, and parallels the behavior of Mandarin de in (7b) and (8b) respectively. (7) a. my one (Singlish; Bao 2009:340, ex. 1c) b. wo de (Mandarin; Bao 2009:342, ex. 4c) 1SG DE ‘mine (n.)’ (8) a. from Thailand one (Singlish; Bao 2009:340, ex. 1d) b. cong Taiguo lai de (Mandarin; Bao 2009:342, ex. 4d) from Thailand come DE ‘ones from Thailand’ The nominalizer function in addition to the shared SFP function of one and de lead Bao to conclude that one is a calque of de. According to system transfer (1), however, the transfer of a substratum feature like de should involve the importation of the entire grammatical subsystem, i.e., all functions of de. Yet, an entire cluster of functions seems to be unavailable in Singlish: that of pre-modification linker. Mandarin de links pre-modifying XPs to nouns, such as AdjPs (9a) and PPs (10a), but using one in such a way is ungrammatical in Singlish (9b, 10b). Another class of pre-modifier not examined by Bao is pre-nominal relative clauses (RCs), which are available in Mandarin with de (11a), but not in Singlish with one (11b). To our knowledge, S. Lee (2023) is the first author to study pre-nominal RCs in Singlish. Here, we will examine them as part of a broader transfer of de-linked pre-modifiers in Singlish, taking a contact-centric approach. (9) a. [AdjP da] de shu (Mandarin; Bao 2009:343, ex. 5a) big DE book b. *big one book (Singlish) ‘(the) big book’ (10) a. [PP zuozi shang] de shu (Mandarin) table one DE book b. *on the table one book ‘(the) book on the table’ (11) a. [RC Mali zuo] de dangao (Mandarin) Mary make DE cake b. *Mary make one cake. ‘(the) cake that Mary made’ Bao accounts for the apparent lack of pre-modifiers in Singlish by means of an inviolable constraint that rules out *[XP-one N] sequences. However, his account offers no motivation for such an ad hoc constraint on the part of early Singlish innovators who did transfer the other functions of Mandarin de to Singlish. Furthermore, we note that the examples in (9–11) are word-for-word equivalents of Mandarin: these are the forms that would be predicted if we assumed that Mandarin de always maps to Singlish one, as (1) seems to implicitly assume. In the following section, we show that Singlish does in fact have the same set of pre-modifiers as Mandarin, but that they do not involve one. 3 Pre-modifiers in Singlish In making the case for pre-modifiers in Singlish, we place particular focus on pre-nominal RCs, which had first attracted our attention, and which has evaded linguists’ attention prior to S. Lee (2023); prior work on Singlish RCs is limited, and furthermore restricted to post-nominal ones (e.g., Alsagoff and Ho 1998). Indeed, the truth-conditionally equivalent minimal pair in (12) shows that both types are available in Singlish. (12) a. John eating that cake [RC Mary make yesterday]. (Singlish) b. John eating [RC Mary make yesterday] that cake. ‘John is eating that cake that Mary made yesterday.’ Pre-nominal RCs may resemble paratactic topic-comment structures which are prevalent in Singlish (Bao & Lye 2005). Thus, one analysis for the string Mary buy that cake very nice is as in (13a), where Mary buy that cake is a full clause in topic position; the comment consists of an adjectival predicate very nice with a null pro subject coindexed with cake. We assume that there is a null copula, notated Ø, in the comment. Alternatively, Mary buy that cake could be an NP modified with a pre-nominal RC (RC-NP), occupying the subject position, while very nice is an adjectival predicate, also with a null copula (13b). (13) a. [CP/Topic Mary buy the cakek] [CP prok Ø very nice]. (Singlish) ‘Mary bought the cake; it was very good.’ b. [TP [DP/Subject Mary buy the cake] [T’ Ø very nice]]. ‘The cake that Mary bought was very nice.’ We maintain that such sentences represent a syntactic ambiguity – both structures are possible, with (13b) being the one relevant to our discussion. Importantly, the ambiguity disappears when the RC-NP is in object position (14), where there is no clear way to posit a topic-comment analysis. (14) John eating [DP [RC Mary make] the cake]. (Singlish) ‘John is eating the cake that Mary made.’ A string like Mary buy the cake can also be an answer to a question targeting a nominal. For instance, per the principle of question-answer congruence (Rooth 1992), we would expect a question like (15Q) to be answered with a nominal, and indeed, (15A) can only be interpreted as an RC-NP and not a clause; the latter interpretation would be infelicitous. (15) Q: John eating what? (Singlish) ‘What is John eating?’ A: Mary buy the chocolate cake. ‘The chocolate cake that Mary bought.’ #‘Mary bought that chocolate cake.’ RC-NPs can also be coordinated with other DPs. In (16), we see a proper name DP Ah Huat coordinated with an RC-NP, yesterday we meet that man. Assuming that coordination targets constituents of the same grammatical category, then, yesterday we meet that man must also be a DP. (16) [DP Ah Huat] and [DP yesterday we meet that man] both going to Punggol. (Singlish) ‘Both [Ah Huat] and [that man we met yesterday] are going to Punggol.’ Finally, RC-NPs can serve as the complement of a preposition like with (17). Assuming that prepositions require a nominal complement, then this fact also supports the nominal status of RC-NPs. (17) Ah Huat going to Punggol [PP [P with] [DP yesterday we meet that man]]. (Singlish) ‘Ah Huat is going to Punggol with that man we met yesterday.’ 4 The Chinese origin of pre-nominal RCs We have thus far made the case that Singlish has pre-nominal RCs. Here, we argue that they arose from Chinese influence, in line with S. Lee (2023), providing further arguments in support of this view. Among the languages proposed to be substrates of Singlish, only Chinese varieties have pre-nominal RCs; Malay RCs, like English ones, are post-nominal (Alsagoff & Ho 1998). Indeed, juxtaposing the Mandarin and Singlish RCs in (18) already shows a striking surface parallelism between the two. Here, we have not aligned that and de, but we make the case in Section 6 that that – or, more precisely, D[+DEF] elements including that – are the reflex of de in Singlish. (18) John eat Mary make that cake. (Singlish) Yuehan chi Mali zuo de dangao. (Mandarin) ‘John is eating the cake that Mary made.’ Singlish verbal inflection is generally optional, including in post-nominal RCs (19a), but in pre-nominal RCs, it is degraded (19b). (19) a. John eating that cake Mary buy/bought yesterday. (Singlish) b. John eating Mary yesterday buy/*bought that cake. ‘John is eating that cake Mary bought yesterday.’ Early Singlish speakers would have had mixed input of English clauses containing verbal inflection and Chinese clauses with no verbal inflection. This mixed input would have established variability in Singlish finite verb-forms: speakers could select more English-like (inflected) or more Chinese-like (uninflected) variants, possibly reflecting different orientations towards either lect in the early creole continuum. As described by N. Lee (2024) for the case of Baba Malay, variation in nascent creoles can be understood as a reflection of speakers’ alignment towards either the lexifier or the substrate language. Indeed, work tracing the development of Singlish alongside the evolution of societal patterns of language shift and education show a split between allegiance towards English and English education on the one hand, and towards Chinese and Chinese education on the other (Bao 2020). Thus, the general variability of modern Singlish verbal marking can be understood as a vestige of the mixed input received by its early innovators. Our observations show, however, that pre-nominal RCs are not part of the envelope of variation for verbal inflection, suggesting that the input of pre-nominal RCs was invariable with respect to verbal inflection, i.e., they never contained inflected verbs, which follows from pre-nominal RCs being from Chinese. Another piece of evidence for a Chinese origin has to do with the distinction between gapped and gapless RCs. As indicated by their name, gapped RCs involve a syntactic gap in the position semantically occupied by the relativized noun. Thus, in (20a) below, nühai ‘girl’ is the agent of tan ‘play’, which occupies the subject position; correspondingly, there is a gap in this position of the RC. On the other hand, gapless RCs do not have such a gap (20b). (20) a. [RC ____k tan gangqin] de nühaik (Mandarin) play piano DE girl ‘the girl who is playing piano’ b. [RC Lulu tan gangqin] de shengyin Lulu play piano DE sound ‘the sound of Lulu playing the piano’ In Singlish, only pre-nominal RCs can be gapless like in Mandarin (21a), but not post-nominal ones (21b), whether or not a relativizer that or relative pronoun which is present. (21) a. [RC Lulu play piano] the sound (Singlish) b. *the sound [RC (which/that) Lulu play piano] ‘the sound of Lulu playing the piano’ Pre-nominal RCs in Singlish therefore pattern just like Chinese ones. This close parallelism, combined with the converging evidence for Chinese influence on Singlish’s grammatical system, paints what we believe to be a compelling picture of Chinese transfer of pre-nominal RCs. Next, we turn to the other pre-modifiers that de links to nominals in Mandarin. 5 Other Pre-modifiers As alluded to in Section 2, the ‘b’ examples in (9–11) show word-for-word equivalents of Mandarin rendered with English lexis, with de mapped to one. The ungrammaticality of such forms only shows that one does not have a pre-modifier function, as Bao correctly states – but we should not conclude from this that Singlish does not have those pre-modifiers. In this section, we show examples of pre‑modified DPs in Singlish representing the full range of pre-modifier categories available in Mandarin. Crucially, we will see that one is not used as a linker in any of them, but that the DPs are definite in each case. For instance, in (22a), the demonstrative that intervenes between the adjective, which is reduplicated. A bare adjective without reduplication is degraded (22b), which may boil down to prosodic or phonological weight constraints. As acceptable as (22a), however, is (22c), where the pre-modifier is red color – a nominal rather than AdjP, but nonetheless closely mirroring Mandarin (22d). (22) a. [AdjP red red] that/*a/*Ø book (Singlish) ‘that very red book’ b. ?[AdjP red] that book (Singlish) Intended: ‘that red book’ c. [NP red color] that book (Singlish) ‘that red book’ d. [NP hong se] de shu (Mandarin) red colour de book ‘red book’ In (22c), the NP premodifier has an attributive function, indicating a property of the modified DP. In Mandarin, de can also link possessor DPs to possession DPs. In this case, a demonstrative (dem) and classifier (cl) can intervene between de and the possessed DP, as schematized in (23a) and illustrated in (23b). Singlish allows exactly this ordering, except that no classifier is used (23c); this structure is not possible in English. Instead, as shown by the free translation of (23c), simultaneous use of demonstratives and possessors requires an of-possessive construction in English, where the possessor is in a following PP headed by of. (23) a. Mandarin possessive with demonstrative: possessor + de + DEM + CL + DP b. Yuehan de na ben shu (Mandarin) John DE DEM CL book ‘that book of John’s’ c. John(’s) that book (Singlish; ungrammatical in English) ‘that book of John’s’ In (23c), we see that the genitive -’s is optional in Singlish. Its availability, however, might create the impression that -’s is the reflex of de, considering their functional and positional similarities. This idea is untenable, however, given that -’s cannot work with any of the other modifiers. Next, we turn to PP modifiers. In (24), we see that Mandarin (24a) and Singlish (24b) allow PP adjuncts to precede nominals, unlike in English (24c) which only allows PPs to the right. (24) a. [PP zuozi shang] de shu (Mandarin) table on DE book ‘book on the table’ b. on the table *(the) book (Singlish) ‘the book on the table’ c. the book on the table (English) 6 Transfer of de pre-modifiers: Pivot-matching Given the observation in the previous section that pre-modified Singlish DPs must have a D[+DEF] head, we propose that this D[+DEF] head itself is the exponent of de, giving the following pattern: (25) XP [D[+DEF] NP] This accords with S. Lee’s (2023) proposal that Mandarin de, realized as [tə] with an unaspirated voiceless coronal stop, was conflated with English the, which is often realized as [də] in Singlish, rendering it phonetically very similar to [tə] (see Baptista 2020 for the role of congruence in creole formation). From there, early Singlish innovators then generalized this exponent to the broader category of definite determiners, giving pre-modifier de a different exponent in Singlish (i.e., D[+DEF]) from that of phrase-final de (i.e., one). We account for this “bifurcation” in the transfer of de through the pivot-matching approach to pattern replication advanced by Matras and Sakel (2007) and Matras (2010). Under this approach, multilingual speakers in contact settings can avail themselves of resources from their full repertoires. These resources include patterns (sentence structures or constructions) as well as material (word-forms or morphemes) from both or all the varieties that they speak. In interaction, speakers aim to pursue a particular communicative goal, for which they scan through their entire repertoire to identify a construction that would best serve their needs. They then identify pivotal features of the construction which are mapped to appropriate material (morphemes and their combinatorial rules); the chosen material is then used to populate the structure, giving rise to an innovative construction that is communicatively effective. Using this model, we can now understand how innovators of Singlish might have transferred Chinese pre-modifiers to Singlish through pivot-matching. A Mandarin-English bilingual would be able to make use of both word-forms and structures from either of these languages for communication. In interaction, they would then select a structure from their repertoire that is appropriate for their communicative needs. In the case of pre-modifiers, they are useful for identifying specific referents in discourse as the presence of the pre-modifier narrows the domain of potential referents. Indeed, in Singlish, which makes available both Chinese-style pre-modifiers and English-style modifiers (which may precede or follow the DP depending on the type of modifier), Chinese-style pre-modifiers can often be used in situations where a speaker wants to emphasize or foreground the referent that they are speaking of. Thus, a Mandarin-English bilingual wishing to identify a specific referent may select from their linguistic repertoire the abstract structure in (26), which schematizes the pre-modifier construct: (26) XP de DP Next, lexical material would be used to populate this structure. The XP and DP would, of course, be filled by the relevant lexical material for the communicative situation, selected from among mainly English word-forms. This bias for English word-forms comes from the fact that, in the colonial setting, English was the socio-politically favored language. Indeed, Singlish is argued to have evolved in the context of shift towards English, particularly in the school system where English education began in 1819 (Bao 2020), a context in which prescriptive norms would have favored lexical material of English origin. Given this normative pressure to select English word-forms, speakers would then have to identify an item that was suitable to expound de, which has no clear counterpart in English. While we showed that it might be likened to the genitive -’s on the basis of possessive structures, -’s is an unlikely candidate given that it is semantically limited to possession and not clearly appropriate for PP, AdjP, or RC pre-modifiers. Instead, we propose, speakers mapped de to English the not just for their phonetic similarity noted above, but also because the encodes definiteness in English, reminiscent of the specificity communicated by pre-modifiers. We now return to the fact that, unlike pre-modifier de, SFP de and nominalizer de were transferred to Singlish as one. That de has ended up with two Singlish exponents is not predicted by system transfer; in our view, such an abstract principle does not account for the speaker’s perspective, whose agency is centered by approaches like pivot-matching. While system transfer has been fruitful in accounting for many key aspects of Singlish grammar (Bao 2009, 2012), Bao himself recognizes that this concept is underspecified with respect to how a grammatical subsystem is defined (Bao & Lye 2005:271–272). From the speaker’s perspective, the transfer of a grammatical system is not an autonomous process but is crucially contingent on speakers recognizing that a cluster of features is related. So, while all structures involving de could be said to form one subsystem on the common morpheme de, it is not necessarily the case that speakers treat them as such, and thus, it is not a given that speakers will transfer them as a package. In the present case study, de as a pre-modifier must not have been seen as part of the same system as de when used as an SFP or nominalizer. This is supported by their syntactic distribution: while pre-modifier de is always followed by a noun, SFP de and nominalizer de are phrase-final. As such, they were not transferred as a single system but two separate ones, each assigned a different morphological exponent. The value of pivot-matching, as with other approaches that conceive of communication in multilingual contexts as a set of problems for speakers to creatively and agentively solve (e.g., Muysken 2013, Backus et al. 2011), is that it centers the speaker’s perspective, allowing it to account for observations that are otherwise unpredicted by more abstract theoretical principles like system transfer. The replacement of Mandarin de with English the (S. Lee 2023) creatively overcomes the need for a counterpart to replace the Mandarin morpheme, and does not straightforwardly fit into the picture painted by system transfer, which implicitly predicts a one-to-one mapping between transferred lexical items in the source and recipient languages. The status of language users as the agents of change is foregrounded by the types of user-centered frameworks that we advocate for here. Author Contributions WL conceptualized the project and collected Singlish data through elicitation with language consultants. MB and WL jointly developed the presented analysis. WL took the lead in writing the manuscript with input and comments from MB. References Backus, Ad, A. Seza Doğruöz, and Bernd Heine. 2011. Salient stages in contact-induced grammatical change: Evidence from synchronic vs. diachronic contact situations. Language Sciences 33.5:738–752. Bao, Zhiming, and Lye Hui Min. 2005. Systemic transfer, topic prominence, and the bare conditional in Singapore English. Journal of Pidgin and Creole Languages 20.2:269–291. Bao, Zhiming. 2009. ‘One’ in Singapore English. Studies in Language 33.2:338–365. Bao, Zhiming. 2010. A usage-based approach to substratum transfer: The case of four unproductive features in Singapore English. Language 86.4:792–820. Bao, Zhiming. 2012. Substratum transfer targets grammatical system. Journal of Linguistics 48.2:479–482. Bao, Zhiming. 2020. The origins of Singapore English. In Multilingual global cities, ed. Peter Siemund and Jakob R.E. Leimgruber, 19–37. London: Routledge. Baptista, Marlyse. 2020. Competition, selection, and the role of congruence in Creole genesis and development. Language 96.1:160–199. Kim, Chonghyuck, Chang Qizhong, and Leslie Lee. 2009. Number marking in Colloquial Singapore English. Journal of Cognitive Science 10.2:149–172. Kim, Chonghyuck. 2012. The role of copula deletion on the emergence of a new grammatical feature. 영어영문학연구 [Journal of English Language and Literature Research] 38.4:223–254. Lee, Nala. 2024. The early Baba Malay continuum. Journal of Pidgin and Creole Languages 39.2:339–364. Lee, Si Kai. 2023. At the Edge of Contact: Insights from the Left Peripheries of Singlish. University of Connecticut doctoral dissertation. Matras, Yaron, and Jeannette Sakel. 2007. Investigating the mechanisms of pattern replication in language convergence. Studies in Language 31.4:829–865. Matras, Yaron. 2010. Contact, convergence, and typology. In The handbook of language contact, ed. Raymond Hickey, 66–85. Hoboken, NJ: John Wiley & Sons. Muysken, Pieter. 2013. Language contact outcomes as the result of bilingual optimization strategies. Bilingualism: Language and Cognition 16.4:709–730. Rooth, Mats. 1992. A theory of focus interpretation. Natural Language Semantics 1.1:75–116. Sato, Yosuke and Kim Chonghyuck. 2012. Radical pro drop and the role of syntactic agreement in Colloquial Singapore English. Lingua 122.8:858–873. Sato, Yosuke. 2013. Wh-questions in Colloquial Singapore English: Adaptive traits from vernacular Malay and typological congruence. Journal of Pidgin and Creole Languages 28.2:299–322. SERIAL VERB CONSTRUCTIONS IN LIO Grace B. WIVELL1, Khanin CHAIPHET2, Michelle MAYRO1, Maria Magdalena RINI3, Maria Floriani SERLIN3, & Thomas CONWAY4 1Stony Brook University, 2Chulalongkorn University, 3Universitas Flores, 4University of California Los Angeles Abstract This study explores serial verb constructions (SVCs) in Lio, a Malayo-Polynesian language of Indonesia. This study builds on Conway et al. (2022), who considered data from the PARADISEC Lio collection (Yanti 2019), using new data from more recent fieldwork material, also available on PARADISEC. The data largely confirms the generalizations of Conway et al. (2022), in that Lio SVCs are monoclausal phrases with no linking-element, nor are they one of the verbs an argument of the other. One semantic event is expressed by each SVC, and SVCs tend to be contiguous. Lio also exhibits SVC subtypes that align with other typological findings (Aikhenvald 2006, Baird 2008), including synonymic, benefactive, causative, and manner, as well as a mixed-type. These subtypes have different properties as regards to word order and contiguity, which raises questions as to their status as SVCs and has implications for the syntactic structure of each subtype. Keywords: serial verb constructions (SVCs), Austronesian language, syntax ISO 639-3 codes: ljl 1 Background on Lio Lio is a Malayo-Polynesian1 language spoken on the island of Flores, Indonesia. It belongs to a dialect chain referred to as the Central Flores (CF) Languages, which consists of the following languages from east to west: Rongga, Ngadha, Kéo, Nage, Nga'o, Ende, and Lio (Elias 2018). The Lio-speaking region of Flores can be found in the east central part of the island, as in Figure 1. Figure 1: Map of the languages of Flores (Edwards and UBB 2018) Although Lio has approximately 105,000 speakers (Eberhard et al. 2024), little has been published about the language. There exists a German-Lio dictionary (Arndt 1993), two Indonesian-language theses (Levi 1978 and Mbete 2020), a brief Indonesian-language grammar funded by the Center for Language Development and Cultivation (Pusat Pembinaan dan Pengembangan Bahasa) (Sawardo et.al., 1987) and an English-language thesis describing Lio’s phonology and the historical relations of the Central Flores languages (Elias 2018). There are also several proceedings articles on Lio: a case study on consonant acquisition (Wivell 2022), an analysis of stop voicing contrast (Miatto and Wivell 2022), a study of stress perception (Mayro and Wivell 2025), and a description of kinship terms (Fluit et al. 2023). Lio has subject-verb-object (SVO) word order. It is a highly isolating language, with almost no morphology present, and high levels of polysemy and homophony. Words in Lio are mostly disyllabic (Elias 2018) and have penultimate stress unless the penultimate syllable contains a schwa (Elias 2018, Mayro 2025). 2 Data Data for this study comes from two collections of naturalistic speech of Lio on PARADISEC (Yanti 2019, Wivell 2026). In total, thirteen texts were considered, including monologues, conversations, and data collected using instruments (coded as CON, MON, INSTR, respectively in examples throughout this proceedings). This included data from fifteen speakers, six women and nine men, ranging in age from 20 to 67 years. In total, over 150 examples of serial verb constructions were considered. 3 Serial Verb Constructions Lio demonstrates a rich set of serial verb constructions (hereafter SVCs). Here, SVCs are defined as, following (2016) and Aikhenvald (2006), a monoclausal construction consisting of two or more verbs, describing one semantic event, with no linking element nor argumentpredicate relationship established between the verbs. An example can be seen in (1). (1) Kai mèra minu kopi no’o bèku 3SG sit drink coffee with civet ‘I drink coffee with the civet.’ (Yanti 2019, PARADISEC; MON006, 20; VJS M20, Wolowaru) While most SVCs contain two verbs in the verb complex, SVCs of up to four verbs have been observed in Lio. 3.1 SVC Diagnostics Many of the diagnostics for serial verbs rely heavily on verbal morphology; given that Lio is such an isolating language, these diagnostics are insufficient for Lio. Closely following Baird (2008), who worked with Keo, another Central Flores language with minimal morphology, we used the following diagnostics in determining if two (or more) verbs could be considered an SVC: (1) describing a single semantic event, (2) single subject, (3) single negator, (4) single TAM marker, and (5) single intonational phrase. That an SVC in Lio describes a single semantic event is illustrated in (2) to (4). In (2), it is understood by Lio speakers that the monkey, which is the sole subject of the constituent in which the SVC under discussion is found, did not cut the civet’s tail, then take the tail, and then run off, but rather did all three as one action. Similarly, in (3), it is not that the king gave something and then the letter came out, but that the king, the sole subject, sent out a letter. Finally, in (4), the single subject, “I,” will not “go,” and then “take,” but rather go to get the tail, as one event. (2) Eko aku ro’a nde'dhe sawe, ro’a gète nde'dhe tail 1SG monkey take already, monkey cut take paru sawe run finished ‘My tail was taken by the monkey; the monkey ran off (with it).’ (Yanti 2019, PARADISEC; MON006, 120-121; VJS M20, Wolowaru) (3) Kai raja gheta Holo Tana pati wa’u sura 3SG king inland Holo Tana give come.out letter ‘He, the king of Holo Tana, sent out a letter.’ (Yanti 2019, PARADISEC; MON003, 33; GBR, M64, Wolowaru) (4) Aku mbana nde’dhe eko kau 1SG go take tail 2SG ‘I will go to get your tail.’ (Yanti 2019, PARADISEC; MON006, 122; VJS, M20, Wolowaru) That an SVC has only a single negator is illustrated in (3) and (4). There is only one negator in Lio, iwa, and it precedes the constituent it is negating, and in the case of SVCs, the negator appears before the first constituent complex, and does not appear again in the middle of the verb complex. (5) Seba ghea iwa mbana dai uma=ke gheta Seba there NEG go watch.over garden=3SG.Poss inland ‘Seba did not go to watch over his garden. ‘ (Yanti 2019, PARADISEC; MON 018, 161; PLL, F36, Wolowaru) (6) Èbe iwa uku aja 3PL NEG measure teach ‘They did not involve themselves (with us).’ (Yanti 2019, PARADISEC; MON002, 125; KKR, M66, Wolowaru) That an SVC has only a single Tense-Aspect-Mood (TAM) marker is illustrated in (7) and (8). TAM markers in Lio come either before or after the verb, depending on the marker. When appearing with an SVC, the TAM marker occurs only once, either before or after the verb complex, as expected for that particular marker. Thus, nèbu, which occurs before verbs, occurs once before the SVC in (7), and rowa, which occurs after verbs, occurs once after the SVC in (8). (7) Nèbu kopa ripe mbola ina=ke gha ina PROG cover shut.tight woven.read.basket this=nya here this ‘It is inside this basket.’ (Yanti 2019, PARADISEC; MON006 110; VJS M20, Wolowaru) (8) Ha kita ale gae rowa ana lo’o tei dowa yes 1PL.Incl ask look.for PERF child small see PERF ‘Yes, (the one) we looked for, we have already found the child.’ (Wivell 2026, PARADISEC; INSTR020, 53; RVO, M53, Detusoko) Regarding whether SVCs are a single intonational phrase, this was determined by measuring pauses. To do so, we looked at only a subset of the data considered for the larger study, as not all recordings were suited to phonetic analysis. Recordings from three speakers (M20, F36, M66), with a total of 26 potential SVCs were considered. All SVCs were situated phrase-medially to avoid prosodic effects associated with phrase edges. SVCs were analyzed in Praat (Boersma 2025) and annotated using ToBI annotation conventions (Beckman and Elam 1997). SVCs generally had no measurable break between verbs in the verb complex. This is demonstrated in Figure 2. Figure 2: Prototypical 2-Word SVC with No Pauses: mbana sogo ‘go pay debt’ 3.2 Types of SVCs Six types of SVCs might possibly be observed in Lio: motion, causative, manner, synonymic, benefactive, and mixed-type. The number of types observed largely depends on how the benefactive is analyzed, which is discussed at length below. Examples (9) and (10) illustrate motion SVCs. Motion SVCs in Lio were always found to be contiguous; that is, there were no interfering lexical items between the two constituents of the verb complex. Almost exclusively, motion SVCs contain some kind of ‘go’ verb as the first constituent of the verb complex, usually mbana. The order of the constituents in the verb complex is not flexible, i.e., *gae mbana. (9) Geso jara ina da ghele mbana saka no jara mesa. Move horse this to inland go ride with horse always ‎‎‘He moved his horse up there, and always rode by on his horse.’ (Yanti 2019, PARADISEC; MON018, ‎‎94; PLL F36, Wolowaru) (10) Mai=se kita mbana gae Ero. come.here=POS.IMP 1PL.Excl go look.for Ero ‘Alright, let’s go look for Ero.’ (Wivell 2026, PARADISEC; MON033, 44; KLL, M46, Detusoko) Examples (11) and (12) illustrate causative SVCs. Causative SVCs are also contiguous. Almost exclusively, causative SVCs contain pati “give” as the first constituent of the verb complex, and the order of the constituents is not flexible. (11) Pati mata do harimau ina Give die already tiger this ‘(He) kills the tiger.’ (Yanti 2019, PARADISEC, MON003, 83; GBR M64, Wolowaru) (12) Pati ka aji ana ina , are lala mesa… Give eat younger.sibling child this, rice mushy all ‘(I) would feed my younger siblings, all the congee…’ (Wivell 2026, PARADISEC; MON031, 21; RVO, M53, Detusoko) Examples (13) to (15) illustrate manner SVCs. In manner SVCs, one constituent in the verb complex is the action, and the other is the manner in which that action is done. Thus, in (13), the water touches the wife and children in a splashing manner, and in (14) and (15), the act of bringing is done via carrying the item on the shoulder. Manner SVCs are also contiguous. However, unlike motion and causative SVCs, there is not a single verb that seems to appear in the first constituent. Furthermore, the order of the two constituents is flexible. This is demonstrated in (14) and (15). These SVCs were uttered by the same speaker in the same monologue, and there is no perceivable difference in meaning between the two Lio speakers. (13) Ae gèsi gèna lèka fai ana ngèta lei.sawe water splash touch LOC wife child like all ‘The water soaked his wife and children.’ (Yanti 2019, PARADISEC; MON006, 83; VJS, M20, Wolowaru) (14) Kami règu tu uta.jèpa da ghawa 1PL.Excl carry.on.shoulders bring chayote to seaward ‘We bring the chayote down (to the market).’ (Yanti 2019, PARADISEC; MON018, 36; PLL F36, Wolowaru) (15) Èbe tu règu pusu lèma da gha kami… 3PL bring carry.on.shoulders heart tongue to here 1PL.Excl ‘They brought the heart and tongue (of the pig) here to us.’ (Yanti 2019, PARADISEC; MON018, 300; PLL F36, Wolowaru) Examples (16) and (17) illustrate synonymic SVCs. Synonymic SVCs, as the name suggests, contain synonyms as each constituent of the verb complex. They are contiguous, and while it has been suggested in Keo that synonymic SVCs have flexible word order (Baird 2008), this has not been observed in Lio, and in fact something like *nasu nemo is illicit for the speakers with whom we worked. (16) Ho’o èmba aku nemo nasu lèka tu’a ana Yes where 1SG cook cook LOC in-law child   ‘Yes, where I cook with my daughter in law’ (Yanti 2019, PARADISEC; CON002, 33; DRF, F67, Wolowaru) (17) Iwa latu eo mbi’a nggera. NEG EXIST REL break break ‘There has not been (anything) that has broken (us apart).’ (Yanti 2019, PARADISEC; MON002, 17; KKR, M66, Wolowaru) Example (18) demonstrates what may be a benefactive SVC. Benefactive SVCs are the only SVCs that are non-contiguous, and the second constituent in the verb complex must be pati ‘give.’ (18) Ana ata=ke   holo rua taku ae pati aku   child people=nya CLFperson two carry water give 1SG   ‘Two children (of other people) carry the water for me.’ (Yanti 2019, PARADISEC; CON002, 139; DRF, F67, Wolowaru) Analyzing the structure in (18) as an SVC follows Baird (2002)’s analysis of Keo SVCs: a benefactive SVC is reproduced below. Baird analyzes these structures as obligatorily non-contiguous. (19) ja’o tendo jawa ti’i iné 1SG plant corn give mother ‘I planted corn for mother.’   *ja’o tendo ti’i jawa iné 1SG plant give corn mother (Baird 2002) However, Levi (1978) has previously analyzed pati as meaning both ‘give’ (as a verb) and ‘for’ (as a preposition), and this is in keeping with our own analysis of Lio prepositions. Relevant data is reproduced in (20). (20) Sura ina pati kai letter this for 3SG ‘This letter is for him.’  (Levi 1978:99) Bearing in mind high levels of polysemy and homophony in Lio, we must wonder whether the pati in (18), reproduced in (21), is a verb at all, or if it is, rather, a preposition, and ought to be glossed accordingly. (21) Ana ata=ke holo rua taku ae pati aku   child people=nya CLFperson two carry water for 1SG   ‘Two children (of other people) carry the water for me.’ (Yanti 2019, PARADISEC; CON002, 139; DRF, F67, Wolowaru) Alternatively, pati can in fact be analyzed as a verb in sentences like (20), in which it is the main verb and thus is unambiguous. This ‘give’ to benefactive cline is common across languages (Heine & Kuteva 2002:321): the question is merely whether the benefactive in Lio involves a prepositional phrase or an SVC. Additional work that truly gets at the status these lexical items have in speakers’ minds must be done to draw any real conclusion, and for now we merely present the reader with both analyses. The final type of SVC that has been observed in the data is the mixed-type SVC, as in (22). In many ways, this SVC appears to be a manner SVC: Mother came in the manner of running and crying. However, ke nangi resembles a synonymic SVC, and thus this has been labeled as a mixed-type SVC. Because mixed-type SVCs must have more than two constituents in the verb complex, and such long verb complexes are rarer in Lio, it is unclear what kinds of SVC types are more likely to mix. (22) Mama paru ra’i ke nangi mèmè ghea ina mother run come cry cry.mournfully LOC there this ‘Mother came running and crying (in a mournful way) here.’ (Yanti 2019, PARADISEC; MON018, 36; PLL, F36, Wolowaru) The types of SVCs found in Lio, as well as their properties, are summarized in Table 1. Table 1: SVC types and their qualities summarized TYPE ‘FIXED’ ORDER CONTIGUOUS Motion Y Y Causative Y Y Synonymic Y* Y Manner N Y Benefactive Y** N** Mixed-Type ? Y *At least not in the data considered thus far **If, in fact, the benefactive is an SVC at all 4 Implications for Lio Syntax Our diagnostics for SVCs in Section 3.1 support the claim that they are monoclausal. The constructions contain a sequence of multiple verbs with a single subject and, when present, a single negator and aspect marker. Collins (1993, 1997) further proposes that SVCs share internal verbal arguments and functional categories within the clause, and that the direct object of the first verb controls PRO, which takes the second verb as its complement, following Baker (1989). As a result, SVCs are analyzed as control structures, where the second verb is incorporated into the first at LF. Furthermore, Collins (2002) assumes that SVCs consist of nested vP shells and that lexical verbs undergo head movement to form a complex head with their associated functional head v, such that the first verb in an SVC will always precede any object. The tree in (23a) illustrates the pre-movement structure, while the one in (23b) shows the multiple verb movement required to derive a different word order in Ewe. (23) a. b. The ability for the single head v to license multiple lexical verbs is captured by the parameter for SVCs in (24) (Collins 2002:9). (24) Serialization parameter The light verb v can license multiple Vs. For the syntactic analysis of Lio SVCs, we assume this parameter in constructing a nested vP shell structure to accommodate multiple verbs, and we adopt multiple verb movements to derive the different word orders found in Lio SVCs. Let us now summarize the types of Lio SVCs in (25).2 (25) a. Motion SVCs b. Causative SVCs c. Synonymic SVCs d. Manner SVCs e. Benefactive SVCs Among these SVC types, manner SVCs are the only ones that allow flexible ordering, and will therefore be represented with different verb movement orders.3 As in (26), the verb règu ‘carry.on.shoulders’ can precede the verb tu ‘bring’, and vice versa. In (26a), règu, the head of the upper VP, undergoes head movement first to form a complex head with its associated functional head v, while tu moves in the second step. In (26b), the order of movement is reversed. (24) Flexible word order: Manner SVCs a. Kami règu tu uta.jèpa da ghawa 1PL.EXCL carry.on.shoulders bring chayote to seaward ‘We bring the chayote down (to the market).’ b. Èbe tu règu pusu lèma da gha kami 3PL bring carry.on.shoulders heart tongue to here 1PL.EXCL ‘They brought the heart and tongue (of the pig) to us here.’ By contrast, motion, causative, synonymic, and benefactive SVCs exhibit fixed word order and consequently fixed verb movement. These SVC types with fixed word order will be syntactically represented in (27a) to (27d), respectively. (25) Fixed word order a. Motion SVCs mbana gae Ero go look.for Ero ‘(We) go look for Ero.’ b. Causative SVC Pati ka aji ana ina Give eat younger.sibling child this ‘(I) would feed my younger siblings…’ c. Synonymic SVC Ho’o èmba aku nemo nasu lèka tu’a ana Yes where 1SG cook cook LOC in-law child   ‘Yes, where I cook with my daughter-in-law’ d. Benefactive SVC ana ata=ke holo rua taku ae pati aku child people=nya CLFperson two carry water give 1SG ‘Two children (of other people) carry the water for me.’ Recall that benefactive SVCs are non-contiguous. Therefore, only the first verb undergoes head movement, while the second does not, leaving the object ae ‘water’ positioned between the two verbs. Also recall that the second verb pati ‘give’ is polysemous and can alternatively be analyzed as a preposition meaning ‘for’. This suggests that Lio may be undergoing a diachronic shift toward reanalyzing certain serial verbs as adpositions, paralleling changes documented in other isolating languages (Heine & Kuteva 2002:149-155). Based on the Lio data we have collected so far, pati appears to support both analyses. If it is treated as a preposition, however, the construction is no longer an SVC; instead, pati aku is analyzed as a PP modifying the main verb of the sentence. This alternative analysis is illustrated in (28). (26) Benefactive SVC (pati as P) ana ata=ke holo rua taku ae pati aku child people=nya CLFperson two carry water for 1SG ‘Two children (of other people) carry the water for me.’ This section has developed a syntactic representation of Lio SVCs as monoclausal structures, following the diagnostics and theoretical analyses of Collins (1993, 1997, 2002) and Baker (1989). Multiple verbs within a clause share arguments and functional categories, and verb order is derived via successive head movement in a nested vP-shell architecture. Among SVC types, only manner SVCs allow flexible verb ordering, while the others exhibit fixed order. Benefactive SVCs further differ in being non-contiguous, and the verb used in this construction is polysemous, functioning either as a verb or as a preposition. Under the latter analysis, the construction is no longer an SVC but a single verb with an adjunct PP. 5 Serial Verb Constructions Like other Central Flores languages, Lio exhibits substantial parallelism in ritual speech. The verbs found in these parallel structures often also exist as SVCs. An example of parallelism is found in (29). (29) Wiki sai wiwi mèdi sai lèma Take POS.IMP mouth, take POS.IMP tongue ‘Watch your mouth, look after your tongue.’ (Be careful about the way you speak so as not to offend people.) (Rini, Masters Thesis for Universitas Negri Malang, 2014) The two verbs found in this expression can be combined into a synonymic SVC, wiki mèdi ‘take.’ Baird proposes for Keo SVCs that while in the present day either constituent can be used in a mono-verbal clause, in an earlier stage of the language only one was used in everyday speech, while the other was only used in ritual speech (Baird 2008). It is unclear as to whether this is also the case for Lio. 6 Conclusion This is the first in-depth study of serial verb constructions in Lio, an understudied Austronesian language, and illustrates that Lio has a range of SVC subtypes, including synonymic, benefactive, causative, and manner, as well as a mixed-type. These subtypes have different properties, which have implications for the syntactic structure of each subtype, and may even cast doubt on their status as an SVC. This is especially true of the benefactive subtype, which may be undergoing a diachronic shift in which the first constituent of the verb complex is reanalyzed as an adposition. There are some acute differences between Lio and related languages, namely, a seemingly fixed word order in synonymic SVCs, rather than flexible order of the constituents as is observed in other languages, and even other SVC subtypes in Lio. Research on Lio is, of course, ongoing, and with more data, we might in fact observe such synonymic with flexible word order. With additional data, we may also be better able to understand which subtypes of SVCs occur within the mixed-type constructions, and thus provide a clearer syntactic analysis of these forms. Nonetheless, as a whole, Lio fits neatly into the typology of SVCs observed in other Austronesian languages, especially in its similarities to Keo, a related language in the largely understudied Central Flores dialect chain. Author Contributions All authors contributed to the intellectual development of this work. The initial corpus of SVCs was developed by Conway, and added to by Wivell. Wivell worked with speakers to glean additional insights into SVCs in Lio, and organized the overall project. Mayro did the phonetic analysis, and Chaiphet the formal syntactic analysis. Rini, and Serlin provided invaluable speaker insights, and our comments on SVCs and Parallelism builds on previous work by Rini. Wivell and Chaiphet wrote the manuscript, with input from all other authors. Acknowledgements There are many groups and individuals without whom this work would not have been possible. We would like to thank the Fulbright Student Research Program in Indonesia, the Center for Inclusive Education at Stony Brook (more specifically the Turner Academic Year Research Grant), and the Linguistics Department at Stony Brook for providing the funding to support the dissertation fieldwork of the first author, which provided a basis for much of this research. We would also like to thank the many undergraduates who assisted with transcription and translation of data collected during recent fieldwork. We would also like to thank Fransiskus (Faris) X. Mbete and Fortunata (Fortun) Dala, who were members of the 2019 training program, alongside the first author, which resulted in much of the 2019 data also considered as part of this study. Finally, we would like to thank all the Lio speakers who shared their time and knowledge of their language with us, without whom this work would not be possible. References Aikhenvald, A. Y. 2006. Serial Verb Constructions in the Typological Perspective. In Serial Verb Constructions: A Cross Linguistic Typology. Baird, Louise 2002. A grammar of Kéo: An Austronesian language of East Nusantara. Doctoral dissertation. Australian National University. Baird, L. 2008. Motion serialisation in Keo. In Serial Verb Constructions in Austronesian and Papuan Languages. Pacific Linguistics. Baker, M. C. 1989. Object sharing and projection in serial verb constructions. Linguistic Inquiry, 20.3:513–553. Beckman, M. E., & Elam, G. A. 1997. Guidelines for ToBI labelling, version 3.0. Ohio State University. Boersma, Paul & Weenink, David 2025. Praat: doing phonetics by computer [Computer program]. Version 6.4.47, retrieved 7 November 2025 from https://urldefense.com/v3/__https://praat.org__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrs8dwDsz$ Collins, C. 1993. Topics in Ewe syntax. Doctoral dissertation. Massachusetts Institute of Technology. Collins, C. 1997. Argument sharing in serial verb constructions. Linguistic Inquiry 28.3:461–497. Collins, C. 2002. Multiple verb movement in ǂHoan. Linguistic Inquiry 33.1:1–29. Conway, Thomas, Grace B. Wivell, and Fransiskus X. Mbete. 2023. Serial Verb Constructions in Lio [Poster]. 96th Annual Meeting of the Linguistic Society of America, Denver, Colorado, USA. Dixon, R. M. W. 2006. Serial verb constructions: Conspectus and coda. In A. Y. Aikhenvald & R. M. W. Dixon (Eds.), Serial verb constructions: A crosslinguistic typology, pp. 338350. Oxford University. Eberhard, D., Simons, G., Fennig, C. (eds.). 2020. Ethnologue: Languages of the World, Twenty third edition. Dallas, Texas: SIL International. Online version: https://urldefense.com/v3/__http://www.ethnologue.com__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmProo7cCRX$ Elias, Alexander. 2018. Lio and the Central Flores languages. Unpublished Master’s Thesis. Faculty of Humanities, Leiden University. Fluit, Arwen Canino, Grace B. Wivell, and Fransiskus X. Mbete 2023. Lio kinship terminology. Proceedings of the Linguistic Society of America (PLSA), Vol 8, No 1. Haspelmath, M. 2016. The Serial Verb Construction: Comparative Concept and Cross-linguistics Generalization. Language and Linguistics. Heine, B., & Kuteva, T. 2002. World Lexicon of Grammaticalization. Cambridge University Press. Levi, Ferdinandus, 1978. A preliminary study of Lionese. Yogyakarta: Sanata Dharma Teacher’s Training Institute. B.A. thesis. Mayro, Michelle. June 2024. An Acoustic Investigation of Stress in Lio [Talk]. 16th International Conference on Austronesian Linguistics, Manila, Philippines. Mayro, Michelle and Grace B. Wivell 2025 Perception of Stress in Lio. In Mark Alves (ed.), Papers from the 33rd Annual Meeting of JSEALS, pp. 162–169. JSEALS Special Publication No. 13. University of Hawai‘i Press. Miatto, Veronica. and Grace B. Wivell, 2022. Stop voicing contrast in Lio. Proceedings of Meetings on Acoustics (POMA), Volume 45, Issue 1, 29. Sawardo, P., Wakidi, Tarno, Y. Lita, S. Kusharyanto. 1987. Struktur Bahasa Lio (Structure of the Lio Language). Pusat Pembinaan dan Pengembangan Bahasa, Departemen Pendidikan dan Kebudayaan (The Center for Language Development and Cultivation, Department of Education and Culture). Wivell, Grace B. 2022. Consonant acquisition in Lio. Proceedings of the Linguistic Society of America (PLSA), Volume 7, No 1. Wivell, Grace B. (collector). 2026. Recordings of various texts in Lio, an Austronesian language spoken in Flores, Indonesia. Collection LJL2026 at catalog.paradisec.org.au [Open Access]. (Collection forthcoming.) Yanti (collector). 2019. Recordings of various texts in Lio, a language spoken in Ende, East Nusa Tenggara, Indonesia. Collection LJL2019 at catalog.paradisec.org.au [Open Access]. https://urldefense.com/v3/__https://dx.doi.org/10.26278/5f35639ea0bcc__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrvJ5ye89$ MODERN STANDARD TAI-AHOM (MSTA): PHONOLOGY AND ORTHOGRAPHY OF A SEMI-ENGINEERED LANGUAGE BASED ON OLD AHOM Madhurjya Burhagohain The English and Foreign Languages University, Hyderabad, India 8ieldarch@gmail.com Abstract Ahom, a member of the Shanic languages belonging to the Southwestern Tai group, has generally been classified as a “dead” language with no continuous native speech community since at least the early modern period (Morey 2005). There have been several personal and community driven efforts to revive the language with differing degrees of success and shortcomings. This article introduces the phonology and orthography of the author’s personal conlanging effort. Modern Standard Tai-Ahom (MSTA hereafter) is a semi-engineered language created to revitalize and standardize Old Ahom for modern usage. Based on comparative evidence from closely related Tai languages, especially Aiton, MSTA institutes a streamlined phonemic inventory and introduces new orthographic reforms and conventions for consistency and pedagogical utility. This article presents justification for phonological decisions and addresses orthographic innovations. Keywords: Tai, Ahom, standardization, conlang, tones, orthography ISO 639-3 codes: aho 1 Introduction Ahom (exonym) is a technically “dead” language which was once used as a medium of communication for some centuries by the Tai people who accompanied the Mao-Shan prince Sue-Ka-Pha to the valley of Assam. For various reasons, chiefly administrative, as their subjects were largely non-Tai, they had to switch to Assamese and discontinue use of Ahom (Buragohain 2018). But the tradition of writing manuscripts remained till the official end of Ahom rule in 1826. There have been several efforts by the Ahom revivalists and several traditionalists to “revive” the long-lost language and the partially lost cultural heritage attached with it. The significant ones include the establishment of the Central Tai Academy in 1964 followed by the Eastern Tai Literary Society in 1981. The government of Assam made Tai-Ahom an elective subject in schools of Ahom dominated areas in the early 1990s. In the early 2000s, the Institute of Tai Studies and Research in Moran started institutionalized teaching of the language through certificate and diploma courses. This also attracted significant scholarly attention to the study of manuscripts written in Ahom. More recently, the Ahom script was included in Unicode in 2015. However, none of these efforts have provided a proper solution to two issues, namely, (1) the lack of tones and (2) orthographic ambiguity which pose significant challenges for pedagogical efforts. Moreover, the modern usage of Tai Ahom remains a highly disputed issue, whether it is a continuation of the old Ahom language or not (Terwiel 1989). As for my involvement, I am not historically part of any Ahom revival movement. However, as someone who takes an interest in studying the Tai languages of Assam, my role is rather to help provide a way to address the abovementioned issues of the Ahom language for heritage learners. This work should not be treated as a full historical reconstruction of the Ahom tone system, but rather as a tentative solution. I argue that learning Ahom with a constructed tone system is a better choice than learning it without tones for several reasons: (1) it reintroduces the tonal nature consistent with Tai typology, making it sound indigenous to the language family; otherwise, it sounds structurally dissimilar to its living sister languages, and (2) continuing to learn atonal Ahom may cause learners to become habituated to a non-Tai sounding prosody. This would make the later introduction and acquisition of tones (even in a hypothetical situation where the tone system is fully reconstructed) challenging. MSTA is crafted to be a neo-Ahom vernacular (as Terwiel and Wichasin in 1992 termed it) with the required linguistic treatment of it. I have tried my best not to make it far removed from how old Ahom phonology could have been. The first task was devising a working tonal system for this semi-constructed language. Whether or not it can be considered a (semi) constructed language is a matter of further discussion. After ruminating on this, I have chosen to use the term ‘semi-engineered’ instead of ‘conlang’ for the time being. Deliberating on the writing system reform, I had to make some major and minor revisions to the existing old Ahom abugida to make it suitable for contemporary usage. 1.1 Ahom as a Southwestern Tai language of the Shan group Ahom has been classified as a member of the Southwestern branch of the Tai language family. Southwestern Tai languages are further sub-divided into two categories: the P languages and the PH languages. Chamberlain classifies Ahom, Shan, Tai Dam, and others as the ‘p languages.’ He further states that this binary classification can get us to a point to reconstruct the tone systems of both. The P languages have *ABCD 123-4, whereas the PH ones have another two sub-groups viz. *BCD 123-4 and *BCD 1-23-4. He further proposes some generalizations regarding the P languages as no P language has A1-23-4 and A1-234 splits (Chamberlain 1975). However, there are counter examples for this, such as Tai Nüa (A1-23-4) and Tai Phake (A1-234), which made Chamberlain revise his classification later. It is still a thing to ponder over what made Chamberlain group Ahom under the *ABCD 123-4 group. Chamberlain’s classification has been provided below for reference in Figure 1. Figure 1: Chamberlain’s Classification of P and PH Languages Furthermore, the Shanic languages in particular have been sub-categorized into two major subgroups according to consonant development and tone splitting by Edmundson (2008). These subgroups are (a) the Northern Shan system with six tones: 1-4, A23=B4, B123, C123-C4 and (b) the Southern Shan system with five tones: A123-4, B123, B4=C123, and C4 (Diller et al 2008). A well-known Tai variety of the Northern Shan system is Tai Nüa. We can further categorize Ahom and the other living Tai languages of Assam and Arunachal Pradesh belonging to the Northern subgroup rather than the Southern one. 2 Phonology This section introduces the phonology of MSTA. It outlines the consonant and vowel inventories and briefly discusses the phonological features relevant to the proposed orthographic system. 2.1 Consonant Inventory The proposed conlang shall have the consonant (pulmonic) inventory as presented in Table 1. Table 1: MSTA Consonant Inventory Bilabial Labiodental Dental Alveolar Postalveolar Palatal Velar Glottal Plosive p (& ph) b t (& th) d n ɾ c k (& kh) ʔ Nasal m ɲ ŋ Tap or Flap Fricative s ʒ h Approximant j Lateral approximant l Note that there can be free variation between the alveolar tap/flap /ɾ/ and the post-alveolar approximant /ɹ/, that can occur sporadically depending on the speaker. Similarly, /ʒ/ and /j/ can also be in free variation. /kh/ may also be phonetically realised as /x/ by some speakers. 2.1.1 Reflexes of Proto Southwestern Tai in MSTA The reflexes of some important proto-Tai consonants (Li 1977) in our neo-Ahom have been defined below: *f, *v ph *hw, *w w (also in Aiton) *ɲ , *hɲ , *j , *ʔj j ~ ʒ (also in Aiton) *hr r, h ; *r r *ʔb b, m ; *ʔd d, n (also in Aiton) 2.2 Vowel Inventory Most of the Shan languages have a ten-vowel system except for Aiton with seven vowels which is the smallest vowel inventory amongst the Shanic languages. According to the most recent reconstruction of the Ahom sound system, a seven-vowel system has been proposed for Ahom as in Table 2 (Gogoi et al. 2020): Table 2: Reconstructed Tai Ahom Vowel Inventory Front Central-Back Back /i/ /ɯ/ /u/ /e/ /o/ /a/ & /a:/ This turns out to be quite similar to the reconstruction of Morey (2008) except for two vowels. For MSTA, I propose the same vowel system as in Table 3.1 without underrepresenting the phonemic contrast of mid vowels. Table 3.1: MSTA Vowel Inventory Front Central-Back Back /i/ /ɯ/ /u/ /e/ /o/ /ɛ/ /ɔ/ /a/ & /a:/ Note that the mid vowels /e/ and /o/ do occur in open syllables in Aiton. Raising of the mid vowels is a feature of Aiton and I apply this assumption to MSTA. Table 3.2 elucidates the vowel mergers in Aiton and Ahom. Table 3.2: Vowel Mergers in Aiton & Ahom Shan, Phake Aiton, Ahom i, e i u, o u ə ɯ It is interesting to observe that the “Tai” textbooks published for pedagogical purposes by some educational institutions in Assam, such as Dibrugarh University, use the mid vowel /ɤ~ə/, which is technically not present in Aiton and possibly not in Old Ahom. This may be because most of the teachers (and the people who compiled these textbooks) were Tai Phake. As Tai Phake has this mid vowel in its phonological inventory, they may have superimposed the Tai Phake vowel system onto Ahom for teaching purposes. However, since the merger of mid and high vowels is a known phenomenon in Ahom and Aiton, there is technically no utility in using this vowel in either closed or open syllables in Ahom. 2.2.1 Diphthongs The only diphthong in Ahom is [aɯ] (Gogoi et al. 2020). 2.3 Tones Tai languages, like many other Southeast and East Asian languages, are tonal in nature. The tones of the Ahom language, unfortunately, have been lost in the sands of time and are nearly impossible to reconstruct, largely because there are no existing recordings of living speakers. Currently, nothing is known about the tones of Ahom (Morey 2005). In my view, even if the well-documented tonal categories were available, it would still be impossible to reconstruct the original tone values that Ahom might have had. Therefore, looking for a feasible tentative solution to the current situation is the most practical approach. “The attempt to create a viable Ahom vernacular, one that is not too far removed from the original Ahom, would have a chance to succeed if the leaders of this movement were to select from among the many Tai languages the one that shares the largest number of characteristics with Ahom as it is found in old documents…. It would not come as a surprise to us if our most knowledgeable readers were to form the opinion that the Aiton language must be regarded as the closest living relative of old Ahom…Those who wish to recreate Ahom will have to make a decision as to which particular set of tones to adopt. In addition, we would recommend that such tones be added to the old writing system.” (Terwiel and Wichasin 1992) The first and the only attempt to reconstruct the Ahom tones was by Weidert in his article written in German titled “Die Rekonstruktion des Tonsystems des Ahom” (The Reconstruction of the Ahom Tone System) (Weidert 1997). In surveying the tones of individual syllables in Khamti and Central Thai, he attempted to guess the possible tones Ahom might have had for those syllables. However, Morey has criticized his reconstruction as invalid noting that Weidert fails to take into consideration the tonal systems of the existing neighboring sister languages in Assam. Moreover, Weidert’s methodology gives equal weightage to the three languages he took into consideration, one of which is Standard Thai. However, Standard Thai is not a part of the Shanic languages and has been separated from Ahom for a long time (Diller et al. 2008). Morey (2005) further indicates what the future attempts to reconstruct the original Ahom tone system might involve: a) Exploring foreign sources, particularly from China, which have data about the Ahom. It is likely that Chinese missions visited Assam over the centuries, and if they did, they may have collected some linguistic information about Ahom. Any recording of the Ahom language in Chinese would encode information about the Ahom tones. b) Comparing the tonal systems of the existing Tai languages of Assam and trying to explain how they came about. Existing tone systems in Assam are quite different from those in, for example, Shan state of Burma. This may be due to the influence of the Ahom.” At present, there are some available Baiyi glossaries from the Ming and Qing periods in the public domains. A preliminary observation gives the impression that the script used for writing these glossaries. In particular, the Ming era Baiyiguan Yiyu resembles the script style used by the Ahom in Assam. However, we should also keep in mind that the Ahom migrated to the Brahmaputra valley long before the Ming period Baiyiguan Yiyu glossaries were compiled. The second point regarding the tone systems of the Shanic languages in Assam being quite different from the ones in the Shan State of Burma is a point worth taking into serious consideration. 2.3.1 The Aiton tone system as a rootstock for Ahom Any knowledgeable reader, through observation of phonological, lexical, and morphosyntactic similarities, might advocate for the direct adoption of the contemporary Aiton tone system to the Ahom syllables. According to the latest literature available, Aiton currently has 3 (+2 merged) tones in its tonal system. For MSTA however, I have not chosen to directly adopt the contemporary Aiton tone system for the following reasons: a) Aiton has a three-tone system currently due to mergers that happened which might be the smallest tonal inventory among the Shanic languages. This may pose a challenge to new learners of MSTA by bombarding them with words having allotones and a plethora of homophones. This might not facilitate pedagogy. b) Adopting the contemporary Aiton tones as they would impose Aiton-specific historical tone mergers that might never have taken place in Ahom till it ceased to be spoken as an everyday language. c) The contemporary Aiton’s tone system most probably represents a post-Ahom merger, and we can better aim to set up a tone system for MSTA using broader comparative data of the various other Shanic languages, especially the existing Tai languages of Assam taking Aiton as a base and the primary point of reference. Figure 2 elucidates the present situation of tones in Aiton villages. We may infer the tone values roughly from this diagram: 1 [33]/ [44], 2 [553], 3 [31], 4 [35], 5 [31ʔ]. Figure 2: Tone Box of Contemporary Aiton (Morey 2005) We can assume that all these Northern Shan languages might have had a tripartite A1-23-4 which showed later a merger of A23 and A4 in the Assamese Tai varieties except for Khamyang. They all have had a B123-4 as bipartite split as well. If we are to talk about the innovations in Aiton, clearly Aiton’s A1=B123 is one. Let us now have a look at the previously constructed tone diagrams for Aiton one after another. At first, we can have a glance at the tone diagram for Aiton proposed by Banchob in 1977 (Morey 2005). We may infer the tone values roughly from this diagram: 1 [44]/ [55], 2 [553], 3 [13], 4 [31], 5 [11], 6 [35], 7 [553] (~tone 2). Figure 3: Aiton Tone Diagram of Banchob We can now have a look at Diller’s (1992) tone diagram for Aiton which Morey termed Aiton2 (Morey 2005) as in Figure 4. We may infer the tone values as 1 [11], 2 [31], 3 [553], 4 [13], 5 [31ʔ], 6 [35]. However, this tone diagram has been characterized as more conservative (Morey 2005). Figure 4: Aiton Tone Diagram of Diller (1992) Inferring from all the available data of tone diagrams of Aiton in the literature, we can further attempt to make a rough reconstruction of the proto-Aiton tone diagram as in Figure 5. If the Aiton have lived in the Assam valley since 1500s (Morey 2008), it is justifiable to make use of or build upon the proto-Aiton model (pre-merger) for MSTA. Figure 5: Proposed Tone Diagram for Proto-Aiton A B C D 1 [13]/[35] 3 [44]/[55] 4 [53ʔ] 3 [44]/[55] 2 [553] 6 [11] 5 [31ʔ] 6 [11] 2.3.2 Towards a Tone System for MSTA As we are aware, a framework of twenty boxes was devised to systematically study the tones in Tai languages as shown in Figure 6 (Gedney 1972). Figure 6: Gedney’s Tone Boxes A B C DL DS voiceless friction voiceless unaspirated preglottalized voiced I am not trying to make a completely historical reconstruction of the Ahom tone system or its precise tone shapes. The available evidence from manuscripts, inscriptions, numismatics, and oral traditions does not provide sufficient tonal information to allow such a reconstruction to happen. Comparative data from related Shanic languages, especially Aiton have been used here only as points of reference rather than the basis for reconstructing actual historical contours. The primary aim of this personal project is to develop a practical and internally consistent tonal framework for MSTA that can be applied for pedagogical and revitalization purposes. The present work, therefore, should be simply understood as a semi-engineered and functional model, rather than as an attempt to establish the historical tonal categories and tone shapes of Ahom. Here, it is good to keep in consideration that all the contemporary tone boxes of all the Northern and Western Shanic languages may have come from a single tone box in the 13th century when Sue-Ka-Pha entered the valley of Assam (Morey 2005). What is further empirically evident is that the tone box may have been similar to contemporary Tai Nüa (see Luo 1999; Chantanaroj 2007). Figure 7: 13th Century Tones in Proto-Assam/Dehong/Northern Burma Tai as proposed by Morey (2005) This prototypal tone box (as in Figure 7) further developed into two types of tone boxes in later developmental stages (around the 1600s). They have been named ‘Assam Tai A’ and ‘Assam Tai B’ (Morey 2005). Assam Tai B has a three-way split of the A tone, A1-23-4 which is seen in modern Tai Nüa as well. This type of three-way split is still seen in Khamyang of Assam. Considering the original Ahom population migrating somewhere from Meng Mao, a three-way split of A is also justifiable. Considering the proto-Aiton situation, an A1-234 bipartite split is also justifiable. Tone column A: In Khamti, Phake, Aiton, and Shan Ni (Indawgyi), we have a two-way A1-234 split while Khamyang and Myitkyina Shan Ni both have an A1-23-4 three-way split (Marseille 2019). Southern Shan however has a split of A123-4 (Lengtai 2009). This can be easily observable that A1-23-4 might have been the original split pattern and A1-234 in the modern varieties is an innovation. I have chosen an Aiton-like bipartite split for MSTA following the lines of the proposed proto-Aiton (see Figure 5). Consistency in B and C tone columns: If observed carefully, one may see the columns B and C (voiceless rows) are consistent (B1, B2, B3 merged and likewise for C1, C2, and C3) across all the Shanic languages. In almost all cases, B123 = DL123. In the Assamese Tai varieties viz. Khamyang, Aiton, and Phake, we can see a consistent correspondence, i.e., B123 = DS123 = DL123, i.e., B123 = D123. Here, one may also observe that Aiton A1 merged with B123 which can be seen as an innovation. Similarly for C, all the languages have C1, C2, and C3 merged with an observable pattern. There are some languages like Khamti of Chindwin and Aiton that have been reported to have merged C123 and C4 (Marseille 2019). However, the merger of C1, C2, and C3 in all the Shanic languages is consistent throughout. Following the above observations, I have merged B1, B2, and B3 (B123) for MSTA. Similarly, I have merged C1, C2, and C3 (C123). Dead Syllables: As mentioned previously, all the Assamese Tai varieties have B123 = D123. We may assume that Ahom might have had a similar merger configuration and is highly unlikely to have a DS123 and DL123 split by the time it ceased to be a spoken language. Hence, the B123 = DS123 = DL123 configuration has been assigned for MSTA. These observations have been taken into consideration and built upon it. The tone box for MSTA has a two-way split in the A column (A1-234) exhibiting two different tones. Moreover, B123 = D123, and C4 = D4. Figure 8 shows the outline tone box thus deduced from our observations. Figure 8: An Outline Tone Box for MSTA A B C D 1 1 2 3 2 2 5 3 4 6 4 2.3.3 Deciding the Tone Shapes After setting up the merger and split patterns for MSTA, choosing the tone shapes is the immediate task. The tone shapes have been decided in the following ways: Tone 1: A careful observation of the box A1 for most Shanic languages shows a rising inflexion of various qualities. A1 has been reconstructed as ‘rising’ in Morey’s (2005) proto-Assam/Dehong/Northern Burma Tai, as well as its two daughter systems named ‘Assam Tai A’ and ‘Assam Tai B’. This may have been a retention from the parent system. Inferring from these, it is not very incorrect to assume that Ahom might have had a rising tone of a certain quality for the A1 syllables likewise. Hence in MSTA as well, a rising tone of the Aiton shape [35] to the box A1 has been assigned. Tone 2: For the B column, we can observe mostly level, non-contour tones for B123 in various Shanic languages. For example, Shan Ni has a [33] (Marseille 2019), Tai Nüa and Southern Shan have [11]/ [21] (Lengtai 2009), Phake [55], and Aiton [33]/ [44] (Morey 2005). I have chosen the high-level tone for B123 in MSTA, i.e., [44] from the two available choices of Aiton, viz. [33] and [44]. Also, as B123 = D123, D123 also has the tone shape of [44]. Tone 3: The C tone originally is believed to have a creaky phonation in proto-Tai (Li 1977). The reflexes of the C tone are seen to be of various creaky types in modern Shanic languages. For example, Phake and Khamyang have [21ʔ], and Aiton has [31]/[31ʔ] (Morey 2005). While we have [21ʔ] and [41ʔ] for C123 and C4 respectively in Shan Ni (Marseille 2019), Southern Shan has [332] and [41] for C123 and C4 respectively (Lengtai 2009). Observing these, we can guess that Ahom might have had a similar reflex of the C tone. I have chosen a mid-falling tone without a glottalized ending for the sake of pedagogical ease. Thus, I chose the Aiton [31] for C123 in MSTA for its simpler tone shape. Tone 4: For the C4 = D4 varieties, Banchob’s (1977) Aiton has a [31] and Phake has a [42] for the same while Morey (2005) describes it as a [21]. Shan Ni realizes it as a high-falling glottalized one [41ʔ] (Marseille 2019) and Southern Shan realizes it as a high-falling one [41] (Lengtai 2009). As I have merged C4 with D4, a non-glottalized high-falling tone like [41] has been assigned to it to mimic most existing varieties while keeping a greater auditory distinction with the tone of A234. Tone 5: A4 has different tone shapes in Khamyang and Shan Ni (Myitkyina). A majority of the Shan group of languages currently exhibit an initial high tone (maybe long flat or slightly falling towards the end) for this box. In Aiton and Phake, A234 shows a high falling tone of the shape [553]. Dehong Dai of Luo’s (1999) dictionary has been mentioned to have a [53] tone for A4. The Dehong Dai Language coursebook by Cai et al. (2014) assigns a rising-falling [453] contour tone. Shan Ni has a high-falling [442] tone for the same (Marseille 2019). However, the variety (unspecified) in the Dai‑Chinese dictionary of Meng (2007) assigns [55] for A4 like Southern Shan. Maintaining faithfulness to the living Tai languages of Assam, A4 in MSTA has been assigned a high level‑falling tone [553]. As A23 has been merged with A4, hence A234 together has a [553] contour. Tone 6: B4 has the tone shape [11] like the B4 tone in our proposed proto-Aiton tone diagram in Figure 5. In summary, the tone values thus assigned are respectively: A1 = [35], A234 = [553], B123 = D123 = [44], B4 = [11], C123 = [31], and C4 = D4 = [41]. Figure 9.1 presents the six-tone system of MSTA and the tone shapes in each box. Figure 9.2 has been provided additionally to provide further word examples for each box. Figure 9.1: Proposed Tone Diagram for MSTA A B C DL DS 1 T1 [35] T2 [44] T3 [31] T2 [44] 2 T5[553] 3 4 T6 [11] T4 [41] Figure 9.2: MSTA Tone Box with Test Words A B C DS DL 1 T1 [35] [˧˥] rUV [ru˧˥] ear xaV [khaː˧˥] leg maV [maː˧˥] dog T2 [44] [˦˦] xNq [khaj˦˦] egg fa [phaː˦˦] to split m] [maɯ˦˦] new T3 [31] [˧˩] xwqS [khaw˧˩] rice siuwqS [sɯ˧˩] shirt xNqS [kh aj˧˩] fever T2 [44] [˦˦] sukq [suk˺˦˦] ripe, cooked mtq [mat˺˦˦] flea sipq [sip˺˦˦] ten T2 [44] [˦˦] x[tq [khaːt˺˦˦] to be broken fukq [phuk˺˦˦] to tie sokq [sɔk˺˦˦] elbow 2 T5 [553] [˥˥˧] pIC [pi˥˥˧] year taC [taː˥˥˧] eye kinqC [kin˥˥˧] to eat pa [paː˦˦] forest kNq [kaj˦˦] chicken pwq [paw˦˦] to blow paS [paː˧˩] aunt tumqS [tum˧˩] to boil kwqS [kaw˧˩] nine tpq [tap˺˦˦] liver cipq [cip˺˦˦] to hurt pitq [pit˺˦˦] duck potq [pɔt˺˦˦] lung :ptq [pɛt˺˦˦] eight p[kq [paːk˺˦˦] mouth 3 binqC [bin˥˥˧] to fly d[wqC [daːw˥˥˧] star b]C [baɯ˥˥˧] leaf ba [baː˦˦] shoulder b[wq [baːw˦˦] young man da [daː˦˦] to strike, scold b[nqS [baːn˧˩] village baS [baː˧˩] crazy AoNqS [ʔɔi˧˩] sugarcane dipq [dip˺˦˦] alive; raw Aukq [ʔuk˺˦˦] chest bitq [bit˺˦˦] fish hook :dtq [dɛt˺˦˦] sunshine A[pq [ʔaːp˺˦˦] to bathe blokq [blɔk˺˦˦] flower 4 miuwqC [mɯ˥˥˧] hand naC [naː˥˥˧] paddy field x[NqC [khaːj˥˥˧] buffalo T6 [11] [˩˩] pIL [pi˩˩] older sibling pUwqL [po˩˩] father n\qL [naŋ˩˩] to sit T4 [41] [˦˩] nmqZ [nam˦˩] water mNqZ [maj˦˩] wood maZ [maː˦˩] horse T4 [41] [˦˩] nukqZ [nuk˺˦˩] bird mtqZ [mat˺˦˩] to tie up lkqZ [lak˺˦˩] to steal T4 [41] [˦˩] mitqZ [mit˺˦˩] knife lukqZ [luk˺˦˩] child liutqZ [lɯt˺˦˩] blood 3 Orthographic reforms and additions All the old Lik-Tai script variants show ambiguity in spelling because of the underrepresentation of many vowel phonemes. A careful survey of some of the earliest written documents in Lik-Tai, such as the Baiyiguan Yiyu and the Mengmao Yiyu of the Ming and Qing periods (the 16th to 18th centuries), shows similar underrepresentation and orthographic inadequacies (Tangsiriwattanakul & Burhagohain 2024, 2025). Tai manuscripts found in Assam likewise do not systematically mark tones and Ahom is no exception (Morey 2005). There have been both individual and collective efforts to reform these orthographies; among the most successful examples are Standard Shan (Tai Long), Khamti Shan, Shan-Ni (Tai Laing), and Tai Le. 3.1 Reforms in Vowel Graphemes As evident from the various Lik-Tai manuscripts, the contrasts of /a/ vs /aː/, /i/ vs /e/ vs /ɛ/, /u/ vs /o/, /ɯ/ vs /ɤ/ in both closed and open syllables were underrepresented, often represented by one single vowel grapheme ( for /i/, /e/, and /ɛ/) or a combination of vowel graphemes (<ē> + for both /o/ and /ɔ/) creating ambiguity for readers. 3.1.1 Differentiating /a/ & /aː/ in closed syllables In all the Shanic languages, the only true phonetic vowel length distinction is between /a/ and /aː/ (Morey 2005). In all old Lik Tai manuscripts, this length distinction is critically underrepresented in closed syllables. In MSTA, I propose a shorter version of the existing grapheme “ a ” <ā>, which looks like “ [ ” to represent /aː/ in closed syllables. 3.1.2 Differentiating /i/, /e/, & /ɛ/ in closed syllables The unreformed Ahom script has no distinction for representing /i/, /e/, and /ɛ/ in closed syllables. All three vowels are uniformly represented by just using the vowel grapheme . This poses a problem for learners and users when differentiating and reading the syllables correctly. A simple strategy has been adopted to represent these underrepresented vowels in the reformed Ahom script without introducing any newly innovated vowel grapheme. For /e/, I have used the vowel grapheme <ē> whereas reduplicating it, i.e., <ēē> gives us a grapheme to represent /ɛ/. This solution should not appear unfamiliar to the people who are familiar with the vowel grapheme for the same in Tai Dam, Thai, Tai Lü, and Lao. Table 4 illustrates the representation of the vowels /aː/, /e/, and /ɛ/ in closed syllables as discussed above: Table 4: Usage examples of newly added/reformed vowel graphemes for closed syllables Word Example Vowel Old Ahom Reformed (MSTA) IPA Gloss /aː/ n\q n[\q naːŋA lady; female /e/ mitq emtq metDS grain; droplet; seed /ɛ/ pitq :ptq pɛtDL number ‘8’ Note that the word for “seed” should be technically /mitDS/ in Ahom. Therefore, /metDS/ has been used for demonstrative purposes only. 3.1.3 Differentiating /o/ vs /ɔ/, & /e/ vs /ɛ/ in open syllables The combination of vowel graphemes <ē> and <ā> (= <ō>) was used to represent both low and high mid vowels /ɔ/ and /o/ in open syllables. To do away with this, I am employing the vowel combination <ū> + for the high-mid vowel /o/ in closed syllables, for example, pUwq /po/ for ‘father.’ However, the original <ē> + <ā> combination has been kept to represent the low mid vowel /ɔ/. For example, eka /kɔ/ for ‘also.’ This way of keeping a clear distinction has been especially inspired by the revised Khamti script. Please refer to Kusalananda and Namnaeu (2013) for the same. Similarly, for the /e/ and /ɛ/ distinction in open syllables, <ē> and <ē> + have been assigned respectively, for example, ek /ke/ versus ek] /kɛ/. These small reformations will make the reformed script more phonetic, rather than strictly using fossilized etymological spellings. 3.2 Reforms in Consonant Graphemes The original consonant graphemes have been kept unmodified for transcribing native Ahom words. Provisions have been made for several additional glyphs to transcribe words from some neighboring Tibeto-Burman languages (especially Burmese), Indo-Aryan languages (such as Assamese, Pali, and Sanskrit), English and occasionally for the transcription of words from other Tai (especially Shanic) languages. Currently, only two graphemes have been added, namely T /ɵ/ and F /f/. 3.2.1 Reforms in final consonant forms The final consonant forms considered for reformation are two in number. These are /j/ and /m/ respectively. The old Ahom orthography represents these two using two special symbols, i.e., j /-aj/, and M/-am/. However, in MSTA, these two special symbols have been removed as redundant. The strategy for representing final -j has been inspired by modern Tai Nüa, which uses <ñ> as a final to represent /j/. For example, the word pj /paj/ will be written in MSTA as pNq , mj /maːj/ will be written as m[Nq . Similarly, for the final /m/, the consonant grapheme itself will do the task. For example, the word kM /kam/ will be written in MSTA as kmq , nM /naːm/ will be written as n[mq . 3.2.2 Clusters MSTA will follow a simple representation for consonant clusters. The segment /r/ is not considered a cluster-forming consonant by itself but can appear as a medial consonant inside a cluster which will use its usual grapheme as in old Ahom, which is “ P”. The possible clusters in Ahom might have been kl, ml, pl, bl, khr, phr, and kw (Strecker 2023). These can be represented as kl /kl/, ml /ml/, pl /pl/, bl /bl/, xP /khr/, fP /phr/, and kb /kw/. 3.3 Tone markers As mentioned before, the original Ahom orthography, like the other Shanic languages, did not systematically mark the tones. Therefore, it is imperative to assign some markers for the tones of the tone system thus proposed for MSTA. In the current system, the placement of the tone markers is the righthand side of a syllable. Table 5: Representing Tones in MSTA Orthography No. Toneme IPA Description Tone Grapheme Example with ka 1 35 ˧˥ mid-rising V kaV 2 11 ˩˩ extra low L kaL 3 31 ˧˩ mid-falling S kaS 4 41 ˦˩ high-falling Z kaZ 5 553 ˥˥˧ high level-falling C kaC 6 44 ˦˦ high level none ka It can be observed from Table 5 that the tone mark for A234 (Tone 5) is the visarga (း), which is used in Burmese to indicate a high/falling tone with lengthening. Moreover, Southern Shan and other Shanic languages also use the same for marking a similar tone in the A column. This is why the same tone mark has been assigned to A234 of MSTA. Tone marks for the first, second, and fourth tones are from the abandoned tone markers of the old Tai Le orthography before the use of tone letters was implemented. These three tone markers suit the aesthetics of the Ahom script well. The marker for Tone 3 has been inspired by the revised Khamti script (see Inglis 2017). 4 Discussion and Conclusion The Kra–Dai family is the newest recognized language family in the Indian Union, with Ahom being the first to establish a major presence in the region, followed by Aiton, Phake, Khamyang, Khamti, and Turung. Therefore, uplifting Ahom from its current dormant status will not only increase the overall linguistic diversity of India, but will also bolster the representation of the Kra–Dai language family within the country. Revitalizing Ahom will also help restore a long-lost but important member of the Southwestern Tai group to the modern world. According to UNESCO Ad Hoc Expert Group on Endangered Languages (2003): “Language diversity is essential to the human heritage. Each and every language embodies the unique cultural wisdom of a people. The loss of any language is thus a loss for all humanity.” Fishman (1996) emphasizes the impact of language loss on culture and its people as well as the need for language restoration and stabilization: “The most important relationship between language and culture that gets to the heart of what is lost when you lose a language is that most of the culture is in the language and is expressed in the language. Take it away from the culture, and you take away its greetings, its curses, its praises, its laws, its literature, its songs, its riddles, its proverbs, its cures, its wisdom, its prayers. The culture could not be expressed and handed on in any other way. What would be left? When you are talking about the language, most of what you are talking about is the culture. That is, you are losing all those things that essentially are the way of life, the way of thought, the way of valuing, and the human reality that you are talking about.” MSTA, as mentioned above, is a personal conlanging effort primarily intended to provide a tentative solution to facilitate the learning of the Ahom language as an ancestral heritage language. This article attempts to elaborate on the strategies used by the author to set up a tone system for Ahom and to introduce reforms to the existing orthography. It also argues why a constructed tone system can be a better choice than learning Ahom without tones. In general, MSTA attempts to show that language revival does not always require strict preservation of ancient forms, but it may also involve some degree of strategic innovation that maintains linguistic authenticity while promoting communicative and pedagogical efficiency. This ongoing work will also address vocabulary creation and standardization, as well as the development of pedagogical resources, so that MSTA remains useful for both scholarly and community-focused goals in Ahom language revitalization in the near future. I welcome constructive criticism and suggestions from scholars and community members to help refine and improve the present work. References Buragohain, Dipima. 2018. Tracing the “extinctness” of Tai Ahom: Issues of language loss and death. International Journal of the Sociology of Language 2018 (254):31–56. Berlin: De Gruyter. Cai, Rongnian, et al. 2014. 德宏傣语教程 Dehong Dai Yu Jiaocheng [Dehong Dai Language Coursebook]. Kunming: Yunnan University Press. Chamberlain, James R. 1975. A New Look at the History and Classification of the Tai Languages. In Studies in Tai Linguistics in Honor of William J. Gedney, 49–66. Bangkok: Central Institute of English Language, Office of State Universities. Chantanaroj, Apiradee. 2007. A Sociolinguistic Survey of Selected Tai Nüa Speech Varieties.  Chiang Mai: Payap University. Edmondson, Jerold A. 2008. Shan and Other Northern Tier Southeast Tai Languages of Myanmar and China: Themes and Variations. In Diller, Anthony V. N., Jerold A. Edmondson, and Yongxian Luo (eds.), The Tai–Kadai Languages,184–206. London: Routledge. Fishman, Joshua A. 1996. What Do You Lose When You Lose Your Language? In Stabilizing Indigenous Languages, ed. by Gina Cantoni, 71–81. Flagstaff: Northern Arizona University. Gedney, William J. 1972. A Checklist to Determine the Tones in Tai Dialects. In Studies in Linguistics in Honor of George L. Trager, ed. by M. Estellie Smith, 423–437. The Hague: Mouton. Gogoi, Poppy, Stephen Morey, and Pittayawat Pittayaporn. 2020. The Tai Ahom Sound System as Reflected by the Texts Recorded in the Bark Manuscripts. Journal of the Southeast Asian Linguistics Society 13.2:14–42. Inglis, Douglas. 2017. Myanmar-based Khamti Shan Orthography. Journal of the Southeast Asian Linguistics Society 10.1:47–61. Kusalananda, & Namnaeu. 2013. Khaam Taet Tai Khamti-Maan [Tai Khamti- Burmese Dictionary], Mueng Khamtilong, Kachin State, Myanmar. Li, Fang-Kuei. 1977. A Handbook of Comparative Tai. Oceanic Linguistics Special Publications No. 15. Honolulu: The University of Hawai’i Press. Luo, Yongxian. 1999. A Dictionary of Dehong, Southwest China. Canberra: Pacific Linguistics, Research School of Pacific and Asian Studies. Meng, Zunxian. 2007. 傣汉词典 ᥓᥣᥛᥰ ᥖᥨᥲ ᥑᥣᥛᥰ ᥖᥭᥰ ᥑᥥᥱ Dai-Han Cidian: Tsaam To Xaam Tai‑Xe [Dai-Chinese Dictionary]. Kunming: Yunnan Nationalities Publishing House. Morey, Stephen. 2005. The Tai Languages of Assam: A Grammar and Texts. Canberra: Pacific Linguistics. Morey, Stephen. 2005. Tonal Change in the Tai Languages of Northeast India. Linguistics of the Tibeto-Burman Area 28.2:1–34. Morey, Stephen. 2008. The Tai Languages of Assam. In Diller, Anthony V. N., Jerold A. Edmondson, and Yongxian Luo (eds.), The Tai–Kadai Languages, 207–253. London: Routledge. Strecker, David. 2023. Words Reconstructed by Fang Kuei Li with Proto-Tai Consonant Clusters: An Aid to Those Involved in the Revival of Ahom. Unpublished manuscript. Tangsiriwattanakul, Shinnakrit, & Burhagohain, M. 2024. The Phonological History of Tai Nüa: Implications from the Sino-Baiyi Manual of Translation. Journal of the Southeast Asian Linguistics Society 17.1:19–45. Tangsiriwattanakul, Shinnakrit, & Buragohain, Madhurjya. 2025. The Evolution of Tai Nüa Orthography: Evidence from the Meng Mao Manual of Translation. Oral presentation at the National Academic Conference on Thai Language Day 2025. Bangkok: Princess Maha Chakri Sirindhorn Anthropology Centre. Terwiel, B. J. 1989. Neo-Ahom and the Parable of the Prodigal Son. Bijdragen tot de taal-, land- en volkenkunde / Journal of the Humanities and Social Sciences of Southeast Asia 145.1:125–145. B. J. Terwiel, & Ranoo Wichasin. 1992. Tai Ahoms and Stars: Three Ritual Texts to Ward Off Danger. Ithaca, NY: Cornell Southeast Asia Program. UNESCO Ad Hoc Expert Group on Endangered Languages. 2003. Language Vitality and Endangerment. Document submitted to the International Expert Meeting on UNESCO Programme Safeguarding of Endangered Languages. Paris: UNESCO. Weidert, Alphons. 1979. Die Rekonstruktion des Tonsystems des Ahom [The Reconstruction of the Tone System of Ahom]. Zeitschrift der Deutschen Morgenländischen Gesellschaft 129:321–334. A SKETCH GRAMMAR OF BLACK LOLO Elaine R. KHARBANDA Columbia University elainekharbanda@gmail.com Abstract To the author’s knowledge, this article presents the first sketch grammar of Black Lolo, based on the author’s original fieldwork conducted with a small number of speakers. The study provides an initial description of the language’s phonology, syntax, and other core grammatical features. This analysis is explicitly preliminary in nature, and further fieldwork and data are required to confirm and refine the descriptions proposed here. In particular, certain phonetic and phonological features, such as patterns of glottalization and breathiness, remain tentative and are identified as areas for future investigation. Keywords: sketch grammar, phonology, syntax ISO 639-3 codes: nty 1 Introduction This research focuses on the language of the Black Lolo peoples, a subset of the Lolo ethnic group. As of 2019, the Lolo people in Vietnam made up 0.01% of the population, or just over 4,800 people (Vietnam General Statistics Office 2020). This total includes all four known sub-groups of the Lolo people, including the Black Lolo, Red Lolo, White Lolo, and Flowery Lolo. While prior research has been published on the languages of the Red Lolo and White Lolo peoples, Black Lolo “remain[s] to be studied” (Edmonson 2003). As such, this paper presents the first formal documentation of the variety of Black Lolo spoken in Lũng Cú village, Tuyên Quang Province, Việt Nam.1 Lũng Cú is the northernmost village in Vietnam and is situated on the border with China, specifically the Yunnan province. Black Lolo is one of Vietnam’s many endangered languages and is estimated to have fewer than 1,000 remaining speakers. Additionally, it is an oral language without a writing system, so it is vital to record information on the language to preserve the language’s health. As of November 2023, the language was one of 37 endangered languages in Vietnam, classified as a “moderately endangered” oral language by the Vietnam Academy of Social Sciences (Vietnam Academy of Social Sciences, 2023). Black Lolo is an analytic language that is part of the Lolo-Burmese subgroup of the Tibeto-Burman language family. It likely falls under the Mondzish branch of Lolo-Burmese languages, proposed by Ziwo Lama (2012) and further researched by Hsiu (2014). Black Lolo’s endonym is [mu31 ⁿdʑi31 nɔ̃33], but many Black Lolo people refer to themselves using the Vietnamese term Lô Lô Đen ‘Black Lolo’. Other Loloish languages are spoken in Northern Vietnam and Southern China, many of which are also endangered and sparsely researched. The majority of Black Lolo speakers are at least bilingual in Vietnamese, and many are also multilingual and speak neighboring minority languages such as Hmong. Black Lolo speakers tend to use Black Lolo if they are talking amongst themselves, while Vietnamese is the lingua franca with speakers of other languages. The following sketch is based on field research and elicitation sessions primarily with three members of the Black Lolo community. Only one Black Lolo individual was known to be multilingual in English, so English was used as the contact language for sessions with this informant, and the majority of the research is based on those sessions. Vietnamese was used as the contact language with the other informants. The research was conducted in Lô Lô Chải in Lũng Cú commune during a three-week field trip from November 19th, 2023 to December 6th, 2023.2 The primary literature consulted for this research was existing research on Tibeto-Burman languages and grammars for related Lolo-Burmese languages. To the author’s knowledge, no other linguistic research has been conducted on the Black Lolo variety of the Loloish languages spoken in Northern Vietnam or Lũng Cú. Edmondson’s 2003 paper entitled “Three Tibeto-Burman Languages of Vietnam” provides a very helpful but three-page sketch of Red Lolo phonology; however, there are some differences from Black Lolo. For example, Edmondson notes, “the Flowery Lolo of Hà Giang reside in Xín Cái (Mèo Vạc) and at Lũng Cú (Đồng Văn) within 5 km of the Sino-Vietnam border. The Black Lolo live in the villages Hoáng Tri and Đức Hạnh (Bảo Lạc) not far from the Bảo Lạc District capital on the China border.” As of this research in 2023, 20 years after Edmondson’s publication, the Black Lolo people live in Lũng Cú and no Red Lolo were known to live there. Although Edmondson classifies Red Lolo and Black Lolo as belonging to two mutually unintelligible linguistic groups, the phonological system and portions of the lexicon elicited for Black Lolo show notable similarities to those described in Edmondson’s sketch of Red Lolo spoken in Mèo Vạc (Edmondson 2003). These similarities suggest a close genetic relationship between the two varieties, but they do not in themselves provide evidence regarding mutual intelligibility. Perhaps the closest and most relevant research to the Lũng Cú variety of Black Lolo is that of Hsiu’s 2014 papers on the Mondzish branch of Lolo-Burmese languages. The Mondzish subgroup was originally proposed by Ziwo Lama in 2012 and includes the documented languages of Mandzi, Maang (Lama 2012), Mantsi (Edmondson 2003), Kathu (Wu Zili 1994), Munji, and undocumented Mondzish languages such as Muangphe, Mango, and Maza. Mondzish languages are primarily spoken in Wenshan Prefecture, Yunnan Province, southwestern China, which, as mentioned, borders Lũng Cú. These languages remain underdocumented, with most existing work limited to basic wordlists (Hsiu, 2014). However, there appear to be many shared lexical items between Black Lolo and Hsiu’s April 2013 audio recordings of the Munji of Hetaowan, Yongli village, Donggan Township, Malipo County, Yunnan, China. 2 Phonology Black Lolo is a monosyllabic language. Monosyllables in Black Lolo have a (C)V(N) syllable structure and may include prenasalized initials. Some transcriptions include syllable-final glottal stops or [h], and this should be further researched. 2.1 Consonants and vowels The Black Lolo system of initial consonants has a large number of distinctions of manner of articulation, as presented in Figure 1. Black Lolo consonant phonemes include aspirated and unaspirated distinctions, fricatives and affricates, retroflexes, prenasalized consonants, nasals, and labio‑velarization. Final consonants are restricted to the four nasals in Figure 1. However, some transcriptions in my field data suggest that glottal stops or [h] may appear syllable-finally. A final glottal stop adds a creaky quality to the vowel and a checked tone. A final [h] adds a breathy quality to the vowel, and the vowel is sometimes elongated. It is unclear whether these non-nasal consonant finals are present word-finally or if they have mostly been eroded. Figure 1: Consonants p t tɕ ʈ tʂ k ʔ q b ts dʑ ɖ dʐ ɡ ph th tɕh tʂh kh qh nʈh nt nts ntɕ ʈh ntʂ nqw f s sh ɕ ʂ x h v z ʑ ʐ ɣ m n ɲ ŋ l j ɭ The vowel system in Black Lolo includes nasalized vowels and diphthongs. As noted, I observed in the data a creaky or breathy quality on certain vowels in certain environments, but this should be further researched. Figure 2: Vowels i ɯ u e o ẽ õ ɛ ɔ ɔ̃ ɑ ɑ̃ iɑ ɯi iɑ̃ 2.2 Tones Black Lolo seems to have 6 tones, outlined in Table 1. For the purposes of this paper, the tones are illustrated using the scale of 1 to 5 with 1 being the lowest pitch and 5 being the highest pitch. Table 1: Tones in Black Lolo Name Tone Value High Flat 44 Neutral 33 Falling 31 Falling-Rising 314 High Rising 35 Rising 24 3 Syntax This section will first review word order in clauses and phrases, followed by other grammatical structures that were identified in this research including comparative and superlative constructions, and verbs including tense, aspect, mode, and serial constructions. 3.1 Clause and Phrase Structure Like many other Tibeto-Burman languages, Black Lolo has a Subject-Object-Verb word order, as in sentence 1. 1. ŋɔ31 mu44 zɔ31 1SG.NOM rice eat ‘I eat rice.’ In sentence 2, the indirect object comes after the direct object but before the verb. 2. ŋɔ31 mu44 nã314 pɛ31 1SG.NOM rice 2SG.NOM give ‘I give you the rice.’ Adjectives usually follow the noun. At times, noun phrases have flexible word order. In grammaticality judgment testing, sentences 3, 4, and 5 were all deemed acceptable. Quantity expressions appear variously before and after the head noun, and in post-nominal position, the modifier and quantity term also occurred both before and after each other. 3. ŋɔ31 vo44 lai44 kʰɯ44 ɔ33 nɛ̃314 1SG.NOM pig dirty six NUM have ‘I have six dirty pigs.’ 4. ŋɔ31 kʰɯ44 ɔ33 vo44 lai44 nɛ̃314 1SG.NOM six NUM pig dirty have ‘I have six dirty pigs.’ 5. ŋɔ31 vo44 kʰɯ44 ɔ33 lai44 nɛ̃314 1SG.NOM pig six NUM dirty have ‘I have six dirty pigs.’ Sentences 6 and 7 show more complex sentence structures and illustrate concepts that will be covered throughout the sketch including genitive pronouns, serial verb constructions, andative and venitive constructions with Lũng Cú as the geographic center. 6. ŋa31 pɔʔ31 jiu44 ɖɔŋ31 van33 ti44 tʂe31 ɡuai44 luŋ314 ku24 lɔ31 1SG.GEN friend Đồng Văn from    motorbike drive Lũng Cú return  ‘My friend took a motorcycle from Đồng Văn to Lũng Cú.’ 7. ŋa31 pɔʔ31 jiu44 luŋ314 ku24 ti44 tʂe31 ɡuai44 ɖɔŋ31 van33 i24 1SG.GEN friend Lũng Cú from motorbike drive Đồng Văn go ‘My friend took a motorcycle from Lũng Cú to Đồng Văn.’ 3.2 Comparative and Superlative Structures In addition to possession, the genitive pronoun is used for the comparative complement in comparative constructions, as in examples 8 and 9. The standard pronoun is used for the subject in comparative and superlative cases, as in examples 10 through 11. 8. ŋɔ31 lã44 ɲi31 ⁿʈʰõⁿ314 ɖu24 1SG.NOM COP 2SG.GEN COMP old ‘I am older than you.’ 9. nã314 lã44 ŋa31 ⁿʈʰõⁿ314 ɖu24 2SG.NOM COP 1SG.GEN COMP old ‘You are older than me.’ 10. luŋ314 ku24 ŋɔ31 lã44 d͡ʑui314 ɖu24 Lũng Cú 1SG.NOM COP SUP old ‘I am the oldest person in Lũng Cú.’ 11. nã314 lã44 d͡ʑui314 ɭo31 2SG.NOM COP SUP beautiful ‘You are the most beautiful person.’ 3.3 Verbs and Tense, Aspect, and Mode The following verb markings have been observed in Black Lolo. Particles indicating tense occur before the main verb, as in examples 13 and 14, while the aspectual and modal markers follow the main verb, as in examples 15 through 18. Many aspects and modes can be used as verbs on their own but have been grammaticalized into aspect and mode markers. Serial verb construction is common in Black Lolo (as discussed in Section 3.4). Table 2: Tense, Aspect, and Mode Markings in Black Lolo Tense/Aspect/Mode Black Lolo Placement Present Indicative Ø V Hodiernal Tense ʈʰɔ44 di44 HOD + V Future Tense d͡ʑun44 pɛ31 FUT + V Continuous Aspect ʈɔ44 V + CONT Perfective Aspect sɔ̃44 V + PFV Volitional Mode sɑ̃͡i44 V + VOL Present Tense 12. ŋɔ31 mu44 zɔ31 1SG.NOM rice eat ‘I eat rice.’ Hodiernal Future Tense 13. ŋɔ31 mu44 ma31 zɔ31 nɛ̃314 ʈʰɔ44 di44 xĩ31 tɛ44 zɔ31 1SG.NOM rice NEG eat have HOD seven o’clock eat ‘I haven’t eaten yet but I will eat at 7pm.’ Future Tense 14. ŋɔ31 d͡ʑun44 pɛ31 mu44 zɔ31 1SG.NOM FUT rice eat ‘I will eat rice.’ Continuous Aspect 15. ŋɔ31 mu44 zɔ31 ʈɔ44 1SG.NOM rice eat CONT ‘I am eating rice.’ Perfective Aspect 16. ŋɔ31 mu44 zɔ31 sɔ̃44 1SG.NOM rice eat PFV ‘I ate rice.’ The perfective aspect is also used to indicate completion. 17. ŋɔ31 ŋa31 tɕiu33 ɣɛʔ31 mu314 sɔ̃44 1SG.NOM 1SG.GEN book do PFV ‘I finished doing my homework.’ Volitional Mode 18. ŋɔ31 ka31 fe33 ⁿʈaᵑ314 sã͡i44 1SG.NOM coffee drink VOL ‘I want to drink coffee.’ The volitional mode marker can also be used as a standalone verb, taking the meaning of ‘to like’ or ‘to love’. This means the verb has likely been grammaticalized into the volitional mode as shown in the sentence above. Sentence 19 shows the same verb on a standalone basis. 19. ŋɔ31 tɕioŋ44 ta24 nã314 sãi44 1SG.NOM always 2SG.NOM love ‘You were always loved by me.’ Imperative Mood The imperative mood is unmarked. 20. mɔ44 lã314 ŋi35 i24 mom COM market go ‘Go to the market with mom.’ 3.4 Specific Verbs and Serial Constructions The following verbs were very common in the data I collected, and the use-cases that were elicited are explained in the examples. Research should be done to further determine the use and rules of the verbs in this section. The following preliminary findings aim to provide a baseline for some ways in which the verbs may be used. [lɑ̃44] ‘to be’ The verb [lɑ̃44] may be a copula. It is optionally used in simple predicate nominative sentences, as shown in the example below. 21. ŋɔ31 (lɑ̃44) ɕi͡o314 sɑ̃͡iⁿ35 1SG.NOM COP student ‘I am a student.’ In locative clauses, [lɑ̃44] is used in a structure along with [nɛ̃314], which means ‘to have’, and will be explained in further detail. 22. ŋɔ31 lã44 vo44 qãⁿ44 nɛ̃314 1SG.NOM COP pig around have ‘I am next to the pig.’ [nɛ̃314] ‘to have’ The verb [nɛ̃314] on its own means ‘to have’, but it is also used in various other structures. A simple possession sentence is shown in 23. 23. ŋɔ31 ka31 fe33 nɛ̃314 1SG.NOM coffee have ‘I have a cup of coffee.’ In some phrases [nɛ̃314] can be used to indicate location, such as in 24 and 25. 24. t͡ɕi͡ɑᵑ314 t͡ɕɯ24 ka31 fe33 nɛ̃314 cup in coffee have ‘In the cup there is coffee.’ 25. ŋɔ31 ŋa31 pã44 nɛ̃314 1SG.NOM 1SG.GEN house have ‘I am at my house.’ In negation sentences that have the adverb ‘yet’, [nɛ̃314] is used following the main verb of the sentence, as shown in example 26. 26. ŋɔ31 mu44 ma31 zɔ31 nɛ̃314 1SG.NOM rice NEG eat have ‘I didn't eat yet.’ [pɛ31] ‘to give’ The verb [pɛ31] means ‘to give’. 27. ŋɔ31 mu44 nã314 pɛ31 1SG.NOM rice 2SG.NOM give ‘I give you the rice.’ 28. ɡa31 ɲi44 ŋɔ31 d͡ʑun44 pɛ31 ɣɔ44 tu͡i44 vɯ͡i33 nã314 pɛ31 tomorrow 1SG.NOM FUT egg buy 2SG.NOM give ‘Tomorrow I will buy eggs to give to you.’ In some sentences, following the direct object, there is the particle [ⁿtʰɑ44] meaning ‘to take’. 29. ŋɔ31 ɣɛ24 ⁿtʰa44 nã314 pɛ31 1SG.NOM water take 2SG.NOM give ‘I give you the water.’ [i24] ‘to go’ The verb [i24] on its own means ‘to go’, as in examples 30 and 31. Note in 31 the future tense marker is not necessary, because the future is implied with the use of [ɡɑ31 ɲi44]. 30. ŋi35 i24 market go ‘Go to the market.’ 31. ɡa31 ɲi44 ⁿtʰa31 ɖɔŋ31 van33 i24 tomorrow 1DL Đồng Văn go ‘Tomorrow we go to Đồng Văn town.’ In sentences 32 and 33, [i24] serves as an andative particle following the main verb at the end of the sentence to describe the motion. 32. ⁿtʰa31 sã͡ɪᵑ ŋi35 i24 1DL run market AND ‘We run to the market.’ 33. ŋa31 pu44 ɲi33 fɯl35 ma31 ɕi͡u44 i24 1SG.GEN grandfather often NEG walk AND ‘My grandfather doesn't walk often.’ [mu314] ‘to do’ or ‘to make’ This verb means ‘to do’ or ‘to make’. In Black Lolo, this verb is commonly realized as ‘to cook,’ as shown in example 34. Examples 35 and 36 show additional uses of the verb. 34. mu44 mu314 rice cook ‘Cook rice.’ 35. ʈe31 mu314 electricity do ‘Fix electricity.’ 36. ka31 fe33 mu314 coffee do ‘Make coffee.’ 4 Special Lexical Classes This section will review negation, pronouns, postpositions, numerals, the intentionality particle, geographical center, and time words that were elicited in the field research. 4.1 Negation There are two ways to negate sentences in Black Lolo, depending on the type of sentence. The first type of negation can be considered the standard negation, which is inserting the negation particle [mɑ31]. This is used to negate the predicate adjective of a subject-complement sentence, and the negation particle [mɑ31] is inserted in front of the predicate adjective. In these sorts of negative subject-complement sentences, according to grammaticality judgements it is better to not include the copular [lɑ̃44]. 37. d͡ʑa31 ma31 ɭo31 3SG.NOM NEG beautiful ‘He is not beautiful.’ Another scenario in which the standard negation construction is used is in sentences with intransitive and transitive verbs. In these scenarios, the negation particle [mɑ31] is inserted before the main verb of the sentence. 38. ŋɔ31 ma31 sã͡iᵑ44 1SG.NOM NEG dance ‘I don't dance.’ 39. ɡa31 ɲi44 ŋɔ31 ɲi31 lã314 ŋi35 ma31 i24 tomorrow 1SG.NOM 2SG.GEN COM market NEG go ‘I am not going to the market with you tomorrow.’ There is a different structure for the negation of subject-complement sentences with predicate nominatives. When negating predicate nominatives, the particle [mɑ31 ŋɑ͡i44] is added to the end of the sentence, following the predicate nominative as in 40. 40. ŋɔ31 la44 si24 ma31 ŋa͡i44 1SG.NOM teacher NEG ‘I am not a teacher.’ Additionally, one way to ask, ‘How are you?’ in Black Lolo uses the A-NEG-A question format, as in 41. 41. nã314 i44 ma31 i44 2SG.NOM good NEG good ‘How are you?’ 4.2 Pronouns Like Vietnamese, Black Lolo has two systems of personal reference: a personal pronoun system and a kinship pronoun system. Black Lolo kinship terminology is expressed in Table 3. Table 3: Kinship Terminology in Black Lolo Gloss Vietnamese3 Black Lolo grandfather ông pu44 grandmother bà pe44 dad bố po31 mom mẹ mɔ44 older brother anh vɯi314 aunt cô ke33 mɔ44 uncle chú ke33 po31 younger person em lɯ33 In addition to kinship terminology, Black Lolo utilizes a complex personal pronominal system with standard and genitive forms. The personal pronouns are shown in Table 4. Table 4: Personal Pronouns in Black Lolo Standard Pronoun Genitive Pronoun 1SG ŋɔ31 ŋɑ31 2 SG nɑ̃314 ɲi31 3 SG d͡ʑɑ31 d͡ʑɑ31 1DL ⁿtʰɑ31 Ø 1PL ɑʔ31 d͡ʒi31 ŋɑ31 non1PL ɔ44 pʰi31 ⁿtʰi44 The pronominal system in Black Lolo makes a distinction between the first-person dual and first-person plural. Another common way to say the first-person dual is [ŋɔ31 ɲɛ44 nɑ̃314] which translates literally to ‘I and you’. A genitive pronoun for the first-person dual pronoun has not been elicited. Black Lolo does not distinguish between the second-person and third-person plural pronouns; instead, the language uses one pronoun for all non-first-person plural pronouns. The standard pronouns are shown in examples 42 through 48. First-Person Singular 42. ŋɔ31 i44 1SG.NOM good ‘I am good.’ Second-Person Singular 43. nã314 i44 2SG.NOM good ‘You are good.’ Third-Person Singular 44. d͡ʑa31 (lã44) ɕi͡o314 sã͡ɪⁿ35 3SG.NOM COP student ‘S/he is a student.’ First-Person Dual Inclusive 45. ɡa31 ɲi44 ⁿtʰa31 ɖɔŋ31 van33 i24 tomorrow 1DL Đồng Văn go ‘Tomorrow we [you and I] go to Đồng Văn.’ First-Person Plural 46. aʔ31 d͡ʒi31 i44 1PL good ‘We are good.’ Non-First-Person Plural 47. ɔ44 pʰi31 i44 non-1PL good ‘You all are good.’ 48. ɔ44 pʰi31 i44 Non-1PL good ‘They are good.’ 4.3 Postpositions Black Lolo utilizes postpositional phrases, as shown in Table 5. Sentences 49 through 51 show how postpositions are used in simple sentences. Table 5: Postpositions in Black Lolo Gloss Black Lolo around qɑ̃ⁿ44 at ti44 bɑ35 by/with (instrumental) pɛ31 from ti44 in t͡ɕɯ24 on ⁿʈɑ̃ᵑ314 49. ŋɔ31 lɑᵑ31 qãⁿ44 ɕi͡u44 1SG.NOM lake around walk ‘I walk around the lake.’ 50. ŋɔ31 ʈʰã44 pɛ31 ɯ24 ɣi44 1SG.NOM knife by meat cut ‘I cut the meat using a knife.’ There is also potential flexible word order with postpositional phrases. Sentence 51 shows the postpositional phrase at the start of the sentence; however, grammaticality judgment tests revealed that [si44 t͡ɕi35] ‘orange’ could also appear at the start of the sentence. 51. pʰa44 ⁿʈɑ̃ᵑ314 si44 t͡ɕi35 nɛ̃314 plate on orange have ‘On the plate there is an orange.’ 4.4 Numerals Cardinal and ordinal numerals are shown in Table 6. Different words were only elicited for the cardinals and ordinals ‘one’ and ‘two’. Table 6: Numerals in Black Lolo Gloss Cardinal Ordinal one tʰɑ31 i44 tʰɯ31 two ɲi44 ɣɔ44 ⁿʈʰɔ44 three sɔm44 sɔm44 four lɑ31 lɑ31 five ŋo33 ŋo33 six kɯ44 kɯ44 seven xĩ31 xĩ31 eight si35 si35 nine kɔ33 kɔ33 ten si314 si314 one hundred tʰɑ31 ɕiɔ35 tʰɑ31 ɕiɔ35 52. mu44 ɲi44 lã44 tʰi44 lɔ44 i44 tʰɯ31 nɛ̃314 today COP month one.ORD have ‘Today is the first day of the month.’ 4.5 Intentionality Particle The particle [kɑ44] is added to indicate that the individual went to the market in order to go shopping. Only this one instance of the use of an intentionality particle was elicited. 53. ti44 ʐa44 ŋɔ31 ŋi35 ka44 i24 sɔ̃44 morning 1SG.NOM market INTV go PFV ‘This morning I went to the market [with the intent to go shopping].’ 54. ti44 ʐa44 ŋɔ31 ŋi35 i24 sɔ̃44 morning 1SG.NOM market go PFV ‘This morning I went to the market [with no intention of shopping].’ 4.6 Lũng Cú as Geographical Center Because Black Lolo is spoken in the Lolo village, different structures are used in certain ways when referring to relative location when going or coming from somewhere. Example 55 uses the ‘return home’ [ti44…lɔ31] structure because the friend is leaving Đồng Văn town and returning to Lũng Cú. 55. ŋa31 pɔʔ31 jiu44 ɖɔŋ31 van33 ti44 tʂe31 guai44 luŋ314 ku24 lɔ31 1SG.GEN friend Đồng Văn from motorcycle drive Lũng Cú return ‘My friend took a motorcycle from Đồng Văn to Lũng Cú.’ Example 56 uses the verb [i24] because the speaker’s friend left Lũng Cú to go (away) to Đồng Văn. 56. ŋa31 pɔʔ31 jiu44 luŋ314 ku24 ti44 tʂe31 guai44 ɖɔŋ31 van33 i24 1SG.GEN friend Lũng Cú from motorcycle drive Đồng Văn go ‘My friend took a motorcycle from Lũng Cú to Đồng Văn.’ 4.7 Time Words Time words in Black Lolo are shown in Table 7. They often occur at the start of the sentence. The word for ‘day’ appears to be [ɲi44], but this should be confirmed. Table 7: Time words in Black Lolo Gloss Black Lolo Yesterday ɲi44 nɛ31 Today mu44 ɲi44 Tomorrow ɡɑ31 ɲi44 When a specific day is referenced, tense, aspect, and mode markers can be optional. In a grammaticality judgment test, it was determined that saying both [mu44 ɲi44] ‘today’ and using the hodiernal tense marker [ʈʰɔ44 di44] in the same sentence would be ungrammatical or abnormal. Similarly, in the sentence below, the perfective aspect marker [sɔ̃44] is not included. 57. ɲi44 nɛ31 ŋɔ31 tɕiu33 ɣɛʔ31 mu314 yesterday 1SG.NOM book do ‘Yesterday I finished doing my homework.’ The words [ɑ31 tʰoⁿ44] are used when an adverbial clause is used to express time. There are different constructions depending on whether the time referred to was in the past or is in the future. If the sentence refers to an event in the past, [ɑ31 tʰoⁿ44] is placed at the start of the adverbial clause, as shown in 58. 58. a31 tʰoⁿ44 ŋɔ31 jiu35 ŋɔ31 la44 si24 mu314 sã͡i44 when 1SG.NOM young 1SG.NOM teacher do VOL ‘When I was a child, I wanted to be a teacher.’ If the time is in the future, [ɑ31 tʰoⁿ44] is placed at the end of the adverbial clause, as in 59. 59. ŋɔ31 ɖu24 a31 tʰoⁿ44 ŋɔ31 la44 si24 mu314 sã͡i44 1SG.NOM old when 1SG.NOM teacher do VOL ‘When I grow up, I want to be a teacher.’ 5 Loanwords from Chinese Though other Loloish languages are spoken on China’s side of the border in Yunnan County, Black Lolo also appears to ha ve some Chinese loanwords, possibly from Southwest Mandarin, which is the main variety of Chinese in Yunnan. Varieties of Loloish languages, such as Yi, are spoken in China’s Yunnan province and bordering areas of Vietnam with large overlaps (Edmondson 2001). Table 8 shows the lexical items in Black Lolo that are hypothesized to be Mandarin loanwords. The borrowings have undergone phonological changes such as the loss of some final nasals. Table 8: Related Lexical Items in Mandarin and Black Lolo Gloss Mandarin Pinyin Black Lolo prepare (mandarin) will / future tense (black lolo) 准备 zhǔn bèi d͡ʑun44 pɛ31 student 学生 xuéshēng ɕi͡o314 sɑ̃͡iⁿ35 school 学校 xuéxiào ɕi͡u31 ɕɔ31 cow milk 牛奶 niúnǎi ni͡u31 nɑ͡i44 most / comparative 最 zuì d͡ʑui314 teacher 老师 lǎoshī lɑ44 si24 friend 朋友 péngyǒu pɔʔ31 jiu44 References Edmondson, Jerold A. 2003. Three Tibeto-Burman languages of Vietnam. In Language variation: Papers on variation and change in the Sinosphere and in the Indosphere in honour of James A. Matisoff, edited by David Bradley, Randy LaPolla, Boyd Michailovsky & Graham Thurgood, 305–333. Pacific Linguistics PL-555. Canberra: The Australian National University. Edmondson, Jerold A., & Kenneth J. Gregerson. 2001. Four languages of the Vietnam–China borderlands. In Papers from the Sixth Annual Meeting of the Southeast Asian Linguistics Society, edited by Karen L. Adams and Thomas John Hudak. Tempe, Arizona, 101–133. Arizona State University, Program for Southeast Asian Studies. Hsiu, Andrew. 2014. Mondzish: A new subgroup of Lolo-Burmese. In Proceedings of the 14th International Symposium on Chinese Languages and Linguistics (IsCLL-14). Taipei: Institute of Linguistics, Academia Sinica. Lama, Ziwo Qiu-Fuyuan. 2012. Subgrouping Of Nisoic (Yi) Languages: A Study from the Perspectives of Shared Innovation and Phylogenetic Estimation. Ph.D. dissertation. Arlington, TX: University of Texas, Arlington. Vietnam Academy of Social Sciences. 2023. Various notes. Vietnam General Statistics Office. 2020. Kết Quả Toàn Bộ Tổng Điều Tra Dân Số Và Nhà Ở Năm 2019 [Completed Results of the 2019 population and housing census]. Hà Nội: Nhà xuất bản Thống kê. https://urldefense.com/v3/__https://www.nso.gov.vn/du-lieu-va-so-lieu-thong-ke/2020/11/ket-qua-toan-bo-tong-dieu-tra-dan-so-va-nha-o-nam-2019/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrmrxuKNa$ A PRELIMINARY PHONEMIC ANALYSIS OF CHUYO Mijke MULDER Payap University mijke.mulder@gmail.com Abstract This paper presents a first phonemic analysis of Chuyo (Myanmar, Sino-Tibetan), including an inventory of consonants, vowels and tones and a description of its syllable structure. To my knowledge, there is no published linguistic work on this language to this date, and my hope is that the current paper can serve as a starting point for further research into the language. Keywords: Sino-Tibetan, Northern Naga, Chuyo, phonology ISO 639-3 codes: nst 1 Introduction The data on which this phonemic study is based was collected over two visits to the Chuyo community by the linguist Hoipo Myers in May and November of 2023. The collected data consist of a recorded 1,624-item wordlist, each item repeated three times, and 132 words recorded in a container phrase of the type ‘I said ___’, each phrase repeated twice. All data was collected from one male speaker, 60 years old at the time of recording, who was born in Chuyo Noknyu village and grew up and lived there for 46 years. His parents and spouse were also born in Chuyo Noknyu village. Other languages spoken by this speaker are Konyak, Nagamese, and Burmese. The language of communication used between Hoipo Myers and the Chuyo speaker was Nagamese. Two other Chuyo speakers helped clarify and check the wordlist entries but were not recorded themselves: a 51-year-old male from Chuyo Noknyu village and a 33-year-old male from Chuyo Longpa village. Besides that, the following activities were carried out with a group of Chuyo speakers: dialect mapping based on self-report, sorting of words to determine tone categories, and eliciting minimal pair data (‘minimal pair fishing’). The analysis of the data was a collaborative effort of Hoipo Myers and the author of this paper. Hoipo Myers has reported back to the community members on the results of the study, who intend to use these results to make informed decisions about orthography development, and the current author took on writing up this report for the target audience of linguists. 2 Background1 The Chuyo people reside in Nanyun Township, Sagaing Region, Myanmar. The following six villages have been identified by community members as predominantly Chuyo (Hoipo Myers, p.c.): Chuyo Noknyu, Chuyo Longkei, Chuyo Longpa, Chuyo Shahyup, Chuyo Papong, and Chuyo Yorong. The first four villages form a cluster and are located near the area in Myanmar where we find speakers of Konyak, close to the border of the Indian state of Arunachal Pradesh and closer to Lahe Town than to Nanyun Town. The last two villages are located a fair distance away from the first cluster, close to Nanyun Town, also in the Sagaing Region. The two villages represent older settlements, while the cluster of four were founded after people migrated from the older settlements in search of better land. Chuyo is reportedly still being used in daily life in the older villages. The area in which Chuyo is spoken is rich in languages. In the vicinity of the Chuyo community near Lahe, we find the following language groups, in clockwise direction: Wakka/Gakkat to the north, Hazik Olo to the northeast, Kaisan to the east, Sansik to the southeast, Lainong to the south, and Kahyu to the west. Like many minority groups, Chuyo people tend to be multilingual, using different languages within different domains. Chuyo is the language of the home. At church, Chuyo is used alongside Konyak, English and Burmese. Other languages that may be understood by members of the Chuyo community include Wakka/Gakkat, Kahyu, Chen, Sansik, Olo Hasik, and some language varieties across the border in India such as Tangnyu, Sheanghah, and Longwa. The endonym with which the speakers of the language refer to themselves and to their language is Chuyo, pronounced by our speaker as /t͡ɕi1jɔ2~t͡ɕu1jɔ2/.2 Neighboring groups refer to the Chuyo people with different exonyms. Community members report that they are referred to as Chuya by the Chen, as Chuyan by speakers of Burmese, as Bangkoi by the Kaisan people, as Bangku by the Hasik people, as Juhza by the Wancho people in Arunachal, and as Chuyang by speakers of Nagamese (Hoipo Myers, p.c.). Eberhard et al. (2025) and Statezni (2013) include the following names and spellings besides Chuyo: Cyuyo, Vanggu, Wanggu, Wangoo, Vangku. The Chuyo language belongs to the Sino-Tibetan language family and falls under the Sal group (Burling 1983) and subsequently under the Northern Naga group. Within Northern Naga, Hsiu (2018) subsumes Chuyo under the Wancho group. Eberhard et al. (2025) subsume Chuyo, together with a multitude of other language varieties, under Tangshang. This group name has been in use in Myanmar since 2003 (Statezni 2013:5-6). On the Indian side, a section of this group has been known as Tangsa since 1956 (Saul 2005:28). The terms Tangshang and Tangsa differ in scope: the label Tangshang is a very large umbrella term that covers, for example, the Wancho language varieties, while Tangsa is a smaller but still large umbrella term that does not include them. It should be noted that what is being discussed here relates to linguistic affiliation, which is different from how people culturally, ethnically, or socio-politically identify themselves, though the two perspectives may overlap. With regard to low-level classification, we have insufficient data to draw conclusions, but according to self-report by different speakers of Chuyo, the Tangnyu language variety is very similar to their language, and Kahyu and their language are mutually intelligible (Hoipo Myers, p.c.). It remains to be determined to what degree the reported intelligibility is due to language similarity (inherent intelligibility) or to exposure (acquired intelligibility). To conclude, we have a general idea of where Chuyo stands in the Sino-Tibetan family, but the lower-level linguistic affiliation of the language is still sketchy and needs further research. 3 Phoneme inventory 3.1 Overview of contrastive consonant sounds Chuyo has 24 consonant phonemes, as shown in Table 1. The plosive series is fairly symmetrical, with three sets of three plosives showing the same distinctions in aspiration and voicing. The voiceless alveolar stop does not have a voiced counterpart at the same place of articulation though but instead is complemented by a retroflex voiced stop. The voiced retroflex can be pronounced as an alveolar [d], but since [ɖ] is the most frequent realization, the latter has been chosen to represent the underlying form. While the language does not have palatal plosives, it does have two alveopalatal affricates, a palatal nasal, and two palatal approximants. There are three fricatives, including a lateral fricative. There is no rhotic consonant, though /ɖ/ can be realized as a tap [ɾ] or approximant [ɹ] (and as [d] as is mentioned above). What stands out about the approximants /w/ and /j/ is that they have glottalised counterparts: /wʔ/ and /jʔ/. The sound sequences [tɕ] and [dʑ] have been analyzed as complex consonant phonemes rather than sequences of two consonants, because the language does not allow any unambiguous consonant clusters within the syllable and the sounds [ɕ] and [ʑ] do not occur separate from [t] and [d] respectively. Similarly, the sound sequences [wʔ] and [jʔ] have been analyzed as consonants with a secondary articulation rather than a coda cluster, because the language otherwise does not allow consonant clusters in coda position. Table 1: Chuyo consonant phoneme chart POA MOA labial alveolar/ retroflex (alveo) palatal velar glottal plosive pʰ p b tʰ t ɖ kʰ k ɡ ʔ affricate t͡ɕ d͡ʑ nasal m n ɲ ŋ fricative s h lat. fricative ɬ approximant w wʔ j jʔ lat. approximant l All Chuyo consonants can occur in onset position except the following four: /w, wʔ, jʔ, ʔ/. A glottal stop at the beginning of a word is considered a phonetic byproduct of a word-initial vowel. In coda position, there is no contrast in voicing or aspiration. The following ten consonants are permitted as codas: voiceless unaspirated plosives /p, t, k, ʔ/, all nasals except the palatal /m, n, ŋ/, and all approximants except the lateral /w, wʔ, j, jʔ/, as exemplified in (1). 1. /tap0/ ‘hut’ /dʑʌm2/ ‘mat’ /ʌw1/ ‘chicken’ /lat0/ ‘nail’ /ɡɔn2/ ‘goat’ /an2tʰawʔ0/ ‘eggplant /mik0/ ‘eye’ /ʌŋ2/ ‘sky’ /ɬʌj3/ ‘wind’ /ɲiʔ0/ ‘day’ /hʌjʔ0/ ‘to gather’ Table 2 and Table 3 provide evidence of contrast in identical or analogous environments for a list of phonetically similar pairs of segments. Table 2 shows contrasts in onset position, and Table 3 shows contrasts in coda position. Table 2: (Near) minimal pairs for contrasts in onset position Phoneme Chuyo English Phoneme Chuyo English pʰ p pʰaw1 paw1 ‘jungle’ ‘to skip’ n ŋ nɔk0 ŋɔk0 ‘village’ ‘bend or curve’ p b puŋ2 buŋ2 ‘granary’ ‘testicles’ n ɲ nɔk0 ɲɔk0 ‘village’ ‘to limp’ tʰ t tʰʌn2 tʌn2 ‘face’ ‘to divide’ ɲ ŋ ɲam3 ŋam3 ‘gong’ ‘prideful’ t ɖ tuʔ0 ɖuʔ0 ‘thorn’ ‘cup’ s h sam2 ham2 ‘rib’ ‘drum’ kʰ k kʰʌm2 kʌm2 ‘leftover’ ‘house’ s ɬ sʌj2 ɬʌj2 ‘jealous’ ‘wind’ k ɡ kʌk0 ɡʌk0 ‘sickness’ ‘pig’ ɬ l ɬɔŋ1 lɔŋ1 ‘insect’ ‘stone’ t͡ɕ d͡ʑ t͡ɕiʔ0 d͡ʑiʔ0 ‘to chop finely’ ‘excrement’ l ɖ lɛp0 ɖɛp0 ‘example’ ‘all’ n m nɔn1 mɔm1 ‘to lift up’ ‘mist/fog’ Table 3: (Near) minimal pairs for contrasts in coda position Phoneme Chuyo English n ŋ ɡɔn2 ɡɔŋ2 ‘goat’ ‘pounded rice’ k ʔ mik0 miʔ0 ‘eye’ ‘to mate’ t ʔ lat0 laʔ0 ‘nail’ ‘to take’ p ʔ tap0 taʔ0 ‘hut’ ‘paddy rice’ 3.2 Realization of consonant sounds While the sonorants in Chuyo do not show much variation in pronunciation, the obstruents do. Among the plosive series, /pʰ/ can alternatively be pronounced as a fricative [ɸ, f]. The fricative realization is found most often in intervocalic position, but the phoneme is not consistently realized as a fricative between vowels, as is exemplified in (2). 2. / ku2pʰɛn1/ [ku⁵⁵pʰʲɛn³¹~ku⁵⁵fɛn³¹] ‘to be low’ The aspirated velar plosive /kʰ/ can alternatively be pronounced with a fricative release or as a full fricative [kˣ, x, ç]. Variation can be found even between tokens of the same word uttered by the same speaker, as shown in example (3). The voiceless palatal fricative allophone can be found before a high front vowel, as shown in example (4) 3. /t͡ɕi2 kʰam1/ [tɕi⁵⁵ kʰam³¹~tɕi⁵⁵ kˣam³¹~tɕi⁵⁵ xam³¹] water hot ‘hot water’ 4. /mʌj1kʰi2ɡɔŋ2lɔp0pɔŋ1/ [mʌj³³çi⁵⁵ɡɔŋ⁵⁵lɔp⁵⁵pɔŋ³¹] ‘barking deer’ The voiced velar plosive /ɡ/ can be pronounced as a fricative [ɣ] or approximant [ɰ] in intervocalic position. For example, /ɡ/ in example (5) was pronounced as [ɣ] when preceded by a word ending in vowel in a container sentence. However, this rule is again not automatic: /ɡ/ can remain [ɡ] also in intervocalic position. For example, the word /ɡʌk0/ ‘pig’ was not pronounced with [ɣ] but with [ɡ] in the same container sentence. 5. /ɡak0/ [ɣak⁵⁵] ‘to sing’ The final plosives /p, t, k/ tend to be pronounced without an audible release, as in example (6). This makes them auditorily closer to the glottal stop than their released counterparts, hence their inclusion in Table 3. 6. /tap0/ [tap̚⁵⁵] ‘hut’ /lat0/ [lat̚⁵⁵] ‘nail’ /mik0/ [mik̚⁵⁵] ‘eye’ The glottal stop may alternatively be realized as tenseness on a syllable rhyme in non-final syllables. This is mostly observed in antepenultimate syllables, like in example (7). In penultimate syllables, the glottal stop is pronounced clearly, sometimes with a short epenthetic vowel [ʌ] between the glottal stop and the following consonant, as shown in (8). 7. /kaʔ0ɡɔ1mun2/ [ka̰⁵⁵ɡɔ³¹mun⁵¹] ‘beard (lit. chin bone hair)’ 8. /kaʔ0ɡɔ1/ [kaʔ⁵⁵ʌɡɔ³¹] ‘chin’ /taʔ0ju3/ [taʔ⁵⁵ʌju³⁵¹] ‘rice beer’ As mentioned in the previous section, the voiced retroflex plosive /ɖ/ shows quite some variation in its realization, [ɖ, d, d̺, ɾ, ɹ, ɹ̥], but minimal pair evidence shows that these all represent the same phoneme. In the source database for this study, [ɖ] is the most frequent surface form, followed closely by [ɾ]. The language has only one sibilant /s/, which unsurprisingly varies in its realization, [s, s̠, ʃ]. While the retracted alveolar allophone [s̠] is the most frequent realization, the alveolar symbol /s/ has been chosen for convenience to represent the sibilant phoneme. Finally, the voiced affricate /d͡ʑ/ can alternatively be pronounced as [z]. Example (9) shows this variation in tokens of the same word. Variation between the affricate and a fricative realization has not yet been observed for its voiceless counterpart /t͡ɕ/. 9. /dʑʌm2/ [dʑʌm⁵¹~zʌm⁵¹] ‘mat’ 3.3 Overview of contrastive vowel sounds Chuyo has six contrastive vowel sounds consisting of three unrounded front vowels, and two rounded and one unrounded back vowel. Evidence for contrast between the phonetically similar vowels /a/ and /ʌ/ is found in minimal pairs such as /ɡak0/ ‘to sing’ and /ɡʌk0/ ‘pig’. Table 4: Chuyo vowel phoneme chart front central back high i u mid ɛ ʌ ɔ low a The language does not have any diphthongs according to the current analysis. The ambiguous sequences of segments [ai, ʌi, ɔi, ui] and [ao, ʌo] are analyzed as combinations of vowel plus glide (VC) rather than diphthongs (V) or sequences of vowels (VV), because (a) all vowel sound sequences end in a high front vowel or close-mid back vowel which potentially can be interpreted as a glide, and never in a low or open-mid vowel, and (b) these otherwise would be restricted as to the syllable structures in which they are permitted to occur, namely, in open and glottal stop final syllables. Example words are provided in (10). 10. /baj2/ ‘waterfall’ /tʰaw3/ ‘chisel /pʌj3/ ‘wood’ /ɖʌw1/ ‘lung’ /ɡɔj3/ ‘snot’ /lɔ2muj1/ ‘to soar’ Similarly, the ambiguous sequences [ʌiʔ, ɔiʔ, uiʔ] and [aoʔ, ʌoʔ] are analyzed as combinations of vowel plus glottalised glide. Examples are provided in (11). The rhyme /ajʔ/ has not been observed in the source database for this study. 11. - /an2tʰawʔ0/ ‘eggplant’ /pʌjʔ0/ ‘to be wrong’ /dʑiŋ1ɖʌwʔ0/ ‘hiccough’ /ɬɔjʔ0/ ‘to move (intrans.)’ /kʰujʔ0tɕi2tɕɛn2nu1/ ‘to avoid’ 3.4 Realization of vowel sounds Fluctuation or variation can be observed in the pronunciation of vowel sounds in the database with regard to at least three features: vowel quality, duration, and vowel breaking. The first feature, differences in vowel quality, stands out with the front mid vowel /ɛ/ and the back mid rounded vowel /ɔ/. The front mid vowel fluctuates between close-mid and open-mid, sometimes between tokens of the same words, but there are some widespread patterns that can be observed. In open syllables, /ɛ/ is mostly realized as an open-mid vowel, as in example (13), while in closed syllables we find more fluctuation in vowel height, [ɛ~e]. Example (13) shows that [e] can still sometimes be found in open syllables. Perhaps this is because the element /-le1/ is a grammatical element and grammatical elements can show this type of phonological irregularity, or perhaps we are dealing with an underlying vowel-glide sequence /ɛj/. The back mid rounded vowel fluctuates between close-mid and open mid as well and again the variation may occur even between tokens of the same word. In open syllables, we mostly find /ɔ/ realized as an open-mid vowel [ɔ], as in example (14). In closed syllables, a dorsal or glottal coda (=velar nasal or plosive, palatal glide, glottal stop) tends to trigger the higher realization [o], as is shown in (15). Some instances of [o] derive from the vowel and glide combination /aw/. An example is provided in (16), which can be compared to (12). These vowel quality differences are not contrastive; that is, we cannot find minimal pairs that show these represent different phonemes. 12. /aw1ɖɛ3/ [ʔaw³³ɾɛː³⁵¹] ‘feast’ 13. /ɡʌ0li2lɛ1/ [ɡʌ³³li⁵⁵le³¹] ‘to be slow’ /siŋ2 tɕʌ0miŋ2lɛ1/ [siŋ⁵⁵ tɕʌ⁵⁵miŋ⁵⁵le³¹] ‘to be awake, alert’ 14. /ɡɔ1/ [ɡɔː³¹] ‘bone’ 15. /kɔŋ2/ [koːŋ⁵⁵] ‘pair’ /lɔj2/ [loj⁵¹] ‘buffalo’ /ɲaʔ0mɛ2mɔʔ0/ [ɲa̰⁵⁵me⁵⁵moʔ⁵¹] ‘gill’ 16. /ɬʌm3aw1ɖɛ3/ [ɬʌm³⁵o³¹ɾɛ³⁵²] ‘wedding ceremony’ The second type of variation, a vowel duration difference, can be found for the vowels /i/ and /u/ between open and closed syllables. Unsurprisingly, these vowels tend to be longer in open syllables and shorter in closed syllables. A noticeable exception to this tendency is found in the verbal suffix /u1/, which is short in duration. This suffix showed up in most entries in the wordlist that belong to the word class of verbs. For all vowel qualities, duration differences observed in the database are clearly non-contrastive, except perhaps for the vowel quality /ɔ/: in syllables ending in an oral stop, there may be a distinction between /ɔ/ and /ɔː/. It appears a short [ɔ] is raised to [o] preceding a dorsal or glottal coda as described above, but a long [ɔː] is not raised but instead pronounced as a long open-mid vowel or undergoes vowel breaking, [ɔʌ]. Examples are provided in (17). Further research is needed to determine whether a phonemic contrast exists between /ɔ/ and /ɔː/ in this language. It would not be unusual to have a vowel length contrast only in stopped syllables (common in the Chin languages) and neither is it impossible to have a duration difference for one vowel quality and not for the others (see for example Iu Mien, Arisawa 2016:149). However, currently we have no evidence of contrast in identical environments or confirmation of such a contrast from native speaker judgements, and the longer duration could represent free variation, an effect of listing intonation or even a tonal contrast as well. For now, until phonemic contrast is proven, the vowel quality /ɔ/ will be transcribed without indication of duration. The vowel /ʌ/ seems to be shorter in duration in general, while the vowels /ɛ, a/ seem to be longer in duration regardless of syllable type. 17. /lɔk0/ [lɔːk⁵¹] ‘elephant’ /pʰɔk0/ [pʰɔːk⁵¹~ pʰɔʌk⁵³] ‘to borrow’ The third type of variation, vowel breaking, occurs in particular environments. The front mid vowel /ɛ/ may be realized as [ɛʌ, eʌ] preceding a coda /p, k/, as shown in example (18). This is not an automatic process that always applies in this speaker: even between tokens of the same word there may be alternation between a monophthong and vowel glide realization. Vowel breaking of /ɛ/ does not happen before the coda /t/ in the source database for this study, though there admittedly are only five instances of /ɛt/ in the wordlist, so perhaps vowel breaking may occur in in this environment, but it just happened not to occur in the recordings of these words. Similarly, the rounded mid back vowel /ɔ/ may be pronounced as [ɔʌ, oʌ]. In our recordings, this is observed before all oral plosive codas, /p, t, k/, and the nasal codas /n, m/, as exemplified in (19). So far, I have not observed this alternation before the velar nasal /ŋ/. Finally, what could also be considered vowel breaking is that /ɛ/ may sometimes be pronounced [ʲɛ]/, as exemplified in (20). 18. /ku0sɛp0/ [ku³³s̠ɛːp⁵¹~ ku⁵⁵sɛʌp⁵¹] ‘to be heavy’ /ɡɛk0/ [ɡɛːk³⁵²~ɡɛʌk³⁵²] ‘rice husk powder’ 19. /ɡɔp0/ [ɡɔʌp⁴⁵²] ‘waist’ /ɡɔt0ɡum2/ [ɡɔʌt⁵⁵ɡum⁵⁵¹] ‘garbage dump’ /pʰɔk0/ [pʰɔːk⁵¹~ pʰɔʌk⁵³] ‘to borrow’ /ɡɔm1/ [ɡɔʌm³¹] ‘belly’ /ɡɔn²/ [ɡɔːn⁵⁵~ɡɔʌn⁵¹] ‘goat’ 20. /ɡɛn2pa1/ [ɡɛn⁵⁵pa³¹~ɡʲɛn⁵⁵pa³¹] ‘argument, quarrel’ 4 Structure of major syllables It seems helpful to distinguish between two syllables in Chuyo: strong, major syllables and weaker, non-major syllables. It remains to be determined whether all non-major syllables fit together nicely in one category or need to be further divided into subtypes. This section will limit itself to the discussion of major syllables. Major syllables are specified for tone if they are open or have a sonorant coda. The maximal syllable template for major syllables is (C)V(C)(T), where T stands for tone. All four possible syllable types are listed in (21), with an example monosyllabic word for each. Syllables with an onset are more frequent than onsetless syllables, which aligns with the universal preference for syllables with an onset. 21. CV /ɡi2/ ‘rattan’ CVC /lɔŋ1/ ‘stone’ VC /ʌŋ2/ ‘sky’ V /i3/ ‘to count’ The distribution chart in Table 5 shows the rhymes that occur in the source database for this study, indicated with a plus symbol and grey shading. The empty cells indicate rhymes that are not attested in the dataset. If they do occur in the language, they must not be very frequent. The rhyme /im/ has so far only been observed in the personal pronoun paradigm and /ʌʔ/ only in deictic words. Among the coda consonants, it is likely there are phonotactic constraints on which nuclei can combine with the glides. Whether the empty cells for the rhymes /ɛʔ/, /ɛŋ/ and /ajʔ/ reflect phonotactic constraints or that they do occur outside the source database for this study remains to be investigated. Also, /ʌ/ is not found in open major syllables so far, but it has to be determined whether perhaps this rhyme may occur in grammatical elements. Table 5: Distribution chart for rhymes of major syllables p t k ʔ m n ŋ j jʔ w wʔ # i + + + + +* + + + ɛ + + + + + + a + + + + + + + + + + + ʌ + + + +* + + + + + + + ɔ + + + + + + + + + + u + + + + + + + + + + 5 Toneme inventory 5.1 Overview of tones In Chuyo, a three-way tone system is observed in lexical words. Not all syllables carry tone: non-stopped major syllables are specified for tone, but stopped syllables and non-major syllables are not specified for tone or the tonal contrast has been neutralized in them. Stopped syllables by default are pronounced with a high pitch, either high level or high falling. In this, they are different from the closely related language Chen, which does show a two-way tonal contrast in syllables ending in an oral stop (Konyak and Mulder 2022:716-717). In this paper, Chuyo stopped syllables, which include syllables with an oral stop coda, syllables with a glottal stop coda, and syllables with a glottalised glide, are treated as atonal. One could alternatively treat these as a fourth tone category or as belonging to tone 2, the high tone. Table 6 shows the default relative pitch trajectory in Chao tone numerals in monosyllabic words in a container phrase, where 1 represents the lowest and 5 the highest slice of the relative pitch range. The tone categories are labelled with numbers, as is usually done in the description of languages from this region (see for example Van Dam:105-108). Throughout the current paper, underlying tone categories are indicated with a subscript number after the syllable while surface realization is indicated in superscript Chao tone numerals after the syllable. Table 6: Chuyo tones Tone category Chao tone numerals 1 31 2 55 3 35 5.2 Realization of tones The relative pitch trajectories in Table 6 reflect the default realization of monosyllabic words in a container phrase. When a monosyllabic word is pronounced in isolation, not only tone 1 but also tone 2 and 3 drop towards the low end of the relative pitch range, as shown in Table 7. In isolation, tone 1 can be described as a straight mid to low falling tone, tone 2 can be characterized as having a delayed drop or being a pushed tone, and finally, tone 3 has a rising-falling contour. The three words in (22) illustrate how monosyllabic words were pronounced by the speaker in a container phrase versus in isolation. Table 7: Chuyo default tone realization in non-final vs final syllables Tone category Container phrase Isolation 1 31 31 2 55 51 3 35 351 22. container phrase isolation /suŋ1/ [s̠uŋ³¹] [s̠uŋ³¹] ‘navel’ /tɕun2/ [tɕun⁵⁵] [tɕun⁵¹] ‘mouth’ /suŋ3/ [s̠uŋ³⁵] [s̠uŋ³⁵¹] ‘river’ An in-depth study of tone realization on non-final syllables in disyllabic and multisyllabic words may reveal tone sandhi rules. One initial observation is that syllables carrying any of the three tones are sometimes realized with a level pitch that aligns with the starting pitch of the following syllable in compounds or with a mid level pitch regardless of the starting pitch of the next tone. This tone neutralization occurs in some compounds and not others. It seems this tone neutralization goes hand in hand with syllable shortening. The first syllables in these compounds must be considered non-major. An example of tone neutralization (and syllable shortening) is the element /pu3/ ‘snake’ in (23). However, the same element /pu3/ ‘snake’ retains its status as a major syllable in the compound in (24). It seems that more highly ‘integrated’ compounds will show this syllable weakening process, but less integrated compounds will not. 23. /pu3/ [puː³⁵¹] ‘snake’ → /pu0ɬan2/ [pu⁵⁵ɬan⁵¹] ‘viper’ 24. /pu3/ [puː³⁵¹] ‘snake’ → /pu3kʰun2/ [puː³⁵xun⁵¹] ‘snake venom’ 5 Discussion This paper constitutes a preliminary investigation of the phonology of Chuyo, including an inventory of consonants, vowels and tones and a description of the structure of major syllables. One challenge that is encountered in this preliminary analysis is that we need to decide where in the system complexity should go. No matter how we analyze the ambiguous sounds [w] and [j] and sequences of sounds [wʔ] and [jʔ], it adds complexity to some section of the sound system, which is nothing unusual but part of the nature of languages. With the current analysis of simple glides /w, j/ and glottalised glides /wʔ, jʔ/, the complexity is assigned to the consonant inventory. We introduce more asymmetry to the syllable structure as well: all four sounds occur in coda position, but only /j/ is permitted in onset position, for example in /jan2/ ‘gun’, while usually the balance is the other way around (i.e., more is allowed in onset than in coda position). If we were to alternatively analyze [w] and [j] as forming a diphthong with the preceding vowel sound, we would be left with more complexity in the vowel inventory, and in the syllable structure, since these diphthongs would only be permitted in open syllables and in glottal stop final syllables and not in others. However, the current analysis also shows phonotactic complexity in that the glides do not seem to freely combine with all nuclei. Finally, if we were to analyze [wʔ] and [jʔ] as sequences of glides and segmental glottal stops, we would be introducing a new syllable type, (C)VCC, to the inventory of possible syllable types. In every case, there is an increase of complexity somewhere in the system: in the consonant inventory, in the vowel inventory, in the syllable inventory, or in a combination of these. Different topics for further investigation have been identified in this paper, such as a potential length contrast for the vowel quality /ɔ/, whether /ʌ/ can occur in open syllables in affixes or particles, the structure of non-major syllables, and tone sandhi rules. While some of these matters can be attempted to be investigated in part based on the existing database, such as the topic of tone sandhi in compounds or the structure of non-major syllables, others cannot start to be addressed without further data collection or consultation with speakers of Chuyo. For example, to study the phonotactics and tone patterns of grammatical elements, textual data is needed, and to confirm or reject contrast between /ɔ/ and /ɔː/, we would need to try and elicit minimal pair data and carry out similarity judgement tasks with Chuyo speakers. References Arisawa, Tatsuro D. 2016. An Iu Mien grammar: A tool for language documentation and revitalization. Doctoral dissertation. La Trobe University. Burling, Robbins. 1983. The Sal languages. Linguistics of the Tibeto-Burman Area 7.2:1–31. Eberhard, David M., Gary F. Simons, and Charles D. Fennig (Eds.). 2025. Ethnologue: Languages of the world (28th ed.). Dallas, Texas: SIL International. https://urldefense.com/v3/__https://www.ethnologue.com__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrg0euag2$ Hsiu, Andrew. 2018. Northern Naga (Konyak). Sino-Tibetan Branches Project. https://urldefense.com/v3/__https://sites.google.com/site/sinotibetanbranches/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrtuHbxWZ$ Konyak, Hoipo, & Mulder, Mijke. 2022. A brief outline of Chen phonology. In Proceedings of the Payap University Research Symposium 2022, Chiang Mai, 11 February 2022, pp. 709–721. Payap University: Research and Academic Service Affairs. Saul, Jamie D. 2005. The Naga of Burma. Their festivals, customs and way of life. Orchid Press. Statezni, Nathan. 2013. Fiftyfive dialects and growing. Literacy and comprehension of vernacular literature among the Tangshang Naga in Myanmar. Master’s thesis. Graduate Institute of Applied Linguistics. Van Dam, Kellen Parker. 2018. The tone system of Tangsa-Nocte and related Northern Naga varieties. Doctoral dissertation. La Trobe University. PHONETIC SYSTEM OF HANI LANGUAGE IN LAI CHAU, VIETNAM PHAN Lương Hùng Institute of Linguistics of Vietnam hungphanluong@gmail.com Abstract Hani is a Sino-Tibetan language, Tibet-Karen branch, Tibet-Burman sub-branch, Burma-Loloish group. Hani people in Vietnam includes Black Hani and Flower Hani subgroups. This writing will synchronically describe phonetic system of Hani language in Lai Châu province of Vietnam, including tonal system, consonant system and vowel system. The result shows that there are two dialects of Hani language in Lai Châu, including Mường Tè dialect and Phong Thổ dialect. The differences between these two dialects are mostly in the consonant system. Keywords: Hani, phonology ISO 639-3 codes: hni 1 Introduction Hani is also known as Uní (Lã Văn Lô, Nguyễn Hữu Thấu, Mai Văn Trí, Ngọc Anh and Mạc Như Đường (1959)), Xá U Ní (Vietnam Institute of Ethnology (1978)), Akhà, Maá and Kà Niá (Hoàng Sơn (2008)). According to statistical data in 2009, there were 21,725 Hani people in Vietnam, mainly concentrated in the provinces of Lai Châu (13,752 people), Lào Cai (4,026 people), and Điện Biên (3,786 people). By 2019, according to Vietnam’s General Statistics Office, the Hani population in Vietnam had increased to 25,539 people. Outside of Vietnam, Hani people also reside in China, Thailand, Laos, and Burma, with a total population of nearly 2 million people. Among them, the largest group of Hani people is in China, with more than a million, followed by Myanmar with over 200,000 people and 100,000 in Laos (Graham Thurgood and Randy J. Lapolla (2006)). There are three subgroups of Hani in Vietnam, including Hani Co Cho (Flower Hani), Hani La Mi (Sino Hani) and Black Hani. Among them, the first two subgroups are mainly distributed in Muong Te, Nam Nhun, and Sin Ho districts of Lai Châu province and Muong Nhe district of Dien Bien province. Black Hani live in Phong Tho district of Lai Châu and Y Ty commune, Bat Xat district of Lao Cai (Viện Dân tộc học (1978), Nguyễn Khắc Tụng (1981), Lê Đông and Tạ Văn Thông (2001), Hoàng Sơn (2008), Trần Bình (2014)). Hani belongs to the Sino-Tibetan language family in the Lolo-Burmese group, Southern Loloish sub-group (Graham Thurgood and Randy J. Lapolla (2006)). In Vietnam, there are six languages reported in this subgroup, including Lahu, Lolo, Cong, Sila, Phu La and Hani. 2 Literature review There are several descriptions of the phonetic system of the Hani language in Vietnam. Lương Bèn (1986) based on fieldwork data in Mù Cả and Trung Chai communes of Lai Châu to proposes a system of 29 consonant phonemes: /p, t, c, k, pʼ, tʼ, cʼ, kʼ, b, d, j, ɡ, m, n, ɲ, ŋ, ts, tʃ, f, s, ʃ, x, h, v, z, Z, ɣ, l, ( /; 08 vowels: /i, y, e, ɯ, ə̆ɯ, a, u, o/ and three tones with the contours of /55/, /25/ and /51/. Of these, /Z/ and /(/ are nonstandard symbols and it is not clear what consonants they represent. Lê Đông and Tạ Văn Thông (2001) also based on data in Mu Ca commune to put forward 29 consonants /p, t, c, k, ʔ, ph, th, kh, b, d, ɡ, ts, tɕ, tsʰ, tɕʰ, dz, dʑ, m, n, ɲ, ŋ, s, ɕ, x, h, z, j, ɣ, l, lʰ/; eight vowels /i, y, ɛ, ɯ, u, ũ, ɔ, ɑ/ and four tones with the contours /33, 45, 32, 21/. Jerold A. Edmondson (2002), in The Central and Southern Loloish Languages of Vietnam, proposed the following phoneme system for Ha Nhi based on data from Mù Cả and Hua Bum: /p, t, c, k, ʔ, ph, th, kh, pj, bj, phj, mj, tj, dj, thj, b, d, ɡ, ts, tɕ, tsʰ, tɕʰ, dz, dʑ, m, n, ɲ, ŋ, s, ɕ, x, h, f, z, j, ɣ, l, lʰ/ and five tones, including three tones in smooth syllables /55, 33, 31/ and two tones in checked syllables: /31ʔ/ and /33ʔ/. I have also noted three differences between Hani in Vietnam and Akhar, a subgroup of Hani in Thailand, as followsː (1) Many items of Hani in Vietnam without voicing of the initial consonants that are found in Akha of Thailand (2) The loss of final nasal codas in Hani in Vietnam (3) The presence of the partially voiced or breathy lateral initial /lʰ/ All three researchers identified similar consonant inventories of Hani in Mu Ca commune of Muong Te district of Lai Châu with minor differences for which they offer various phonological explanations. For example, a series of palatalized consonants /pj, bj, phj, mj, tj, dj, thj/ in Edmondson’s (2002) description are listed as initial clusters in Luong Ben (1986) and Le Dong and Ta Van Thong (2001). One significant difference is that the tonal systems vary: three tones in Luong Ben’s (1986) system, four tones in Le Dong and Ta Van Thong’s (2001) system, and five tones in Edmondson’s (2002) system. Other variations among inventories are the existence of nasal vowel /ũ/ in Le Dong and Ta Van Thong (2001)’s system, the phonemes /v, tS, S, Z, □/ appear in the description of Luong Ben (1986) but are absent from the inventories of Le Dong and Ta Van Thong (2001) and Jerold A. Edmondson (2002). Conversely, Le Dong and Ta Van Thong (2001) and Jerold A. Edmondson (2002) posit the existence of /ɕ/, /tɕ/, /tsʰ/, /tɕʰ/, /dz/, /dʑ/, and /lʰ/, which are not found in Lương Bèn’s (1986) inventory. The consonant /f/ only appears in the descriptions of Lương Bèn (1986) and Jerold A. Edmondson (2002). 3 Data and methods The research data used in this study consist of 2,000 words collected by the author in each of eight communes of Mu Ca, Ka Lang, Thu Lum, Ta Ba, Kan Ho, Huoi Luong, Hua Bum, and Si Lo Lau of Lai Châu from 2019 to 2023. All informants are native speakers of Hani. The research method applied in this paper is the descriptive method based on both auditory impression and experimental acoustic analysis using Praat . 4 Results The data show that in Hani, the phonological word in Hani could be one or more than one syllable. The canonical syllable shape in Hani is CGV+tone. The G (glide) in Hani could be /w/ or /j/. The glide /w/ is only found in loanword from Cantonese meanwhile the glide /j/ is popular in Hani, for example, /b̤ja32/ bee, /lja32/ cool, /bjɔ35/ mucus, /mjɔ33/ ripe, /si33 Xwa33/ be happy, /kwɛ35 kɔ32/ expensive, /Ɂa35 hɔ32 hɔ32 ɲɔ32/ sticky rice. At the initial position of the syllable, there are 34 consonants. - /p/: /pa33/ ‘white,’ /pɔ33/ ‘strong,’ /pɛ33/ ‘to start’ - /ph/: /pʰa35/ ‘to change,’ /pʰɔ33/ ‘to open,’ /pʰɛ32/ ‘to release’ - /b/: /ba31/ ‘to carry,’ /bɯ33/ ‘to shoot,’ /bɛ33/ ‘to distribute’ - /b̤/: /b̤a32/ ‘thin,’ /da33 b̤ɔ35/ ‘bank,’ /b̤i35/ ‘to divide’ - /t/: /ta33/ ‘sharp,’ /tɔ33/ ‘to wrap,’ /tɛ33/ ‘flat’ - /th/: /tʰa35/ ‘to fry,’ /tʰɔ32/ ‘to pound,’ /tʰɛ32/ ‘to jostle’ - /d/: /da33/ ‘to go up,’ /dɔ33/ ‘to enter,’ /dɛ31/ ‘raw’ - /d̤/: /hɔ32 d̤a32/ ‘mattress,’ /d̤ɔ35/ ‘straight,’ /b̤y32 d̤ɛ35/ ‘worm’ - /ɣ/: /ɣa32/ ‘be able to,’ /ɣɔ33/ ‘to fly,’ /pa32 ɣɛ35/ ‘hornet’ - /g/: /ga33/ ‘cold,’ /gɔ31/ ‘skinny,’ /gi33/ ‘dark’ - /k/: /ka33/ ‘to drop,’ /kɔ33/ ‘to poke,’ /ki33/ ‘to run’ - /kh/: /kʰa35 bɛ33/ ‘mud,’ /kʰɔ33/ ‘heavy,’ /kʰɛ32/ ‘inclined’ - /h/: /Ɂa32 ha33/ ‘chicken,’ /Ɂa35 hɔ32/ ‘rice,’ /hɛ35 dɔ33/ ‘to put into’ - /Ɂ/: /Ɂa32 ha33/ ‘chicken,’ /Ɂɔ31/ ‘ripe,’ /Ɂɛ35/ ‘to remind’ - /m/:/ma32/ ‘not,’ /mɔ32/ ‘old,’ /mɛ33/ ‘hungry’ - /n/: /na31/ ‘deep,’ /nɔ32/ ‘to step on,’ /nɛ31 ɲu32/ ‘bull’ - /ɲ/: /ɲa33/ ‘above,’ /ɲɔ33/ ‘to pick by chopsticks,’ /tsʰɯ32 ɲɛ35/ ‘hoarfrost’ - /ŋ/: /ŋa35/ ‘me,’ /ŋɔ32/ ‘to break by hand,’ /ŋɛ32/ ‘unfinished’ - /z/: /za32/ ‘child,’ /zɔ33/ ‘hit on target,’ /χa32 zɯ32/ ‘tiger’ - /ʒ/: /ʒɔ35/ ‘host, owner,’ /ʒɛ35/ ‘rain,’ /ʒa33/ ‘fast’ - /s/: /sa31 ɣɔ35/ ‘breath,’ /sɔ35/ ‘fragrant,’ /kʰa35 sɛ35/ ‘sand’ - /ɕ/: /ɕi35/ ‘to die,’ /ɕɔ33/ ‘to touch,’ /ɕɛ33/ ‘to fall’ - /l/: /Ɂa32 la31/ ‘hand,’ /lɔ35 tɛ33/ ‘paddy field,’ /Ɂu35 tʂu31 tɔ35 lɛ32 / ‘flood’ - /lʰ/: /pa33 lʰa33/ ‘moon,’ /Ɂu32 lʰɛ35/ ‘moon,’ /lʰɔ32/ ‘boat’ - /ɬ/: /ɬa33/ ‘month,’ /ɬɔ35/ ‘hot,’ /dʒa32 ɬɛ35/ ‘wind’ - /w/: /χwaŋ32 ti35/ ‘king,’ /kwɛ35 gɔ32/ ‘expensive,’ /ma32 χwa33/ ‘narrow’ - /j/: /ja33/ ‘to sweep,’ /jɔ31/ ‘distort,’ /jɛ32/ ‘to slice’ - /Ɂ/ ‘/Ɂɔ31 mjɔ31/ ‘flash,’ /zja35/ ‘rain’ - /ts/: /tsa32/ ‘beautiful,’ /tsɔ35/ ‘to sit,’ /mɛ32 tsɛ35/ ‘edge’ - /tsʰ/: /tsʰa32 za32/ ‘illegitimate child,’ /tsʰɔ33/ ‘to follow,’ /tsʰɛ33/ ‘rice’ - /tɕ/: /tɕa32/ ‘to eat,’ /tɕɔ33/ ‘to stab,’ /ta35 tɕɛ33/ ‘to measure’ - /tɕʰ/: /tɕʰa32 pa33/ ‘to split firewood,’ /tɕʰɔ32/ ‘to jump,’ /tɕʰɛ33/ ‘sharp’ - /dz/: /dza32/ ‘to eat,’ /dzɔ35/ ‘to study,’ /dzɛ33/ ‘torn’ - /dʒ/: /dʒa32 ɬɛ35/ ‘wind,’ /dʒɔ35 sa35/ ‘satisfy,’ /dʒi32/ ‘to hit ‘ The consonant system of Hani can be summarized as in Table 1. Table 1: Consonant system of Hani p t k Ɂ ph th kh b d ɡ b̤ d̤ s ɕ χ h z ʒ ɣ m n ɲ ŋ w j ɬ l lʰ ts tɕ tsʰ tɕʰ dz dʒ These consonants do not occur in the speech varieties of all communes. /ɬ/ occurs exclusively in the Hani variety of Si Lo Lau, whereas the consonant /lʰ/ is attested in seven other communes. The breathy-voiced stops /b̤, d̤/ are found only in three communes: Hua Bum, Huoi Luong, and Si Lo Lau. The glide /j/ appears with high frequency in Hua Bum, Huoi Luông, and Si Lo Lau, while it occurs in my data as a marginal phenomenon in the Hani varieties of other communes. The affricates /dz/ and /dʒ/ occur with high frequency in Hani spoken in Hua Bum, Huoi Luông, and Si Lo Lau, whereas it is attested only sporadically at a low frequency in Mu Ca and is absent from the remaining communes. Table 2 summarizes the presence or absence of consonantal phonemes across the surveyed communes, based on the collected data. Here, the symbol (–) indicates that the phoneme is absent in the corresponding commune. Table 2: Distribution of Hani’s consonant among communes MC KL TL TB KH HB HL SLL p + + + + + + + + t + + + + + + + + k + + + + + + + + Ɂ + + + + + + + + pʰ + + + + + + + + tʰ + + + + + + + + kʰ + + + + + + + + b + + + + + + + + d + + + + + + + + ɡ + + + + + + + + ts + + + + + + + + tɕ + + + + + + + + tsʰ + + + + + + + + tɕʰ + + + + + + + + dz + - - - - + + + dʒ + - - - - + + + m + + + + + + + + n + + + + + + + + ɲ + + + + + + + + ŋ + + + + + + + + s + + + + + + + + ɕ + + + + + + + + X + + + + + + + + h + + + + + + + + z + + + + + + + + ʒ - - - - - - - + ɣ + + + + + + + + l + + + + + + + + lʰ + + + + + + + - b̤ - - - - - + + + d̤ - - - - - + + + ɬ - - - - - - - + j + + + + + + + + w + - - - - + + + (MC: Mù Cả, KL: Ka Lăng, TL: Thu Lũm, TB: Tá Bạ, KH: Kan Hồ, HB: Hua Bum, HL: Huổi Luông, SLL: Sì Lở Lầu) Data also show regular phonological reflections between these communes as follow:/ts/, /tɕ/ in Ka Lang, Thu Lum, Mu Ca and Kan Ho correspond to /dʒ/, /dz/ in Hua Bum, Huoi Luong and Si Lo Lau. /lʰ/ in Ka Lang, Thu Lum, Mu Ca and Kan Ho correspond to /ɬ/ in Hua Bum, Huoi Luong and Si Lo Lau. These cases are shown in Table 3. Table 3: Consonant reflection among communes MC KL TL TB KH HB HL SLL cloud tsɔ32 χɔ32 tsɔ32 χɔ32 tsɔ32 χɔ32 tsɔ32 χɔ32 tsɔ32 χɔ32 dʒɔ32 χu32 dʒɔ32 khy32 dʒi32 χu32 eat tɕa32 tsa32 tsa32 tsa32 tɕa32 dza32 dza32 dza32 closed tɕi35 tɕi35 tɕi35 tɕi35 tɕi35 dzi35 dzi35 dzi35 boat lhɔ32 lhɔ32 lhɔ32 lhɔ32 lhɔ32 lʰɔ32 lhɔ32 ɬɔ32 to fry lhu45 lhu35 lhu35 lhu35 lhu35 lʰu35 lhu35 ɬy35 to guard lʰu32 lʰu32 lʰu32 lʰu32 lʰu32 lʰu32 lʰu31 ɬu31 As for vowels, seven monophthongs are attested across all of the surveyed communes, and their realizations are consistent, as summarized in Table 4. Table 4: Vowel reflection among communes MC KL TL TB KH HB HL SLL i χa32 mi32 (wife) χa32 mi32 (wife) χa32 mi32 (wife) χa32 mi32 (wife) mi32 za32 (wife) mi32 za32 (wife) mi32 za32 (wife) mi32 za32 (wife) ɛ dɛ31 (raw) dɛ31 (raw) dɛ31 (raw) dɛ31 (raw) dɛ31 (raw) dɛ31 (raw) dɛ31 (raw) dɛ31 (raw) a ta33 (sharp) ta33 (sharp) ta33 (sharp) ta33 (sharp) ta33 (sharp) ta33 (sharp) ta33 (sharp) ta33 (sharp) u Ɂu32 (sky) Ɂu32 (sky) Ɂu32 (sky) Ɂu32 (sky) Ɂu32 (sky) Ɂu32 (sky) Ɂu32 (sky) Ɂu32 (sky) ɔ lhɔ32 (boat) lhɔ32 (boat) lhɔ32 (boat) lhɔ32 (boat) lhɔ32 (boat) ɬɔ32 (boat) ɬɔ32 (boat) ɬɔ32 (boat) ɯ ɕɯ31 (shave) ɕɯ31 (shave) ɕɯ31 (shave) ɕɯ31 (shave) ɕɯ31 (shave) ɕɯ31 (shave) ɕɯ31 (shave) ɕɯ31 (shave) y py33 (hole) py33 (hole) py33 (hole) py33 (hole) py33 (hole) b̤y33 (hole) b̤y33 (hole) b̤y33 (hole) This vowel system could be summarized in Table 5. Table 5: Vowel system of Hani front back unrounded rounded unrounded rounded high i y ɯ u mid ɛ ɔ low a For the acoustic parameters of vowels of Hani, we use Praat to get the average F1 and F2 values as showed in Table 6 and the movements of F1, F2 as shown in Figure 1. Table 6: F1 and F2 of vowels /i/ /y/ /ɛ/ /a/ /ɯ/ /u/ /ɔ/ F1 382 Hz 454 Hz 543 Hz 731 Hz 420 Hz 425 Hz 591 Hz F2 1811 Hz 1816 Hz 1608 Hz 1269 Hz 1292 Hz 1128 Hz 1014 Hz Figure 1: Waveform and formants of /a/ There are four tones in all eight surveyed communes: /33, 35, 31, 32/. Examples of these are shown in Table 7. Picture 2 shows an acoustic waveform of a sample word with tones. Table 7: Examples of words with all tones in Hani Hani Gloss /pa33/ white /Ɂa35 pa31/ leaf /kɛ33 pa35/ vanished /tsi35 pa32/ wine /ŋa35/ I, me /Ɂa35 ŋa31/ to stutter /ŋa32/ borrow /pɔ33/ strong /Ɂa35 pɔ31/ bagasse /Ɂa32 pɔ35/ greatgrand parent /Xɔ32 pɔ32/ hollow tree trunk /ki33/ run /ki31/ evening /sa33 ki35/ to measure weight /gi32 sa33/ to erase Picture 2: Waveform and Fo of /pa33 lʰa33/ moon There are various phonetic realizations of vowels among speakers. /y/ tends to become [w̯i] among young speakers. /ɛ/ and /ɔ/ sometimes are raised to [ɛ̝] and [ɔ̝]. Sometime, [ũ] and [ĩ] occur as variations of /u/ and /i/ in Mu Ca, Ka Lang, Thu Lum, Ta Ba and Kan Ho, for example, [Ɂu32 tũ32] head, [tĩ32] to beat. The nasal [ũ] was also reported by Le Dong and Ta Van Thong (2001) in Mu Ca with a very low frequency. 5 Conclusion There are some observable features of the phonological system of Hani in Lai Chau. The presence of (+voiced) and (+aspirated) contrasts within the lateral series—an uncommon pattern in the languages of Vietnam—is particularly striking. Also noteworthy is the emergence of phonemic contrasts among affricates distinguished by (+voice), (–voice), and (+aspiration), including /dz/, /dʒ/, /ts/, /tɕ/, /tsʰ/, and /tɕʰ/. As for the vowel system of Hani in Lai Chau, it consists of seven monophthongs, with no dipthongs. The presence of /y/ is distinct as it is rarely attested in other languages of Vietnam. The tonal system of Hani in Lai Châu is comprises four tones grouped into phonetically high and low series, and all four tones exhibit a single contour direction. Regarding dialectal differentiation, the findings indicate that the Hani varieties spoken in the eight surveyed communes fall into two major groups. The first group includes Ka Lăng, Thu Lũm, Mù Cả, Tá Bạ, and Kan Hồ, while the second group includes Hua Bum, Huổi Luông, and Sì Lở Lầu. The primary distinctions among the communes lie in the consonant inventory, most notably the presence of /b̤/ and /d̤/ in the second group, as well as the correspondence /lʰ/ ~ /ɬ/ between Sì Lở Lầu and the other communes. References Benedict, Paul K. 1972. Sino-Tibetan: A Conspectus. UK: Cambridge University Press. Chu Thùy Liên & Lê Đình Lai. 2001. Xa Nhà Ca: trường ca dân tộc Hà Nhì [Xa Nhà Ca: The Epic of Hani]. Hà Nội: Nhà Xuất Bản Văn Hóa Dân Tộc. Edmondson, Jerold A. 2002. The Central and Southern Loloish Languages of Vietnam. Proceedings of the Twenty-Eighth Annual Meeting of the Berkeley Linguistics Society: Special Session on Tibeto-Burman and Southeast Asian Linguistics, 1–13. Hansson, Inga-Lill. 1982. A phonological comparison of Akha and Hani. Linguistics of the Tibeto-Burman Area 7.1:63–115. Hoàng Sơn. 2008. Người Hani ở Huổi Luông [Hani people in Huổi Luông]. Hanoi: Nhà Xuất Bản Văn Hóa Dân Tộc. Lê Đông & Tạ Văn Thông. 2001, Tiếng Hà Nhì [Hani language]. Hanoi: Nhà Xuất Bản Văn Hóa Dân Tộc. Lương Bèn. 1986. Hệ thống ngữ âm tiếng Hani [Phonetic system of Hani]. Tạp Chí Ngôn Ngữ (1986) 1:6–7. Nguyễn Khắc Tụng & Ngô Vĩnh Bình. 1981. Đại gia đình dân tộc Việt Nam (Ethnic communities in Vietnam). Hanoi: Nhà Xuất Bản Giáo Dục. Thurgood, Graham & Randy J. LaPolla. 2006. The Sino-Tibetan languages. New York: Routledge. Trần Bình. 2014. Các dân tộc thiểu số ở Việt Nam [Ethnic minorities in Vietnam]. Hanoi; Nhà Xuất Bản Lao Động. Viện Dân Tộc Học. 1978. Các dân tộc ít người ở Việt Nam (các tỉnh phía Bắc) [Ethnic minorities in Vietnam – Northern provinces]. Hanoi: Nhà Xuất Bản Khoa Học Xã Hội. ETYMOLOGICAL NOTES ON WORDS FOR TEXTILES AND FABRIC PRODUCTION IN MAINLAND SOUTHEAST ASIA Mark Alves Rikker Dockum Montgomery College Binghamton University, State University of New York mark.alves@montgomerycollege.edu rdockum@gmail.com Abstract This is an etymological study of words attested regionally among languages of Mainland Southeast Asia (MSEA hereafter) in the semantic domain of fabrics and fabric production. Based on lexical data assembled from databases, proto-language reconstructions, and other extensive digital lexical data resources, combined with ethnohistorical information, we make hypotheses of possible origins of the words and the earliest evidence of their presence in MSEA. Primary source languages of words in this domain include Chinese, Indic, and Southwest Tai, but in many instances, Southwest Tai languages and Vietnamese were secondary sources that facilitated the spread of words from Chinese or Indic. Additional localized exchange happened between Malayo-Chamic and Austroasiatic languages. Adding to previous studies of regional lexical borrowing in semantic domains such as metals and metallurgy, domestic animals, and others, this study contributes further explicit evidence of the regional lexical (and hence sociocultural) influence of Indic and Chinese, of the impact of Southwest Tai expansion into MSEA on Austroasiatic languages, and of other observed language contact scenarios (e.g., Vietnamese and Malay as donor languages to neighboring minority languages, language contact between Chamic and Bahnaric, etc.). Keywords: etymology, loanwords, ethnohistorical linguistics ISO 639-3 codes: cmc, khm, hmx, lao, tha, vi, zh, zsm 1 Introduction Increasing availability of extensive lexical data resources for languages and reconstructions of languages throughout Mainland Southeast Asia (MSEA hereafter) has facilitated etymological research into aspects of material culture found among groups in multiple language families. Our interest is in words which (a) are found throughout the region, (b) have substantial history (e.g., from around the beginning of the Common Era to the mid-colonial period), and (c) belong to a variety of semantic domains. Some of these domains include words for metals (Alves 2018) and metal implements (Alves 2015), pottery and ceramics (Alves 2022), domestic birds (Alves 2015), houses and household structures (Alves and Dockum 2023), and other material cultural areas. Such studies show recurring patterns of widespread loanwords, including the following. • A north-to-south spread of words from Chinese and Southwest Tai, and secondary spread of Chinese words via Southwest Tai languages, Vietnamese, and Khmer • A small but culturally significant set of words from Sanskrit and/or Pali • A radiation of words around Khmer in the center of MSEA • Occasional instances of words of undetermined origin (i.e., Wanderwörter) The focus of this study is on regionally attested words for weaving fabric (in contrast with plaiting of baskets and mats). The wordforms in this study are considered widespread as they are seen in multiple language families and at least several languages. Exchange of words between just two languages or within language families is not considered in this study as those are typically localized instances of borrowing. Such instances deserve study, but in other research projects. In addition, we have identified words with substantial historical depth in the region, many of which date back to the early 1st millennium CE. However, detailed histories of the spread of these words in most cases can only be determined with additional historical phonological information from the respective recipient languages, which is beyond the scope of this current study. The key questions considered herein include the following. These questions are addressed in various ways throughout the following subsections. • Wordforms: What are wordforms for fabrics and fabric production with regionally attested interphylum distribution among the languages of MSEA? • Geography: What is the geographic distribution among the MSEA language families? • Sources: What are the likely etymological sources? • Earliest attestations: What is the earliest attested timing of the words, and what other chronological information can be provided about their spread? • Ethnohistory: What does relevant ethnohistorical and archaeological data indicate about the related objects or practices, and how does that aid in answering the questions above? • Implications: What characterizations of historical language contact can be made based on the data? In the next sections, we review the lexical data and methods of analysis, present the main widespread wordforms, and offer concluding observations on the data. 2 Data and Methods This section presents the lexical data and data sources for this study, describes key methods of searching for and processing the data (especially assessing the relatedness of identified comparable wordforms), summarizes the ethnohistorical context of the spread of fabric and fabric production in MSEA, and discusses the limits of the approach. Regarding the lexical data, the sources include publicly accessible databases and digital publications for the major language families in the region: Austroasiatic, Austronesian (especially Malayic and Chamic), Sino-Tibetan, Kra-Dai (especially Southwest Tai), Hmong-Mien, and Indic (especially Sanskrit and Pali). Collectively, the data comes from hundreds of languages and/or dialects and reconstructions of both language families and branches. These extensive resources are listed in Appendix B. We searched the various databases, dictionaries, and reconstructions for multiple related senses of words to gather the largest groupings of words. The data was assembled into worksheets with lexical data organized by language family. We reviewed the assembled data and highlighted potentially related wordforms. In general, we take a conservative approach, and the goal is to find wordforms with strong phonological and semantic comparability but exclude those with phonological differences that are unusual (e.g., changes which are not typologically common for the language(s) in question, etc.) or cannot be explained or those with significant and/or unexplainable semantic differences. When there are unexpected differences or irregularities, there must be feasible explanations, as provided in the discussions of the various regional wordforms. Also, we sometimes acknowledge when more than one possibility is feasible. We have made efforts to avoid claiming relatedness of wordforms that may be instances of chance partial similarity. As we have assembled data from large numbers of languages with feasible phonological and semantic comparability, and words which belong to a cultural domain for which there is supporting ethnohistorical evidence of its spread, the likelihood of widespread chance similarity of wordforms is reduced. Regarding the timing of the spread of the wordforms, we note the earliest periods with evidence of the presence of such words in the region. We then consider additional information about later spread of the words. However, there is often insufficient information to make claims about how long it took for words to spread to the modern regional extent or, for example, when words were borrowed by minority groups that have no early textual records. Still, we consider ethnohistorical information to determine broad chronological periods of lexical borrowing, which is a starting point for this line of query. The following ethnohistorical sketch of fabric production in MSEA is shown below to provide a historical linguistic context up to the pre-modern era. • The Hoabinhian period (up to c. 2000 BCE): We have found no archaeological data showing evidence of fabric weaving in this period in MSEA in the period prior to the Austroasiatic expansion. • The Neolithic agricultural revolution (from c. 2000 BCE): In this period, numerous spindle whorls used to fashion thread have been excavated in northern Vietnam, some dated to the Phung Nguyen culture (2000-1500 BCE) (Hán 2009:92, 223). This matches the timing of the hypothesized Austroasiatic dispersal into MSEA (e.g., Higham 2017, Sidwell and Rau 2019, Alves 2020, etc.). However, we find no evidence of fabric in the archaeological record from this period to show how thread was used or tools for weaving fabric.1 • The beginning of the Common Era: At the end of the 1st millennium BCE, the Chinese expansion into northern Vietnam and early Indic contact in MSEA brought new cultural practices.2 Evidence of looms dates to this period, including body-tension wooden loom parts (Buckley 2023:7). In northern Vietnam, Han-brick tombs contained Chinese cultural artifacts (Phan 1988), and fragmentary evidence of silk was found in Northern Vietnam dating to the Han Dynasty and Dong Son period (Cameron 2015). • The 2nd millennium CE and the Southwest Tai expansion: While the traditional first Tai kingdom in MSEA is the Sukhothai kingdom starting in 1238 CE, indicating migrations happened some centuries earlier, Pittayaporn (2014) has posited a timing of southward Tai migrations of the 8th to 10th centuries CE based on historical linguistic evidence. We take the position that Tai groups expanded into MSEA around the beginning of the 2nd millennium CE, though much is unclear about these migration events (e.g., when, how many, how long, etc.), making some of our observations about timing tentative. As for weaving practices, Bayesian studies of Kra-Dai groups show parallels between the linguistic phylogeny and the types of looms used by the linguistic groups within Kra-Dai (Buckley et al. 2024). Consequently, the study herein of Tai words in this semantic domain can be regarded as belonging to a tradition distinct from other branches of Kra-Dai. The later colonial period may have affected the interactions among groups in complex ways, but the language contact among the groups naturally continued. Based on this ethnohistorical information, and suggested by the lexical data collected for this study, we posit the following periods of early language contact and lexical borrowing. • Before Sinitic and Indic language contact: It is unclear whether any words in this study date to this period. • Early Sinitic and Indic language contact: From the end of the 1st millennium BCE through the 1st millennium, archaeohistorical evidence demonstrates language contact with Indic‑speaking and Sinitic-speaking groups. Linguistic evidence (i.e., historical phonological data and texts/inscriptions) indicate borrowing (a) from Chinese into Kra-Dai, Hmong-Mien, Vietic, and Old Khmer and (b) from Indic into Chinese, Vietic, Old Khmer, and Tai. • The Southwest Tai expansion: As discussed, the Southwest Tai expansion into MSEA happened from around the start of the 2nd millennium CE. Based on recurring historical linguistic studies and supported by the data herein, Southwest Tai languages experienced intense language contact with many Austroasiatic and Tibeto-Burman languages. This includes major languages (e.g., between Thai and Cambodian) but also many minority languages, as the lexical data shows in following sections. Overall, in making these characterizations, we can only posit very general trends of paths of borrowing. Details might not be recoverable in many cases. But we hope future historical linguistic studies of languages in the region will offer insights into both the timing and paths of transmission of the wordforms in this study. 3 Words in the Primary Semantic Domains This section consists of four subsections of categories of words related to fabric production: words for implements, words for materials, words for actions, and words for miscellaneous matters. All the items are listed in Table 3 in Appendix A. In each entry, information about posited language sources, chronological information, and supporting comparative data are provided. • In the headers of the entries, (pseudo-)phonetic approximations of the regionally attested generalized wordforms are presented in all capital letters. This approach is taken because there can be no single reconstruction as the wordforms found in multiple language families, but some representative form is needed for presentation. Next, there is a posited language source (with indication of degree of certainty, such as “probable” or “possible”), and a general indication of the recipient languages or of secondary donor languages. For example, in the entry ‘needle’ KEM, TSEM (see section 3.1.4), Old Chinese is the probable donor language to Tai and Vietic, which in turn were donor languages to various Austroasiatic languages in MSEA. • Next, there is discussion about the wordforms and their larger historical context. This includes ethnohistorical and historical linguistic background information and the earliest evidence of appearance of the words in MSEA. • Finally, the comparative lexical data is organized by language families, and then examples from proto-languages and/or specific languages are presented. For examples of attestations provided in branches, “etc.” means there is an unspecified number of other languages with the wordforms in that branch, while the lack of “etc.” means only one language in the branch is found in available data (but of course, others could be found in future archival exploration or fieldwork). When listed definitions are precisely the same as that of an entry label, no glosses are provided, but when they differ, such glosses are provided. A question mark indicates that the phonological and/or semantic features differ in an unexplained way, decreasing the certainty of the relatedness, though the overall historical linguistic characterization is not affected. 3.1 Implements: ‘loom’, ‘shuttle of loom’, ‘needle’ The senses in this category include the general terms of ‘loom’, ‘shuttle of a loom’, and ‘needle’. The spread of words in this domain suggests bilingualism that allowed for the transmission of these practices, not merely transmission via trade. These items all have a deep history in the region, though for looms, the history appears to begin in central China. We cannot find a description of the history of needles in MSEA, but humans have used them for dozens of thousands of years (e.g., d’Errico et al. 2018, Gilligan et al. 2024). It is also unclear how the BCE-era Metal Age and increasing sociocultural contact in the early Common Era has affected the use of needles. However, as noted below, two wordforms meaning ‘needle’ have spread: one Chinese word appearing throughout the region and one of Malayo-Chamic origin reaching neighboring languages in Malaysia and central Vietnam. As for the history of looms in southern China and MSEA, ethnohistorical information has been presented in a few studies in recent years, including Buckley (2018) focusing on Tai-Kadai groups and Buckley et al. (2024) on the spread southward into Southeast Asia. According to Buckley (2023:5), the earliest loom remains are at the Hemudu site in central China dating from 7000 to 6000 years before present. Complex frame looms in southern China appear no later than 3000 BP (Buckley 2023:9). However, in MSEA, loom technology comes later. In both northern and southern Vietnam sites, body‑tension wooden loom parts date to c. 2000 BP (Buckley 2023:7), coinciding with the reach of the Han Dynasty in MSEA. However, no studies have focused on relevant lexical evidence. Notably, no language families have reconstructed words for ‘loom’ (see Table 1). However, an Old Chinese word has been reconstructed, and the lexical data we have assembled shows that Kra-Dai branches have a few distinct reconstructions for ‘loom’. This suggests a deep history of these implements in Kra-Dai history, corresponding to the work of Buckley et al. (2024). However, these are only at the branch-level, and no Proto-Kra-Dai etymon can be reconstructed. Table 1: Reconstructions for ‘loom’ in languages in MSEA Kra-Dai • Proto-Tai *trukD ‘loom’ (Pittayaporn 2009) • Kam-Sui *kraːk7 ‘loom’ (Thurgood 1988) • Proto-Ong-Be ɗə:k D2 ‘loom’ (Chen 2018) • Proto-Hlai NA • Proto-Kra NA Austroasiatic • Proto-Austroasiatic NA Hmong-Mien • Proto-Hmong-Mien NA • Proto-Hmong NA • Proto-Mien NA Austronesian • Proto-Austronesian NA3 • Proto-Chamic NA Trans-Himalayan • Proto-Sino-Tibetan NA • Old Chinese *kəj 機 jī ‘mechanism’ (Baxter and Sagart 2015) Of the two widespread wordforms for ‘loom’ in the data, one is a probable Chinese loanword, but it spread via Southwest Tai into Austroasiatic languages. Moreover, the other wordform is a Proto-Tai word that was also borrowed by Austroasiatic languages. Finally, a Southwest Tai word for ‘shuttle of a loom’ is yet another widespread word. The words are discussed below. 3.1.1 ‘Loom’ KI – Probable Chinese source: Old Chinese > Vietic, Tai > Austroasiatic The KI form is a probable Chinese loan in Tai that subsequently spread from Southwest Tai languages into many Austroasiatic languages in the 2nd millennium. However, the tone category presents challenges. The expected tone category in Tai is C, not the actual B for the Old Chinese glottal stop coda. The Vietnamese word also has this unexpected tone, which should come a final fricative, not a coda. For the time being, we assume that the spread of this Chinese technological practice to be the likely situation and that the tone category still suggests borrowing from Chinese in the 1st millennium CE, and that the explanation for the tone category must await additional information. This wordform then spread from Tai to Austroasiatic languages (but not Vietnamese, as its wordform has a distinct vowel) at some point in the 2nd millennium. The diphthong in the Khmer wordform suggests it dates back at least a few centuries (cf. Maspong 2023), having been borrowed as a monophthong and subsequently diphthongized, while the /i/ in other languages does not provide chronological information of time of borrowing. • Chinese – 機 jī ‘mechanism’, MC *kjiB, LH *kɨB, OC *kriʔ (Schuessler 2009) • Kra-Dai – Thai kìi, Lao kīː, Shan ki2, etc. (over 20 Southwest Tai lects); Proto-Southwest Tai *kiB (Jonnson 1991) • Austroasiatic – Katuic (Bru ki:), Khmeric (Khmer kəj & kʔɨəj, Surin Khmer kɛj), Khmuic (Phong kiː huːk), Palaungic (Riang ki, kji), Vietic (Vietnamese cửi [kɨj31], Thavung ki:) • Hmong-Mien – Various languages, in compounds (e.g., East Xiangxi ki⁰³zaŋ³¹dəɯ³⁵ ‘loom’) • Tibeto-Burman – Mpi ki⁶ 3.1.2 ‘Loom’ HUK – Tai source: Southwest Tai > Austroasiatic The HUK form is a likely Tai word as there is a Proto-Tai reconstruction of it, and it is attested in a wide geographic range among Tai languages, while it is found only in a few Austroasiatic languages. Also, the spread of the Chinese KI ‘loom’ form via Tai languages, as noted in 3.1.1, is further evidence of the spread of this kind of technology and accompanying words. The Proto-Tai reconstruction has an onset cluster *tr. Thus, the HUK form among the Austroasiatic languages must have been borrowed after the reduction to /h/ in the donor Tai languages. This suggests borrowing relatively recently, but how recent remains to be determined. • Kra-Dai – Proto-Tai *trukD ‘loom’ (Pittayaporn 2009), Thai hùuk, Lao hȕːk, Shan huk2 • Austroasiatic – Bahnaric (Cheng huːk), Khmuic (Phong kiː huːk) 3.1.3 ‘Shuttle’ KSUI – Probable Tai source: Southwestern Tai > Austroasiatic The only regionally attested word for ‘shuttle (of a loom)’ is that of Proto-Southwest Tai, and it is only found in a few Austroasiatic languages. Considering the spread of other words from Southwest Tai in this domain, this is a likely borrowing into Austroasiatic. • Kra-Dai – Proto-Southwest Tai *suaj “shuttle” (Jonsson 1990) (e.g., Lao ká sŭaːj, Keng Tung Shan soj3, etc.) • Austroasiatic – Katuic (Kui (k)suːj, Bru kasuəj), Khmer (Surin Khmer ksʊːj) 3.1.4 ‘Needle’ KEM, TSIM – Probable Chinese source: Old Chinese > Vietic, Tai > Austroasiatic languages In this entry, two wordforms have a probable single Chinese source word borrowed in two different periods, as indicated by the onset. The KEM form appears to be connected to the OC stage, in which *k is reconstructed, whereas the TSIM form was likely borrowed during the MC period after. However, the latter form with a palatalized onset is found only in Kra-Dai and Mienic languages in China, whereas the older form with /k/ appears in languages in MSEA. The word in Vietnamese is a 1st millennium borrowing, with evidence of two borrowings in two periods: ghim ‘hairpin’ from the OC layer, in which the ‘gh’ /ɣ/ onset developed from OC *k in intervocalic position in the original OC word, and kim from a period in early MC after the loss of the presyllabic material. Vietnamese kim is the likely donor language in other Vietic languages and in Bahnaric. Elsewhere in MSEA, the wordforms were likely borrowed via Tai languages. • Chinese 針 zhēn ‘needle’, MC tsyim, OC *t.[k]əm (Baxter and Sagart 2015) • Kra-Dai – Proto-Tai *qemA (Pittayaporn 2009); TSIM in dozens of Northern Tai lects (e.g., Yongbei Zhuang ɕim1, etc.); KHEM in two dozen Southwest Tai lects (e.g., Lao kʰĕm, etc.); TEM in a dozen Kam-Sui lects (e.g., Cao Miao ȶhəm³⁵, etc.) • Austroasiatic – Bahnaric (Mnong kiːm), Khmuic (Khmu skam),4 Palaungic (Lamet khɪm, khʌm, Riang kɤm² riːm² ‘numb (pins and needles)’), Pearic (Chong kʰem), Vietic (Vietnamese ki:m33 ‘needle’ and ghim ‘pin’, Ruc ki:m1, etc.) • Hmong-Mien – Mienic (Mien sim33, Zao Min tsum44, etc.) 3.1.5 ‘Needle’ JRUM – Malayo-Chamic source: Chamic, Malayic > Austroasiatic While this item is a Proto-Austronesian etymon, it is best characterized as a Malayo-Chamic wordform that spread into Austroasiatic languages. In Vietnam, the source was Chamic languages, while in Malaysia, the donor language was most likely Malay. The reconstruction of this word in Bahnaric branches suggests a pre-modern borrowing, but how far back is unclear. • Malayo-Chamic – Proto-Austronesian *zaRem ‘needle’ • Austroasiatic – Aslian (Temiar ɟaɹup, Jahai jarum, etc.), Bahnaric (Proto-South Bahnaric *ɟrum, Proto-North-Bahnaric *ɟruːm), Katuic (Pacoh ta.rum, Katu ɟarum, etc.) 3.2 Materials: string/thread, cotton, silk, cloth This section considers wordforms with the related senses of ‘thread’, ‘string’, ‘cloth’, ‘cotton’, and ‘silk’. The indirect archaeological evidence of early thread/string comes from spindle whorls. Buckley (2023:3) describes a north-to-south progression of the history of spindle whorls, starting c. 6000 BP in central/northern China, 5000 to 4000 BP in northern MSEA, and 4000 to 3000 BP in central MSEA (Buckley 2023:3). The latter periods in MSEA coincide with the Neolithic agricultural revolution and the Proto-Austroasiatic dispersal. However, while Proto-Austroasiatic has etymons for ‘to weave’ *ta:ɲ and ‘string, cord, rope, thread’ *kə.'sɛːˀ (Sidwell 2024), there is no proto-language etymon for spindle whorls. As for cotton, the earliest period of its usage is to the west of the region (Castillo, Bellina, and Fuller 2016:1260), while silk has a history in China back at least to the 4th millennium BCE (Cameron 2015:67). These are discussed more in sections 3.2.1 and 3.2.3 respectively. Among the wordforms in this section, in some cases, semantic shift has occurred among the recipient languages, such as having the senses of fabric or thread. However, considering the consistency of the phonological forms, in general, this does not create uncertainty of the relatedness among the wordforms. The donor languages for these terms include Indic, Chinese, Southwest Tai, Malayo‑Chamic, and Austroasiatic. 3.2.1 ‘Cotton’ KPAS – Probable Indic source: Indic > Austroasiatic (Vietnamese, Khmer, etc.), ?Chinese, Tai, Tibeto-Burman (Chin branch) Some archaeological studies posit a timing of 6000 to 5000 BCE of the domestication of tree cotton (Castillo et al. 2016:1260). However, evidence of trade of cotton in Southeast Asia dates only to the 4th century BCE (Castillo et al. 2016:1264). So regardless of whether the Sanskrit form is an innovation in an Indro-Aryan language or another language, the wordform in question spread via Indic into many languages of the China and greater Southeast Asian region.5 The wordform is extremely widespread, being found in dozens of languages, and it has even been reconstructed in multiple Austroasiatic branches (e.g., Bahnaric, Katuic, Khasic, etc.). Textual evidence suggests the timing of the spread of cotton production in China dates to the Han Dynasty (Zurndorfer n.d.), though the matter is complex and beyond the scope of this study. As for the linguistic data, Pulleyblank (1981:286) lists a few obsolete two-syllable Chinese compounds (e.g., 古貝 gǔ bèi, OC *kˤaʔ & *pˤa[t]-s, as noted by Pelliot (1959:433-442)) that were used to transcribe this Indic word. These transcriptions based on OC reconstructions partially match the Indic phonological form. We also raise the possibility that the modern Chinese word 布 bù ‘cloth’, OC *pˤa-s, may come from the same Indic source, minus the presyllable, since the phonological form matches a main stressed syllable in the Indic forms, and the semantic shift from ‘cotton’ to ‘cloth’ is feasible. Indeed, in Vietnamese, the proposed loanword also has the meaning of ‘(cotton) cloth’, and it has an expected softened ‘v’ onset from *p in former intervocalic position in the source Indic form. Also, conservative Vietic languages have retained the bisyllabic form, suggesting the earlier word shape of the modern Vietnamese item. Regardless, it appears that the Indic form (Sanskrit and/or Pali) spread into various languages in the region, though this is a matter which cannot be easily tracked or timed. The Sanskrit word is found with multiple syllables in epigraphs in Old Khmer in the 800s, and the medial /r/ in that form suggests borrowing from Sanskrit.6 An even earlier possible attestation is found for Old Khmer of the word ʼaṃpas ‘cotton thread (that which has been spun)’, dating to 620 CE. While this is admittedly speculative, if this is a nominalized form with a nasal affix, this potential loanword has an even earlier history in MSEA. Also, in Vietnamese, based on the modern phonological features, it must have been borrowed when northern Vietic still had disyllabic or sequisyllabic typology, which suggests first‑millennium borrowing. Finally, there is uncertainty regarding the proposed comparable word form in Tai. Proto-Tai *fajC is reconstructed by Li 1977 but not Pittayaporn 2009, though cognates of this form are widespread in all Tai branches. We simply note that it bears surface similarity to other regional forms and may be a borrowing into Proto-Tai, and more careful explanation of the segments await further study. • Indic – Sanskrit kārpāsa or Pali kappāsa ‘cotton’ • Chinese – 布 bù ‘cloth’, MC puH, ?OC *pˤa-s (Baxter and Sagart 2015) • Austroasiatic – Aslian (Temiar & Kensiu kapas, etc.), Bahnaric (pBahnaric *kpaːs ‘cotton plant’), Katuic (pKatuic *kpaas ‘cotton plant’), Khasic (pKhasic *knpaat ‘cotton’), Khmeric (Old Khmer krapās (803 CE), Surin Khmer kbaːh ‘cotton plant/fiber/cloth’), Khmuic (Ti’n [Mal] phwáːj, Khmu pʰaːj), Munda (Juang kapas, Korku kapuso, etc.), Palaungic (Lawa rapas, Riang pɑj¹), Pearic (Chong (various dialects) kbah, pah, kpaːˀt, Pear kəpaːs, etc.), Vietic (pVietic *k.pa:s ‘cotton cloth’, Vietnamese vải ‘cotton cloth’) • Kra-Dai – Proto-Tai *fajC ‘cotton’ (Li 1977) (also Lao káp paː sĭː ‘cotton’) • Malayo-Chamic – Malay kapas, Rhade kpaih, Cham kapăh (& widespread in Western Malayo-Polynesian languages) • Tibeto-Burman – ?Proto-Chin *pat ‘fiber, cotton, thread’ (also Tibetan (Balti) kupa:s, and several others scattered among Tibeto-Burman branches) 3.2.2 ‘Cloth, cotton, thread’ BRAJ – Possible Austroasiatic source: Austroasiatic > Chamic Another word referring to ‘cotton’ or ‘cotton thread’ or ‘cloth’ is the BRAI form. It has been posited by Thurgood (1999:359) to be a loanword in Chamic from neighboring Austroasiatic languages based on his expectation of the phonological form, though he does not give a precise explanation (e.g., the *mr onset cluster being typologically less common, etc.). This wordform is only found in minority Austroasiatic languages, not Khmer, and mostly in languages neighboring Chamic (though the word is in non-adjacent Pearic). There is currently somewhat impressionistic phonological evidence of the direction of borrowing, while ethnohistorical evidence of this practice among these groups is not readily available. As for chronological information, the reconstructed forms in three Austroasiatic branches suggest substantial time depth, and again, we have no ethnohistorical descriptions. Still, the word should be an innovation at some point after the spread of cotton into the region and during a period of significant contact between Chamic and Bahnaric (and/or Katuic). • Austroasiatic – Bahnaric (pBahnaric *braːj ‘(cotton) thread’), Katuic (pKatuic *baraaj ‘thread’), Pearic (pPearic *braːj ‘cotton thread’) • Malayo-Chamic – Proto-Chamic *mray ‘cotton, cloth’ 3.2.3 ‘Silk’ SI – Chinese source: Middle Chinese > Tibeto-Burman, Kra-Dai, Vietnamese As noted in 3.2, early evidence of silk production in MSEA is found during the Dong Son period, while silk production in China is older than that by millennia. Thus, we consider the SI ‘silk’ wordform to be a Chinese loanword that has spread throughout languages in southern China and MSEA. However, in Kra-Dai languages, the wordform is not found in Southwest Tai (see PRE ‘silk’ below). In Austroasiatic, the form is found only in Vietnamese, in which it is an early Chinese loanword, as indicated by the vowel (i.e., schwa, possibly retained from OC). There are comparable wordforms in Tibeto-Burman, though as we are less familiar with Tibeto-Burmese historical phonology, we cannot claim with confidence that these are loanwords from Chinese. • Chinese – 絲 ‘sī ‘silk, MC si, OC *[s]ə (B&S 2015) • Kra-Dai – not in MSEA, only in China, several varieties of Kam-Sui (e.g., Mulam səi¹, etc.) and Northern Tai (e.g., Zhuang sei²⁴, etc.) • Austroasiatic – Vietnamese tơ [təː] (vs. Sino-Vietnamese ti) • Tibeto-Burman – Lolo-Burmese (Gazhuo Gazhuo tsha³²³ sɿ³³, etc.), Tibetan (Xiahe) sə 3.2.4 ‘Silk’ PRE – Tai source: Southwest Tai > Austroasiatic Another regionally attested wordform for silk is PRE. In Tai, it is not found in languages outside MSEA and has been reconstructed in Proto-Southwest Tai (while the SI ‘silk’ form just noted is from Chinese). Its reconstruction in Proto-Southwest Tai does not guarantee that this is a Tai source, but it is more likely that Tai speakers specifically innovated this term as part of the Tai traditions in weaving. Also, its distribution in only single languages in some Austroasiatic branches (Bahnaric, Katuic, Monic) suggests borrowing into them. A comparable wordform is found in Burmish as well, though specialists should check whether these are viable loanword forms (i.e., /ph/ from /pr/). • Kra-Dai – Proto-Southwest Tai *brɛA ‘silk clothes, satin’ (Jonnson 1990) • Austroasiatic – Bahnaric (Alak p’ap’ree), Katuic (Ngeq p’ree daaj), Khmeric (Khmer prɛɛ), Monic (Nyah Kur phrɛ̀ɛ), Palaungic (Palaung phre, etc.), Pearic (Chong prɛ̀ɛ, etc.) • Tibeto-Burman – Burmish (Achang phe⁵⁵, Burmese phɛ⁵⁵, etc.) 3.2.5 ‘Thread, string’ SEN – Possible Chinese source: Middle Chinese (?) > Southwest Tai > Austroasiatic A comparable SEN wordform is found in Middle Chinese, Tai and Kam-Sui languages, and several Austroasiatic languages. However, the Tai and Kam-Sui words appear to have an unexpected tone (i.e., tone C in Chinese typically corresponds to Tai tone B in the traditional labelling categories). This makes it possible that it was borrowed in a later period after tonogenesis, while another possibility is chance similarity. Pittayaporn reconstructs *selC ‘classifier for long thin objects’ in Proto-Tai, and if that is the source with an *l coda, it does not come from Chinese. The wordform is found in a few branches of Austroasiatic, but only four minority languages in total, suggesting borrowing from Southwest Tai languages at some point in the 2nd millennium CE. Thus, while it could be a Chinese loanword, if it is not, it is an innovation in Tai that spread into Austroasiatic. • Chinese7 – 線 xiàn ‘thread, wire, line’, MC sjänC, LH sianC, OC sans • Kra-Dai –Several Southwest Tai languages (Thai sên, Shan sʰen3, Lao sȅn, etc.); SIN in a few Kam-Sui languages (e.g., Dong sin⁴⁵³, etc.); cf. Proto-Tai *selC ‘classifier for long, thin object’ (Pittayaporn 2009). • Austroasiatic – Katuic (Bru sɛn, Kui & Souei sen), Monic (Nyah Kur ɛn), Vietic (Thavung sɛ̂n) 3.2.6 ‘Thread’ MAJ – Tai source: Southwest Tai > Austroasiatic (Mang), Tibeto-Burman (Mpi) The MAJ ‘thread’ form has been reconstructed in Proto-Tai and is found in only a couple of languages in Austroasiatic and Tibeto-Burman, making Southwest Tai languages the likely donor language group.8 • Kra-Dai – Proto-Tai *ʰmajA ‘thread’ (Pittayaporn 2009) • Austroasiatic –Mangic (Mang maj⁴ ‘thread; silk’) • Tibeto-Burman – Southern Loloish (Mpi mai45) 3.3 Other items: indigo, blue, green, silkworm The words in this section include color terms, a word for ‘indigo (the pigment and the color)’, and one word for ‘silkworm’. 3.3.1 ‘Indigo, blue’ KRAM (?CHAM) – Probable Indic source: Sanskrit > Old Chinese (?), Kra-Dai, Austroasiatic Some studies suggest indigo was used for dye in India perhaps several thousand years ago, while in China, it may have date to the Xia Dynasty some 3,600 years ago (Zhang 2016:14). However, it has been suggested that only in the Han Dynasty (c. 200 BCE to 200 CE) is widespread use of indigo seen in the historical record (e.g., Prasad 2018). Considering this information, we posit that the Sanskrit word grāmaḥ ‘the indigo plant’ is a viable source of wordforms in Old Chinese, Proto-Tai, and various Austroasiatic languages. As for the spread among the languages after borrowing from Indic, it is not yet possible to determine the precise paths. However, if the pattern for this word follows other instances, Old Chinese was the donor to Tai and eventually various Austroasiatic languages. However, we cannot rule out borrowing directly from Indic to various languages. The form is not found in early Khmer texts to clarify the time of borrowing in Khmer-speaking territory. We have also noted Vietnamese chàm /ca:m21/, with comparable forms in other Vietic languages and some Tai languages in and around Vietnam (and a comparable form in Mon). However, it is a less certain related wordform as the palatal onset cannot be clearly connected to the apparent *gr source onset in Indic. As for the Tibeto-Burman languages Tibetan and Lepcha, it is unclear whether the source is Indic, considering the lack of the /k/ onset, but borrowing via other languages seems unlikely. A final alternative is that the Sanskrit form is coincidental similarity and not the source, while the Old Chinese form is the source that spread throughout southern China and Southeast Asia. However, the Indic source seems more likely, and we consider that to be the assumption unless additional evidence shows otherwise. • Indic – Sankskrit grāmaḥ ‘the indigo plant’ • Chinese – 藍 lán ‘indigo’, MC lam, OC *[N-k.]rˤam (B&S) • Kra-Dai – Proto-Tai *g.ra:mA 353 ‘indigo’ (Pittayaporn 2009) (e.g., Bangkok Thai khraːm³³, etc.); Also, CHAM in Tai languages in and near Vietnam (e.g., caam4 in White Tai and Black Tai in Vietnam, but also Red Tai in Laos, etc.) • Austroasiatic –Katuic (Bru khaːm ‘indigo (the substance)’), Khmeric (Surin Khmer kraːm ‘indigo (the substance)’), Monic (Nyah Kur khràam, etc., and ?Mon chɒm ‘dark blue’), ?Vietic (Vietnamese chàm, Tho caːm², Muong caːm², etc.) • Tibeto-Burman – Tibetan ræm, Lepcha ryom 3.3.2 ‘Blue, green’ KHIAU – Tai source: Southwest Tai > Austroasiatic A Kra-Dai KHIAU ‘blue, green’ wordform has spread widely among Austroasiatic languages. It is different from ‘indigo’, which is related to the pigment. Instead, the ‘grue’ (i.e., green-blue) color category may have been introduced in a context of (but not necessarily limited to) fabric trade and/or production, though we do not have specific ethnographic data to support this speculation. The wordform has been reconstructed in Proto-Tai, but it is also in many Hlai and Kam-Sui languages, suggesting deep history in Kra-Dai before the Southwest Tai expansion. Thus, Southwest Tai is the likely source of this wordform among Austroasiatic languages in MSEA. The varied onsets (e.g., /s/, /kh/, /h/) of the wordform among the Austroasiatic recipient languages could be due to varied borrowing circumstances, but also time depth during which phonological changes could have occurred. If so, the borrowings predate the modern era but still within the 2nd millennium CE. • Kra-Dai – Proto-Tai: *xiəwA ‘green’ (Pittayaporn 2009), *xiauA ‘green, blue’ (Li 1977); Also many Hlai and Kam-Sui languages. • Austroasiatic – Bahnaric (Tampuan khiəw ‘blue, green’), ?Khmuic (Khmu si:w ‘green, blue, and dark blue’), Khmeric (Khmer bɑŋkhiev ‘to make blue’), Pearic (Pear kʰiəw ‘blue’), Vietic (Pong hɛːw ‘blue’) 3.3.3 ‘Silkworm’ MƆN – Probable Tai source: Southwest Tai > Austroasiatic One widespread term is the MƆN ‘silkworm’ wordform. This word has been reconstructed in Proto-Tai (though it is mostly limited to Southwestern Tai and is found in our data in only one Central Tai language, Tai of Ping Siang moon4 from Hudak (1995)), though this may be a sampling artifact, as this term is not core vocabulary and most wordlists do not include it. As Southwest Tai languages have spread a PRE word for ‘silk’, this wordform is also a likely Southwest Tai loanword in the recipient Austroasiatic languages. In Austroasiatic, it is found in several languages in the Lao/Northeast Thailand region. Considering that silkworms are natively found throughout the region, this word could have been borrowed as a result of the spread of silk production from Tai-speaking groups. • Kra-Dai – Proto-Tai *mo:nC (Pittayaporn 2009) • Austroasiatic – Bahnaric (Brao mɑɑn), Katuic (?Bru mṳan ‘white mulberry tree’), Khmuic (pKhmuic *mɔːn ‘silkworm’), and Mangic (Mang moːn¹) 3.4 Actions: weave, dye, mend, sew While most of the words in this study are nouns, a few key verbs in this domain have also spread regionally. As for the word ‘to weave’, this sense has mostly distinct etyma among the language families in the region, as shown in Table 2. It is reconstructed in Proto-Austroasiatic, both Proto-Tai and Proto‑Kam-Sui, Proto-Western-Malayo-Polynesian and Proto-Chamic, and among others. However, the Hlai branch of Kra-Dai and the Mienic branch of Hmong-Mien have apparently borrowed a Chinese word meaning ‘to weave’, highlighting the influence of Chinese in this cultural domain. Regardless, in MSEA, language families largely have maintained their own native etyma. Indeed, in the World Loanword Database (Haspelmath and Tadmor 2009), the senses ‘to weave’ and ‘to weave or plait/braid’ both have very low borrowability rates, about 0.07 each, which is in the category of basic vocabulary (see discussion in Appendix A). Table 2: Proto-language forms for ‘to weave’ among languages in Greater SEA Language Family Language Group Form Austroasiatic Proto-Austroasiatic *ta:ɲ (AA005) (Sidwell 2024) Austronesian Proto-Chamic *mañam ‘to weave, twill’ (Thurgood 1999) Proto-Western-Malayo-Polynesian *ma-añam PWMP ‘plait, weave (mats, baskets)’ (Blust and Trussel 2010) Kra-Dai Proto-Tai *tamB (Pittayaporn 2009) Proto-Kam-Sui *tam3 (Thurgood 1988) Proto-Hlai *tʃhwɯ:k (< Chinese 織 zhī) (Norquest 2008) Hmong-Mien Proto-Hmong-Mien *ntət (Ratliff 2010) Proto-Mien *tsi̯ɛkD (< Chinese 織 zhī) (Ratliff 2010) Trans-Himalayan Middle Chinese tsyik 織 zhī ‘weave (v.)’ (Baxter and Sagart 2015) Old Chinese *tək 織 zhī ‘weave (v.)’ (Baxter and Sagart 2015) Proto-Tibeto-Burman *(t/d)ak ‘to weave’ #2686; *rak #5669 ‘to weave, drive, chase’ (Matisoff 2003) While nouns can be borrowed readily even in less intense language contact situations, the borrowing of verbs suggests more substantive language contact and bilingualism. This was certainly the case between Sinitic and both Tai and Vietic, and later, such interaction also happened with Tai and Austroasiatic languages, as well as bilingualism between Austroasiatic languages in Vietnam and Chamic. Thus, beyond the trade of objects, the borrowing of verbs in this domain suggests the sharing of technology and cultural practices. 3.4.1 ‘Dye (verb)’ NYOM – Possible Chinese source: Chinese > Tai, Vietic > Austroasiatic, Chamic The wordform NYOM meaning ‘to dye’ is found in all five language families in MSEA. However, as the overall word shape (i.e., onset, coda, and tone) and semantics match, and as this is an expected kind of cultural borrowing, we take the position that this is originally a Chinese word unless additional evidence can show otherwise. The tone category in Tai and Vietnamese both indicate this is an early Chinese loanword. Thus, this wordform has been used in northern Vietnam since the early part of the 1st millennium CE and then brought again in the 2nd millennium with the Southwest Tai expansion. However, beyond that general period, more detailed timing of the borrowing of this word into Austroasiatic languages (other than Vietnamese) and Chamic languages cannot yet be determined. As for the wordforms in Tibeto-Burman languages, we have found them in only two sub-groups in Southeast Asia (but we are less confident of the noted Loloish forms as we are not familiar with their historical phonological situations and explanations for those forms). This suggests borrowing, not retention from a higher-level node in Trans-Himalayan. However, we hope specialists in such languages can check this data offer evaluations. • Chinese – 染 rǎn ‘to dye’, MC nyemX, OC *C.n[a]mʔ (Baxter & Sagart 2015) • Kra-Dai – Proto-Tai *ɲwu:mC (Pittayaporn 2009), Kam-Sui (widespread) • Austroasiatic – Bahnaric (Mnong ɲuom, Stieng ɲɔːm, etc.), Katuic (Bru ɲṳam, ?Pacoh ʄɯm ‘to dye, dip, soak’), Khmuic (Phong nhɔːm), Monic (jɔ̀ɔm in several Nyah Kur varieties), Palaungic (pPalaungic *ɲɔm), Pearic (Chong jaːm), Vietic (Proto-Vietic *ɲɔːmʔ, Vietnamese ɲuəm22, etc.) • Malayo-Chamic – Cham ɲɔm, Rhade ɲuom • Proto-Hmong-Mien – pMienic *ɲumC • Tibeto-Burman – Proto-Kuki-Chin *hnim ‘dip, dye, submerge’, ?Loloish (Lisu nɯ⁵⁵, Hani na̱³³) 3.4.2 ‘Mend’ CƏ.PA – Probable Chinese source: Old Chinese > Vietic, Tai > Austroasiatic The deep history of the borrowing of this word is indicated by both the presyllable and the tone category. In both Vietic and Tai, there is indication of borrowing from a syllable with a final glottal stop, which dates it at least back to the 1st millennium. In Kra-Dai, it appears limited to Southwest Tai, so it could have been borrowed in OC but also early MC. However, in Vietic, there is indication of a presyllable, not only in conservative Vietic languages, such as Ruc, but also the /v/ onset in Vietnamese, indicating lenition of OC *p in intervocalic position. That suggests this was borrowed into Vietic in the Han Dynasty near the beginning of the Common Era. In the following data, we have included comparable words in Khmer and Pearic, though the unexpected fricative finals reduces the certainty that these are borrowings. The provisional Proto-Tibeto-Burman reconstruction is listed, though it is unclear whether Tibeto-Burman languages retained this word from an earlier stage, a matter Tibeto-Burman specialists would need to determine. • Chinese – 補 bǔ ‘to patch, MC puX, OC *[Cə]-pˤaʔ (Baxter & Sagart 2015) • Kra-Dai – Southwest Tai (Thai pàʔ, Tai of Surat paʔ⁵, Lao póʔ, etc.) • Austroasiatic – Bahnaric (?Sedang kəpɔ̰), ?Khmer (Old Khmer *paḥ, cf. Old Khmer *pramaḥ ‘one who mends garments’ (921 CE)), ?Pearic (Chong pak, pas ‘mend’), Vietic (Vietnamese vá, Ruc tupáː) • Tibeto-Burman – #2548 Proto-Tibeto-Burman *s/m-p(ʷ)a ‘patch, mend, sew’ (STEDT provisional) 3.4.3 ‘Mend’ TAAP – Possible Tai source: Southwest Tai > Austroasiatic A second wordform meaning ‘to mend’ is the TAAP form. In available data, it is found in a few Southwest Tai lects, one Northern Tai language (Saek), and three Austroasiatic languages. The word has no distinctive features to indicate direction or timing of borrowing. While this situation could make it a word of uncertain origin, considering Zhou Daguan’s 13th century description of Siamese women commonly offering mending services in their communities (Zhou 2007:76), it is feasible to consider this a Southwest Tai word that spread to Austroasiatic languages in the 2nd millennium CE. • Kra-Dai – Southwest Tai (Red Tai taap2, Lao tȁːp, Saek taap6, etc.) • Austroasiatic – Khmeric (Surin Khmer taːp), Khmuic (Khmu taːp), Vietic (Thavung taːp) 4 Summary and Concluding Observations We can now offer observations of the geographic spread of words in the domain of fabrics and fabric production in MSEA based on the various lines of data and analyses presented above. • North-to-south transmission: There is a recurring pattern of north-to-south spread of words, with Chinese and Southwest Tai (the latter spreading both native Tai words and borrowed Chinese words) as the most common donor languages. Chinese was the source of several words (e.g., ‘loom’, ‘needle’, ‘thread’, ‘silk’, ‘mend’, ‘dye’), which often spread to Tai and Vietic, and subsequently to other languages in the region. • Indic loans spreading regionally: Early contact with Indic has led to the spread of a few culturally significant words (i.e., ‘cotton’ and ‘indigo’) starting from around the beginning of the Common Era, but without clear evidence of the precise paths of transmission. Sanskrit loans may have been transmitted directly to various languages, but indirect spread via other languages is also possible or likely in other cases. • Southwest Tai to Austroasiatic: In many instances, Southwest Tai languages were donor languages of many words (e.g., ‘loom’, ‘thread/string’, ‘silk’, ‘silkworm’, ‘blue/green’, ‘mend’) primarily to minority Austroasiatic languages. This most likely occurred in the 2nd millennium CE after the Southwest Tai expansion. The significant number of loanwords in this semantic domain and others (e.g., metals and metal implements, headwear and footwear, number terms and grammatical words, etc.) highlights the linguistic and sociocultural impact of that large‑scale migration and settling in the region. • Khmer and Vietnamese: Vietnamese and Khmer were also significant donor languages over the centuries within their territories as both were used for administrative purposes and as regional trade languages and lingua francas broadly. • Malayo-Chamic and Austroasiatic: Intensive language contact between Chamic and both Bahnaric and Katuic and between Malayic and Aslian has led to some lexical transmission in this domain. These words include Malayo-Chamic ‘needle’ borrowed into various neighboring Austroasiatic branches and from Austroasiatic ‘needle’ and ‘cloth, string’ to Chamic. Overall, the data sets match tendencies in previous studies of the inter-phylum contact, of the direction, quantities, and timing of loanwords, and of related ethnohistorical descriptions. The data herein could be used to advance queries about unrecorded sociocultural contact through the lexical data, which provides concrete evidence of such contact with some relative chronological information. Author Contributions Alves contributed to the initial conceptual development of this work and the preliminary writing. Dockum in particular assembled and processed the Kra-Dai data for this study and reviewed and helped refine the writing. References Alves, Mark J. 2015. Historical notes on words for knives, swords, and other metal implements in early Southern China and mainland Southeast Asia. Mon-Khmer Studies 44:39–56. Alves, Mark. 2015. Etyma for ‘chicken’, ‘duck’, and ‘goose’ among language phyla in China and Southeast Asia. Journal of the Southeast Asian Linguistics Society 8:39–55. Alves, Mark. 2018. Early Sino-Vietnamese Lexical Data and the Relative Chronology of Tonogenesis in Chinese and Vietnamese. Bulletin of Chinese Linguistics 11.1-2:3–33. Alves, Mark J. 2021. The Đông Sơn Speech Community: Evidence for Vietic. Crossroads 20:1–41. Alves, Mark. 2022. Lexical Evidence of the History of Ceramics in Mainland Southeast Asia. Presentation. Historical Relationships of East and Southeast Asian Languages 2022 (September 3rd-4th). Tsing-Hua University. DOI: 10.13140/RG.2.2.34238.15686/1 Alves, Mark. 2024. An Etymological Study of Vietnamese Words for Weaving and Woven Objects. In T. Phan, Nguyen TC, Shimizu M (eds.), Studies in Vietnamese Historical Linguistics. Global Vietnam: Across Time, Space and Community, 9–28. Singapore: Springer. https://urldefense.com/v3/__https://doi.org/10.1007/978-981-97-4314-8_2__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmProIl88DG$ Alves, Mark and Rikker Dockum. 2023. Etymological Research on Words for the Household in Greater Mainland Southeast Asia. Slide presentation for the 24th Annual Meeting of the Southeast Asian Linguistics Society (SEALS), 16-17 May 2023, Chiangmai University, Thailand. DOI: 10.13140/RG.2.2.12372.58242/1 Baxter, William H. & Laurent Sagart. 2014a. Baxter-Sagart Old Chinese reconstruction, version 1.1 (20 September 2014). Bilmes, Leela. 1998. The /ka-/ and /kra-/ Prefixes in Thai. Linguistics of the Tibeto-Burman Area 21.2:73–96. Buckley, Christopher D. 2018. Connecting Tai, Kam and Li peoples through weaving techniques. Journal of the Siam Society 106 (2018):95–130. Buckley, Christopher D. 2023. The origins of southeast Asian weaving traditions: the perspective from archaeology. Asian Archaeology Volume 7:151–162. https://urldefense.com/v3/__https://doi.org/10.1007/s41826-023-00074-4__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrmDdqPIp$ Buckley, Christopher D., Emma Kopp, Thomas Pellard, Robin J. Ryder, and Guillaume Jacques. 2024. Contrasting modes of cultural evolution: Kra-Dai languages and weaving technologies. Pre-print manuscript. https://urldefense.com/v3/__https://doi.org/10.31219/osf.io/8pz67__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrhkIG5La$ Cameron, Judith Anne. 2002. Textile Technology in the Prehistory of Southeast Asia. Doctoral dissertation. The Australian National University, Canberra. Cameron, Judith. 2015. Direct, indirect questionable archaeological evidence for silk in Viet Nam. Khảo Cổ [Archaeology] 10 (2015):66–70. Castillo, Cristina Cobo, Bérénice Bellina and Dorian Q. Fuller. 2016. Rice, beans and trade crops on the early maritime Silk Route in Southeast Asia. Antiquity 90.353(2016):1255–1269. d’Errico, Francesco, Luc Doyon, Shuangquan Zhang, Malvina Baumann, Martina Lázničková-Galetová, Xing Gao, Fuyou Chen, & Yue Zhang. The origin and evolution of sewing technologies in Eurasia and North America. Journal of Human Evolution 125:71–86 https://urldefense.com/v3/__https://doi.org/10.1016/j.jhevol.2018.10.004__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrqtiBxnT$ Gilligan Ian, Francesco d’Errico, Luc Doyon, Wei Wang, & Yaroslav V. Kuzmin. 2024. Paleolithic eyed needles and the evolution of dress. Science Advances 10:26. DOI: 10.1126/sciadv.adp28. Hán, Văn Khẩn. 2009. Xóm Rền: Một di tích khảo cổ đặc biệt quan trọng của thời đại đồ đồng Việt Nam [Xom Ren: An important archaeological site of the Bronze Age in Vietnam]. Hanoi: Nhà Xuất Bản Đại Học Quốc Gia Hà Nội. Haspelmath, Martin & Tadmor, Uri (eds.). 2009. World Loanword Database. Leipzig: Max Planck Institute for Evolutionary Anthropology. Available online at https://urldefense.com/v3/__http://wold.clld.org__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrmrjAZdE$ , Accessed on 2025-08-07. Higham, Charles. 2014. Early mainland Southeast Asia: from first humans to Angkor. Bangkok: River Books Press. Higham, Charles F. W. 2017a. First farmers in Mainland Southeast Asia. Journal of Indo-Pacific Archaeology 41:13-21. Hoàng, Văn Khoán. 2003. Nghề đàn của người Đồng Đậu (qua các dấu đan in trên đồ gốm) [The craft of the Dong Dau people (through imprints of weavings on pottery)]. Văn Hóa Đồng Đậu: 40 Năm Phát Hiện và Nghiên Cứu (1962-2002) [The Dong Dau Culture: 40 Years of Discovery and Research (1962-2002)], 172–180. Hanoi: Nhà Xuất Bản Khoa Học Xã Hội. Maspong, Sireemas. 2023. Chronology of Registrogenesis in Khmer: Analyses of Poetry and Inscriptions. Journal of the Southeast Asian Linguistics Society 17.1:46–62. Pelliot, Paul. 1959. Notes on Marco Polo. Paris: Librairie Adrien-Maisonneuve. Phan Tiến Ba. 1988. Mộ gạch 10 thế kỉ đầu công nguyên [Brick tombs o f the ten centuries at the beginning of the common era]. Khảo Cổ [Archaeology] 1988.1-2:92–106. Pittayaporn, Pittayawat. 2009. The phonology of Proto-Tai. Ph.D. Doctoral Dissertation. Ithica, New York: Cornell University. Pittayaporn, Pittayawat. 2014. layers of Chinese loanwords in Proto-Southwestern Tai as evidence for the dating of the spread of Southwestern Tai. MANUSYA: Journal of Humanities, Special Issue No 20, 2014:47-68. R Prasad. 2018. Indigo—the crop that created history and then itself became history. Indian Journal of History of Science 53.3 (2018):296–301. Schuessler, Axel. 2009. Minimal Old Chinese and Later Han Chinese: A Companion to Grammata Serica Recensa. Honolulu: University of Hawai‘i Press. Sidwell, Paul and Felix Rau. 2019. The Munda maritime hypothesis. The Journal of the Southeast Asian Linguistics Society 12.2:33–57. Siri-Aksornsat, Pojanee. 1996. The origin and development of /kra-/ and /ka-/ words in Siamese Thai. In The Fourth International Symposium on Language and Linguistics, Thailand, pp. 1730–1742. Institute of Language and Culture for Rural Development, Mahidol University. Xhauflair, Hermine, Sheldon Jago-on, Timothy James Vitalfes, Dante Manipon, Noel Amano, John Rey Callado, Danilo Tandang, Céline Kerfant, Omar Choa, & Alfred Pawlik. 2023. The invisible plant technology of Prehistoric Southeast Asia: Indirect evidence for basket and rope making at Tabon Cave, Philippines, 39–33,000 years ago. June 2023PLOS One 18(6):e0281415. https://urldefense.com/v3/__https://doi.org/10.1371/journal.pone.0281415__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPri-rLSQb$ Zhang, Lei, Kehui Deng, & Ziqi Wang. 2016. Research on the Dyeing Process of Chinese Traditional Plant Indigo Based on Tianmen Blue Calico. Chemical Engineering Transactions 55 (2016):13–18. DOI: 10.3303/CET1655003 Zhou, Daguan (Peter Harris, translator). 2007. A Record of Cambodia: The Land and Its People. Silkworm Books. Zurndorfer, Harriet T. n.d. The Resistant Fibre: The Pre-modern History of Cotton in China. Unpublished manuscript. Appendix A: List of senses, wordforms, and posited origins Note: For perspective on borrowability of words for weaving, we provide borrowability rates from the Max Planck Institute’s World Loanword Database (Haspelmath and Tadmor 2009) of the words in this study in Table 3. While no hard lines can be drawn between categories, in general, rates of basic vocabulary are in the single digits, such as 0.07 for ‘to weave’, while relatively commonly borrowed senses are in the range of 0.25 or higher, as are most of the words in this study. In this table, words for ‘needle’, ‘cotton’, and ‘silk’ are all in this upper range. Borrowability rates of specific senses necessarily vary according to sociocultural circumstances. Table 3: Wordforms in this study Domain Sense and wordform Posited sources Borrowability rates Implements ‘Loom’ KI Probable Chinese source: Old Chinese > Vietic, Tai > Austroasiatic 0.16 ‘Loom’ HUK Tai source: Southwest Tai > Austroasiatic 0.16 ‘Needle’ KEM, TSIM Probable Chinese source: Old Chinese > Vietic, Tai > Austroasiatic minority languages 0.37 ‘Needle’ JRUM Malayo-Chamic source: Chamic, Malayic > Austroasiatic 0.37 ‘Shuttle’ KSUI Likely Tai source: Southwestern Tai > Austroasiatic NA Materials ‘Cotton’ KPAS Probable Indic source: Indic > Austroasiatic (Vietnamese, Khmer, etc.), ?Chinese, Tai, Tibeto-Burman (Chin branch) 0.49 ‘Cloth, cotton, thread’ BRAJ Possible Austroasiatic source: Austroasiatic > Chamic NA ‘Silk’ SI Chinese source: Middle Chinese > Tibeto-Burman, Kra-Dai, Vietnamese 0.80 ‘Silk’ PRE Tai source: Southwest Tai > Austroasiatic 0.80 ‘Thread, string’ SEN Possible Chinese source: Middle Chinese (?) > Southwest Tai > Austroasiatic 0.28 ‘Thread’ MAJ Tai source: Southwest Tai > Austroasiatic (Mang), Tibeto-Burman (Mpi) 0.28 ‘Indigo, blue’ KRAM (?CHAM) Probable Indic source: Sanskrit > Old Chinese (?), Kra-Dai, Austroasiatic NA ‘Blue, green’ KHIAU Tai source: Southwest Tai > Austroasiatic 0.24, 0.17 ‘Silkworm’ MƆN Probable Tai source: Southwest Tai > Austroasiatic NA Actions ‘Dye (verb)’ NYOM Possible Chinese source: Chinese > Tai, Vietic > Austroasiatic, Chamic 0.22 ‘Mend’ CƏ.PA Probable Chinese source: Old Chinese > Vietic, Tai > Austroasiatic NA ‘Mend’ TAAP Possible Tai source: Southwest Tai > Austroasiatic NA Appendix B: Online digital lexical resources consulted Blust, Robert and Stephen Trussel. 2010. Austronesian Comparative Dictionary. Updated December 21, 2014. Web. https://urldefense.com/v3/__http://www.trussel2.com/ACD/acd-ak_a.htm__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrpXBQ6hq$ Last accessed: 1 May 2025 Dictionary of Old Khmer. SEACLASSICS Khmer. Bangkok: Center for Research in Computational Linguistics. Web. https://urldefense.com/v3/__http://sealang.net/ok/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrhfD6TVb$ Last accessed: 1 May 2025 Digital Dictionaries of South Asia. University of Chicago. Web. https://urldefense.com/v3/__https://dsal.uchicago.edu/dictionaries/list.html__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrlMYGTjm$ Last accessed: 1 May 2025 Old Javanese Dictionary. Bangkok: Center for Research in Computational Linguistics. Web. https://urldefense.com/v3/__http://sealang.net/ojed/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrudLKpH6$ Last accessed: 1 May 2025 Proto-Tai-o-Matic. Bangkok: Center for Research in Computational Linguistics. Web. https://urldefense.com/v3/__http://sealang.net/crcl/proto/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrp0Z_3CB$ Last accessed: 1 May 2025 SEALang Library (Digital dictionaries of Burmese, Indonesian, Karen, Khmer, Lao, Malay, Mon, Shan, Thai, Vietnamese, and various other major languages in Southeast Asia). Bangkok: Center for Research in Computational Linguistics. Web. https://urldefense.com/v3/__http://sealang.net/library/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrmJRGEjc$ Last accessed: 1 May 2025 SEALang Mon-Khmer Etymological Dictionary. Bangkok: Center for Research in Computational Linguistics. Web. https://urldefense.com/v3/__http://www.sealang.net/monkhmer/dictionary/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPriswHaNY$ Last accessed: 1 May 2025 SEALang Munda Etymological Dictionary. Bangkok: Center for Research in Computational Linguistics. Web. https://urldefense.com/v3/__http://www.sealang.net/munda/dictionary/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrif_q5Rd$ Last accessed: 1 May 2025 The Sino-Tibetan Etymological Dictionary and Thesaurus. The University of California, Berkeley. Web. https://urldefense.com/v3/__http://stedt.berkeley.edu/*stedt-cgi/rootcanal.pl__;fg!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrnttTDHg$ Last accessed: 1 May 2025 SPIRIT NUMERALS IN OCEANIC AND (THE) BEYOND Russell BARLOW Max Planck Institute for Evolutionary Anthropology russell_barlow@eva.mpg.de Abstract This paper offers a survey of numeral systems that speakers ascribe to the languages of supernatural beings. The majority of these “spirit numerals” are attested for languages of the Oceanic subgroup of Austronesian. However, they occur elsewhere in that family, as well as in other Southeast Asian families, such as Sino-Tibetan. The formal characteristics of these systems are discussed, and theories as to their origins are suggested. Keywords: counting, numeral systems, language games, archaisms, metathesis, sequential number word formation ISO 639-3 codes: app, blb, cdm, fnb, grt, ivv, kos, ksd, lww, mmg, pma, uli, vmg 1 Spirit numeral systems in Oceanic languages In the speaker communities of a few Oceanic languages of Vanuatu, New Britain (Papua New Guinea), and Micronesia, there are reports of supernatural beings using special numeral systems. That is, speakers of these languages know of nonstandard counting systems, which they attribute to spirits or other nonhuman entities. In this paper, I describe these Oceanic “spirit numeral systems”, focusing on their formal features (§1). I then describe a single case of spirit numerals in an Austronesian family outside the Oceanic subgroup (§2) before turning to a few instances found outside the Austronesian family (§3). I conclude with some thoughts on the possible origins of these unusual ways of counting (§4). 1.1 Epi Island, Vanuatu The Reverend Thomas Ewart Riddle (1915, 1916) published 21 traditional stories from Epi Island in central Vanuatu. One of these, titled “The origin of counting”, begins as follows (Riddle 1916:24): A girl was wandering in the cover of the bush, when a spirit came and climbed up a Kinai (almond nut, Pometia pinnata) tree, picked ten kinai nuts and threw them down. The girl took away the tenth. When the spirit climbed down he gathered the nuts and then counted them saying :— “Kinateta, kinaloa, ligan, plakito, vero, verovero, kotua mligani, plakito, lualima.” Having found one missing he again climbed to get one in its place saying :— “Tangka, luangka, tetuka, vari-ka, limka, kona, visi, paro, siwe kuru.” (This last spoken in falsetto.) The girl returns to the same spot with her parents, and they hear the spirit again count to ten, both ways (though with a few minor spelling differences). Riddle (1916:24) notes that these forms are different from the regular numerals used on Epi, interpreting this to mean that they may be “archaic” or “fanciful”. Finally, he records “another old count given by the father of the oldest woman at Nikaura, when she was young and he an old man”. In his reference grammar of the Lewo language of eastern Epi, Robert Early (1994:219 [fn. 27]) notes: “The Lewo counting system for spirit-creatures is still widely known, retained as an item of conscious cultural knowledge, but without any current function.” He provides forms that are similar to the second set given by Riddle (1916:24). Early (1994:34–35) also discusses a set of obscure numerals from an unidentified language of northern Epi that were reported by Steel (1880:470) and Codrington (1885:469–470) (transcribed initially by Bishop John Patteson). Early (1994:35) writes: “it is thought [by speakers? by the author?] that the forms are not the regular counting system, but that of spirit-creatures”. However, Early (1994:35) also notes that the first five forms in this set are not actually numerals but instead mean ‘tomorrow’, ‘day after tomorrow’, ‘third day’, ‘fourth day’, and ‘fifth day’. The forms for ‘six’ through ‘ten’ do not look remarkably different from those of other “human” numeral systems found on Epi. Table 1 presents the forms of the numerals ‘one’ through ‘ten’ in Lewo, as given by Early (1994:219–221), both for the regular system of counting and for that used by spirits; also included in the table is the second set of forms given in the story recorded by Riddle (1916:24).1 Table 1: Lewo numeral systems (based on Early 1994:219–221 and Riddle 1916:24) People Analysis Spirits (Riddle) Spirits (Early) Analysis 1 taːŋa ‘one’ taŋ-ka ta-ka suffix 2 lua ‘two’ lua-ŋ-ka lua-ka suffix 3 telu ‘three’ tetu-ka ~ tel-ka tel-ka suffix 4 vari ‘four’ vari-ka ver-ka suffix 5 lima ‘five’ lim-ka lim-ka suffix 6 o-rai [5]+1 kona kona < POC *onom ‘six’ 7 o-lua [5]+2 visi isi < POC *pitu ‘seven’ 8 o-relu [5]+3 paro varo < POC *walu ‘eight’ 9 o-vari [5]+4 siwe siwe < POC *siwa ‘nine’ 10 lua-lima 2[×]5 kuru kuru < POC *sa-ŋa-puluq ‘ten’ (?) Like the numerals of many Oceanic languages, those of Lewo do not reflect the full set of inherited Proto-Oceanic (POC) numerals. Rather than being monomorphemic, the numerals ‘six’ through ‘ten’ are formed in a quinary-like fashion (cf. Barlow 2023; Ross 2023b). Although lima ‘five’ does not recur in the numerals ‘six’ through ‘nine’, the number 5 is nevertheless implied as an augend, to which forms representing 1, 2, 3, and 4 are added. The number 10 is represented by lualima ‘ten’, which is formed multiplicatively, again with 5 as an “anchor” (cf. Barlow 2025; Pelland 2025). Remarkably—as already noted by Lynch (2009:405) and Garde (2015:129–130)—the numerals ‘six’ through ‘nine’ in the spirit system do appear to reflect the monomorphemic POC forms. The origin of ‘ten’ in the spirit system is less clear, although it perhaps also reflects an older decimal-like term for ‘ten’. The numerals ‘one’ through ‘five’ in the spirit system are clearly related to the respective forms in the regular system, only apocopated and suffixed with the form -ka, which is elsewhere used to derive collective numerals, at least for the numerals ‘two’ through ‘four’: luaka ‘pair’, teluka ‘trio’, varoka ‘quartet’ (Early 1994:224). A homophonous form ka also occurs—apparently suggesting addition with 5—in the numeral ka vari ‘nine’ in one of the obscure systems of northern Epi (Steel 1880:470), a form that Codrington (1885:470) identifies as “strange”. Table 2: More Lewo numeral systems (based on Early 1994:219–221 and Riddle 1916:24) People Analysis A spirit Analysis “Old” Analysis 1 taːŋa ‘one’ kina-te-ta prefix; infix (?) kina-ta prefix; infix (?) 2 lua ‘two’ kina-loa prefix; infix (?) kina-lua prefix; infix (?) 3 telu ‘three’ liɡan ~ liɡnan ‘put, send’ (?) kina-tou prefix; infix (?) 4 vari ‘four’ plakito (?) (= ‘9’) tou-repa ‘3’-‘4’ (?) 5 lima ‘five’ ve-ro ‘be’-‘half’ (?) ropa-sia ‘4’-sia (?) 6 o-rai [5]+1 ve-ro-ve-ro reduplication pili-roŋo cf. ‘7’ 7 o-lua [5]+2 kotua ‘close’ (?) roŋo-pili metathesis of ‘6’ 8 o-relu [5]+3 m-liɡani ‘3’ with prefix (?) rasis-ila same ending as ‘9’ 9 o-vari [5]+4 plakito (?) (= ‘4’) rap-ila same ending as ‘8’ 10 lua-lima 2[×]5 lua-lima 2[×]5 rap-wisi same beginning as ‘9’ Table 2 provides the additional numeral forms reported by Riddle (1916:24): the set of numerals used in the story the first time the spirit counts and the “old count” recalled by the old woman at Nikaura. Although this reportedly “old” system was not attributed to any spirit, it shares some formal similarities with the spirit systems. The “regular” Lewo forms are provided again for comparison. It is impossible to know whether the first set of numerals recited by the spirit in the story recorded by Riddle (1916:24) represents a conventionalized counting series, or whether they were perhaps the novel creation of the narrator. Either way, they bear some similarity to the other “spirit” set recorded by Riddle (1916:24) and Early (1994:219). The numerals kinateta ‘one’ and kinaloa ‘two’ are clearly derived from the equivalent regular numerals, only by means of prefixation rather than suffixation. Here, the prefixed element is kina-, which perhaps derives from the element ka- (suffixed in the other spirit set) itself infixed with , which occurs elsewhere in the language as a fossilized element (cf. Early 1994:116 [fn. 25]). In addition to ka-, the form for ‘one’ also contains the element te-, which may simply be an alternative form of ta- ‘one’. Alternatively, te- may be associated with the language of the supernatural: Early (1994:107) mentions the existence of the “female bush spirit prefix” te-, an apparent cranberry morpheme that derives te-leriko ‘female bush spirit’ from leriko ‘male bush spirit’. Riddle (1916:24) writes the word for ‘three’ in this system two different ways, once as and once as . I wonder whether the intended form was the same as that of the verb /liŋan(i)/ ‘put, send’ (cf. Early 1994: 135, 462, 499). At any rate, liɡ(n)an ‘three’ bears a clear relationship to mliɡani ‘eight’, a common connection in quinary-like systems, whereby ‘eight’ is constructed as 5+3. Perhaps both forms derive from the idea of 3 being ‘put onto’ or ‘sent to’ 5 in order to add up to 8. Assuming that mliɡani ‘eight’ should be /m-liŋan(i)/, then this is the same verb as used for ‘three’, only with the realis prefix m- (cf. Early 1994: 136, 145, 378, 425, inter alia). The numeral plakito ‘four’ is obscure to me. However, vero ‘five’ is conceivably derived from the copular verb ve ‘be’ (cf. p. 320) and the nuclear layer serial verb -ro ‘half, across’ (cf. p. 150). Thus, ‘being half’ could refer to 5 being half the full count of 10 in a decimal system. The next numeral, verovero ‘six’ is clearly a reduplication of ‘five’, here perhaps including the meaning of ‘across’ found in -ro, a possible reflection of a digit-based tallying practice whereby counters would switch ‘across’ to the other hand when counting the fingers representing ‘six’ through ‘ten’. The numeral kotua ‘seven’ is, again, obscure to me. Could it be related to the verb kotaβa ‘close’ (cf. Early 1994:316)? Next, plakito ‘nine’ is homophonous with the form for ‘four’, two numerals which again usually bear a relationship in quinary-like systems. However, no fully functional numeral systems should permit homophony between numerals (see §3.3). Finally, the form lualima ‘ten’ in this system is the same as its non-spirit equivalent. The “old count” numerals share the prefixed element kina-, not only for ‘one’ and ‘two’ but also for ‘three’. The form for ‘one’ in this system is not further augmented by te-. The numeral kinatou ‘three’, in addition to exhibiting the prefixed element kina-, has apparently undergone an irregular loss of l. In the following numerals, it seems as though something rather interesting is going on: tourepa ‘four’ recapitulates the ending of kinatou ‘three’; ropasia ‘five’, in turn, appears to recapitulate the ending of tourepa ‘four’ (albeit with a vowel change). This sort of morphological “tail-head linkage” in numerals, although exceedingly rare crosslinguistically, occurs in a few languages of New Britain, New Ireland, Bougainville, the Solomon Islands, Temotu, and Vanuatu. Christoph Holz (2021), who describes some occurrences of this phenomenon, dubs it “sequential number word formation”. Although the putative morpheme repa ‘four’ conceivably derives from POC *pat[i] ‘four’ (with prefixation of re‑?), the putative morpheme sia ‘five’ remains obscure. Hereafter, the numerals become even more challenging to analyze: piliroŋo ‘six’ and roŋopili ‘seven’ are clearly metathesized versions of each other, but their ultimate etymologies remain unclear (although the verb vili ~ pili ‘move back and forward by hand’ seems semantically related to the act of metathesis itself; cf. Early 1994:143 [fn. 15]). Finally, the remaining forms share some internal relationships but, again, lack clear etymologies: rasisila ‘eight’ and rapila ‘nine’ share the ending -ila, while rapila ‘nine’ and rapwisi ‘ten’ share the beginning rap-. 1.2 Ambrym Island, Vanuatu On Ambrym Island, north of Epi, there are similar reports of spirit counting systems, again preserved in the form of a traditional tale in which a spirit is observed counting nuts. Paton (1971:20–21) gives two versions of this tale, one of which is specified as being from North Ambrym (aside from capitalization of the initial letter of the numeral ‘one’, the two sets of numerals are identical). Richards (2010 [in Garde 2015:129]) gives a very similar version of the forms in a text told in the closely related languages spoken at Fanbak (i.e., Orkon-Fanbak). Table 3 presents the regular North Ambrym numerals as given by Franjieh (2014) and the spirit numerals recorded by Paton (1971:19); the spirit forms for Fanbak (from Richards 2010 [in Garde 2015:129]) are also given for comparison. Table 3: North Ambrym [A] and Fanbak [= F] numeral systems (based on Franjieh 2014, Paton 1971:19, and Richards 2010 [in Garde 2015:129]) People Analysis Spirits [A] Spirits [F] Analysis 1 hu ‘one’ sɔ-ŋae sɔ-kae < POC *sa-kai ‘one’ 2 ru ‘two’ na-loe be-na-lua < (be-)na + POC *rua ‘two’ 3 sul ‘three’ na-tolu be-na-telu < (be-)na + POC *tolu ‘three’ 4 virː ‘four’ tɔlu-nɛmba telu-nimba ‘3’-‘4’ (?) 5 lim ‘five’ nɪmba-ŋeŋe nimba-ŋɡaŋɡa ‘4’-‘5’ (?) 6 leu-se [5]+1 na-or-ŋeŋe na-or-eŋɡa na-1(?)[+]5(?) 7 leu-ru [5]+2 na-or-bɪsi na-or-bisi na-1(?)[×]7 ; < POC *pitu ‘seven’ (?) 8 leu-sul [5]+3 bɪsɪ-nɪŋɡe bisi-nia ‘7’-‘8’ (?) 9 la-ferː [5]+4 taŋa-ŋae taŋa-ŋae [10]−1 (?) 10 sa-ŋul 1[×]10 taŋo-ŋolo taŋo-ŋolo ‘9’-‘10’ (?) Like the regular numeral system in Lewo (§1.1), the North Ambrym system does not reflect the POC monomorphemic forms for ‘six’ through ‘nine’. These are instead formed in a quinary-like fashion, with the ligature leu- ~ la- likely tracing back to POC *lapi ‘take, get, give’ (cf. Lynch 2009:403; Ross 2023b:542). Thus, a form like leu-ru ‘seven’ presumably derives from a phrase meaning something like ‘five give two [more]’, with the initial ‘five’ having been elided. Although having lost the forms for ‘six’ through ‘nine’, North Ambrym nevertheless preserves an apparent reflex of POC *sa-ŋa-puluq ‘ten’, which itself was formed from *sa ‘one’ and *puluq ‘(unit of) ten’. The North Ambrym spirit numerals, on the other hand, are more challenging to analyze than those of Lewo. The first three numerals appear to be archaic retentions of older Oceanic forms, with the addition of the prefixed form na- in ‘two’ and ‘three’ (perhaps from the POC article *na). Then, we have what appears to be another case of sequential number word formation (cf. §1.1): tɔlunɛmba ‘four’ appears to repeat the ending of the preceding numeral natolu ‘three’; and nɪmbaŋeŋe ‘five’ appears to repeat the ending of tɔlunɛmba ‘four’. The origins of nɛmba ~ nɪmba (putatively *‘four’) and ŋeŋe (putatively *‘five’) are, however, obscure. Following ‘five’, the numerals only become more challenging to analyze. The form naorŋeŋe ‘six’ has the same ending as nɪmbaŋeŋe ‘five’; if we assume this ŋeŋe to be the “older” morpheme for ‘five’ in this sequential number word formation, then perhaps the initial naor represents ‘one’, or or is ‘one’ and na- is the same prefixed element found in ‘two’ and ‘three’ (in either case, the assumption is that ‘six’ is derived by an arithmetic operation of 1+5). This initial naor of the form for ‘six’ also occurs at the beginning of naorbɪsi ‘seven’, the ending of which Garde (2015:129) interprets as an archaic reflex of Proto-North-Central Vanuatu (PNCV) *bitu ‘seven’. If naor (or or) represents ‘one’ and bɪsi represents ‘seven’, then perhaps this numeral is derived by an arithmetic operation of 1×7. Regardless of the etymology, we appear to have another instance of sequential number word formation between naorbɪsi ‘seven’ and bɪsɪnɪŋɡe ‘eight’, which—if bɪsɪ is the older morpheme for ‘seven’—contains the putative morpheme nɪŋɡe ‘eight’ (which bears a suspicious similarity to the putative ‘older’ morpheme for ‘five’). When we get to taŋaŋae ‘nine’, I can really only speculate more wildly. The ending ŋae could plausibly mean ‘one’ (as found also in sɔŋae ‘one’). Could this form for ‘nine’ be derived subtractively, with the ‘one’ being understood as ‘one less than ten’? Perhaps the initial taŋa is a reflex of POC *taŋo(p) ‘take hold of, grasp, touch with the hand’ (cf. Osmond & Pawley 2016:513–514)—in other words, ‘take one’ (with ‘from ten’ being implied). Finally, we may again have a case of sequential number word formation in taŋoŋolo ‘ten’, with taŋo standing for ‘nine’ and ŋolo a likely reflex of *sa-ŋa-puluq ‘ten’ (cf. sa-ŋul ‘ten’ in the regular system). 1.3 Pentecost Island, Vanuatu Spirit numerals are also attested on Pentecost Island, north of Ambrym. Andrew Gray (2012:110) discusses the phenomenon: Old people in some areas of Pentecost remember sequences of ‘devils’ numbers’ that are very different from the ordinary numbers. In the old days, it is claimed, people working in their gardens would hear devils in the bushes counting to their children using these numbers. Some sets of devils’ numbers are attributed to specific mythological figures. The numbers often have an archaic sound to them, although it is unlikely that they simply represent an older form of numbering. Gray’s transcriptions of an elderly Apma speaker’s recollection of these “devils’ numbers”, along with the regular Apma counting forms (following Schneider 2010:114), are given in Table 4. Table 4: Apma numeral systems (based on Schneider 2010:114 and Gray 2012:110) People Analysis Spirits Analysis 1 bʷaleh ‘one’ (?) ɡo-na-leh-a prefix (= ‘one’-‘DEF’?), suffix 2 ka-ru ka-2 ɡo-na-ru-a prefix (= ‘one’-‘DEF’?), suffix 3 ka-tsil ka-3 ɡo-na-tsil-a prefix (= ‘one’-‘DEF’?), suffix 4 ka-βet ka-4 ɡo-na-βet-a prefix (= ‘one’-‘DEF’?), suffix 5 ka-lim ka-5 ɡo-na-lim-a prefix (= ‘one’-‘DEF’?), suffix 6 la-bʷaleh [5]+1 lowaŋɡa neologism (?) 7 lavi-ru [5]+2 dulambana neologism (?) 8 lab-tsil [5]+3 ro-ro-mbosisa reduplication? (minus *rua ‘two’?) 9 lab-et [5]+4 se-se-lua reduplication? (minus *sa- ‘one’?) 10 saŋ-wul 1[×]10 bʷal-utem 1×10 (?) Aside from the innovative form bwaleh ‘one’, the regular Apma numerals are relatively straightforward to explain. The numerals ‘two’ through ‘five’ contain a prefixed element ka-, probably a fossilized form of the POC numeral classifier *kai- (cf. Ross 2023a:485–486). The ligature lavi- ~ la(b)- is probably cognate with North Ambrym leu- ~ la-, and saŋwul ‘ten’ likewise exhibits the same formulation as North Ambrym saŋul ‘ten’ (§1.2). The etymologies of the spirit numerals, on the other hand, are much more perplexing, especially the higher numerals. As with Lewo (§1.1) and North Ambrym (§1.2), the lower numerals are more easily explained: the spirit numerals for ‘one’ through ‘five’ are formed regularly by dropping the initial syllable, prefixing ɡona-, and suffixing -a. The prefixed element ɡona- possibly consists of ɡo ‘one, another’ and the definite article na (cf. Schneider 2010:137). The etymology of the suffixed element -a is less clear. It has the same form as a transitive suffix (cf. Schneider 2010:33), but this seems coincidental. The spirit forms lowaŋɡa ‘six’ and dulambana ‘seven’ are completely opaque to me. They are possibly neologisms, created independently of any known morphemes. The same may also be true of rorombosisa ‘eight’ and seselua ‘nine’. However, I note that each of these numerals begins with a formally reduplicated CV element: ro-ro- for ‘eight’ and se-se for ‘nine’. If ro- is a reflex of POC *rua ‘two’ and if se- is a reflex of POC *sa- ‘one’, then perhaps these two forms point to subtractive constructions (i.e., ‘two’ being subtracted [from ten] in the case of ‘eight’, and ‘one’ being subtracted [from ten] in the case of ‘nine’). That said, the -mbosisa ending of ‘eight’—unless it means something like ‘less’—remains obscure, as does the -lua ending of ‘nine’. However, it is possibly derived (perhaps involving metathesis) from the middle of bʷalutem ‘ten’ in the spirit system, which itself has the same beginning as bʷaleh ‘one’ of the regular system, perhaps suggesting a decimal formulation like 1×10. 1.4 Paama Island, Vanuatu Just south of Ambrym, on the small island of Paama, there is also a record of spirit numerals. This occurs in Terry Crowley’s (1992) dictionary of Paamese, which contains twenty entries for words—all of them numerals—indicated as being “in lisefsef language”. The Bislama word lisefsef ‘bush sprite’ has the Paamese equivalent titamol, which Crowley (1992:163) defines as a “short long-haired being with long fingernails that lives in the roots of banyan trees and who causes evil”. Apparently, these creatures count, as is also suggested by one of the example sentences in the dictionary, which is translated as follows: “When the lisefsef counts, he says words from everywhere” (Crowley 1992:201). Table 5 provides the Paamese numerals from ‘one’ through ‘twenty’ with their lisefsef equivalents. Table 5: Paamese numeral systems (based on Crowley 1982:245; 1992) People Analysis Spirits Analysis 1 taːi ‘one’ taː-ɡa 1[×]1 2 e-lu e-2 luaː-ɡa 2[×]1 3 e-tel e-3 telu-ɡa 3[×]1 4 e-hat e-4 hatu-ɡa 4[×]1 5 e-lim e-5 lima-ɡa 5[×]1 6 lahi-taːi [5]+1 kuana < POC *onom ‘six’ 7 lau-lu [5]+2 tiːti < POC *pitu ‘seven’ (?) 8 lau-tel [5]+3 vaːlo < POC *walu ‘eight’ 9 lau-hat [5]+4 teː-ɡa [10]−1 (?) 10 haː-lua-lim ha-2[×]5 lu-ri 2[×]5 (?) 11 taːi-dan-taːi 1[×10]-‘down’-[+]1  so-ɡa [10]+1 (?) 12 taːi-dan-e-lu 1[×10]-‘down’-[+]2 be-rua [10]+2 (?) 13 taːi-dan-e-tel 1[×10]-‘down’-[+]3 be-rua-i [10]+2+[1] (?) 14 taːi-dan-e-hat 1[×10]-‘down’-[+]4 pe-ta [10]+4 (?) 15 taːi-dan-e-lim 1[×10]-‘down’-[+]5 pe-ta-i [10]+4+[1] (?) 16 taːi-dan-lahi-taːi 1[×10]-‘down’-[+5]+1 vimoma (?) 17 taːi-dan-lau-lu 1[×10]-‘down’-[+5]+2 uerie (?) 18 taːi-dan-lau-tel 1[×10]-‘down’-[+5]+3 masa (?) 19 taːi-dan-lau-hat 1[×10]-‘down’-[+5]+4 nabu (?) 20 hanu-mau ‘person’-‘whole’ so-ɡa-i [10]+(1×[10]) (?) Like the initial ka- in Apma (§1.3), the initial e- in the Paamese forms for ‘two’ through ‘five’ is probably a reflex of the POC numeral classifier *kai-. The ligature lahi- ~ lau- in ‘six’ through ‘nine’ is likely cognate with the similar forms in North Ambrym (§1.2) and Apma (§1.3) (cf. Lynch 2009:403). The numeral ‘ten’ is formed multiplicatively (2[×]5) and is preceded by haː-, which may be a reflex of the POC frequentative adverb *pa[ka]- (Ross 2023a:480). Above 10, the system proceeds in a more decimal-like manner, with the numerals ‘eleven’ through ‘nineteen’ formed by adding ‘one’ through ‘nine’ to an augend of taːi ‘one [unit of ten]’. The ligature that connects taːi with each following addend is dan ‘below, down; lower’ (Crowley 1992:17), which likely reflects the system’s origin as a physical digit-based tallying method, whereby the ten upper digits of the body (i.e., the fingers) would be counted before proceeding down to the ten lower digits (i.e., the toes). The number 20 is thus referred to as hanu mau ‘twenty’, a phrase referring to a ‘whole person’—that is, the sum of fingers and toes of a single body; from 20 onward, the system is more vigesimal-like, with 40 referred to as hanuo mau elue ‘forty’ (i.e., 20[×]2), 60 referred to as hanuo mau etelu ‘sixty’ (i.e., 20[×]3), and so on (Crowley 1982:98, 245–247). As in other spirit systems, the lower numerals of the lisefsef are fairly transparently derived from their equivalent “human” forms. The lisefsef numerals ‘one’ through ‘five’ are derived from the regular forms, only lacking the initial e- and suffixed with the form -ɡa, perhaps related to -ɡaː ‘have one each; different’ (cf. Crowley 1992:23). The form for ‘one’ is apocopated, whereas the forms for ‘two’ through ‘five’ have seemingly paragogic vowels, perhaps actually reflecting more archaic forms of these numerals. Indeed, as Crowley (1992:51, 175) himself notes, kuana ‘six’ and vaːlo ‘eight’ are archaic retentions of PNCV *ono ‘six’ and *walu ‘eight’, respectively. Clark (2009:60 [fn. 5]) suggests that tiːti ‘seven’ “may retain some slight formal trace of *bitu” (that is, of the PNCV form for ‘seven’). The form teːɡa ‘nine’ also contains the ending -ɡa. If the initial teː derives from /tehe/ ‘cut’ (cf. Crowley 1992:124) and this final ɡa signifies ‘one’, then the form may have a subtractive structure (i.e., ‘cut one [from ten])’. Perhaps luri ‘ten’ derives from an altered form of ‘2[×]5’. Although lu ‘two’ is easily identifiable, the numeral lim ‘five’ would have to have lost its final m and undergone an irregular change of l > r to yield the element ri in luri ‘ten’. The following form, soɡa ‘eleven’, also ends in -ɡa ‘one (?)’. The initial so perhaps derives from soːrin ‘join’ (cf. Crowley 1992:149), thereby suggesting an additive construction (i.e., ‘join one [to ten]’). There is a clear phonological relationship between berua ‘twelve’ and beruai ‘thirteen’, but I doubt that there is any morphological relationship. Instead, I suspect that the similarity in forms is the result of the recitative nature of this sort of numeral series. Could it be, though, that the ending -rua of ‘twelve’ is an archaic retention of PNCV *rua ‘two’? Although the initial be- is obscure (is it related to veː ‘step’?; cf. Crowley 1992:181), this would give the form a decimal-like additive structure (i.e., [10]+2). There is the same sort of phonological relationship between peta ‘fourteen’ and petai ‘fifteen’. Could initial pe- be a devoiced form of the initial be- in ‘twelve’ and ‘thirteen’? Could the final -ta of ‘fourteen’ ultimately derive from PNCV *vati ‘four’? The lisefsef forms for ‘sixteen’ through ‘nineteen’ are completely opaque to me, and I wonder whether they could be pure neologisms. Finally, soɡai ‘twenty’ bears a strong resemblance to soɡa ‘eleven’. Indeed, there is the same phonological relationship between these two forms as between ‘thirteen’ and ‘twelve’ and between ‘fifteen’ and ‘fourteen’. Perhaps the final -i sound conveys finality, with 11 being the start of additive decimal counting and 20 the end of it. Or else, perhaps, as in ‘eleven’, the ligature so connotes addition (to 10) and the element ɡa signifies ‘one’—only here, with the help of final -i, it is taken to mean ‘one unit of ten’, thus giving the structure [10]+(1×[10]) = 20. 1.5 Tolai language area (New Britain, Papua New Guinea) Outside Vanuatu, among the Kuanua-speaking Tolai community of East New Britain, there are a few reports of “old counting systems” (cf. Lean 1985:21). The system as I recorded it on Makada Island (in the Duke of York Islands, between New Britain and New Ireland), was described to me as being “a traditional counting method that was taught to the ancestors by supernatural spirits” (Barlow 2024:17). The numerals from the Makada dialect of Kuanua are given in Table 6. Table 6: Kuanua (Makada dialect) numeral systems (based on Barlow 2024:14–17) People Analysis Spirits Analysis 1 tikai ‘one’ təke reduced 2 ə-uruə ‘ART’-‘two’ urə-de suffix 3 ə-utul ‘ART’-‘three’ təmapu neologism (?) 4 ə-iwat ‘ART’-‘four’ kaiti neologism (?) 5 ə-ilim ‘ART’-‘five’ kait-a 4+[1] (?) 6 ləp-tikai  [5]+1 libur ‘play’ (?) 7 ləw-uruə [5]+2 matam ‘your [SG] eyes’ (?) 8 ləw-utul [5]+3 tumam metathesis (?) 9 ləw-uwat [5]+4 tərərek ‘do silently’ (?) 10 winun ‘ten’ (?) mu-de (?) The regular Kuanua numerals require little comment. The numerals ‘two’ through ‘five’ in Table 6 contain the article ə, and the numerals ‘six’ through ‘nine’ exhibit the ligature ləp- ~ ləw-, another likely reflex of POC *lapi ‘take, get, give’. The etymology of winun ‘ten’ is unclear, despite a casual resemblance to POC *puluq ‘(unit of) ten’. In the spirit system, we see—as in the Vanuatu cases—that the lower numerals are the most clearly related to their non-spirit equivalents: təke ‘one’ appears to be a phonologically reduced form of tikai ‘one’ and urəde ‘two’ a reduced form of uruə ‘two’, with the suffixed element -de, whose origin is unknown to me. The numerals ‘three’ through ‘ten’, however, bear no clear resemblance to any of the regular numerals. That said, there are some apparent internal relationships. The pair of numerals kaiti ‘four’ and kaita ‘five’ appears to exhibit a similar phenomenon to that found in the Paamese pair of berua ‘twelve’ and beruai ‘thirteen’ (§1.4), if we view the final vowel (monophthong or diphthong) of the second form as an alternation of the final vowel of the first form. When these numerals were recited to me, they were produced in a rather singsong fashion, and I wonder again whether the phonological relationship between the consecutive forms for ‘four’ and ‘five’ may relate to the recitative nature of the series. The initial lib- of libur ‘six’ is perhaps reminiscent of the ligature ləp-, but the full form is the same as that of the verb libur ‘play’ (cf. Meyer 1961:209: líbur ‘sich ergehn [sic], spielen, spazieren gehen, nichts Wichtiges tun’ [‘indulge oneself, play, stroll, do nothing important’]), with which it perhaps has a semantic connection: assuming these spirit numerals are somehow “playful” and accepting that Tolai counting has its origins in finger-based tallying, then the numeral ‘six’ would be formed as the person who is counting (playfully?) switches from counting on one hand to counting on the other. The following form, matam ‘seven’, has the same form as mata-m ‘your [SG] eye(s)’ (cf. Mosel 1984:32, 55). It is conceivable that it derives from the notion of the two eyes (of an interlocutor?) being added to an implied five fingers of the person counting to reach the number 7, again reflecting a body‑based origin of counting in Tolai culture. The next numeral, tumam ‘eight’, may derive from a metathesized version of matam ‘seven’, albeit with a different vowel in the first syllable. After this, tərərek ‘nine’ perhaps relates to tárarak ‘geräuchlos tun’ [‘do silently’] (Meyer 1961:374), possibly reflecting the cryptic nature of these spirit numerals. Finally, the spirit form mude ‘ten’ may contain the same final element -de as seen in ‘two’, perhaps reflecting a quinary-like formulation (i.e., 5×2). However, this Makada dialect version of this “spirit” ‘ten’ may be the result of metathesis, as other dialects exhibit forms like idumek, dumek, and idumek for their esoteric numeral ‘ten’ (Table 7). In general, the Kuanua spirit numerals for ‘three’ through ‘ten’ are opaque, and I suspect at least some of them to be neologisms without any clear connection to other morphemes in the language. Although the Makada-dialect forms are the only ones from the area in which spirits have been explicitly named as the source of the system, other dialects of Kuanua, as well as the closely related Vinitiri language, attest to the presence of esoteric counting systems, viewed by speakers as being “old” or “traditional” (see Barlow, in preparation, for more on “esoteric numerals” in Austronesian languages). These forms are given in Table 7, with the Makada-dialect spirit forms for comparison. Table 7: Esoteric Tolai numeral systems (based on Barlow 2024:17, Lean 1985:21, and personal communications) Makada (spirits) Matupit (?) (Lean) (Bonnie Emos, p.c.) (Martin Maden, p.c.) Tavui #3 (Roland, p.c.) Vinitiri (Kinakeva, p.c.) 1 təke tike tike take tike tike 2 urə-de ura-det ura-de ura-de urə-de vura-de 3 təmapu tamapu kamapu tomapu təməpu tamapu 4 kaiti kaiti kaiti kaiti təməp-o ra-matam 5 kait-a kait-a kait-e kait-a kaiti limb-limbur 6 libur i-luba libuŋ i-liɡur kait-a 7 matam ra-matam makam mati liɡur 8 tumam i-tumam tumam tumak matam 9 tərərek pererek pererek pererek tərərek 10 mu-de i-dumek dumek i-dumek mu-de The “Matuput (?)” column comes from Lean (1985:21), who writes: “A number of CSQ [Counting Systems Questionnaire] informants, in particular those from Matupit Island, report the existence of an old counting system which is now rarely used.” The “Bonnie Emos” column comes from a personal communication with this Kuanua speaker from 2023 (she did not provide any dialect information). The “Martin Maden” column comes from a personal communication with this Kuanua speaker in 2024, who refers to the numerals as belonging to an “ancient” or “archaic” language. The “Tavui #3” column comes from a personal communication from 2024 with Johnny Roland, who had interviewed an elderly woman from Tavui #3 village, who gave him these “ancestral” numerals. Finally, the “Vinitiri” column comes from a personal communication from 2022 with Paschalis Kinakava, a speaker of Vinitiri, who notes that “there is an older language of Vinitiri for counting”; he believed the system continued up to ‘ten’ but could only remember the first five numerals. Vinitiri (sometimes called “Minigir”) is closely related to Kuanua (Van Der Mark 2007). Lawrence Vue & Malcolm Ross’s Tolai wordlist, a contribution to the ABVD (Greenhill, Blust & Gray 2008), based on data from Ratavul village, also reveals similar forms: tike ‘one’, urade ‘two’, tamapu ‘three’, kaiti ‘four’, kaita ‘five’ (these are presented along with more regular-looking Kuanua numerals). The various sets in Table 7 exhibit interesting and highly irregular sound correspondences. Bonnie Emos’s version has k instead of t word-initially in ‘three’ and word-medially in ‘seven’, and it has ŋ word-finally in ‘six’, as opposed to r or Ø. Martin Maden’s version has ɡ instead of b word-medially in ‘six’, and it has k word-finally in ‘eight’, as opposed to m. The Makada version has t word-initially in ‘nine’, whereas the others all have p. The Tavui #3 set is notable in having an “extra” form, təməpo ‘four’, which then “displaces” the numerals ‘five’ through ‘eight’, such that they correspond to the numerals ‘four’ through ‘seven’ in the other Kuanua sets. Thus, the pair of numerals təməpu, təməpo ‘three, four’ in this version is similar to the pair kaiti, kaita ‘five, six’, in that the second form of each pair is apparently derived from the first by means of altering the final vowel. The Tavui #3 set is then “missing” a form similar to tumam ‘eight’ entirely, thereby realigning with the other Kuanua sets for ‘nine’ and ‘ten’. The Vinitiri set also exhibits some apparent “displacements”: ramatam ‘four’ corresponds to ‘seven’ in the other sets, and limblimbur ‘five’ corresponds to ‘six’ in the other sets, having additionally undergone reduplication in Vinitiri. Since the speaker could only remember five forms, these apparent displacements may result from the truncation of a longer list due to some forms having been forgotten. 1.6 Ulithian (Micronesia) Lessa (1980:8–16) records (mainly in English) a folktale from Ulithi Atoll called ‘Haluwai’, in which the eponymous hero travels from Yap to the Sky World (the destination of souls after death). There, he comes upon a blind old woman with man-eating children, who is counting her twenty taro tubers. Her counting method (based on Lessa 1980:9) is given in Table 8, along with the regular human system (based on Walsh’s contribution to Chan et al. 2019). Table 8: Ulithian numeral systems (based on Walsh’s contribution to Chan et al. 2019 and Lessa 1980:9) People Analysis Spirits (?) Analysis 1 se-iu ‘one’-CLF s-oθ < se- ‘one’ + ioθ ‘taro’ (?) 2 ruːo-u ‘two’-CLF faŋ (?) 3 sel-uː ‘three’-CLF lim < ‘five’ (?) 4 fa-u ‘four’-CLF rot͡ʃ (?) 5 lɪmo-u ‘five’-CLF le (?) 6 wɔːl-uː ‘six’-CLF iel (?) 7 fɪs-uː ‘seven’-CLF sau (?) 8 waːl-uː ‘eight’-CLF faθ (?) 9 θuwo-u ‘nine’-CLF li (?) 10 sei-ɡ 1[×]10 waroŋ (?) 11 sei-ɡ-mʌ-se-ruː 1[×]10-‘and’-1-CLF titi (?) 12 sei-ɡ-mʌ-ruːo-u 1[×]10-‘and’-2-CLF hafta (?) 13 sei-ɡ-mʌ-sel-uː 1[×]10-‘and’-3-CLF fali (?) 14 sei-ɡ-mʌ-fa-u 1[×]10-‘and’-4-CLF powa (?) 15 sei-ɡ-mʌ-lɪmo-u 1[×]10-‘and’-5-CLF laŋwe (?) 16 sei-ɡ-mʌ-wɔːl-uː 1[×]10-‘and’-6-CLF marue (?) 17 sei-ɡ-mʌ-fɪs-uː 1[×]10-‘and’-7-CLF t͡ʃou (?) 18 sei-ɡ-mʌ-waːl-uː 1[×]10-‘and’-8-CLF wela < ‘eight’ (?) 19 sei-ɡ-mʌ-θuwo-u 1[×]10-‘and’-9-CLF hauti (?) 20 ruːe-ɡ 2[×]10 haut-a 19+[1] (?) As in other Micronesian languages, the numeral system in Ulithian is a fairly regular decimal system and requires little comment. Different classifier suffixes are used with the set of numeral bases depending on the referent being counted (Sohn & Bender 1973:201–202; Bender & Beller 2006:389). The regular forms in Table 8 probably reflect the “general object” classifier suffix (cf. Sohn & Bender 1973:202). It is not clear whether there is or was a general belief in supernatural beings (or other inhabitants of the Sky World) having a special system of counting, or whether the forms given in Table 8 are particular to this one story. Lessa (1980:9 [fn. 6]) provides the following note: The numbers here used in counting are not Palauan, Yapese, Ulithian, Woleaian, Puluwatan, Trukese, or any other language of the Carolines of which I am aware. Melchethal [Lessa’s key informant] said that he did not know where they came from, but a suggestion that they are nonsense words came from the narrator himself, Taiethau, who said that he did not know their origin but remembered them because they are used by children. However, he was the one who taught the children! He said that children use the numbers in fun. As in Paamese (§1.4), these putative spirit numerals continue until (at least) ‘twenty’. Few of them bear any close resemblance to the regular human forms. The spirit form soθ ‘one’ may derive from the root se- ‘one’ and the word ioθ ‘taro’ (as, indeed, the woman in the story is counting taro). However, I have not found evidence aside from Lessa (1980) that means ‘taro’ in Ulithian. In fact, Sohn & Bender (1973:245) give apparently the same form ( ‘one’) as the first of a set of “rapid counting numerals”. Some of the other numerals used by the old woman in the Sky World may have connections to regular numeral forms. Although a connection between the old woman’s wela ‘eighteen’ and the regular numeral wɔːluː ‘eight’ makes sense (given a decimal numeral system), any connection between her lim ‘three’ and the regular lɪmou ‘five’ would be harder to explain. Finally, the pair of numerals hauti ‘nineteen’ and hauta ‘twenty’ is notable in sharing the same sort of phonological relationship as found in the Kuanua spirit numerals kaiti ‘four’ and kaita ‘five (§1.5). Largely, though, the numerals used by the old woman in the story remain opaque to me. Whether this particular story encodes a list of “spirit numerals” or not, it is clear that, in Ulithian culture, as elsewhere in the Caroline Islands, a high spiritual value is placed on the act of counting, as suggested by the practice of divination through counting knots tied in coconut fronds (cf. Girschner 1912; Lessa 1959; Alkire 1970). Lessa (1961:434–444; 1980:87) describes 10 as a formulistic number in Ulithian folk tales and identifies parallels in stories from around the Pacific. 1.7 Kosraean (Micronesia) More than 2,500 kilometers east of Ulithi Atoll, on the Micronesian island of Kosrae, another story with supernatural counting is recorded by Sarfert (1920:451–452), entitled “Šikḗinfuṅ”. In the story, the eponymous hero meets a blind old man, and—like the protagonist of the story from Ulithi Atoll (§1.6)—he cures the old man of his blindness. In gratitude, the old man devises a plan to protect Šikḗinfuṅ from being devoured by his ten man-eating sons, whom he identifies as kōt spirits. In the story, one of these kōt spirits counts himself and his nine brothers. Sarfert (1920:452) writes: “Der erste Bruder zählte in ganz andere Weise, als wir in Kušae zählen; er zählte nach Kōt-Art” [“The first brother counted in a completely different way from how we in Kosrae count; he counted in the kōt fashion”]. In addition to providing the spirit numerals used in this story, Sarfert (1920:495–496) describes a Kosraean game called sī́ro, which involves counting. In the game, each participant makes pairs of holes in the sand with their index and middle fingers, rapidly and without counting. Then, the players count the holes they have made, one by one, winning if they have managed to create exactly ten holes or a multiple thereof. In this game, a special set of numerals is used to count. Sarfert (1920:496) notes: “Der Sinn der Zahlwörter weiß man nicht” [“The meaning of the numerals is not known”]. Table 9 presents the kōt spirit numerals and the sī́ro sand game numerals, along with the regular Kosraean counting method, based on Lee (1975:118–119, 124).2 Table 9: Kosraean numeral systems (based on Lee 1975:118–119, 124 and Sarfert 1920:452, 496) People Analysis Spirits Analysis Sand game Analysis 1 ʂa ‘one’ ʂắ-ka suffix sī́-ro ‘1’-‘2’ (?) 2 lo ‘two’ rŏ́-ka suffix mắtan ‘eyes’ (?) 3 tol ‘three’ tŭ́r-tur reduplication tŏ́-fa ‘3’-‘4’ (?) 4 æŋ ‘four’ pắta < POC *pat[i] ‘four’ (?) ŋ-el ‘4’-‘5’ (?) 5 lʌm ‘five’ sī́re ‘lucky’ (?) nom ‘your (EDIBLE)’ (?) 6 on ‘six’ si-pā́-re infix (?) kaŋ ‘eat’ (?) 7 it ‘seven’ rḗti (?) sī́e ‘one’ (?) 8 o͡al ‘eight’ rḗt-a 7+[1] (?) ʂắ-k ‘one’ (?) (= ‘9’) 9 juk ‘nine’ sĭ́r-sir < POC *siwa ‘nine’ (?) ʂắ-k ‘one’ (?) (= ‘8’) 10 si-ŋʌ-ul 1×10 eä̆́ (?) lū́o 2[×5] (?) Again we see that the lower numerals in the spirit system are the most similar to their respective non-spirit forms: ʂaka ‘one’ is simply a suffixed form of ʂa ‘one’; roka ‘two’ has the same suffix -ka but on a form of lo ‘two’ that has r instead of l; turtur ‘three’ appears to derive from a reduplicated form of tol ‘three’, again with r instead of l, in addition, perhaps, to having a different vowel quality. The next numeral, pata ‘four’, looks suspiciously like a highly archaic retention of POC *pat[i] ‘four’, of which the regular numeral æŋ ‘four’ is itself a reflex. Beyond ‘four’, the spirit numerals become quite challenging to analyze. Could sire ‘five’ be related to sire ‘lucky, fortunate’ (cf. Lee 1976:309)? Is sipare ‘six’ an infixed form of sire ‘five’, perhaps incorporating the focus marker pa (cf. Lee 1976:241)? The consecutive numerals reti ‘seven’ and reta ‘eight’ bear the same phonological relationship to each other as do hauti ‘nineteen’ and hauta ‘twenty’ in the Ulithian spirit system (§1.6) and kaiti ‘four’ and kaita ‘five’ in the Kuanua spirit system (§1.5). Could sirsir ‘nine’ be a reduplicated form of a reflex of POC *siwa ‘nine’, or perhaps a reduplicated (and apocopated) form of sire ‘five’? The game that Sarfert (1920:495–496) describes is not said to have any connection with spirits. Nevertheless, it perhaps shares formal similarities with some of the spirit systems under discussion. For instance, some of the numerals used in this game possibly reflect sequential number word formation (cf. §1.1, §1.2). However, if so, then it is of a peculiar “anticipatory” sort, in which phonological material from the following numeral (or its etymon) is incorporated into the form of the preceding numeral. For example, tŏ́fa ‘three’ may derive its first syllable from the regular numeral tol ‘three’ but its second syllable from an archaic reflex of POC *pat[i] ‘four’; ŋel ‘four’, in turn, may derive its initial consonant from the synchronic regular numeral æŋ ‘four’ and its final consonant from lʌm ‘five’. Perhaps this sort of anticipatory sequential number word formation owes its origin to the pairing nature of the game, in which holes are created in the sand two-by-two. This connection may also be seen in the form sī́ro ‘one’, which is also given as the name of the game itself, and which seems to derive from the combination of reflexes of POC *sa- ‘one’ and POC *rua ‘two. The origins of the other forms in the game are less clear. Could mắtan ‘two’ be mʌtʌn ‘eye’ (in the so-called “construct” form; Lee 1975:69)? The semantic connection between the number 2 and a person’s eyes is clear (and is perhaps also reflected in the Kuanua spirit numeral matam ‘seven’; §1.5). However, possible connections between nom ‘five’ and the second-person edible possessive classifier nɔm (cf. Lee 1975:112) or between kaŋ ‘six’ and kaŋ ‘eat, consume’ (cf. Lee 1976:113) are more difficult to justify on semantic grounds. Perhaps even stranger are the numerals for ‘seven’, ‘eight’, and ‘nine’ used in this game, all of which seem to derive from words for ‘one’, and two of which are identical to each other: sī́e ‘seven’ appears to contain the putative ‘one’ element of the game numeral sī́ro ‘one’ (only with a suffixed -e rather than a putative ‘two’ element -ro); ʂắk ‘eight; nine’ seems to be the regular numeral ʂa ‘one’ with a suffixed element -k (reminiscent of the numerals ‘one’ and ‘two’ of the spirit system, both of which seem to be suffixed with -ka). Such homophony between forms is permissible in a “numeral system” that is used only for serial counting. It would not work for communicating exact quantities (see §3.3). Finally, lū́o ‘ten’ is possibly derived from lo ‘two’, perhaps suggesting a quinary-like construction (that is, with an implied multiplicand of 5). 2 Austronesian spirit numerals beyond Oceanic: Itbayat (Batanes, Philippines) The cases described in §1 represent all the cases of spirit numerals that I have been able to find for languages of the Oceanic subgroup of Austronesian. Outside this subgroup, I have found only a single attested case of Austronesian speakers ascribing distinct numeral forms to supernatural beings. These are speakers of Itbayat, a Batanic language of the Philippines, which, interestingly, is located at essentially the opposite corner of the Malayo-Polynesian world. Itbayat Island is located some 6,500 kilometers northwest of Epi Island (discussed in §1.1). Yamada (1972:46) presents several sets of unusual numeral forms that are used by Itbayat speakers, in addition to a set of forms (“used in daily life”) that regularly reflect the inherited Proto-Austronesian (PAN) forms and a set of forms borrowed from Spanish, which are used for telling time and counting money. Yamada (1972:46) notes that one of these sets “is often referred to as … archaic or semi-archaic numbers, and is rarely used”. Another such set is formed via a regular pattern of reversing the order of all segments in a word, a technique that can be used throughout the lexicon as a form of “speech disguise” (i.e., a secret language, cant, argot, language game, or ludling). Yet another of these unusual numeral sets is formed by metathesizing the syllables of each numeral, which is similarly used for disguising speech involving non-numerals as well (Yamada 1972:46–47). Yamada (1972:46) further presents three versions of unusual numerals that “are called among the Itbayat vilavilaŋ no tawo aŋkakohay ‘counting system of people in older times’ or vilavilaŋ no inannoma ‘ancestral way of counting’, and are also known only to a small number of old people” (p. 48). Most relevant to our present purposes, Yamada (1972:46) presents two similar sets of numerals (“Set 6-a” and “Set 6-b”), describing them as follows (p. 47): Sets 6-a and 6-b seem to be known to a very limited number of old people. The old woman who gave me Set 6-a calls it vilavilaŋ no anito ‘counting or number of anito [fn.: “supernatural being, ghost, spirit, of which people are usually afraid”]’, but she does not know of its use or its significance. It was probably once used as a secret language, for many of the numbers of Sets 6-a and 6-b can be explained by such phenomena as partial reversal of the phonemic shape of the word base (metathesis), loss or change of phonemes, and the like. Yamada (2002:313) provides three such sets of “numerals of the departed souls (anito’s)”, two of which match the sets given in Yamada (1972:46), only with orthographic differences, and the third of which has some distinct forms of its own. Yamada (2002:6–7) provides the following description of anito: A person in the village transforms into an anito (ghost) after he or she dies. People are afraid of the anito in daily life and many folktales in Itbayat reflect the people’s attitude toward the anito. They believe that the anito stays in the takey (wilderness, field) because he or she died with some worldly desires left unsolved or unsatisfied in the hili (village, space for the living persons). Until these worldly desires are satisfied, the anito remains in the wilderness which surrounds the village of tawo (living people), and affects or controls them to such a degree as it actually prescribes the way of daily life for the people. The anito gives instructions to the people through such means as dreams, illnesses, accidents, mysteries, or practical jokes. The mamihay (healer) or the mangaptos (masseur / masseuse) is supposed to interpret these happenings and give people appropriate instructions, so that the anito may be propitiated and presumably leave the island for the eternal world, that is, the hawa (sea) where there are among (fish) into which the anito probably transforms. Yamada (2002) also gives six sets of “ancestor’s numerals” (p. 313), sets of “metathesized and reversed numerals” (pp. 313–314), a set of “archaic numerals for counting things nearby” (p. 312), and a set of “numerals for persons, animals, or less frequently for things” (p. 309), in addition to the regular numerals—both the indigenous set and the set of loans from Spanish (pp. 308–309). All these sets of Itbayat numerals are also treated in Yamada (2014:9, 36–37, 179–180, 198–203), including also a seventh set of “ancestor’s numerals”. Table 10 compares the regular Itbayat numerals used by people with three versions of spirit numerals used by anito. Table 11 and Table 12 show some of the additional Itbayat systems. The bracketed forms in the “Metathesized” column of Table 12 were apparently constructed by the author based on the pattern of syllable metathesis found in the other numerals (and elsewhere in the lexicon) (cf. Yamada 1972:46–47). Table 10: Itbayat numeral systems (based on Yamada 2014:198, 202) People Analysis Anito (a) Anito (b) Anito (c) Analysis 1 aʔsa ‘one’ ŋa-ka-sa ma-saŋa na-ka-sa prefix (?) 2 doxa ‘two’ ŋa-joda ma-jada ŋa-joda prefix; metathesis (?) 3 atlo ‘three’ nal-jod talo lal-jot prefix; metathesis (?) 4 aʔpat ‘four’ tap-tap atap ta-tap metathesis; reduplication 5 lima ‘five’ mila mila mila metathesis 6 aʔnəm ‘six’ anim anim mina ə > i ; reversed 7 pito ‘seven’ sipo tipo tipo metathesis 8 waɣo ‘eight’ ɣajo ɣajo ɣajo metathesis; w > j 9 sijam ‘nine’ miːjas miɡa mijas metathesis 10 saː-poɣo 1[×]10 sop sop soːp metathesis (?); reduced Table 11: Itbayat “ancestor’s numerals” (based on Yamada 2014:198, 202) Regular “Ancestor’s numerals” numerals (a) (b) (c) (d) (e) 1 aʔsa itːi itːi itːi itːi saŋasaŋ 2 doxa dojːi dojːi doji dojːi doŋodoŋ 3 atlo ilːo ilːo ilːo ilːo lotolot 4 aʔpat ipːa ipːa ipa ipːa patapat 5 lima malana lamana malana malana limilim 6 aʔnəm idːam idːam idːam idːam nəmənəm 7 pito viriɡo biriɡo biriɡo biriɡom pitipit 8 waɣo salaɡo salaɡo salaɡo salaɡom waɣawaɣ 9 sijam omajam omajam omajam xomajam sinisin 10 saː-poɣo kaloji kalawit͡ɕ kaloji kalawi poɣopoɣ 11 tə Table 12: Itbayat “archaic”, “metathesized”, and “reversed” numerals (based on Yamada 2014:198, 201–203) Regular “Archaic” “Metathesized” “Reversed” 1 aʔsa sax ~ sa saʔa asʔa 2 doxa dox xaʔdo axod 3 atlo tlo (loʔat) olta 4 aʔpat pat (patʔa) tapa 5 lima lim (maʔli) amil 6 aʔnəm nəm (nəmʔa) məna 7 pito pit (toʔpi) otip 8 waɣo waɣ ɣoʔwa owaɣ 9 sijam six (amsi) majis 10 saː-poɣo poɣ ɣosapo oɣopas As Yamada (1972:47–48) notes in his analysis, most of the anito numerals are clearly derived from their regular numeral equivalents. However, unlike other spirit numeral systems, where it is the lowest numerals that most closely resemble regular numerals (§1), the Itbayat systems seem to show the most irregularity in their numerals for ‘one’ through ‘three’. Yamada (1972:48) observes that the form asa (from aʔsa ‘one’) “may be extracted from” the spirit forms ŋakasa ~ masaŋa ‘one’, but that “it is hard to explain the remaining part”, also noting the “alliterative” nature of the forms for ‘one’ and ‘two’ in the spirit systems; the form naljod ‘three’, however, is “unexplainable”. The loss of the glottal stop in the spirit numeral forms seems regular, as it also happens in the forms for ‘four’ and ‘six’: none of the forms in any of the three spirit systems has a glottal stop. The alliterative nature of the first two numerals may suggest prefixed elements (ŋa- in two of the sets, ma- in the other), similar to the (often obscure) prefixation found in the lower numerals of other spirit systems (§1.1, §1.2, §1.3). The spirit forms for ‘two’—assuming a prefix ŋa- or ma- in common with the preceding numeral—contain the root joda ~ jada, which suggests metathesized forms of doxa ‘two’, with a further change of x > j (and a different vowel in the latter variant). These forms thus approximate the “ancestor’s numeral” forms dojːi ~ doji ‘two’ (Table 11). The forms naljod ~ laljot ‘three’ are indeed difficult to explain. However, they appear to contain abbreviated forms of the root joda ‘two’, perhaps with the prefixed element conveying the value of 1 being added to this 2. The word for ‘three’ in set (b) of the anito numerals (talo) is simply a metathesized version of the regular form atlo ‘three’. The higher numerals in the anito numeral sets are all more easily connected to their regular equivalents: atap ‘four’ drives from aʔpat ‘four’ with loss of glottal stop and metathesis, while tatap ‘four’ and taptap ‘four’ are further restructured through reduplication, whether partial or full; mila ‘five’ is formed simply through metathesis of the consonants of lima ‘five’; anim ‘six’ derives from aʔnəm ‘six’ with loss of glottal stop (and a different vowel), while mina ‘six’ is further restructured through the reversed ordering of each of the four segments; tipo ‘seven’ is formed simply through metathesis of the consonants of pito ‘seven’, while sipo ‘seven’ undergoes the additional change of t > s; ɣajo ‘eight’ derives from waɣo ‘eight’ via metathesis of the two consonants, with the additional change of w > j; and mijas ‘nine’ derives from sijam ‘nine’ via the unusual long-distance metathesis of the initial and final consonants, while miːjas ‘nine’ has further undergone vowel lengthening and miɡa ‘nine’ has both lost the final -s and undergone the change of j > ɡ. Finally, sop ‘ten’ appears to be a reduced form of saːpoɣo ‘ten’, perhaps having passed through a stage of vowel metathesis—that is, saːpoɣo became sopaːɣo (?) before losing its ending—and soːp ‘ten’ is similarly derived, only retaining the long-vowel feature of the initial vowel. The “ancestor’s numerals”, although not designated by speakers as being the language of supernatural beings, exhibit similarities with Itbayat’s spirit numerals as well as with the spirit numerals of other Austronesian languages (§1). Like the anito numerals, the “ancestor’s” numerals all lack glottal stops. Yamada (1972:48) notes the frequent correspondence between a in the regular numeral set and i in several of the “ancestor’s” sets. Once this is taken into consideration, it is relatively simple to find the less regular changes in the sets (a) through (d): s > tː (perhaps showing compensatory lengthening) in ‘one’, x > j in ‘two’, t > l in ‘three’, t > Ø in ‘four’, and n > d in ‘six’. The forms lamana ~ malana ‘five’ exhibit the opposite correspondence of having a where the regular numeral lima ‘five’ has i, in addition to exhibiting an extra syllable -na and (in the case of the latter form) metathesis of the first two consonants. The forms for ‘six’ through ‘nine’ in the “ancestor’s” sets (a) through (d) all vaguely resemble the equivalent regular numeral forms (forms for ‘seven’ somewhat less so), but it is challenging to find any pattern. They have the appearance of deriving from portmanteau words with neologisms as beginnings and the regular numerals as endings. The forms kaloji ~ kalawit͡ɕ ~ kalawi ‘ten’ are obscure to me. Set (d) is notable for apparently having a special monomorphemic form for ‘eleven’, whose etymology is also opaque, although it shares t with the form for ‘one’ in that set. The forms in set (e) of the “ancestor’s numerals” are relatively straightforward. They are all fully reduplicative forms with additional harmonic linking vowels. More specifically, they appear most similar to the set of “archaic forms” (Table 12), all of which are clearly abbreviated forms of the regular numerals. There are some peculiarities, however. Thus, while saŋasaŋ ‘one’ seems to reduplicate the “archaic” form sax ‘one’, there is an additional correspondence of x : ŋ, which makes the reduplicated form reminiscent of the internal string -saŋ- in the anito numeral masaŋa ‘one’. Likewise, doŋodoŋ ‘two’, when compared to the “archaic” form dox ‘two’, exhibits this x : ŋ correspondence. The form lotolot ‘three’ appears to reduplicate a metathesized form tlo ‘three’, while patapat ‘four’ derives from reduplication of pat ‘four’, limilim ‘five’ from reduplication of lim ‘five’, nəmənəm from reduplication of nəm ‘six’, pitipit from reduplication of pit ‘seven’, and waɣawaɣ ‘eight’ from reduplication of waɣ ‘eight’. However, sinisin ‘nine’ is once again irregular, here apparently exhibiting a correspondence of x : n when compared with the “archaic” form six ‘nine’. Finally, poɣopoɣ ‘ten’ derives from reduplication of poɣ ‘ten’. The forms in set (f) of the “ancestor’s numerals” are phonetically the closest to the regular numerals, with pito ‘seven’ and sijam ‘nine’ indeed being identical to the respective regular forms. The numerals asa ‘one’, apat ‘four’, and anəm ‘six’ simply lack the glottal stop; atdo ‘three’ and dima ‘five’ exhibit a correspondence with their regular equivalents of d : l; waxo ‘eight’ exhibits a correspondence of x : ɣ, while polo ‘ten’ has the correspondence l : ɣ, in addition to lacking the initial saː-; perhaps dwa ‘two’ is the most divergent, exhibiting the correspondence w : ox. The forms in set (g) of the “ancestor’s numerals”, on the other hand, are the most divergent. Only ati ‘one’ resembles other forms of ‘one’ (in particular, those of sets (a) through (d), however retaining the initial a- of the regular aʔsa ‘one’). The remaining forms are all rather peculiar, including the four‑syllable-long forms bisalokton ‘three’ and saravanda ‘nine’, along with the monosyllabic yet opaque dok ‘four’. There may be some internal relationships among the forms, however. Thus, pinasil ‘five’ possibly derives from the immediately following numeral paːsil ‘six’ via infixation of , a form that perhaps also occurs within kinilaw ‘seven’. It is remarkable how many variations of numeral systems there are in Itbayat. To conclude this section it may be mentioned that Itbayat is also notable for employing the crosslinguistically unusual pattern of overcounting in its higher numerals, such that a numeral like ‘eleven’ is formed as ‘one at the 2nd stage or step’ (Yamada 2002:308; 2014:198)—that is, numerals between multiples of 10 are counted in anticipation of the next multiple of 10 to be counted. 3 Spirit numeral systems outside Austronesian How widespread is it for cultures to ascribe special counting systems to supernatural beings? Are these eight or so cases from Austronesian languages of Vanuatu, New Britain, Micronesia, and the Philippines incredibly rare, or do we find similar phenomena elsewhere? To try to answer this question, I searched through the eHRAF World Cultures database of the Human Relations Area Files for cooccurrences of terms like “spirit”, “ghost”, “supernatural” (and so on) with terms like “numeral”, “number”, “counting” (and so on) in ethnographic descriptions of over 360 cultures.3 I performed a similar search through digital versions of grammatical descriptions of languages from around the world.4 This was, of course, in addition to all the standard methods of secondary research. Although I cannot claim this to have been an exhaustive search, I am nevertheless surprised by how few attestations of spirit numeral systems I was able to find. Beyond Austronesian, I was able to find spirit numerals associated with just three languages: (1) Chepang, a Sino-Tibetan language of Nepal; (2) Garo, a Sino-Tibetan language of India and Bangladesh; and (3) Bilua, an isolate of the Solomon Islands. 3.1 Chepang (Sino-Tibetan; Nepal) Although numerals for ‘one’ through ‘ten’ were recorded for Chepang in the mid-nineteenth century (Hodgson 1848:657; 1857:322), there are no longer any speakers who regularly use the inherited Sino-Tibetan numerals beyond ‘three’ or ‘five’, instead using borrowed numerals from the Indo-European language Nepali (Caughley 1988:197). Chepang speakers also have a specific duodecimal counting system, which is used “in certain situations, (especially for tallying game such as birds and bats)” (Caughley 1988:197; cf. Mazaudon 1982:12, 29; 2009:139–141; Hammarström 2010:38 [n. 10]). These numerals, whose forms are to be found in Caughley (1972:2, 10) and Hale (1973:47, 202–204), contain Nepali loans for ‘six’ through ‘eleven’, but higher numerals in this system are built with the anchor hale ‘12’, yielding constructions such as 1[×]12 for ‘twelve’, 1[×]12[+]8 for ‘twenty’, and 2[×]12[+]5 for ‘twenty-nine’. The addends higher than 5 in these constructions are likewise loans from Nepali (e.g., the atom ‘eight’ in the complex form for ‘twenty’). Despite the contemporary absence of inherited Sino-Tibetan forms for higher numerals, Caughley (1988:198) reports of a “system regarded now by Chepang speakers as a mythological spirit system of counting”. Table 13 gives the contemporary Chepang forms based on Caughley (1972:2, 10), the older forms based on Hodgson (1848:657), and the spirit forms based on Caughley (1988:198–199). Table 13: Chepang numeral systems (based on Hodgson 1848:657 and Caughley 1972:2, 10; 1988:198–199) People (older) People (newer) Analysis Spirits Analysis 1 ja-d͡zo jat-d͡zoʔ ‘one’-CLF ja ‘one’ 2 nʰi-d͡zo nis-d͡zoʔ ‘two’-CLF ɡi ‘two’ 3 sum-d͡zo sum-d͡zoʔ ‘three’-CLF sum ‘three’ 4 ploi-d͡zo pləj-d͡zoʔ ‘four’-CLF kləj ‘four’ 5 puma-d͡zo poŋa-d͡zoʔ ‘five’-CLF poŋa ‘five’ 6 kruk-d͡zo t͡sʰə-ɡota < Nepali prek ‘eight’ (?) 7 t͡sana-d͡zo sat-ɡota < Nepali taɡu-d͡zi ‘nine’-‘ten’ (?) 8 prap-d͡zo ʔat-ɡota < Nepali hlukum ‘eleven’ (?) 9 taku-d͡zo nəw-ɡota < Nepali trak ‘twelve’ (?) 10 ɡjib-d͡zo dəs-ɡota < Nepali 11 ʔeɡʰarə-ɡota < Nepali 12 jat-hale 1[×]12 The column showing the “newer” Chepang numerals includes the formulation for 12 that is occasionally used in specialized kinds of counting, although regular counting in Chepang would proceed with forms borrowed from Nepali. Mazaudon (1982:12; 2009:140) identifies the element hale ‘twelve/dozen’ in this form as cognate with forms like khal ~ kal ‘twenty/score’ found in other Sino-Tibetan languages (especially Bodish) with vigesimal numeral systems. This may suggest a reanalysis of the word used as an anchor for counting in cycles of 20 as an anchor for counting in cycles of 12. The endings -d͡zo and -ɡota are numeral classifiers, the latter apparently borrowed from Nepali (cf. Pons 2022:639), along with the numerals ‘six’ through ‘eleven’. The spirit numerals exhibit several similarities to the regular Chepang numerals (uninfluenced by Nepali). The spirit forms for ‘one’, ‘three’, and ‘five’ are identical with one or both of the equivalent human forms included in the table. The spirit form for ‘two’ may reflect an irregular change of n > ɡ. Similarly, the spirit form for ‘four’ may reflect an irregular change of p > k. Caughley (1988:198) proposes that the spirit form recorded as taɡud͡zi ‘seven’ is actually a conflation of two numerals: taɡu (corresponding to Hodgson’s taku ‘nine’) and d͡zi (corresponding to Hodgson’s ɡjib ‘ten’). He further suggests that the spirit system represents a truncation of an older, possibly duodecimal, system, now missing the original forms for ‘six’ and ‘seven’. Under this proposal, the spirit form prek ‘six’ corresponds to Hodgson’s prap ‘eight’, and the spirit forms hlukum ‘eight’ and trak ‘nine’ presumably reflect older atomic forms used for ‘eleven’ and ‘twelve’, respectively. Alternatively, considering the aforementioned irregular correspondences of n : ɡ and of p : k (perhaps instances of taboo deformation?), I wonder whether prek ‘six’ in the spirit system could correspond with Hodgson’s kruk ‘six’—in other words, a correspondence of k : p, the inverse of what is found in ‘four’ (i.e., the spirit system involves some “swapping” of voiceless labial and velar stops). Similarly, perhaps taɡud͡zi ‘seven’ in the spirit system corresponds to Hodgson’s t͡sana ‘seven’, here again exhibiting the correspondence of n : ɡ that is also found in ‘two’. Following this analysis, the spirit form hlukum ‘eight’ remains somewhat mysterious, but trak ‘nine’ is not unlike Hodgson’s taku ‘nine’. Although it is unclear whether the Chepang “mythological spirit system of counting” is evidence of an earlier duodecimal system, it does seem to reflect an older system with inherited Sino-Tibetan forms that was partially replaced by elements borrowed from the Nepali numeral system. 3.2 Garo (Sino-Tibetan; India and Bangladesh) According to Burling (2003a:244–246), speakers of Garo’s Mandi dialects in Bangladesh no longer know the inherited Garo numerals beyond ‘five’ and instead exclusively borrow these numerals from the Indo-European language Bengali. In the Achik dialect of Northeast India, however, people use a decimal system with inherited forms allowing one to count up to 999, with only words for ‘thousand’ and higher numerical orders borrowed from Bengali. There is also an alternative and mostly forgotten system of counting with vigesimal elements, whereby the numerals ‘forty’, ‘sixty’, and ‘eighty’ are formed as 20×2, 20×3, and 20×4 rather than being constructed by means of multiplication with 10. I have seen no reports of a traditional Garo belief in spirit numeral systems. The only possible indication of such systems is found in a folktale called “Dejan and the matchadu”, told in the village of Salpara in Goalpara District, Assam, and recorded (primarily in English) by Rongmuthu (1960:214–220). A matchadu is a mythological creature that can change form between that of a human and that of a tiger. In the story, a traveling party of ten men spends the night in an old village. As they sleep, a ghost counts the men, one by one, although excluding from the count the coward Dejan, who was too fearful to fall asleep. On the return trip, the men stay the night in the same village. This time, a matchadu counts all ten men, ending with the coward Dejan, still awake, whom the matchadu carries off as a servant. Table 14 provides the Garo (Achik dialect) numerals for ‘one’ through ‘ten’, based on Burling (2003a:245), along with the numerals used by the ghost and the matchadu in the story recorded by Rongmuthu (1960:215–217).5 Table 14: Garo numeral systems (based on Burling 2003a:245 and Rongmuthu 1960:215–217) People Analysis Ghost Analysis Matchadu Analysis 1 sa ‘one’ a-sa prefix a-sa prefix 2 ɡini ‘two’ a-ɡin prefix a-ɡin prefix 3 ɡittam ‘three’ d͡ʒor-a ‘pair’-‘1’ (?) d͡ʒor-a ‘pair’-‘1’ (?) 4 bri ‘four’ d͡ʒo-ɡin ‘pair’-‘2’ (?) d͡ʒo-ɡin ‘pair’-‘2’ (?) 5 boŋa ‘five’ ila-ʃi α-γ (?) bina < ‘five’ (?) 6 dok ‘six’ nia-ʃi β-γ (?) kʰawa (?) 7 sini ‘seven’ il-ɡoʃ α-δ (?) o-na (?) 8 t͡ʃet ‘eight’ nia-ɡoʃ β-δ (?) o-ŋ-ɡet (?) 9 sku ‘nine’ ɡoʃ δ (?) o-la-ʃi (?) 10 t͡ʃikiŋ ‘ten’ ɡaŋ-ɡet (?) Of the set of numerals used by the ghost, Rongmuthu (1960:372 [n. 1]) notes: “No particular earthly meanings are attached to these words except that they are generally taken as the Ghost names of numerals from 1 to 9.” Concerning the numerals used by the matchadu, Rongmuthu (1960:372 [n. 2]) writes: “These words bear no particular mundane meanings to human beings; but they are merely taken as the names of numerals of the tiger from 1 to 10.” These notes might suggest some general familiarity with one or more supernatural forms of counting, but this is not clear. As with other spirit systems, we again see that the lowest numerals bear the greatest similarity to their human equivalents. The ghost’s and matchadu’s ‘one’ and ‘two’ are the same as people’s ‘one’ and ‘two’, only prefixed, in both instances, with a-. The two supernatural beings also share the same forms for ‘three’ and ‘four’, which potentially contain the element d͡ʒora- ~ d͡ʒura- ~ d͡ʒur, a “classifier for pairs, teams of animals”, borrowed from Bengali (cf. Burling 2003b:52). In this way, ‘three’ possibly derives from ‘[a] pair [and] one’ (with an elided form of sa ‘one’) and ‘four’ possibly derives from ‘[a] pair [and] two’ (with an elided form of d͡ʒor ‘pair’). Beyond the numeral ‘four’, the two supernatural systems diverge. The ghost’s numerals for ‘five’ through ‘nine’ (we lack ‘ten’) show clear internal relationships, and they appear to be structured in pairs. Still, it is difficult to make exact sense of them: ‘five’ and ‘six’ both end in -ʃi, whereas ‘eight’ and ‘nine’ both end in -ɡoʃ (which, when alone, is the numeral ‘nine’); and of these pairs, the first form begins with il(a)- and the second form begins with nia-. The matchadu’s numeral bina ‘five’, on the other hand, is perhaps an altered form of the regular numeral boŋa ‘five’, while his form kʰawa ‘six’ appears to be unlike any other. The matchadu’s numerals ‘seven’ through ‘ten’ perhaps exhibit some internal relationships, but again these are unclear: ‘seven’, ‘eight’, and ‘nine’ all begin with o-, and ‘eight’ and ‘ten’ both end with -ɡet. The matchadu’s numeral ‘nine’, furthermore, ends with -ʃi, the form found at the end of the ghost’s ‘five’ and ‘six’. 3.3 Bilua (isolate; Solomon Islands) Bilua is one of four non-Austronesian languages spoken in the Solomon Islands. Like the other languages of the Solomon Islands—both Austronesian and non-Austronesian—Bilua has a decimal numeral system, with synchronically unanalyzable monomorphemic forms for the numerals ‘one’ through ‘ten’ and higher numerals that use 10 as an anchor, yielding formulations like 10[+]1 for ‘eleven’, 3[×]10 for ‘thirty’ and 3[×]10[+]2 for ‘thirty-two’ (Obata 2003:53–54). There is also a set of numerals in Bilua that Obata (2003:54) describes as follows: “Only some old people remember these forms. According to local people, these are forms which devils or spirits used to use and that is why local people do not use them any more.” Table 15 presents these spirit numerals along with the regular Bilua forms. Table 15: Bilua numeral systems (based on Obata 2003:53) People Analysis Spirits Analysis 1 o-madeu ‘one’ madeu reduced 2 o-muɡa ‘two’ muɡa reduced 3 zouke ‘three’ ke reduced (= ‘5’) 4 ariku ‘four’ ariku ‘four’ 5 sike ‘five’ ke reduced (= ‘3’) 6 βari-mud͡ʒa < [5]+1 (?) mud͡ʒa reduced 7 sike-ura < ‘five’ + POC *rua ‘two’ (?) ke-ura reduced 8 sio-tolu < ‘five’ + POC *tolu ‘three’ (?) tolu reduced 9 siaka-βa < ‘five’ + POC *pat[i] ‘four’ (?) ka-βa reduced 10 toni ‘ten’ (?) ni reduced The spirit numerals are very clearly derived from their non-spirit equivalents. Except for ariku ‘four’, which is the same in both the regular system and the spirit system, each spirit numeral is simply an elided form of the regular version, missing the first one or two syllables. There is something rather telling in this, though. Although the spirit forms can be successfully recited as a counting series from ‘one’ to ‘ten’, they would perform poorly as actual numerals in discourse. For example, the spirit numerals would be deficient for quantifying nominal referents or providing answers to the question ‘how many?’, since the forms for ‘three’ and ‘five’ are homophonous, something that should never occur in a numeral system (cf. §1.1, §1.7). Although the Bilua spirit numerals have somewhat unexciting etymologies, the regular numerals in the language point to a truly remarkable history. Although synchronically unanalyzable, the numerals ‘six’ through ‘nine’ seem to derive historically from quinary-like constructions, whereby forms representing 1 through 4 are added to an anchor of 5. Quite bizarrely, though, the forms representing the numbers 2, 3, and 4 in the construction of ‘seven’, ‘eight’, and ‘nine’ appear to be borrowed from an Austronesian language (cf. Owens & Lean 2018:177). Thus, sikeura ‘seven’ derives from Bilua sike ‘five’ and a metathesized form of POC *rua ‘two’; siotolu ‘eight’ derives from an altered form of Bilua sike ‘five’ and POC *tolu ‘three’; and siakaβa ‘nine’ derives from an altered form of Bilua sike ‘five’ and an apocopated form of POC *pat[i] ‘four’. Bilua βarimud͡ʒa ‘six’, however, may derive from wholly indigenous material, assuming the ending mud͡ʒa is related to omadeu ‘one’ and the beginning βari is a verb meaning ‘increase’ (cf. Obata 2003:280). Thus, ‘six’ would derive from an expression meaning something like ‘increase [by] one’, with the anchor of 5 being implied—thus, [5]+1. The partial borrowing of Oceanic material in the Bilua numerals to create quinary-like constructions is especially noteworthy, since, although many Oceanic languages exhibit quinary-like features in their numerals ‘six’ through ‘nine’, those of the Solomon Islands actually retain the inherited monomorphemic forms for these numerals (Barlow 2023). 4 Conclusion Although the dozen or so “spirit numeral systems” presented here likely owe their origins to vastly different sources and sociocultural contexts, it is remarkable that we find recurring formal features among them. Several systems seem to exhibit archaic retentions of numeral forms (§1.1, §1.2, §1.4, §1.7[?], §3.1). In some cases, the spirit numerals are derived via phonological reduction (§1.5, §2, §3.2), while in other cases they are derived via elaboration, whether through prefixation (§1.1, §1.2, §1.3, §2[?], §3.2), suffixation (§1.1, §1.3, §1.5, §1.6[?], §1.7), or even possibly infixation (§1.1[?], §1.7[?]). We also find phonological processes like reduplication (§1.1, §1.3, §1.7, §2) and metathesis (§1.1, §1.5, §2). One of the more remarkable recurring patterns is that of sequential number word formation (§1.1, §1.2). In a large number of spirit systems, there are morphemes that are either recycled from elsewhere in the lexicon or else are neologisms (§1.1, §1.2, §1.3, §1.4, §1.5, §1.6, §1.7, §3.2). Even when the morphemes are obscure, there may be internal patterns within a system—in particular within consecutive pairs of numerals—such as final vowel alternation (§1.4, §1.5, §1.6, §1.7) and recycled beginnings or endings (§1.1, §3.2). In two cases, there is evidence of a particular phonation or prosody that is associated with the spirit numeral system (§1.1, §1.5). It is reasonable to hypothesize that some of the apparent archaic retentions indicate the retention (in an albeit esoteric and narrow domain) of otherwise abandoned numeral systems. In the case of Chepang (§3.1), we have good evidence that this has happened: the speakers of the language have been documented as using Sino-Tibetan numeral forms in the 1840s but were reported in the 1980s to have entirely replaced all numerals above ‘five’ with loans from Nepal. The older forms—it seems—nevertheless survive in the spirit numeral system. There is no way of knowing what form the numerals in the isolate Bilua took before Austronesian speakers arrived (perhaps sometime after about 3,200 years ago; Sheppard, Walter & Roga 2010:97), but they were certainly different, as there has clearly been borrowing from Austronesian in the numerals ‘six’ through ‘nine’ (§3.3). Therefore, although the spirit forms (which are simply reduced versions of the regular numerals) are not themselves archaisms, their mere existence may point to a history of speakers shifting from one form of counting to another, or perhaps adopting a more highly conventionalized numeral system when previously only a few (low) numerals were lexicalized. Under this theory, the fact that speakers remember a “spirit” system may be the outcome of a generations-old cultural memory of the arrival of foreign-looking “spirits”, who, among other peculiarities, had an elaborate decimal counting system. It is tempting to extend this notion to the Oceanic languages of Melanesia. Here, we know that some formerly non-Austronesian-speaking populations ultimately came to speak Oceanic languages, whether through intermarriage or through language shift (cf. Bellwood 2001:40–41; Pawley 2002:267; Ross 2014). Thus, we could again imagine a situation in which non-Austronesian speaking groups—many of whom likely lacked highly conventionalized numeral systems (Barlow 2023)—witnessed decimal counting for the first time, along with other cultural practices, when Austronesian speakers arrived. Under this theory, although a language like Kuanua (§1.5) is undoubtedly Oceanic, its speakers would have had ancestors who were (at least in part) members of groups who witnessed the arrival of perhaps “spirit”-like foreigners with decimal counting systems, and this memory has survived in the recitation of spirit numeral systems (irrespective of whether the forms in the systems have any particular antiquity themselves). An alternative theory can be made without any appeal to non-Austronesian speakers encountering decimal numeral systems for the first time. Early Oceanic speakers in Melanesia likely employed a highly conventionalized verbal decimal numeral system, inherited from Proto-Austronesian (cf. Blust 2013:278), alongside physical tallying practices structured around the five digits of each hand (and possibly of each foot), with the decimal systems likely only having been used for ceremonial purposes (Ross 2023b:515, 529). Many Oceanic languages ultimately lost the inherited decimal system, in many cases “rebuilding” their numeral systems with 5 as an anchor—that is, the physical digit-tallying practices likely influenced verbal practices in which the numerals ‘six’, ‘seven’, ‘eight’, and ‘nine’ were formed as 5+1, 5+2, 5+3, and 5+4. In other words, while some Oceanic speaker communities preserved the inherited decimal numeral system (perhaps extending its domains of use in the modern era), other Oceanic speaker communities at some point in their past lost at least some components of this inherited system. Following this train of thought, the spirit numeral systems with archaic retentions of otherwise lost forms are the last vestiges of a system that has otherwise been completely lost by the speaker community. This certainly seems to be the case for one of the Lewo systems (§1.1: Table 1), in which all numerals from ‘one’ through ‘nine’ (and possibly also ‘ten’) reflect the inherited decimal forms, whereas the regular Lewo numerals ‘six’ through ‘ten’ are constructed with 5 as an anchor. In the Paama spirit system (§1.4), in which only some of the forms appear to be archaic retentions, with other forms possibly being neologisms, it could be that the existence of a more elaborate manner of counting was remembered, only that not all of its forms were successfully transmitted. The North Ambrym spirit system (§1.2) preserves possibly only one archaic numeral between ‘six’ and ‘nine’ (*pitu ‘seven’)—if that. And the situation in Apma (§1.3) and Kuanua (§1.6) is even more extreme, as they apparently do not preserve any formal material from an earlier decimal system. The idea here is thus that they preserve the mere existence of such a system, using innovative forms to “recreate” it in a sense. However, these hypotheses seem less tractable in Micronesia, where Austronesian-speaking groups were the first humans to arrive and where there does not seem to have been any interruption to the transmission of the inherited decimal counting system. Thus, unlike the Melanesian Oceanic languages with spirit numeral systems (§1.1, §1.2, §1.3, §1.4, §1.5), Ulithian (§1.6) and Kosraean (§1.7) preserve the inherited decimal numeral in their regular counting systems. Although the Kosraean spirit numerals may include one or two relatively archaic forms, we are not dealing here with a case of vestigial traces of an otherwise lost method of counting. Clearly, there must be other reasons why a community would recite spirit numerals. If anything, the Micronesian languages seem to illustrate an opposite phenomenon: whereas in the Melanesian examples, communities had in some ways lost interest in elaborate numeral systems, only preserving the older decimal system in bits of esoterica, Micronesian communities are well known for having a keen interest in numbers, numerals, and enumeration, having, in some cases, indigenous terms for powers as high as a ‘billion’ (Bender & Beller 2021). Perhaps stories with supernatural beings that count in special ways—along with children’s games involving special counting terms (§1.7)—simply reflect a culture’s heightened interest in counting. A similar case could perhaps be made for the Itbayat (§2), who, in addition to spirit systems, have many different ways of counting. The existence of some of these systems can probably be attributed to secret languages (i.e., cants, ludlings, etc.), although that is not necessarily the case for all. Whatever their origins, these spirit numeral systems are fascinating cultural features. Their recurrent formal properties, which also seem to recur in other forms of nonstandard counting practices, undoubtedly warrant further investigation. Acknowledgments This work was supported by the ERC-funded Synergy Project QUANTA (no. 951388) and has benefited from the feedback of other members of the QUANTA project. I wish to thank Harald Hammarström for help searching through sources; Bonnie Emos, Paschalis Kinakava, Martin Maden, and Johnny Roland for providing data; and the audience members of the SEALS-34 conference for their questions and comments. References Alkire, William H. 1970. Systems of measurement on Woleai Atoll, Caroline Islands. Anthropos 65.1/2:1–73. https://urldefense.com/v3/__https://www.jstor.org/stable/40457613__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrqLqDUzA$ . Barlow, Russell. 2023. Papuan-Austronesian contact and the spread of numeral systems in Melanesia. Diachronica 40.3:287–340. doi: 10.1075/dia.22005.bar. Barlow, Russell. 2024. The Makada dialect of Kuanua. Te Reo 67.1:1–71. https://urldefense.com/v3/__https://hdl.handle.net/21.11116/0000-000F-6F4C-1__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrnhDyoR_$ . Barlow, Russell. 2025. The notion of the numeral base in linguistics. Philosophical Transactions of the Royal Society B: Biological Sciences 380(1937): 20240208. doi: 10.1098/rstb.2024.0208. Barlow, Russell. In preparation. Esoteric numerals in Austronesian languages. Bellwood, Peter. 2001. Archaeology and the historical determinants of punctuation in language-family origins. In Alexandra Y. Aikhenvald and R. M. W. Dixon (eds.), Areal diffusion and genetic inheritance: Problems in comparative linguistics (Explorations in Linguistic Typology), 27–43. Oxford: Oxford University Press. doi: 10.1093/oso/9780198299813.003.0002. Bender, Andrea, and Sieghard Beller. 2006. Numeral classifiers and counting systems in Polynesian and Micronesian languages: Common roots and cultural adaptations. Oceanic Linguistics 45.2:380–403. doi: 10.1353/ol.2007.0000. Bender, Andrea, and Sieghard Beller. 2021. Ways of counting in Micronesia. Historia Mathematica 56:40–72. doi: 10.1016/j.hm.2021.04.002. Blust, Robert. 2013. The Austronesian languages. Revised edition (A-PL 008). Canberra: Asia-Pacific Linguistics, Research School of Pacific and Asian Studies, The Australian National University. https://urldefense.com/v3/__https://hdl.handle.net/1885/10191__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrgvk2dqE$ . Burling, Robbins. 2003a. The language of the Modhupur Mandi (Garo), Vol. I: Grammar. Ann Arbor: Michigan Publishing, University of Michigan Library. doi: 10.3998/spobooks.bbv9808.0001.001. Burling, Robbins. 2003b. The language of the Modhupur Mandi (Garo), Vol. II: The lexicon. Ann Arbor: Michigan Publishing, University of Michigan Library. doi: 10.3998/spobooks.bbv9808.0002.001. Caughley, Ross C. 1972. A vocabulary of the Chepang language. Kirtipur: Summer Institute of Linguistics and Institute of Nepal Studies, Tribhuvan University. https://urldefense.com/v3/__https://www.sil.org/resources/archives/36902__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrtTrONjb$ . Caughley, Ross C. 1988. Chepang: A Sino-Tibetan language with a duodecimal numeral base? In David Bradley, Eugénie J. A. Henderson, and Martine Mazaudon (eds.), Prosodic analysis and Asian linguistics: To honour R. K. Sprigg (Pacific Linguistics C-104), 197–199. Canberra: Research School of Pacific Studies, The Australian National University. https://urldefense.com/v3/__http://hdl.handle.net/1885/145648__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrmtxY0Rv$ . Chan, Eugene, Hans-Jörg Bibiko, Christoph Rzymski, Simon J. Greenhill, and Robert Forkel. 2019. Channumerals (v1.0). doi: 10.5281/zenodo.3475912. Derived from Eugene Chan’s “Numeral systems of the world’s languages” (accessed 30 September 2019). https://urldefense.com/v3/__https://lingweb.eva.mpg.de/channumerals__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrgTPcY1R$ . Clark, Ross. 2009. *Leo Tuai: A comparative lexical study of North and Central Vanuatu languages (Pacific Linguistics 603). Canberra: Research School of Pacific and Asian Studies, The Australian National University. https://urldefense.com/v3/__https://hdl.handle.net/1885/146751__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrpv6i5sB$ . Codrington, Robert H. 1885. The Melanesian languages. Oxford: Clarendon Press. Crowley, Terry. 1982. The Paamese language of Vanuatu (Pacific Linguistics B-87). Canberra: Research School of Pacific Studies, The Australian National University. https://urldefense.com/v3/__http://hdl.handle.net/1885/145173__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrhK3VP7T$ . Crowley, Terry. 1992. A dictionary of Paamese (Pacific Linguistics C-121). Canberra: Research School of Pacific Studies, The Australian National University. https://urldefense.com/v3/__http://hdl.handle.net/1885/145802__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrmRVtJGU$ . Early, Robert. 1994. A grammar of Lewo, Vanuatu. Canberra: The Australian National University. PhD thesis. https://urldefense.com/v3/__http://hdl.handle.net/1885/132959__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrpB0QGyS$ . Franjieh, Michael James. 2014. Rral North Ambrym dictionary. Unpublished manuscript. Garde, Murray. 2015. Numerals in Sa. In Alexandre François, Sébastien Lacrampe, Michael Franjieh, and Stefan Schnell (eds.), The languages of Vanuatu: Unity and diversity (Asia-Pacific Linguistics A-PL 021, Studies in the Languages of Island Melanesia SLIM 5), 117–136. Canberra: College of Asia and the Pacific, The Australian National University. https://urldefense.com/v3/__http://hdl.handle.net/1885/14819__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmProW1k61b$ . Girschner, Max. 1912. Die Karolineninsel Námōluk und ihre Bewohner. In P. Ehrenreich (ed.), Baessler-Archiv: Beiträge zur Völkerkunde, Band II, 123–215. Leipzig: B. G. Teubner. Gray, Andrew. 2012. The languages of Pentecost Island. Ashford: Manples (BFoV) Publishing. Greenhill, Simon J., Robert Blust, and Russell D. Gray. 2008. The Austronesian basic vocabulary database: From bioinformatics to lexomics. Evolutionary Bioinformatics 4:271–283. doi: 10.4137/EBO.S893. https://urldefense.com/v3/__https://abvd.eva.mpg.de__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrlPTYJRY$ . Data available as CLDF dataset derived from Greenhill et al.’s “Austronesian Basic Vocabulary Database” (v2020). https://urldefense.com/v3/__https://github.com/lexibank/abvd__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrnRU70Vk$ . Hale, Austin. 1973. Clause, sentence, and discourse patterns in selected languages of Nepal: Part IV, word lists (Summer Institute of Linguistics Publications in Linguistics and Related Fields 40[4]). Norman: Summer Institute of Linguistics of the University of Oklahoma. https://urldefense.com/v3/__https://www.sil.org/resources/archives/8633__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmProp7H9rl$ . Hammarström, Harald. 2010. Rarities in numeral systems. In Jan Wohlgemuth and Michael Cysouw (eds.), Rethinking universals: How rarities affect linguistic theory, 11–60. Berlin: De Gruyter Mouton. doi: 10.1515/9783110220933.11. Hodgson, Brian H. 1848. On the Chépáng and Kúsúnda tribes of Népál. Journal of the Asiatic Society of Bengal 17.2:650–658. Hodgson, Brian H. 1857. Comparative vocabulary of the languages of the broken tribes of Népál. Journal of the Asiatic Society of Bengal 26.5:317–349. Holz, Christoph. 2021. Sequential number word formation in the Tungag-Nalik languages (New Ireland). Oceanic Linguistics 60.1:231–242. doi: 10.1353/ol.2021.0007. Lean, Glendon A. 1985. Counting systems of Papua New Guinea, volume 4: The New Britain Provinces, draft edition. Department of Mathematics, Papua New Guinea University of Technology. Unpublished manuscript. Lee, Kee-dong. 1975. Kusaiean reference grammar (Pali Language Texts: Micronesia). Honolulu: The University Press of Hawaii. Lee, Kee-dong. 1976. Kusaiean-English dictionary (Pali Language Texts: Micronesia). Honolulu: University of Hawaii Press. https://urldefense.com/v3/__http://hdl.handle.net/10125/62887__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrnhh4JzL$ . Lessa, William A. 1959. Divining from knots in the Carolines. The Journal of the Polynesian Society 68.3:188–204. https://urldefense.com/v3/__https://www.jstor.org/stable/20703747__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrrJtSxEq$ . Lessa, William A. 1961. Tales from Ulithi Atoll: A comparative study in Oceanic folklore (Folklore Studies 13). Berkeley: University of California Press. Lessa, William A. 1980. More tales from Ulithi Atoll: A content analysis (Folklore and Mythology Studies 32). Berkeley: University of California Press. Lynch, John. 2009. At sixes and sevens: The development of numeral systems in Vanuatu and New Caledonia. In Bethwyn Evans (ed.), Discovering history through language: Papers in honour of Malcolm Ross (Pacific Linguistics 605), 391–411. Canberra: Research School of Pacific and Asian Studies, The Australian National University. https://urldefense.com/v3/__http://hdl.handle.net/1885/146753__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrpcT_V0X$ . Mazaudon, Martine. 1982. Dzongkha numerals. Paper presented at the 15th International Conference on Sino-Tibetan languages and Linguistics, August, 1982, Beijing, China. https://urldefense.com/v3/__https://shs.hal.science/halshs-00452217v1__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrsnziORI$ . Mazaudon, Martine. 2009. Number-building in Tibeto-Burman languages. In Stephen Morey and Mark Post (eds.), North East Indian linguistics, volume 2, 117–148. Delhi: Foundation Books. doi: 10.1017/UPO9788175968554.009. Meyer, Otto. 1961. Wörterbuch der Tuna-Sprache (Micro-Bibliotheca Anthropos 34). Posieux: Anthropos-Institut. doi: 10.4225/72/56FD43D737050. Mosel, Ulrike. 1984. Tolai syntax and its historical development (Pacific Linguistics B-92). Canberra: Research School of Pacific Studies, The Australian National University. https://urldefense.com/v3/__http://hdl.handle.net/1885/145237__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrk_Vob4N$ . Obata, Kazuko. 2003. A grammar of Bilua: A Papuan language of the Solomon Islands (Pacific Linguistics 540). Canberra: Research School of Pacific and Asian Studies, The Australian National University. https://urldefense.com/v3/__http://hdl.handle.net/1885/146708__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrri9ys8Z$ . Osmond, Meredith, and Pawley, Andrew. 2016. Perception. In Malcolm Ross, Andrew Pawley, and Meredith Osmond (eds.), The lexicon of Proto Oceanic: The culture and environment of ancestral Oceanic society, volume 5: People: Body and mind (Asia-Pacific Linguistics 28), 489–517. Canberra: College of Asia and the Pacific, The Australian National University. https://urldefense.com/v3/__http://hdl.handle.net/1885/106908__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrtPwdRVS$ . Owens, Kay, and Lean, Glen. 2018. Testing the diffusion theory. In Kay Owens and Glen Lean (eds.), History of number: Evidence from Papua New Guinea and Oceania (History of Mathematics Education), 167–192. Cham: Springer. doi: 10.1007/978-3-319-45483-2_9. Paton, W. F. 1971. Tales of Ambrym (Pacific Linguistics D-10). Canberra: Research School of Pacific Studies, The Australian National University. https://urldefense.com/v3/__http://hdl.handle.net/1885/146607__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmProvSsPJs$ . Pawley, Andrew. 2002. The Austronesian dispersal: Languages, technologies and people. In Peter Bellwood and Colin Renfrew (eds.), Examining the farming/language dispersal hypothesis (McDonald Institute Monographs), 251–273. Cambridge: McDonald Institute for Archaeological Research. Pelland, Jean-Charles. 2025. Compositionality beyond bases. Philosophical Transactions of the Royal Society B: Biological Sciences 380(1937): 20240209. doi: 10.1098/rstb.2024.0209. Pons, Marie-Caroline Lyda. 2022. The Chepang language: Phonology, nominal and verbal morphology – synchrony and diachrony of the varieties of the Lothar and Manhari Rivers. Eugene: University of Oregon. PhD dissertation. https://urldefense.com/v3/__https://hal.science/tel-04644030__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrvPdZtjb$ . Richards, Houghton. 2010. Digital audio file, May 13, 2010. Lisepsep spirit numbers counted by Joel Saksak, North Ambrym. Riddle, T. E. 1915. Some myths and folk stories from Epi, New Hebrides. The Journal of the Polynesian Society 24.4:156–167. https://urldefense.com/v3/__https://www.jstor.org/stable/20701117__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrhcX8xsW$ . Riddle, T. E. 1916. Some myths and folk stories from Epi, New Hebrides: Continued from page 156, Vol. XXIV. The Journal of the Polynesian Society 25.1:24–30. https://urldefense.com/v3/__https://www.jstor.org/stable/20701127__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrpv2I-w-$ . Rongmuthu, Dewan Sing. 1960. The folk-tales of the Garos (University of Gauhati, Department of Tribal Culture and Folk Research, Publication No. 4). Gauhati [Printed in Calcutta]: University of Gauhati, Department of Publications. Ross, Malcolm. 2014. Reconstructing the history of languages in northwest New Britain. Journal of Historical Linguistics 4.1:84–132. doi: 10.1075/jhl.4.1.03ros. Ross, Malcolm. 2023a. Counting: Numerals and numeral classifiers. In Malcolm Ross, Andrew Pawley, and Meredith Osmond (eds.), The lexicon of Proto Oceanic: The culture and environment of ancestral Oceanic society, volume 6: People: Society, 427–513. Canberra: College of Asia and the Pacific, The Australian National University. https://urldefense.com/v3/__http://hdl.handle.net/1885/106908__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrtPwdRVS$ . Ross, Malcolm. 2023b. Digit tallying. In Malcolm Ross, Andrew Pawley, and Meredith Osmond (eds.), The lexicon of Proto Oceanic: The culture and environment of ancestral Oceanic society, volume 6: People: Society, 515–548. Canberra: College of Asia and the Pacific, The Australian National University. https://urldefense.com/v3/__http://hdl.handle.net/1885/106908__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrtPwdRVS$ . Sarfert, Ernst. 1920. Kusae, 2. Halbband (Ergebnisse der Südsee-Expedition 1908-1910, II. Ethnographie: B. Mikronesien, Band 4). Hamburg: L. Friederichsen & Co. Schneider, Cynthia. 2010. A grammar of Abma: A language of Pentecost Island, Vanuatu (Pacific Linguistics 608). Canberra: College of Asia and the Pacific, The Australian National University. https://urldefense.com/v3/__http://hdl.handle.net/1885/307437__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrshKMP2g$ . Sheppard, Peter, Richard Walter, and Kenneth Roga. 2010. Friends, relatives, and enemies: The archaeology and history of interaction among Austronesian and non-Austronesian speakers in the western Solomons. In John Bowden, Nikolaus P. Himmelmann, and Malcolm Ross (eds.), A journey through Austronesian and Papuan linguistic and cultural space: Papers in honour of Andrew K. Pawley (Pacific Linguistics 615), 95–112. Canberra: College of Asia and the Pacific, The Australian National University. https://urldefense.com/v3/__http://hdl.handle.net/1885/146763__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPro8C-dse$ . Sohn, Ho-min, and Byron W. Bender. 1973. A Ulithian grammar (Pacific Linguistics C-27). Canberra: Research School of Pacific Studies, The Australian National University. https://urldefense.com/v3/__http://hdl.handle.net/1885/146585__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrgudqLVZ$ . Steel, Robert. 1880. The New Hebrides and Christian missions: With a sketch of the labour traffic, and notes of a cruise through the group in the mission vessel. London: James Nisbet & Co. Van Der Mark, Sheena Chantal. 2007. A grammar of Vinitiri: An Austronesian language of Papua New Guinea. Bundoora: La Trobe University. PhD thesis. Yamada, Yukihiro. 1972. Speech disguise in Itbayaten numerals. Asian Studies 10.1:44–49. https://urldefense.com/v3/__https://asj.upd.edu.ph__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrrEJg87s$ . Yamada, Yukihiro. 2002. Itbayat-English dictionary (Endangered Languages of the Pacific Rim A3-006). Himeji: Himeji Dokkyo University. Yamada, Yukihiro. 2014. A grammar of the Itbayat language of the Philippines. Himeji: [no publisher]. REGIONAL WANDERWÖRTER OF FORMOSAN LANGUAGES IN NORTHWESTERN TAIWAN: A CASE STUDY OF SOME PLANT NAMES* Samuel Yu-hsiang PAN, Walis Hian-chi SONG National Tsing Hua University samuel.yhpan@gmail.com, walis.hcs@gapp.nthu.edu.tw Abstract This study investigates the regional shared lexicon among Formosan languages, focusing on the distribution and interrelations of plant names in the northwestern region of Taiwan. Previous works by Ferrell (1969) and Li (1994) have compared lexical items among Formosan languages. However, the historical relation of these words has not received broad and systematic discussion. We collected data from multiple Formosan languages in northwestern Taiwan, including Atayal, Seediq, Pazeh-Kaxabu, Saisiyat, Bunun, Thao, Kavalan, Taokas, and other nearby languages. Through lexical comparison, we examined whether the phonological correspondences of these words align with each other. Despite surface-level similarities, irregular phonological patterns suggest these plant names are more likely “regional Wanderwörter” resulting from long-term language contact, rather than direct descendants of Proto-Austronesian (PAn) etyma. Based on distribution and historical contact patterns, we categorize these shared words into three groups: (1) cross-regional shared words, such as %BALAYAR ~ BIALAR ‘Alocasia’, %BAYLUH ~ BAWLU ‘beans’, %(BA)TAKAN ‘bamboo (tubes)’, %BARASIQ ~ DARESIQ ‘wax trees’, %QALUP(AS) ’peaches’; (2) words found in the Atayal diffusion area, such as %DALU ‘persimmons’, %BASAR ‘Broussonetia trees’; (3) words spreading across the northern coastal areas, such as %QAWPIR ‘sweet potatoes’, %SILIH ‘chili peppers’.Our findings suggest these shared words emerge from early language contact. By clarifying historical relationships within Formosan languages, this research provides a concrete basis for identifying regional Wanderwörter and offers insights into historical language interactions in northwestern Taiwan. Keywords: Wanderwörter, Formosan Languages, Northwestern Taiwan, Plant Names ISO 639-3 codes: tay, trv, pzh, xsy, bnn, ssf, ckv, bzg, ami 1 Introduction Formosan languages, as the major first offshoots of the Austronesian languages, possess a rich and diverse linguistic landscape. Over the past few decades, scholars such as Ferrell (1969) and Li (1994) have conducted extensive lexical comparisons of Formosan languages, laying important foundations for understanding the relationship between Proto-Austronesian (PAn) and its descendant languages. However, the distribution of many shared lexical items still requires further clarification. Specifically, it remains uncertain whether certain words can be directly traced back to the Proto-Austronesian level, or if they represent regionally shared lexemes arising from language contact, which is a topic that has yet to receive systematic investigation. We investigate plant-related vocabulary through cross-linguistic comparison and diachronic phonological examination, aiming to identify the presence and diffusion patterns. Plant vocabulary, being closely tied to daily life and natural surroundings, often spreads through language contact and trade activities, making it an ideal domain for exploring regional contact phenomena. Methodologically, the study employs the historical-comparative method to examine phonological correspondences, semantic shifts, and geographical distributions of target lexemes across different languages. Furthermore, plant-related vocabulary is categorized for analysis, and three main diffusion areas are identified, illustrating the heterogeneity and multi-layered nature of lexical contact and spread. Through this research, we aim to provide a more nuanced understanding of the historical linguistic relationships among the Formosan languages and to deepen our insights into the mechanisms underlying the formation of regional Wanderwörter. Additionally, the findings offer new perspectives on the linguistic contact zones and cultural exchanges in the history of Taiwan. 2 Theoretical Background and Methodology 2.1 Defining Wanderwort The concept of Wanderwort (plural: Wanderwörter) refers to lexical items that, although sharing superficial similarities, are in fact the result of regional diffusion and contact rather than direct inheritance from a common ancestor (Campbell & Mixco 2007:220–221). Earlier comparative works on Formosan languages (e.g., Ferrell 1969; Li 1994) have successfully identified numerous cognates traceable to Proto-Austronesian (PAn). However, the possibility of contact-induced diffusion seems neglected in these previous studies. Unlike typical loanwords, which generally have a clear donor language and exhibit regular phonological integration, Wanderwörter lack such clear-cut sound correspondences. These words often do not follow the usual patterns of sound change, making them distinct from typical borrowings. In notation, reconstructed forms are represented as *xyz (a form with an asterisk), while Wanderwörter, in this paper are denoted as %XYZ (a more abstract representation with percentage sign). Recent developments in contact linguistics and areal typology emphasize the necessity of distinguishing inherited features from those shaped by prolonged regional interactions. In particular, vocabulary domains closely tied to cultural and ecological practices—such as plant names—are especially prone to diffusion through trade, migration, and cultural exchange. Therefore, examining the distribution and phonological development of plant-related vocabulary can reveal the intricate layers of contact and inheritance in the historical relationships among Formosan languages. 2.2 Dataset and Method This study adopts a historical-comparative approach, supplemented by insights from the perspective of language contact, to investigate plant-related vocabulary in northwestern Formosan languages. Data were gathered through systematic review of existing linguistic documentation, such as the online dictionaries of Indigenous Languages Research and Development Foundation (ILRDF) and Li and Tsuchida’s (2001) Pazeh dictionary. This study focuses on plant names in the Formosan languages of northwestern Taiwan, including Taokas, Kavalan, Pazeh-Kaxabu, Saisiyat, Atayal, Seediq, Bunun, Thao, and others nearby languages, such as Amis. The distribution of languages mentioned above is shown in Figure 1. The regular and irregular reflexes in the given plant names should be examined. Utilizing resources such as Blust and Trussel’s (ongoing) Austronesian Comparative Dictionary (ACD), we can thus compare the sound correspondences of various Formosan languages to identify the phonological irregularities found in the dataset. Special attention is given to cases where apparent phonological similarity does not align with PAn reconstructions, suggesting potential instances of regional lexical diffusion rather than direct inheritance. Figure 1: Distribution of some Formosan languages in Northwestern Taiwan Finally, to refine the analysis, the plant-related vocabulary is categorized into three primary diffusion patterns: (i) cross-regional shared terms, (ii) Atayal diffusion area terms, and (iii) northwestern plains diffusion area terms. This categorization allows for a more comprehensive interpretation of early contact networks and contributes to a deeper understanding of historical linguistic and cultural interactions among Formosan languages in Northwestern Taiwan. 3 Irregular Patterns in Wanderwörter In analyzing plant-related vocabulary across northwestern Formosan languages, it is crucial to distinguish between superficial phonological similarities and historically valid sound correspondences. That is, we need to determine systematic phonological evidence aligned with established sound change patterns across Formosan languages instead of surface-level similarities. 3.1 Irregular Sound Correspondences For example, words for ‘fern’ exhibit surface-level phonological similarity in Atayal, Seediq, and Pazeh. However, a closer comparison of Proto-Austronesian reflexes across these languages reveals clear irregularities, suggesting that this form may represent a regional Wanderwort rather than a true cognate set. Table 1: Irregular correspondences in %RIDI(L) ‘fern’ Form Languages and Examples Sound Irregularity %RIDI(L) ‘fern’ Ata. giriʔ Sed. giril Paz. xizi ‘k.o. fern’ - Irregular r ~ z - Existence of -l As shown in Table 2, although the initial g in Atayal and Seediq can correspond to x in Pazeh, the medial correspondence of r : r : z does not align with any expected reflex patterns from Proto-Austronesian phonemes. Furthermore, the occurrence of final l in Seediq adds to the irregularity, presenting an even more problematic case for straightforward reconstruction. Therefore, it is more appropriate to regard this form as a regional Wanderwort rather than a genuine cognate inherited from Proto-Austronesian. Table 2: Some PAn reflexes in Atayal, Seediq and Pazeh PAn *R *d/*z *l *s *j *N Atayal g [ɣ] r ɹ, y [j], ∅ h [ħ] g [ɣ] l Seediq g [ɣ] d r h [ħ] y [j] l Pazeh x d r z z l Another example can be found in the word for ‘persimmon’, illustrated here as %QALUPAZ. Across different languages, the forms display surface similarities, but closer phonological examination reveals, once again, irregular correspondences. Table 3: Irregular correspondences in %QALUPAZ ‘persimmon’ Form Languages and Examples Irregularity %QALUPAZ ‘persimmon’ Bun. qalupaz Ami. ʡalopal Kav. inupal - Irregular l ~ n - Irregular z ~ l In Table 3, the initial sequence qa- : ʡa- : i is relatively stable across the three languages, but notable irregularities occur in the medial and final segments. For related sound correspondences among Bunun, Amis, and Kavalan, please see Table 4. Table 4: Some PAn reflexes in Bunun, Amis and Kavalan PAn *q *a *R *N *y Bunun q a l n z Amis ʡ a l ɬ y Kavalan ∅ i /*q_; _q R n y While compared with PAn reflexes, the correspondence of l in Amis and Bunun to n in Kavalan, and the final z in Bunun to l in both Ami and Kavalan are obviously unexpected. These inconsistencies do not match the expected phonological reflexes of Proto-Austronesian and suggest that %QALUPAZ is more likely a regional Wanderwort. 3.2 Double Reflexes in Seediq In addition to genuine irregular sound correspondences, Seediq also exhibits a number of “irregular but expected” sound changes that complicate the task of historical reconstruction, that is, the “double reflexes” in the Seediq language. Table 5: “Double reflexes” in the Seediq language Gloss ‘upriver’ ‘bring’ ‘camphor tree’ ‘tree bean’ PAn *daya *adaS *dakeS *qaRidaŋ Seediq daya adas cakus gicaŋ Gloss ‘grayhair’ > ‘hair’ ‘banana’ ‘moon’ ‘hunt’ PAn *qubaN *bəNbəN *qiNaS *qaNup Seediq ubal bləbul idas aduk One notable irregularity is the unexpected change of Proto-Austronesian *d to Seediq c, which occurs in addition to the more common regular reflex d. Another case involves the fluctuation between Seediq l and d, where forms that are expected to reflect PAn *N, yet no clear conditioning environment is identifiable. These double reflexes in Seediq appear across different dialects and can thus be reconstructed to Proto-Seediq although the cause of this phenomenon is yet to be determined (Song 2025). These irregular sound changes can also be found in some examples of regional Wanderwort, as shown in ‘wild persimmon’ in Atayal and Seediq in Table 6. Table 6: Irregular correspondences in %DALU ‘wild persimmon’ Form Languages and Examples Irregularity %DALU ‘wild persimmon’ Sai. raloʔ ~ laroʔ Ata. raluʔ Sed. cadu ~ lacu - Seediq double reflexes - Metathesis These irregular sound changes suggest that Seediq may have undergone internal phonological innovations or have been influenced by neighboring languages through prolonged contact. Consequently, careful phonological checking is necessary when evaluating potential cognates involving Seediq forms, especially in studies focused on regional lexical diffusion and Wanderwörter. 3.3 Atayalic Gender-register Morphology The mountain corridor inhabited by Atayal and Seediq shows a second layer of changes. What makes this bundle special is the presence of gender‑register system (GRS) —- a lexical alternation between male/female registers preserved in Mayrinax dialect Atayal and partly in Seediq (Li 1980, 1982; Goderich 2020). Table 7: Some examples of gender register in Proto-Atayal Gloss ‘tree’ ‘face’ ‘wide; broad’ PAn *kaSiw *daqiS – PAta (F) *kahuy *raqis *gVlabaŋ PAta (M) *kahu-niq *raqi<ɹa>s *gVlahaŋ As illustrated in Table 7, Proto-Atayal reconstructions show a systematic difference between male and female forms. Typically, male-register forms involve additional affixation or morphological modification, whereas female-register forms tend to retain shapes closer to Proto-Austronesian (PAn) etyma. Table 8 presents another case where GRS-related morphology may play a role in irregular patterns. In the shared forms derived from %IYUK ‘citrus’, a probable infix appears in the Squliq dialect of Atayal. Table 8: Irregular correspondences in %IYUK ‘citrus’ Form Languages and Forms Irregularity %IYUK ‘citrus (2)’ Bun. izuk Sai. ʔiyok AtaMx. ʔiyuk AtaSq. yuk - Probable GRS infix in AtaSq. Thus, understanding gender-register morphology is crucial for evaluating irregular patterns in Atayal and Seediq, particularly when shared regional vocabulary exhibits unexpected forms. The presence of gender-register features in some examples of regional Wanderwörter further suggests that these lexical items may reflect a deeper and more prolonged history of language contact in Northwestern Taiwan. 4 Plant Vocabulary across Languages in Northwestern Taiwan In this section, we categorize plant-related vocabulary according to plant types, while focusing on the distribution of these terms across languages in Northwestern Taiwan. By examining the spread of these terms, we can understand why certain plants, which are integral to daily life and culture, show wide-spread lexical forms across languages. This pattern reflects not only the shared ecological knowledge but also the historical interaction and contact between these communities. 4.1 Fruits Fruits, which are essential for both consumption and cultural practices, demonstrate significant lexical overlap across different languages. The spread of these terms indicates the importance of fruits in these societies and suggests that the cultural practices around fruit cultivation and exchange facilitated the transmission of vocabulary. The distributions of fruit-related Wanderwörter are shown in Figure 2. Table 9: Fruits-related Wanderwörter Form Gloss Languages and Examples Irregularities %(A)MULU ‘citrus (1)’ Kavalan mulu, Pazeh amulu, Saisiyat morok ‘pomelo’, Plngawan Atayal namuruʔ, Seediq mudu - presence/absence of initial syllable; - irregular l ~ r ~ d; - irregular k ~ ʔ ~ ∅ %IYUK ‘citrus (2)’ Bunun izuk, Saisiyat ʔiyok ‘tangerine’, Mayrinax Atayal ʔiyuk, Squliq Atayal yuk - Atayal variety shows fossilized GRS infix %QALIM ‘peach (1)’ Pazeh arim, Saisiyat ʔæLim, Tgdaya/Toda Seediq ariŋ, Mayrinax Atayal qaim, Squliq Atayal qzimux - Atayal variety shows fossilized GRS suffix ‑ux %QALUP(AS) ‘peach (2)’ Bunun qalup, Thao qalup, Truku Seediq qlup-as, Kavalan lupas, Amis lopas - presence/absence of qa‑; - irregular liquids; - additional ‑as ending %QALUPAZ ‘persimmon’ Bunun qalupaz, Amis ʡalopal, Kavalan inupal - irregular l ~ n; - irregular z ~ l %DALU ‘wild persimmon’ Saisiyat raloʔ ~ laroʔ, Atayal raluʔ, Seediq cadu ~ lacu - Seediq double reflexes; - metathesis %BANUAZ ‘plum’ Bunun banuaz, Seediq bnuar, Amis manowad - irregular b ~ m; - irregular z ~ r ~ d %RILUK ‘wild berry’ Kavalan Rinuq, Saisiyat Lilok, Plngawan Atayal ɹiluk, Squliq Atayal biluk, Mayrinax Atayal wa-yhuk, Seediq rh<ənu>k ~ rh<əlu>k, - irregular initial in Squliq Atayal; - GRS h substitution in Mayrinax Atayal and Seediq; - GRS infixes <ənu> & <əlu> in Seediq The spread of these terms might also be due to trade or agricultural practices that involved the exchange of fruit species across regions, reinforcing the connection between linguistic and cultural exchange. Figure 2: Fruit-related Wanderwörter 4.2 Trees Tree-related terms are another important category that shows widespread distribution across languages in Northwestern Taiwan. Trees have a central role in the daily lives of these communities. The widespread lexical forms of trees suggest that these species were not only ecologically significant but were also part of a shared cultural heritage. The distributions of tree-related Wanderwörter are shown in Figure 3. Due to their importance in daily life, the transmission of terms related to trees reflects both historical cultural exchanges and the lasting impact of shared forest resources. Table 10: Tree-related Wanderwörter Form Gloss Languages and Examples Irregularities %SINERIL ‘cherry blossom’ Seediq snəgil, Saisiyat siŋLil - irregular nasal %PEHUL ‘Roxburgh sumac’ Pazeh puhun, Saisiyat ka-phœl, Seediq pihut - irregular n ~ l ~ t;† - fossilized prefix ka- in Saisiyat; - fossilized infix in Seediq %QACER ‘Taiwan crepe myrtle’ Bunun atul, Thao qaθuɬ, Saisiyat qasəL, Seediq səw - irregular q; fossilized infix in Seediq %BANGUN ‘Taiwanese cypress’ Kavalan baŋun, Taokas banun N/A %BASAR ‘tapa cloth tree’ Saisiyat baʃaL, Mayrinax Atayal basaw, Plgawan Atayal basw, Seediq bsər-ux, - irregular -w in Atayal - GRS affixes in AtaPl. & Sed. %BARASIQ ~ DARESIQ ‘wax tree’ Pazeh baxasa, Saisiyat baLaʃiʔ, Mayrinax Atayal bagasuq, Seediq drəsiq, Amis forəɬiʡ - irregular b ~ d; - vowel mismatch; - fricative changes † The Seediq word-final -t might derive from the devoicing of -d. For the double reflexes of l ~ d, please refer to the discussion in section 3.2. Figure 3: Tree-related Wanderwörter 4.3 Bamboos and Bamboo Shoot Regarding shared bamboo-related vocabulary, its distribution is primarily seen in Atayal, Seediq, Saisiyat, Pazeh, and Bunun languages. The meanings of these bamboo-related terms often overlap with one another. For example, Atayal qoran refers to ‘Ma bamboo’, while Saisiyat ʔæwran refers to ‘thorny bamboo’. The distributions of bamboo-related Wanderwörter are shown in Figure 4. The shared ecological importance of bamboo across different linguistic communities shaped these terms. Furthermore, the conservative forms, such as Pazeh batakan ‘bamboo tube’, Amis ʡaɬəci ‘bamboo shoot’ might suggest a source of transmission. Table 11. Bamboo-related Wanderwörter Form Gloss Languages and Examples Irregularities %(DA)DUMAH ‘bamboo sp.’ Squliq Atayal rumaʔ, Seediq (Toda, Truku) ddima, Pazeh ruma ‘Makino bamboo’, raruma ‘Ma bamboo’, Saisiyat raromæh - irregular r in Pazeh; † - irregular -h; - irregular i ~ u/o %(BA)TAKAN ‘bamboo sp.’ Pazeh batakan, Seediq btakan, Bunun takan ‘bamboo tube’, Thao takan - presence/absence of ba‑ %QAURAN ‘bamboo sp.’ Mayrinax Atayal qawran, Squliq Atayal qoran ‘Ma bamboo’, Seediq qoran ‘Ma bamboo’, Pazeh auran ‘village name’, Saisiyat ʔæwran ‘thorny bamboo’ - irregular q; - irregular r %TANAYAN ‘bamboo sp.’ Kavalan tənayan ‘bamboo’, Saisiyat tanayan ‘k.o. bamboo’ - irregular ə ~ a, - cf. Basay tənayan ‘bamboo, bamboo fence’, %QALESI ~ ALI ‘bamboo shoot’ Amis ʡaɬəci ~ ɬaʡ(ə)ci, Thao qati, Seediq ləxi, Saisiyat ʔanhiʔ, Pazeh ali, Atayal (Mayrinax, Squliq, Plngawan) ʔaliʔ, Bunun (Takituduh) ʔaʔali - irregular l ~ n ~ ∅; - loss of medial syllable † Cf. PAn *duSa > Atayal (Matu’uwal) rusaʔ; Seediq dəha ~ daha; Pazeh dusa; Saisiyat roʃaʔ. Figure 4: Bamboo-related Wanderwörter 4.4 Herbs and Beans As for herb/bean-related Wanderwörter, these terms also come across different Formosan languages in northwest Taiwan, often accompanied by substantial phonological adaptation. Particularly, Seediq and Atayal varieties frequently show double reflexes, implying complex internal developments after borrowing. The distributions of herb/bean-related Wanderwörter are shown in Figure 5. Moreover, the presence of consistent morphological elements like the Atayalic -hiŋ suffix suggests that not just individual words, but entire morphological patterns, were transmitted across groups. Irregular correspondences further highlight the need for caution when reconstructing proto-forms, as treating these borrowed and remodeled items as genuine cognates could distort phylogenetic trees. Table 12: Herb/bean-related Wanderwörter Form Gloss Languages and Examples Irregularities %BARAYAR~BIYALAR ‘Alocasia’ Pazeh biarax, Saisiyat byaraL, Bunun baihal, Seediq brayaw, Atayal bagayaw (Squliq, Plngawan) bagatiʔ (Mayrinax) - irregular r ~ h ~ g - cf. Taokas bixax ‘leaf’ %(QA)DURUP ~ CURUK ‘Bidens sp.’ Mayrinax Atayal gərup, Squliq Atayal qregup, Plngawan Atayal ruk, Seediq cuguk, Isbukun Bunun susuluk - Seediq double reflexes %DALUKU ~DALUHING ‘Bird-nest fern’ Saisiyat raLokoʔ, Mayrinax Atayal raw-hiŋ, Plngawan Atayal ɹaɹu-hiŋ Squliq Atayal ryu-hiŋ, Seediq cru-hiŋ - Seediq double reflexes; - Atayalic GRS suffix -hiŋ %RIDI(L) ‘fern’ Squliq Atayal giriʔ, Seediq giril, Pazeh xizi ‘elkhorn fern’ - irregular r ~ z; - additional -l in Seediq %BAYLUH ~ BAWLU ‘bean’ Seediq beyluh, Mayrinax Atayal bawluʔ, Squliq Atayal boluʔ, Bunun bainu, Thao bailu - irregular l ~ n; - irregular ay ~ aw; - irregular final consonant %LAYAN ‘mung bean’ Seediq layan, Mayrinax Atayal layan, Plngawan Atayal layan, Squliq Atayal layan mtasiq, Bunun laian, Thao layan a bailu - irregular l ~ n; - irregular ay ~ ai Figure 5: Herb/bean-related Wanderwörter 4.5 Post‑Columbian Crops A set of lexical items is clearly associated with crops introduced during the Columbian Exchange, such as sweet potato, chili pepper, peanut, and guava. These forms can be identified in the languages along Taiwan’s northern coast, including Kavalan, Taokas, Pazeh, and Saisiyat. As these crops only reached Taiwan after the 17th century, the diffusion of these terms must be younger than the other ones. The direction of flow is opposite: here, languages in or near coastal areas appear as donors, with inland groups borrowing the terms alongside the crops. This distribution suggests a dynamic network of intergroup contact, where certain irregular variation can be due to later adaptation in other languages. The distributions of Post‑Columbian crop-related Wanderwörter are shown in Figure 6. Table 13. Post‑Columbian crop-related Wanderwörter Form Gloss Languages and Examples Irregularities %SILIH ‘chili pepper’ Kavalan sidiʔ, Pazeh siri, Saisiyat silih - irregular s; - irregular d ~ l ~ r; - possible fossilized infix in Pazeh %LAPUAD ~ LAPAT ‘guava’ Pazeh lapuat, Saisiyat lapuar, Mayrinax Atayal qapuwa, Squliq Atayal sebwal, Bunun lapat, Thao lapat - irregular initial syllable; - irregular ua ~ a; - irregular final consonant %TAWTAW ‘peanuts’ Taokas tawtaw, Pazeh tawtaw, Saisiyat tawtaw N/A %BUNGA ‘sweet potato (1)’ Thao buna, Mayrinax Atayal buŋaʔ, Squliq Atayal ŋa-hiʔ, Plngawan Atayal ŋa-hiʔ, Seediq buŋa, Amis foŋa ~ koŋa, - irregular f ~ k in Amis %KAWPIR ‘sweet potato (2)’ Kavalan qawpiR, Taokas khapit, Saisiyat ʔæwpir, Sakizaya kawpil ‘~ leaves’, Northern Amis kawpir ‘~ leaves’ - irregular q ~ kh ~ ʔ; - irregular R ~ t ~ r Notably, the form %SILIH ‘chili pepper’ not only spreads across Formosan languages but also corresponds to a widespread form found throughout Southeast Asia, such as cili in Malay and sili in various Philippine languages, which ultimately derives from Nahuatl chīlli via Spanish chile, reflecting the global diffusion routes established during the Columbian Exchange. Meanwhile, %TAWTAW ‘peanut’ shows a high degree of uniformity across Saisiyat, Taokas, and Pazeh, and is likely a direct borrowing from Southern Min 塗豆 thɔ⁵-tau⁷, further supporting the role of Sinitic coastal trade in the introduction of New World crops. Together, these cases illustrate a multi-layered history of contact-driven lexical innovation, shaped by both local interaction and long-distance maritime exchanges.  Figure 6: Post‑Columbian crop-related Wanderwörter 5 Discussion This section discusses how multiple contact processes, lexical similarity patterns, and the identification of regional Wanderwörter contribute to understanding early interactions among Formosan languages and refining Proto-Austronesian reconstruction. We focus on the diagnostic value of shared plant-related terms and emphasize the importance of distinguishing borrowed forms from inherited vocabulary.5.1 Possible Spreading Pathways The two examples, %TANA(Q) ‘prickly ash’ and %BARAYAR~%BIYALAR ‘Alocasia,’ reveal different pathways of lexical spreading among Formosan languages. In the case of %TANA(Q), irregular sound correspondences, such as the loss of final q in Pazeh and the shift from a to i in Kavalan, suggest that the word spread through multiple intermediate languages rather than via direct inheritance. Table 14: Irregular correspondences in %TANA(Q) ‘prickly ash’ Form Languages and Examples Form %TANA(Q) ‘prickly ash’ Kav. tani Sai. taniʔ Paz. tana Bun. tana Tha. ta-tanaq Ata. tanaʔ - Irregularities q ~ ∅ - Irregularities a ~ i Table 15: Possible spreading pathways of %TANA(Q) ‘prickly ash’ PAn Inherited forms Spread forms *tanaq > Thao ta-tanaq > Pazeh tana → Atayal tanaʔ, Bunun tana > Kavalan tani → Saisiyat taniʔ In contrast, the diffusion of %BIYALAR~BARAYAR is traceable not only through irregular phonological correspondences r ~ h ~ g but also through morphological evidence, such as gender-register suffixes and fossilized infixes preserved in Atayalic forms. These morphological features provide a more reliable diagnostic tool than phonetic similarity alone for reconstructing the spreading pathways. Table 16: Irregular correspondences in %BALAYAR~BIYALAR ‘Alocasia’ Form Languages and Examples Irregularity %BARAYAR ~BIYALAR ‘Alocasia’ Paz. biarax Sai. byaraL BunBh. baihal SedTr. brayaw AtaPl. bagayaw - Irregular r ~ h ~ g - cf. Taokas bixax ‘leaf’ Table 17: Possible spreading pathways of %BIYALAR~BARALAR ‘Alocasia’ PAn Inherited forms Semantic change Assimilation and Metathesis *biRaq ‘leaf’ > Kavalan biRi ‘leaf’ > Taokas bixax ‘leaf’ → Pre-Proto-Atayalic **biRa-yaR ‘Alocasia’ → % BIYALAR ~ BARALAR Furthermore, the coexistence of two plant-related bundles—one centered on forest and wild plants, the other on cultivated crops—suggests that borrowing was shaped by semantic domains and ecological contexts. Rather than reflecting a single event, the observed patterns point to different routes of diffusion shaped by specific cultural and environmental factors. 5.2 Main Diffusion Areas in Northwestern Taiwan The shared plant-related vocabulary among the languages of northwestern Taiwan likely arises from early language contact rather than inheritance from a common ancestor. Two main mechanisms of lexical spread can be identified: contact via trade and dominant- language influence. To better illustrate the patterns of diffusion, a similarity matrix (Table 18) was constructed, showing the number of shared plant-related terms between language pairs. Table 18: Similarity Matrix of Shared Plant-related Forms Tks Kav Paz Sai Ata Sed Bun Tha Tks 4 Kav 2 8 Paz 2 3 16 Sai 3 5 14 22 Ata 1 2 6 11 15 Sed 1 3 8 11 10 19 Bun 1 3 5 6 7 8 12 Tha 1 2 5 5 7 8 9 10 The results illustrate the shared terms found among different language groups, with higher numbers indicating more frequent contact or stronger historical ties. For instance, the higher similarity between Pazeh (Paz) and Saisiyat (Sai) suggests these languages may have shared a cultural or historical background, resulting in more interactions over time. Based on distribution and historical contact patterns, we categorize these shared words into three groups: (1) cross-regional shared words, such as %BALAYAR ~ BIALAR ‘Alocasia’ (2) words found in the Atayal diffusion area, such as %DALU ‘persimmons’ and %BASAR ‘Broussonetia trees’; (3) words spreading across the northern coastal areas, such as %QAWPIR ‘sweet potatoes’ and %SILIH ‘chili peppers’. Dominant languages, such as Atayal, could have exerted significant influence over neighboring languages through sociopolitical or demographic factors. The vocabulary borrowed from dominant languages often retains more regular phonological adaptations and sometimes even morphological features, reflecting the power dynamics between neighboring language groups. On the other hand, contact through trade facilitated the spread of items such as ‘sweet potato’ and ‘chili pepper,’ which were likely exchanged due to human mobility and economic activities. These borrowings often exhibit widespread distribution across linguistic boundaries, but they may lack deep morphological integration, as trade-induced borrowings typically involve more phonologically variable forms. Further examination of the plant categories shows that different diffusion waves were likely associated with different periods of contact. Wild and forest plants, for example, tend to be concentrated in the mountain regions, while imported crops like sweet potato and chili pepper appear more frequently in coastal areas. This division suggests that the diffusion process occurred in at least two separate waves, corresponding to distinct historical contact events. 5.3 Implications for PAn Reconstruction The identification of regional Wanderwörter is crucial for maintaining the integrity of Proto-Austronesian reconstruction. Without recognizing the effects of early contact and borrowing, there is a significant risk of misinterpreting borrowed forms as inherited cognates, which could lead to the construction of artificial subgroupings and obscure true phylogenetic relationships among languages. In particular, the presence of shared plant-related vocabulary across otherwise distantly related languages points to historical interactions that are independent of common descent. Such shared forms often cluster geographically rather than genealogically, suggesting diffusion through trade, intermarriage, or other forms of sustained contact. Moreover, some Wanderwörter may not simply reflect Austronesian-internal borrowing but could represent deeper substratal influences. The possibility that prehistoric Taiwan harbored non-Austronesian-speaking populations, such as potential prehistoric Negrito groups, raises the hypothesis that certain lexical items might have been adopted from extinct or poorly documented languages. If so, these substratal layers could explain phonological irregularities or semantic domains (such as forest plants) that are otherwise difficult to account for within Austronesian comparative frameworks. By identifying lexical diffusion areas, we can avoid misinterpreting borrowed forms as inherited vocabulary. This approach allows us to gain a more nuanced understanding of the historical interactions and contact dynamics that shaped the development of the Formosan languages. Therefore, careful attention to irregular correspondences, semantic fields, and geographic distribution is necessary. A stricter methodology that separates true inherited forms from contact-induced vocabulary allows for a cleaner, more reliable reconstruction of Proto-Austronesian, providing a clearer picture of Taiwan’s complex linguistic prehistory. 6 Concluding remarks In this study, we reexamined the distribution and phonological variation of plant-related vocabulary across languages in northwestern Taiwan. Previous works have noted certain shared lexical items among Atayalic, Saisiyat, Pazeh, and other languages in northwestern Taiwan, but the mechanisms underlying their spread and variation have not been systematically explored. To gain a better understanding of these shared forms, we provided a structured overview of plant vocabulary, identifying both regular phonological correspondences and irregular deviations, and categorizing them by plant types. Evidence of Wanderwörter suggests that these forms were not simply inherited but also shaped by processes of diffusion and re-borrowing. A similar analysis of animal names or material culture terms could reveal additional corridors of contact. Expanding the scope of research to include these domains may uncover further evidence of linguistic diffusion. Moreover, research into trading history could provide a more definitive timeline for the diffusion of these plants and their associated vocabulary, grounding the historical layers of borrowing in a more concrete temporal framework. In conclusion, the study of plant-related Wanderwörter in northwestern Formosan languages emphasizes the importance of recognizing contact histories in lexical studies. It shows that shared vocabulary can point to not only genetic inheritance, but also complex trajectories of diffusion, re-borrowing, and semantic change. Careful disentanglement of these processes enriches our understanding of Formosan linguistic prehistory and offers broader insights into the dynamics of language contact and cultural exchange in prehistoric and maritime Taiwan. Abbreviations Ami. Amis (Central) Ata. Atayal AtaMx. Mayrinax Atayal AtaPl. Plngawan Atayal AtaSq. Squliq Atayal Bun. Bunun BunTd. Takituduh Bunun BunBh. Takibakha Bunun BunIs. Isbukun Bunun Paz. Pazeh (Auran) Pap. Papora Sai. Saisiyat (Ta’ai) Sed. Seediq SedTg. Tgdaya Seediq SedTr. Truku Seediq Tha. Thao Tks. Taokas PAn Proto-Austronesian References Blust, Robert. 1999a. Notes on Pazeh Phonology and Morphology. Oceanic Linguistics. 38.2:321–365. Blust, Robert. 1999b. Subgrouping, circularity and extinction: some issues in Austronesian comparative linguistics. In Zeitoun, Elizabeth and Paul Jen-kuei Li, Selected Papers from the Eighth International Conference on Austronesian Linguistics, 31–94. Symposium Series of the Institute of Linguistics Preparatory Office, Academia Sinica, No. 1: Taipei: Academia Sinica. Blust, Robert. 2009. The Austronesian languages. Canberra: Pacific Linguistics, Research School of Pacific and Asian Studies, Australian National University. Campbell, Lyle, and Mauricio J. Mixco. 2007. A Glossary of Historical Linguistics. Glossaries in Linguistics. Edinburgh: Edinburgh University Press. Ferrell, Raleigh. 1969. Taiwan Aboriginal Groups: Problems in Cultural and Linguistic Classification. Institute of Ethnology. Academia Sinica Monograph, 17. Ferrell, Raleigh. 1970. The Pazeh-Kahabu language. Bulletin of the Department of Archaeology and Anthropology, National Taiwan University 31/32:73–97. Goderich, Andre. 2020. Atayal phonology, reconstruction, and subgrouping. Doctoral dissertation. Graduate Institute of Linguistics, National Tsing Hua University. Indigenous Languages Research and Development Foundation. ongoing. 原住民族語言線上辭典 yuánzhù mínzú yǔyán xiànshàng cídiǎn [Online Dictionary of Indigenous Languages]. https://urldefense.com/v3/__https://e-dictionary.ilrdf.org.tw/index.htm__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPriTBILvc$ Li, Paul Jen-kuei. 1978. A Comparative Vocabulary of Saisiyat Dialects. Bulletin of the Institute of History and Philology 49:133–199. Taipei: Academia Sinica. Li, Paul Jen-kuei. 1980. Men’s and women’s speech in Mayrinax. In Papers in Honor of Professor Lin Yu-k’eng on Her Seventieth Birthday, 9–17. Taipei: Wen Shin Publishing. Li, Paul Jen-kuei. 1982. Male and female forms of speech in the Atayalic group. Bulletin of the Institute of History and Philology, Academia Sinica 53:265–304. Li, Paul Jen-kuei. 1994. Some plant names in Formosan languages. In Andrew Pawley & Malcolm Ross (eds.), Some plant names in Formosan languages. Austronesian Terminologies: Continuity and Change, 241–266. (Pacific Linguistics C-127.) Canberra: Pacific Linguistics. Li, Paul Jen-kuei. 2003. The Internal Relationships of Six Western Plains Languages. In Paul Jen-kuei Li, Selected Papers on Formosan Languages, vol. 2. Taipei, Taiwan: Institute of Linguistics, Academia Sinica. Li, Paul Jen-kuei & Tsuchida, Shigeru. 2001. Pazih dictionary. Language and Linguistics Monograph Series No. A2. Taipei: Institute of Linguistics Preparatory Office, Academia Sinica. Lim, Hong-sui. 2022. A reference grammar of Kaxabu, a moribund Formosan language. Doctoral dissertation. Nantou, Taiwan: Graduate Institute of Chinese Literature, National Chi Nan University. Pan, Samuel Yu-hsiang. 2022. The historical relationships of languages in northwest Taiwan reconsidered. Historical Relationships of East and Southeast Asian Languages 2022. Pan, Yung-li. 2015. Kaxabu dictionary. Nantou: Shoucheng Community Development Association. Robert Blust, Stephen Trussel, & Alexander D. Smith. 2023. CLDF dataset derived from Blust’s “Austronesian Comparative Dictionary” v1.2 [Data set]. Zenodo. https://urldefense.com/v3/__https://doi.org/10.5281/zenodo.7741197__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrsZEAvNo$ , https://urldefense.com/v3/__https://acd.clld.org/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrpsXZMYf$ Song, Walis Hian-chi. 2025. Eluw ndaan kari Seediq: The historical development of the Seediq language. M.A. thesis. Graduate Institute of Linguistics, National Tsing Hua University. Tsuchida, Shigeru. 1982. A comparative vocabulary of Austronesian languages of sinicized ethnic groups in Taiwan, Part I: West Taiwan. Memoirs of the Faculty of Letters, University of Tokyo No.7: 166pp + 1 map. Tsuchida, Shigeru. 1985. Kulon: Yet another Austronesian language in Taiwan? Bulletin of the Institute of Ethnology 60:1–59. THE PRECEDENCE FOR A CONSERVATIVE APICAL *R IN PROTO-MALAYIC: ANALYSIS OF EARLY MODERN MALAY'S /R/-PHONEME FROM FOREIGN SOURCES M Natsir Fachruddin SURYATAMA Universiteit Leiden s4769392@vuw.leidenuniv.nl Abstract This paper will discuss the reconstruction of Proto-Malayic *r as a velar, or uvular fricative, as is done in Adelaar (1992b). This reconstruction was justified through evidence of Peninsular and Sumatran isolects having a majority dorsal distribution, in contrast to Malayic isolects spoken outside of the areas where Malayic isolects are not native, such as Jakartan Indonesian or Papuan Malay—where the dorsal fricative realization is said to have been replaced through L2 acquisition. However, the paper will seek to argue that the notion of a unanimous reflex of *r as a dorsal fricative as an ancestral feature of Malayic may be challenged when looking into Early Modern documentations of the Malay language. The materials that will be analyzed will be the orthographic conventions from three different wordlists dated from around the fifteenth to the onset of the seventeenth century. The documents sourced are traced to two European sources and one source of Chinese origin. While all three of the documents are of vastly divergent origins, nevertheless they all point to Early Modern Malay as having an apical realization of the rhotic, challenging previous notions of the apical realization of the phoneme as having an isolated distribution exclusively in koiné or contact-influenced isolects. This challenges the idea of the dorsal realization of the rhotic as a conservative, rather than an innovative feature of the Malayic subgrouping. In addition, this paper will also discuss a parallel case of a more established apical-to-dorsal rhotic shift, namely, the European “guttural r”, and how said rhotics have been perceived in borrowings through creolization and contact. Keywords: Malay, Early Modern Malay, Historical Linguistics, Rhotic, Malayic ISO 639-3 codes: zlm Introduction In his thesis, Adelaar (1992b) sought to reconstruct the Proto-Malayic morphological structures and its respective phoneme inventory. Alongside phonemic reconstructions, the thesis also provided claims of phonetic realizations of each of the Proto-Malayic phonemes. However, this paper will only focus on the thesis’ claims in regard to the liquid protophoneme, *r (henceforth, the rhotic). In Adelaar’s work, the *r is reconstructed as being dorsal1 based on the fact that *r reflexes as a dorsal (cf. Table 1) in most Malayic isolects in the Malay Peninsula, Borneo, and Sumatra. As a preliminary, the reconstruction of the Proto-Malayic liquid as a velar *r was first observed by Collins (1986) and later expanded upon by Adelaar (1992b). Even before Adelaar’s thesis, Collins’ works also stipulated that the Proto-Malayic rhotic was a dorsal; in part, due to the fact that most “Core” areas of Malayic speakers had a dorsal reflex, in opposition to “secondary” Malay isolects with primarily an apical rhotic reflex, (also cf. Table 1). Adelaar (1992b) furthered argued against the realization of Proto-Malayic *r as an apical trill, as he observed that a majority of cases of apical rhotic realization are due to language contact from Standard Malay and Indonesian, which generally have an apical rhotic. Contact-mediated languages also uniformly reflect a rhotic as an apical trill (cf. Baba Malay (Lee 2022), Papua Malay (Kluge 2014), and Colloquial Jakartan Indonesian (Sneddon 2006)). While the trend of the reflex of *r in the Core Malay areas as a dorsal is generally true, this phenomenon seems to be much more extensive in the Malay Peninsula and Sumatra than it is in Borneo. In the Malay isolects of Banjar Hulu (Durdje & Durasid 1978) and Salako (Adelaar 2005), the reflex of *r is an apical trill; however, in opposition, Collins’ (1997) dialectological survey of West Borneo also found isolects which reflect a dorsal pronunciation: among which is Pantai Mas, which realizes the /r/ phoneme as a uvular trill. Even in Core Malay isolects, the apical realization is observed to have spread with isolects in more urbanized areas (cf. Anderbeck 2008 in regard to the split of [r] and [ɣ] in Jambi Malay varieties). This contact-induced shift of the pronunciation of the rhotic phoneme is generally attributed to the increasing contact with Standard Indonesian and Javanese in these urban/port areas. In some cases, the earlier dorsal rhotic can also form doublets, as discussed in by McDowell & Anderbeck (2020) in regard to Ogan, where the inherited reflex of *buru ‘hunt’ is buχu ‘chase away’ versus a loaned buru ‘chase ’. A similar phenomenon of recent loans in other Malay varieties is also observed in Gil (2024), where it is observed that in Siak Malay and Kaur, loans from Standard Indonesian are rendered with the apical trill, while the native reflexes of *r are [ɣ] and [ʕ] respectively. Evaluating the literature on the diachronic and synchronic realizations of the Malay rhotic, it seems safe to assume that the Malayic *r was originally realized as a dorsal, and that all coronal realizations can be attributed to L2 acquisition or standardization. However, three documents dating from before the sixteenth century to the onset of the seventeenth century seem to contradict the earlier reconstructions—namely, Antonio Pigafetta’s Vocaboli de Questi Popoli Mori (Henceforth PG; dated 1521), Yang Lin (楊林)’s Mǎnlàjiā guò yìyú (滿剌加國譯語) (henceforth YL; dated Pre-1511), and Frederick de Houtman’s Spraeck en de Woord-boekin (Henceforth dH; dated 1603). Taking these sources into account, I will attempt to argue that this shift from an apical to a dorsal realization is quite recent and can be traced back through written records. Earlier Work on Malayic Liquids This work is a direct result of the current consensus of Malayic diachrony of the rhotic consonant *r being a velar fricative. These conclusions stemmed from the forementioned works of Collins (1986) and Adelaar (1992b), who both postulate that Proto-Malayic *r was realized as a dorsal, either as *[ɣ] or *[ʁ]. In more recent works of Anderbeck (2024), he further corroborates this argument for the reconstruction as he claims that the sound change of velar of *r > y would be more probable from a dorsal fricative and not from an apical flap or trill. The Nineteenth Century Descriptions of Malay Phonemes These more recent works regarding the diachronic attestations of Malayic’s rhotic, however, are not the first attestations of the guttural /r/ sounds in Malay(ic) languages. As earlier works of the nineteenth century seem to have noticed such pronunciations of the Malay /r/ phoneme. Mentioned in Adelaar (1992b) is Fokker’s (1895) works on Malay Phonetics. In Fokker (1895), it is also further remarked that the distribution of the guttural is quite widespread. He not only noticed the distribution of the guttural r in Borneo—where synchronic distribution is uncommon—but also in Batavia (now Jakarta) and Singapore. This peculiarity is also noticed by Adelaar, who made note of his assertion in his reconstruction of Proto-Malayic’s *r. Several nineteenth century works also corroborate Fokker’s remarks on the Malay guttural /r/, In Hugh Cliffurd’s (1852) Dictionary of the Malay Language, the “r” sound is remarked to have a guttural quality, as, in his words, is comparable to the German pronunciation of the word Berg, and he also likened it to the pronunciation of the Northumbrian Burr 2—a uvular pronunciation of English /r/ first noticed in literature by eighteenth century novelist Daniel Defoe’s (1724-1727) A Tour Thro’ the Whole Island of Great Britain. However, the dictionary provides some dialectical differences that first give insights into the split of the /r/ phoneme in Malay. While primarily describing /r/ as a guttural in Perak and Terengganu3, he also describes that there is a “tendency to roll the letter r is also observable among the Malays of Sumatra”. However, this statement does not fully cover the complexities of the reflexes of *r on the island, as Anderbeck’s (2008) dialectological surveys show a plethora of reflexes that are attested within Jambi and South Sumatra alone. In juxtaposition to the explicit remarks to the Northumbrian Burr in the works of Fokker and Cliffurd, Swettenham’s Vocabulary of the Malay Language provides a vaguer description of the phonetic qualities of the letter r, as he describes it as “a peculiarity of the people that they lay much stress on the r on the pronunciation.” However, this description is rather too vague and broad for any meaningful phonetic assumptions to be made. The book, however, notes that in Kedah, the chiefs speak with what he described as a “curious lisp”, citing that there is a lack of the pronunciation of /s r l/. This reflects the current data on Malay spoken in Kedah, (cf. Mohd Tarmizi Hasrah & Mohammad Khirulanwar Abdul Ghani 2021), where the Baling subdialect of Kedah underwent a change of final *r *l > ∅4 while final *s > h occurs. Another English scholar, Marsden—whose 1812 work predated most of the English scholarship—noted a pronunciation guide for Malay in his work A Grammar of the Malayan language. A stark juxtaposition is made by the fact that r is described without a reference of a guttural pronunciation, while the letter gh (represented in Jāwī as TRT غ) is likened to the Northumbrian pronunciation. Dutch scholars working on the Malay language, such as van Wijk’s (1890) work Spraakleer der Maleische taal, give more ambiguous qualities (notwithstanding the state of phonetic descriptions in the nineteenth century) as “softly spoken in some regions” after the vowel ê (= /ə/) such that lexemes such as pərnah are realized as pənah and that this is most noticeable in the prefixes of pər-, bər-, tər- vel sim. This description of an elision of /r/ in these prefixes is not unprecedented, as Deterding et al. (2022) shows that this elision of postvocalic /r/ in these prefixes is common in many Malay isolectical varieties. The Variability of Malayic *r Reflexes This section will discuss the attested reflexes of Proto-Malayic *r, with regards to isolectical differences and subgrouping differences. While prior to this section, the paper has referred to this concept as a single phoneme, the term reflexes of a protophoneme may be more appropriate instead. Of course, “diaphonemes” might also be appropriate; however, this terminology might not be fully apt due to the fact that doing such would imply that reflexes of *r form a singular perceptual category between speakers. McDonnell, quoted in Gil (2024)’s case studies of borrowing between Standard Indonesian and Kaur perhaps illustrates that the reflexes are perceptually different phonemes in certain isolects. Consider recent loans from Standard Indonesian such as partai “party” (ultimately from Dutch partij) versus native lexemes such as ʕuma “house” from Proto-Malayic *rumah. Another occurrence can also be noted in Ogan (McDowell & Anderbeck 2020), where *buru “to hunt” creates a lexical doublet. As noted in (ibid.), the native reflex of *r is [χ], where the native buχu is glossed as “chase away” and thus, there is a precedence that—in some cases—the apical and dorsal can be perceptually different and aren’t a neat category of their own. Much like the variable realizations of the rhotics in Europe, Malayic isolects also differ a lot in their inter-variety realizations. This variability between apicals and dorsal realizations is not unprecedented outside of Europe (cf. Magnuson (2008) on the variability in Japanese varieties). The situation in Malayic, however, most closely resembles the rhotic variation found in European languages, whereby a bipartite split between alveolar/guttural pronunciations occur between the varieties. This split hearkens back to languages in Western and Central Europe with high variability of rhotic realizations, such as French, Dutch (Sebregts 2014), Portuguese (Rennicke 2013), and German (Wiese 1996). The internal variety of the rhotics within the mentioned European languages have diachronically undergone a unanimous directional shift from an apical trill (or tap) realization towards the phonological reduction towards the dorsal places of articulation.5 A notable difference in the system of Malayic *r reflexes (see Table 3) and the European languages with a high degree of variance of the /r/ realization is the fact that there exists a gap of any approximant6 realization. While the latter could be explained by the fact that the trill tends to commonly surface as a tap crosslinguistically (Lindau 1985)—and is observed in many apical trill isolects (cf. Sodeberg & Olson (2008) for Indonesian data), the gap in the latter varieties seem to be quite interesting, as there are parallel implications that can be drawn from this phonological gap. Table 1: Selected Reflexes of Malayic *r Realization Isolects within Malayic Apical r Jakarta Indonesian (Sneddon 2006) Papua Malay (Kluge 2014) Baba Malay (Lee 2022) Salako (Adelaar 1992a & 2005) Dusun Teluk (Jambi Malay dialect) (Anderbeck 2008) Ketapang Malay (Sulissusiawan et al. 1998) Standard Indonesian/Malay, etc. Dorsal ɣ Kelantan and Terengganu Malay (Jiang Wu 2023) Mualang (Tjia 2007) ʕ Kaur (McDonnell a p.c., quoted in Gil 2024) ʁ Sekujam (Collins & Herpanus 2018) Most Jambi Malay varieties (Anderbeck 2008) x Serawai (Helfrich 1904, quoted in Adelaar 1992b) χ Ogan (South Barisan Malay isolect) (McDowell & Anderbeck 2020) h Seling (Jambi Malay isolect) (Anderbeck 2008) This, however, mirrors the cases in the European languages with a large variability of the /r/ sound. According to Sebregts (2014) and Rennicke (2013), the diachronic relationships between *r > [ɹ] and *r > [ʁ] are caused by different sets of phonological reductions. As noted in Sebregts (2014), the [ɹ] in Dutch—and its subsequent approximant reductions such as [j] and [ɻ]—are caused by a reduction of an apical trill to a tap [ɾ] or a fricated [r̝]. In contrast, the reduction of an apical trill to a uvular trill is considered by Sebregts (2014) to have arisen from a retention of child language acquisition. As observed by Vihman (2014), [ʀ] is observed in the babbling stage, whereas the apical trill [r] is absent. This retention of child language within rhotics isn’t a sui generis case either; compare this case study with Knight et al. (2007) and King & Ferragne (2020), where Southern British English /r/ may be realized as the “infantilism” of [ʋ]. The Documents This section introduces each of the sources that will be discussed in the paper. As a preface to the following subsection of the paper, matters of Romanization need to be addressed. Firstly, in this article, I have opted to use a toneless variation of Standard (Modern) Mandarin Pinyin for ease of use. I have done this because the tones used in Mǎnlàjiā guó yìyǔ do not seem to make much—or any—difference within transcription. In the transcriptions discussed in 5.1, the document uses Chinese characters to phonetically describe Malay words. However, the tone of the syllable seems to be irrelevant. The phonological differences between Early Modern Mandarin and Standard Mandarin itself are quite minimal. Thus, a transcription in the more accessible Standard Mandarin is still suitable for discussions of phonological differences. The largest difference of note is that Early Modern Mandarin had not undergone palatalization of /s z k kʰ/ before /i/, perhaps best illustrated by the loanwords in Malay xi xi er - 西昔兒 - sisir ‘comb’, and ji lan - 肌藍 - kilat ‘lightning’. Mǎnlàjiā guó yìyǔ Provenance The Mǎnlàjiā guó yìyǔ (滿剌加國譯語) (henceforth YL) is a wordlist of four-hundred and eighty-two Malay words rendered phonetically in Hanzi. Despite the fact that a majority of Chinese settlers in Nusantara are Southern Min of origin, the document itself is written in Mandarin. It is the oldest attested foreign source of a Malay isolect. Edwards & Blagden (1931) dates this document to the fifteenth to early sixteenth century, prior to the Fall of Malacca to the Portuguese in 1511. This date was estimated by Edwards & Blagden (1931) from the first attestations of the Ming Dynasty’s envoys of Malacca (1403)7. This manuscript is itself a revision of an earlier text—as noted by Edwards & Blagden (1931)—the appendix of the document notes that was compiled by the interpreter Yang Lin 楊林 working for the Sìyìguǎn 四譯館 (College of All Foreigners) in the twenty-eighth year of the Jiajing Emperor (1549 A.D.). This Malay wordlist is only one of many in the volumes compiled by the Sìyìguǎn. As Hoogervorst (2021)8 notes, other languages were also studied by the Sìyìguǎn, such as Tibetan, Sanskrit, Persian, Shan, Uyghur, and Burmese. The latter date estimate is notable due to the fact that YL’s wordlist do not contain any loanwords of European origin. In contrast, Malay loanwords of pre-European strata (Arabic, Persian, Sanskrit, and Sinitic) origin are observed in the wordlist. Having discussed the provenience of this document, it can be safely concluded that the isolect documented in YL was of a peninsular isolect spoken around Malacca. However, whether the isolect documented in YL is a Traditional or Koiné Variety of Malay is left to debate and cannot necessarily be conclusively determined with the wordlist alone. This detail that can’t definitely be pinned down may be crucial if we were to go by Adelaar’s (1992b) analysis of the apical [r] realization as being caused by L2 Acquisition (as is evident in the unanimous realization of the apical [r] in Betawi (Ikranagara 1975), Papua Malay (Kluge 2014), or Baba Malay (Lee 2022) among many others). As for the document itself, the wordlist consists of 482 lemmas in the Malay language. Each of these lemmas are divided into seventeen sections, which are shown in Table 2. Table 2: Chapters of the YL Wordlist Part Title Translation Part Title Translation I 天文 Astronomy IX 人事 Human Affairs II 地理 Geography X 身體 Body III 時令 Time XI 衣服 Clothes IV 花木 Flowers and Trees XII 飮食 Food and Drink V 鳥獸 Birds and Beasts XIII 珍寶 Jewels VI 宮室 Houses XIV 文史 Literature and History VII 器用 Implements, etc. XV 彩色 Colours VIII 人物 Persons XVI 數目 Numerals XVII 通用 Current Words The rhotics This section will discuss the distribution of the rhotics in YL. Unlike the other documents of European origin, the medium of phonetic transcription of Mandarin presents a problem with how /r/ is rendered, due to Chinese’s lack of a distinction between the lateral /l/ and a (tap-like) rhotic /r/9 in initial positions. Thus, in lieu of /r/, a majority of the lexemes where /r/ is found in Modern Malay are often transcribed with /l/, with a few exceptions as will be discussed later in this subsection. While having the least information on phonetics out of all the three early documents discussed in the paper, YL contains entries with the phoneme /r/ in all positions of a word, including initially, as in (lu sa) = rusa “deer”; medially, as in (bu la si) - bêras “husked rice”; and finally (su mu er), = sumur “well”. In the initial and medial position, Malay /r/ is rendered with characters that correspond to the lateral /l/ in Modern Mandarin. This can be shown as in Table 3. All lexemes with word-initial /r/ but one are rendered with initial /l/. This occurrence is discussed later. Table 3: /r/ in Initial Position in YL Gloss Standard Malay YL Pinyin IPA ‘deer’ Rusa 鹿撒 lu sa /lusa/ ‘rattan’ Rotan 澇丹 lao dan /laotan/ ‘king’ Raja 剌札 la zha /lat͡ʂa/ While most of the transcriptions are uniform—with word-initial /r/ being rendered as an /l/ onset—one exception should be noted: the transcription in e.465 is surprising as it continues to muddy the data at hand. The entry glossed as ‘low’ is rendered as en da - rəndah lacks this initial r- as /l/. This curiosity of transcription is also noted by Edwards & Blagden (1931), though whether it is reflective of phonology or merely a transcription error is impossible to discern as this occurrence is one that is sui generis among the YL entries, especially given the other errors of transcription in this document. In word-medial positions, a similar trend can be observed where the transcription of the phoneme /r/ in Contemporary Malay is rendered with the phoneme /l/ in Mandarin. There are notable exceptions to this. In contrast to the near-unanimous transcription of /r/ as /l/ in word-initial positions, some entries (as shown in Table 6) show that the phoneme /r/ is transcribed as either /n/ in some of the entries in YL. Unlike the once-occurring loss of /l/ in entry 465 , this substitution of the more-straightforward /l/ with /n/ occurs in multiple entries, suggesting, thus, that this would not merely be an error in transcription. It is also of note that these phonemes are evidently coronal, rather than dorsal in their features. Again, this strongly suggests that the realization of the phoneme /r/ in seventeenth-century Malacca was an apico-alveolar trill, rather than a dorsal fricative. Table 4: /r/ in Medial Position in YL Gloss Standard Malay YL Pinyin IPA ‘thunder’ Guruh 孤路 gu lu /kulu/ ‘bright’ Terang 得浪 de lang /telaŋ/ ‘day’ Hari 哈利 ha li /hali/ ‘north’ utara 烏答剌 wu da la /wutala/ ‘tiger’ harimau 亞利毛 ya li mao /jalimao/ ‘run’ lari 剌利 la li /lali/ ‘silver’ perak 必剌 bi la /pila/ Table 5: /r/ in Final Position in YL Gloss Standard Malay YL Pinyin IPA ‘well’ sumur 末的亞納 su mu er /sumuɚ/ ‘comb’ sisir 必答納 xi xi er /ɕiɕiɚ/ ‘water’ air 卜記那 ya i er /jaiɚ/ Another remarkable transcription choice can also be shown by two entries which render the phoneme /r/ in medial position as /t/1 such as entry 321 di di - (ber)diri23 ‘stand’ and also entry 380 zhe di, which Edwards (1931) identified as jari ‘finger’. In sum, while in the medial position, the usual transcription of /l/ is present. There are also other ‘aberrant’ transcriptions that point towards a coronal realization through their likeness in their points of articulation. Setting aside the liquid /l/, which has been discussed in its use in the word-initial position, the other phonemes, /n/ and /t/ share segmental similarities in regard to points of articulation (and also sonority, in the case of /n/). Lastly, we consider the rendition of /r/ in the final position. In place of the /l/ used in the majority of entries in the initial and medial position, word-final /r/ is often transcribed using the rhotacized vowel /ɚ/ as in ya i er - air ‘water’. A list of final /r/ rendered as the rhotacized /ɚ/ is shown in Table 7. A Brief Aside on Lexical Analysis While most of the lexical material in YL are Pan-Malay—that is, most of the lexical entries are seen in most Malay isolects—several entries in YL are distinct in terms of their characteristics. The task of comparing the lexical material of YL is quite difficult given the limited material provided. However, in future studies, a discussion on the pronouns is warranted; in YL, two personal pronouns are found: bi ta (= beta) ‘1SG’ and duan nan ba (= tuan hamba) ‘2SG’. Relevant to our current interest is the first-person pronoun beta that is present in Classical Malay texts (Adelaar 1992b) though it is not much used in many of the Standard and ‘Core’ varieties of Malay today. Synchronically, this pronoun still survives in the spoken language and is largely a characteristic of Eastern Indonesian isolects of Kupang, Makassar, or Ambon (Paauw 2009). These pronouns, conversely, are not found in the contact varieties of Malacca Malay, such as the isolects of Chetty Malay (Mohamed 2006) or Baba Malay (Lee 2022). In juxtaposition with the eastern contact varieties which utilize beta for the first person pronouns, Chetty and Baba Malay varieties utilize the pronoun gua (< Hokkien góa), a trait, shared among many other isolects, predominantly in areas of high Chinese influence, perhaps most known in Colloquial Jakarta Indonesian (Sneddon 2006). In sum, does this relative closeness towards a more traditional variety of Malay in regard to the pronominal system—in contrast to a contact-variety—conclusively prove that the isolect as written in YL is a traditional variety? Again, not much can be said due to the limited data source and the limited textual history of the contact varieties of Malay. Conclusions on YL In YL, some syllables that have the same phonemes are sometimes rendered inconsistently, such as the forementioned use of , bǔ bǔ, and bù used for the sequences of /bu/ and /pu/. This inconsistency, as previously mentioned, led to Edwards & Blagden (1931) proposing that the text was compiled by multiple authors. Assuming that his analysis of multiple authorship is true, this would mean that multiple people associated the Malacca Malay /r/ sound with an alveolar lateral and an alveolar approximant word finally between the fifteenth to early sixteenth centuries. While Edwards & Blagden (1931) notes a caution that the limited syllable inventory of Mandarin prevents much phonetic detail to be extracted from YL—and, while it is fair to assume that not much can be inferred from the phonetic transcriptions—it at least shows that in the Core area of Malacca in the fifteenth century, an apical pronunciation is observed for the /r/. It is thus evident that the patterns of transliterations into Chinese characters follow that of an apical trill rather than a dorsal fricative, in part, due to the fact that YL renders /r/ in line with Modern Mandarin alveolar /l/ and the erhua (i.e., rhotacization of final sounds) for postvocalic /r/ sequences. It can also be shown that these transcriptions follow trends where transcription of an apical /r/, as shown with comparisons of Chinese transliterations of the Manchu language, which—in concord with Malay—contrast /r/ and /l/, unlike Chinese. Thus, in YL, the /r/ phoneme can be safely inferred to be an apical phoneme, rather than a dorsal. This is in juxtaposition with the sparse phonetic details that are immediately evident in the text itself. Vocaboli de Questi Popoli Mori Vocaboli de Questi Popoli Mori (PG hereafter) is the earliest known Western source of the Malay language, and by extension, any Malayic isolect. The document was first composed by Antonio Pigafetta of Vicenza, a Venetian explorer aboard the Ferdinand Magellan expedition around the world. Pigafetta’s wordlist of Malay is only one of the many wordlists that he had compiled during the expedition around the world, which also included early wordlists of the languages of the Philippines and the Americas, such as ‘Patagonian’ (identified as Tehuelche; cf. Fernández Rodríguez & Alejandra Regúnaga 2020). The wordlist itself is similar in composition with YL, having a modestly-large 426 entries of Malay words with glosses in Italian—or more aptly, rather, Venetian with Italianized and Hispanicized characteristics (cf. thus, Bausani 1960). Provenance As provided by Bausani (1960), the wordlist as composed by Pigafetta was written around the year of 1521, when the Magellan expedition had sojourned in the Moluccas, a part of the modern conception of the ‘Nusantara Archipelago’. PG’s isolectical profile is a subject of past debates, as summarized by Bausani (1960). While many early interpretations of Pigafetta’s wordlist—such as those by Schuchardt (1890)—interpret this as perfectly corresponding to a variety of ‘Moluccan Malay’, this was argued to be not the case by following authors. Le Roux (1929) also proposed that the provenance of the document had a single origin; that is, that the document was composed in Tidore during Pigafetta’s stay at Tidore. This debate of a single-location where Pigafetta collected the words seem to be discredited by later works by Blagden (1931) as having entries mismatched between words of Bruneian and Philippine origin—suggesting, thus that the words in the wordlist have scattered origins. Most apparent to this is the occurrence of the entry tubig, glossed as hacqua ‘water’, a chiefly Central-Philippine lexeme with no reflexes outside of the Central Philippine branches of Malayo-Polynesian. As also noted by Blagden (1931), there are more similarities of lexical items of Malay origin with those of the Bruneian isolects than those of the Moluccas. This can be exemplified as in Table 8, with words taken from Awang Haji Muhammad bin Hj Jambul & Awang Alipuddin bin Haji Omarkandi (1997). Table 6: Bruneian Words in PG Gloss Pigafetta Standard Malay Brunei ‘Eyebrow’ quilai alis Kiray ‘Chin’ aghai janggut Ajay ‘Neck’ tun dun leher Tundun ‘Dog’ cuiu anjing Kuyuk ‘Salt’ sira garam Sira Unlike the two other documents discussed in this paper, Brunei Malay is a unanimously apical lect (Clynes 2001). While at first, this would make the presence of or in these documents unremarkable, the neighboring lects of Kampung Ayer and Kadayan show an interesting sound change which Anderbeck (2024) notes as a sound change that could only have happened through a stage of *ɣ. In Kampung Ayer, a sub-isolect of the Brunei isolect of the reflex of *r is unanimously reflected as a palatal glide. Poedjosoedarmo (1992) thus postulates that these two isolects are closely related to one another and form a distinct subgrouping, with lexical cognate analysis showing a 95% cognate between the two varieties. Furthermore, he argued that the Kampung Ayer dialect is more archaic than Brunei Malay, which he analyzed as having only diverged from the Kampung Ayer-Brunei Malay subgrouping after the founding of Brunei Town. Despite this focus on Bruneian etymology, Clynes (2001) notes the fact that some of the entries are not of Brunei origin. See the inclusion of igao ‘green’ verus Brunei Malay gadung. Thus, the analysis below should be taken with a grain of salt, and should not be taken as representative of a specific isolect, except when they are a lexeme that can be clearly identified as such. The purpose of the following section is thus to show that regardless of the position in the word, the pronunciation of /r/ can be narrowed towards an apical realization, rather than a dorsal in Pigafetta’s documents. The Document As mentioned in the previous subsection, the document is a wordlist of Malay words with Italian glosses. Unlike YL, though, the reproduction of PG as per Bausani (1960) seems to only be partially divided into sections, if at all. Some entries are placed in somewhat of a clustered fashion, as in entries 89 to 97, which are exclusively names for animals, though one can also expect adjacent entries in the document to list several unconnected things, such as entries 233 and 234 ‘meat (carne)’ and ‘snail (corniolo)’ respectively. The wordlist is largely separated into individual entries, though two unique sections in Bausani’s reproduction in the document are labeled ‘the winds (Li Venti)’ and ‘numbers (Numero)’, standing in opposition to the semi-organized sections that precede it. In sum, this amounts to 426 entries of words, or multi-word phrases, that have been recorded in PG. Even though as a wordlist—as with YL as discussed before—not much morphology is displayed in the document. There are some multi-word entries which do show verbal morphology, such as the actor voice, as in magnurat - menyurat ‘to write’ as in entry 271—or the stative as in babini - bərbini ‘to be married’. The Rhotics The rhotics in PG are usually rendered as either or . The latter grapheme used to render /r/ is quite significant in establishing a closer phonetic correspondence than the former. This is due to the fact that the many articulatory similarities that /l/ share with /r/ are absent if the rhotic consonant as displayed in PG’s list are a dorsal fricative. Due to the mixed provenance of the entries, the analysis of /r/ should be treated with caution. While sometimes, the divide between a Bruneian and a non-Bruneian word is clear, many lexical items are shared by Bruneian and Standard Malay, or other Malay isolects. Thus, in lieu of a distinctive isogloss, these entries should wholly represent a specific isolect of Malay. In word-initial positions, /r/ is often rendered most consistently as , though is also found, as in lambut - rambut ‘head hair’. See Table 9. Table 7: /r/ in Initial Position in PG Gloss Pigafetta Malay ‘Deer’ roza rusa ‘House’ ruma rumah ‘King’ raia raja ‘(Head) hair’ lambut rambut In word-medial positions, the same could be said, as many of the variable or renderings are attested in word-medial entries. Furthermore, the presence of /r/ can also be shown in word-medial cluster positions as in cartas - kərtas (< Arabic qarṭās) ‘paper’. However, in the cases with the Standard Malay stative bər- prefix, it seems to have been dropped as in babini - bərbini ‘married’ and belaiar - bərlayar ‘to sail’. Also refer to Clynes (2001) on Brunei Malay’s ba- and ma- with cognates to Standard Malay bər- and mər-, which would hold more credibility towards Pigafetta’s phonetic transcriptions and evidence towards many entries’ provenance as Brunei Malay, in contrast to other isolects. Table 8: /r/ in Medial Position in PG Gloss Pigafetta Malay ‘Eyebrow’ quilai kiray ‘Salt’ sira sira ‘galley’ gurap gurab ‘blood’ dala darah ‘day’ alli hari ‘paper’ cartas kərtas In contrast to YL, the rendering of word-final /r/ is consistently rendered, with the caveat of the forementioned confusion between and . Some examples of /r/ in final position are given in Table 9. Table 9: /r/ in Final Position in PG Gloss Pigafetta Malay ‘Truth’ benar bənar ‘Neck’ laher léhér ‘Egg’ talor təlur ‘Comb’ sissir sisir ‘Thunder’ gunthur guntur ‘Merchant’ saudagar saudagar Similarly, /r/ is consistently shown with the grapheme , in line with the reflexes of Standard Malay final consonants. Thus, it is shown that the reflex of *r in the isolect(s) given in Pigafetta’s wordlists are rendered as either the graphemes or . Though, due to a lack of words clearly identifiable as Bruneian that have a word-initial and word-final /r/, it is difficult to ascertain whether any of the lexemes provided reflect a Bruneian provenance (in contrast to word-medial positions which include the distinctly Bruneian quilai and sira). Conclusion and Discussion As can be shown, the way in which Pigafetta noted /r/ as either the graphemes and point towards an apical realization of the isolect(s) as given in Pigafetta. This interchangeability of phonemes, especially given its presence in clearly identifiable Bruneian words, quilai versus sira thus also suggest that this apical pronunciation is attested in around the sixteenth century. Again, this is not unsurprising given the synchronic reflex of Bruneian *r as an apical trill (setting aside Kampong Ayer). However, this begs the question—if the Kampong Ayer variety is truly ‘ancestral’ or more conservative in contrast to the other Brunei Malay isolects, why is a (probable) apical trill /r/ recorded, in contrast to the zero attestation of the palatal glide /y/? Following the proposals of Poedjosoedarmo (1992) that the split of the Kampong Ayer and ‘Brunei Town’ dialects happened after the founding of Brunei Town. A major criticism was posited by Adelaar’s (1999) review of the following works by Poedjosoedarmo (1996) on Malay linguistics in Brunei, namely, the fact that through implying that the Kampong Ayer variety as conservative, one would posit that the direct ancestral variety of the two dialects share a protophoenemic */y/ for *r that is seen in synchronic Kampong Ayer. However, as can be posited by the historical documents collected by Pigafetta—especially those that can be traced to a Bruneian provenance—there is a notable lack of a palatal glide reflex for the rhotic. In fact, the graphemic correspondences seem to suggest that it was an apical trill, like Modern Day Bruneian Malay. These hypotheses are not mutually exclusive. One can still propose that the two varieties had developed from an Early Modern Malay variety which had an apical trill, which would have undergone a shift to a palatal glide after a splitting of the two varieties had happened. This is illustrated in Figure 1. Figure 1: A Proposal for Proto-Bruneian Malay Proto-Malayic 16th Century Brunei Malay Modern Day Bruneian *r [r] *r [r] → /y/ (Kampong Ayer) /r/ (Brunei Malay) The proposal above thus follows the frameworks of Poedjosoedarmo (1996) in classifying the two isolects of Brunei Malay as constituting a single grouping—in which a split of the two varieties happened after the founding of the historical Brunei Town. As Pigafetta’s documents predate the timeline of the supposed dialectical split, a realization of an apical trill for the shared common ancestor of both Brunei Malay and Kampong Ayer should be thus considered. Thus, this also implies a shift of *r > y (or > ∅ in some cases, as noted by Clynes 2001) happening in the Kampong Ayer dialect. Lastly, this split of an apical *r > y also has implications on Hoogervorst’s (2024) arguments for a velar fricative *r at the Malayic stage. He posits, thus, that a shift from a velar fricative to a palatal glide would be a natural one, in juxtapositon to an apical trill or flap. However, as can be seen in the case of Kampong Ayer and Brunei Malay, the hypothetical variety ancestral to both of them would have most likely had an apical trill realization, in lieu of any signs of a velar—or uvular—fricative. This case for change from an apical trill toward a palatal glide is not sui generis either; if one were to look at typological parallels, this change is also attested in some North American languages, cf., Proto-Mayan *r > y in the Greater Q’anjob’alan languages (Justeson et al. 1985) and Proto-Algonquian *r > y in Plains Cree (Goddard 1994). In sum, Pigafetta’s wordlist does suggest that—at one point in Malay history—the shift of an apical *r to /y/ happened—perhaps multiple times, as can be shown in the Malayic Dayak cases brought up in Anderbeck (2024). Spreck en de Woord-boekin de Maleyscheen de Madagaskar Talen In contrast to the previously discussed documents, Spraeck en de Woord-boekin de Maleyscheen de Madagaskar Talen (dH) contains a clearer provenience in contrast to YL and PG. Thus, it is not surprising that the scholarship of dH is more extensive than the other two texts presented in this paper. These works include textual analyses by Drewes (1958) and Drewes & Voorhoeve (1958), where the discussion of Acehnese etymons is discussed. This is in conjunction with more recent works by Hoogervorst (2024) on the potential of Early Modern sources on etymology and diachrony within the Malay(ic) subgrouping. In addition to the Malay wordlist and dialogues, the text has also been used in Malagasy diachronic scholarship (cf., Adelaar (2024) on the Malagasy dialogues in dH as a source for identifying loanwords from Malay and Javanese). Provenance Among the three documents analyzed in this paper, dH has the clearest provenance, as its authorship and origins are clearly known. As per Hoogervorst (2024), this document was compiled during the years when Dutch navigator Frederik de Houtman was held in captivity in Aceh. Other than the twelve dialogues between Malay-Dutch, the document also contains a wordlist in the Malay language, with a more extensive linguistic information than the previous two documents. While the Acehnese influence of the documented isolect has been discussed before (see Drewes 1958 and Drewes & Voorhoeve 1958 and Lombard & Tucci 1970), Hoogervorst (2024) also makes note of the fact that many of the words also show North Sumatran1 isolectal influences. Table 10, as reproduced from Hoogervorst (2024), demonstrates the similarity of dH’s words with North Sumatran Malay isolects. Table 10: North Sumatran Malay Words in dH Gloss North Sumatran Malay dH ‘Net’ bubol muboel ‘Garden’ əmpus empus ‘Machete’ gədubang gedoubang ‘to lie down’ guʀin goering ‘to see’ kəleh, kəlih kelih, kelich ‘nephew’ kəmun kemon, comon ‘closed up’ ləkap lekap ‘to throw’ lutar loutar ‘barn’ mandah mandah ‘what’ maya maya ‘faint’ pangsan pangsawan ‘empty’ soh soch ‘earlier’ tain tain, táin ‘fat’ təmbun tombon In addition to the influences of Acehnese and North Sumatra Malay vocabulary in de Houtman’s dialogues of Malay, the isolect as written also contains influences of other languages of Northern Sumatra. Perhaps most apparent is its Karo Batak influence, with the inclusion of a Karo loan, sentabi ‘sorry, excuse me’ present in the dialogues. The Document In contrast to the previous documents, which are predominantly wordlists first and foremost, the organization of dH is much more comprehensive. The document itself includes twelve dialogues of Malay-Dutch conversations, with all of them being set in Aceh (Lombard & Tucci 1970). Furthermore, of this paper’s interest, the document itself also provides a wordlist of Dutch‑Malay‑Malagasy words, which provides direct translation between Early Modern Malay lexical material into Dutch. While the wordlist itself provides for linear word-per-word translations (e.g., groot - besar), as discussed in subection 11, the dialogues provide for a more nuanced discussion in regard to how the variability of attestations in the dialogues show—at the very least—a slight hint about phonetics. The Rhotics In dH, the majority of rhotics are rendered as , with little variety. In word initial positions, the rhotic is constantly rendered as , with examples given as in Table 11. Table 11: /r/ in Initial Position in dH Gloss dH Malay ‘King’ rayja, raija raja ‘Bread’ rotty roti ‘House’ roema rumah ‘boiled’ reboes rəbus Word medially, the rhotic is also regularly written as , as given in Table 12. Table 12: /r/ in Medial, Non-Cluster Position in dH Gloss dH Malay ‘thing’ barang barang ‘how much’ barappa bərapa ‘price’ harga rumah ‘give’ bri bəri ‘husked rice’ bras bəras ‘woman, female’ parampouan pərəmpuan ‘tiger’ harrimou harimau Furthermore, the isolects in dH may also appear as clusters in the medial position, as shown in Table 13. Table 13: /r/ in Medial Cluster Position in dH Gloss dH Malay ‘Dirham’ derham dirham ‘Prosper’ dergahayo dirgahayu ‘Price’ harga harga ‘Treasure’ arta/harta harta ‘buffalo’ karbou kərbau ‘pigeon, dove’ merpaty mərpati ‘fly’ terbang tərbang In the final coda position, an interesting phenomenon occurs in regard to how the rhotic phoneme is rendered. While in Table 14, it is shown that /r/ may be rendered as , there are cases in which is written as zero, as discussed in the following subsection. Table 14: /r/ in Word-Final Position in dH Gloss dH Malay ‘hear’ deng’ar dəngar ‘to sail’ balayer bərlayar ‘(a) port’ (se)bandaer səbandar ‘large’ besar/besa[!] bəsar ‘go outside’ calouwar kəluar A Brief Note on ‘Big’ and ‘True’ An interesting phenomenon may be observed in the attestations for the word for ‘big’ attested in dH also as besa ‘big’ (cf. Modern Malay bêsar). The lack of may thus point towards a sporadic change of final *r > ʔ (cf. Anderbeck 2008 in Jambi Malay).10 A similar rendering is also shown in the attestation for the word ‘true’, which is variably rendered as bena or benar. Uri Tadmor (p.c.) notes this change as evidence of [ɣ] being conservative. However, this ‘sporadic’ change of /r/ is observed with the other liquid /l/ too (cf., Anderbeck 2008 on Jambi Malay ‘unstable *l’). Thus, this change should be seen as an analogical one—following the pattern of another liquid consonant—and not a regular change.11 Conclusions While there are minor epiphenomenal features which hint at phonological and phonetic changes of the rhotic, the evidence as shown by dH is quite minor. While the change of final rhotics shown in besa/bena could be attributed to a debuccalization of /r/ in these lexical material, a transcription variety that does not reflect a phonetic difference should not be disregarded either.12 Additionally, the elision of /r/ in the prefixes of ber-, ter-, and per- may be pertinent to the analysis of the phonetic details of dH’s rhotics. However, the elision of /r/ in these prefixes are observed in isolects with /ɣ/.13 14 Thus, not much can be said at the moment to conclude whether dH’s transcription of Malay rhotics could be pinned down to a specific pronunciation—or whether analysis can only be based on slight epiphenomenal features. Conclusion & Discussion In sum, the paper proposes that this shift would chronologically date to between the sixteenth and the nineteenth century, when the first guttural rhotics were attested. This proposal mirrors the fact that the phonetic inferences point towards a common distribution of apical [r] in lieu of [ɣ] in Early Modern Malay. Thus, looking into the documents, it is proposed that Proto-Malayic *r was originally an apical trill instead of a dorsal, as proposed by Adelaar (1992b) and later corroborated by later authors like Anderbeck (2024). Weighing in on the matters of directionality and the seemingly recent attestations of a guttural pronunciation of /r/, it would be more logical to think of Proto-Malayic *r as originally an apical trill, with a sound change to a dorsal later. While the evidence in this paper has suggested that the early documents of the Malay language have not hinted at a uvular pronunciation until the nineteenth century, the evidence is still basic and incomplete. The evidence allows for some interpretations of the data. An alternative suggested by Evelyn Fettes (p.c.) is that these documents—in comparison to Proto-Malayic—are recent. Thus, an alternate analysis could be that the dorsal-to-apical sound shift had already happened in these varieties. However, pertinent to the dating of my proposal of *r > [ɣ] shift is the fact that European loans of older strata also underwent this shift. As previously discussed, the European loans in Malay entered the language after the Fall of Malacca in 1511 A.D. This can be exemplified in the work of Wu (2023), where the older strata of loans have velar fricatives, in contrast to newer loans such as aɾnab ‘rabbit’ (< Ar. ʔarnab), for example, Portuguese carreta ‘cart’ > Kelantan Malay xːɛta ‘car/automobile’ (cf. Standard Malay kəreta). This suggests, in part, that the shifts could have happened between 1511 A.D. and the mid-1800s, when the first explicit attestations of the guttural /r/ in the peninsula were first mentioned. (cf., Crawfurd 1852). More Speculative Asides Putting in a final aside for a more speculative proposal—brought up by Uri Tadmor (p.c.)—is the direction of where the pronunciation of the dorsal rhotic would have spread. Wu (2023) notes that there were migrations from Sumatra toward the Malay Peninsula (specifically the Kelantan and Terengganu area) in which Coastal Terengganu Malay and Kelantan Malay are postulated to be later migrations from the post-Malaccan Sultanate era. While the current evidence is sparse, a few things can be noted. The Trengganu Inscriptions (c. 1303 A.D) notates /r/ as the Jāwī rā (never ġayn), which would follow the notion that at this time, the isolect spoken in Terengganu may have had an apical trill pronunciation. However, taking into account the multiple migration models proposed by Wu (2023), it could be the case that a migration from Sumatra spread the pronunciation of /r/ as a dorsal fricative, due to its ubiquity in the island. Furthermore, the more widespread distribution of a dorsal pronunciation in Sumatra (cf. Anderbeck 2008, McDowell & Anderbeck 2020, Adelaar 1992b) in contrast to Borneo (cf., Tadmor 2015 on West Bornean Malay varieties, and Adelaar 2005 on Salako) and the Peninsula would also support this hypothesis, as the dorsal pronunciation would have been brought through (back)migrations from Sumatra. References Adelaar, Karl Alexander. 1992a. Proto Malayic: the reconstruction of its phonology and parts of its lexicon and morphology. Pacific Linguistics. Series C, 119. Canberra: Department of Linguistics, Research School of Pacific Studies, The Australian National University. Adelaar, Karl Alexander. 1992b. The Relevance of Salako for Protomalayic and for Old Malay Epigraphy. Bijdragen tot de taal-, land- en volkenkunde. KITLV Press 148.3/4:381–408. Adelaar, Karl Alexander. 1999. Language Use and Language Change in Brunei Darussalam. By Peter W. Martin, Conrad Ozóg and Gloria Poedjosoedarmo. Athens (Ohio): Ohio University Center for International Studies, 1996. Monographs in International Studies, Southeast Asia Series Number 100. Pp.xvi, 373. Maps, Figures, Bibliography. Article. Journal of Southeast Asian studies (Singapore) 30.2:359–363. https://urldefense.com/v3/__https://doi.org/10.1017/S0022463400013163__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrm8GNAHe$ . Adelaar, Karl Alexander. 2005. Salako or Badameá: sketch grammar, text and lexicon of a Kanayatn dialect in West Borneo. Wiesbaden: Harrassowitz. Ahadi Sulissusiawan, Chairil Effendi, Sonlie, & M. Yunus. 1998. Struktur Bahasa Melayu Dialek Ketapang [Structure of the Ketapang Malay Dialect]. Jakarta: Pusat Pembinaan dan Pengembangan Bahasa Departemen Pendidikan dan Kebudayaan Jakarta. Anderbeck, Karl. 2008. Malay dialects of the Batanghari River Basin (Jambi, Sumatra). Dallas: SIL International. Anderbeck, Karl. 2024. Historical linguistics of the Malayic subgroup. In The Oxford Guide to the Malayo-Polynesian Languages of Southeast Asia, 111–126. Oxford: Oxford University Press. https://urldefense.com/v3/__https://doi.org/10.1093/oso/9780198807353.003.0009__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrkrVMNkn$ . Bausani, Alessandro. 1960. The First Italian-Malay Vocabulary by Antonio Pigafetta. East and West, NS 11.4:229–248. Blagden, C. O. 1931. Corrigenda to Malay and other Words collected by Pigafetta. Article. Journal of the Royal Asiatic Society of Great Britain & Ireland. Cambridge: Cambridge University Press 63.4:857–861. https://urldefense.com/v3/__https://doi.org/10.1017/S0035869X00073482__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrmtbvHn0$ . Campbell, Lyle. 2017. Mayan History and Comparison. In The Mayan Languages, 43–61. New York: Routledge. https://urldefense.com/v3/__https://doi.org/10.4324/9781315192345-3__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPriT82hTT$ . Chairani Nasution, Sriasrianti, Anharuddin Hutasuhut, & Zufri Hidayat. 2018. Kamus Melayu Sumatera Utara-Indonesia. [North Sumatran Malay-Indonesian dictionary] Medan: Balai Bahasa Sumatera Utara. Clifford, Hugh. 1894. A dictionary of the Malay language: Malay-English. Ed. By Frank Athelstane Swettenham. Taiping: Government Printing Office. Clynes, Adrian. 2001. Brunei Malay: an overview. Occasional Papers in Language Studies 7.1:11–43. Department of English Language and Applied Linguistics, Universiti Brunei Darussalam. Coblin, W. South. 2000. A diachronic study of Míng Guānhuá phonology. Monumenta Serica 48:267–335. Collins, James T. 1986. Kajian dialek daerah dan rekonstruksi Bahasa Purba [Study (in) regional dialects and the reconstruction of proto-languages]. Antologi Kajian Dialek Melayu, 30:344–65. Kuala Lumpur: Dewan Bahasa. Collins, James T. 1997. The Malays and non-Malays of Kalimantan Barat: evidence from the study of language. Paper presented at International Conference on Tribal Communities in the Malay World, Institute of Southeast Asian Studiesape. Collins, James T. & Herpanus. 2018. The Sekujam language of West Kalimantan (Indonesia). Wacana, Journal of the Humanities of Indonesia 19.2:425–458. Faculty of Humanities, University of Indonesia. https://urldefense.com/v3/__https://doi.org/10.17510/wacana.v19i2.702__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrhSTOisY$ . Crawfurd, John. 1852. A grammar and dictionary of the Malay language, with a preliminary dissertation. London: Smith, Elder & Co. Dennys, Nicholas Belfield. 1878. A handbook of Malay colloquial, as spoken in Singapore: being a series of introductory lessons for domestic and business purposes. Singapore: Mission Press. Deterding, David, Ishamina Athirah Gardiner & Najib Noorashid. 2022. The Phonetics of Malay. Cambridge: Cambridge University Press. https://urldefense.com/v3/__https://doi.org/10.1017/9781108942836__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrsX_SKYT$ . Drewes, G.W.J. 1972. De Involed van de Atjéhse Omgeving om het Maleise Spraeck ende Woordboek van Frederick de Houtman [The Influence of the Acehnese Language on the Malay in Spraeck ende Woordboek by Frederick de Houtman]. Bijdragen tot de Taal-, Land- en Volkenkunde. [Brill, KITLV, Royal Netherlands Institute of Southeast Asian and Caribbean Studies] 128.4:447–457. Edwards, E. D. & C. O. Blagden. 1931. A Chinese Vocabulary of Malacca Malay Words and Phrases Collected between A. D. 1403 and 1511 (?). Article. Bulletin of the School of Oriental Studies, London Institution 6.3:715–749. The School of Oriental Studies, London Institution. Favre, Pierre Étienne Lazare. 1876. Grammaire de la langue Malaise [Grammar of the Malay language]. Vienne: Imprimérie Impériale et Royale. Fernández Rodríguez, Rebeca & María Alejandra Regúnaga. 2020. Patagonian Lexicography (Sixteenth–Eighteenth Centuries). In Astrid Alexander Bakkerus, Zack Liesbeth, Otto Zwartjes & Rebeca Férnandez Rodríguez (eds.), Missionary Linguistic Studies from Mesoamerica to Patagonia, 236–259. United States: BRILL. https://urldefense.com/v3/__https://doi.org/10.1163/9789004427006_009__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPri-XT278$ . Fokker, Abraham Anthony. 1895. Malay phonetics. Leiden: Brill. Gil, David. 2024. Borrowing within Malayic; The role of exotericity. Wacana, Journal of the Humanities of Indonesia 25.3:6. https://urldefense.com/v3/__https://doi.org/10.17510/wacana.v25i3.1793__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrq5H7d4E$ . Goddard, Ives. 1994. The West-to-East Cline in Algonquian Dialectology. Algonquian Papers - Archive 25. https://urldefense.com/v3/__https://ojs.library.carleton.ca/index.php/ALGQP/article/view/616__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrqTdw5fl$ . Haji Awang Muhammad Awang Hj Jambu & Alipuddin bin Hj Omarkandi. 1997. Etimologi dalam bidang perkamusan dan hubungannya dengan bahasa Melayu Brunei [Etymology in the studies of lexicography and its use in Brunei Malay]. Paper presented at the Simposium Bahasa Melayu. Academy Pengkajian Brunei. Helfrich, O. L. 1904. Bijdragen tot de kennis van het Midden Maleisch: (Běsěmahsch en Sěrawajsch dialect) [Contributions to the studies of Middle-Malay: Běsěmahsch and Sěrawajsch dialects] (Verhandelingen van het Bataviaasch Genootschap van Kunsten en Wetenschappen; d. 53.). Batavia: Landsdrukkerij. Hirth, Friedrich. 1888. Notes on the Chinese documentary style. Shanghai: Kelly & Walsh. Hoogervorst, Tom. 2021. Language ungoverned: Indonesia’s Chinese print entrepreneurs, 1911-1949 (Cornell Scholarship Online). Ithaca: South Asia Progam Publications, an imprint of Cornell University Press. https://urldefense.com/v3/__https://doi.org/10.1515/9781501758256__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrnHiBVf0$ . Hoogervorst, Tom G. 2024. Seventeenth-century Malay wordlists and their potential for etymological scholarship. Wacana, Journal of the Humanities of Indonesia 25.3:7. Faculty of Humanities, University of Indonesia. https://urldefense.com/v3/__https://doi.org/10.17510/wacana.v25i3.1782__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrl865oxO$ . Houtman, Frederick de. 1603. Spraeck ende word-boeck, inde Malaysche ende Madagaskarsche Talen met vele Arabische ende Tursche woorden: Inhoudende twaalf tsamensprekinghen inde maleysche ende drie inde Madagaskarsche spraken met alderhande woorden ende namen ghestelt naer ordre vanden A.B.c. alles in Nederduytsch verduytst. [Dialogues and Dictionaries in Malay and Madagascarese languages with many Arabic and Turkish words: Contains twelve dialogues in Malay and three in Malagasy languages with speeches with all words and names given with alphabetical order all translated into Dutch] Amsterdam: Jan Evertsen Cloppenburgh. Kluge, Angela. 2014. A grammar of Papuan Malay (LOT Dissertation Series; 361 265215056). Utrecht: LOT. Kostakis, Andrew. 2007. More on the Origin of Uvular [ʀ]: Phonetic and Sociolinguistic Motivations. IULC Working Papers 7. 2. Le Roux, C.C.F.M. 1929. Feestbundel uitgegeven door het Koninklijk Bataviaasch Genootschap van Kunsten en Wetenschappen bij gelegenheid van zijn 150 jarig bestaan 1778-1928. De Elcano’s tocht door den Timorarchipel met Magalhães’ schip “Victoria.” [Festschrift given to the Koniklijk Bataviaasch Genootschap van Kunsten en Wetenschapen for its 150th anniversary (1778-1928). The Elcano’s journey through the Timor archipelago with Magalhães’ ship “Victoria”] Weltevreden: Kolff. Lee, Nala H. 2022. A grammar of modern Baba Malay. Berlin: De Gruyter Mouton. Lombard, D & G Tucci. 1970. Le “Spraeck ende Woord-Boek” de Frederick de Houtman: prèmiere méthode de malais parlé (fin du XVIe s.) [The “Spraeck ende Woord-boek” of Frederick de Houtman: first grammar of the Malay tongue] (Publications de l’École Française d’Extrême-Orient). Paris: École Française d’Extrême-Orient. Maguire, Warren. 2017. Variation and Change in the Realisation of /r/ in an Isolated Northumbrian Dialect. In Language and a Sense of Place, 87–104. Cambridge: Cambridge University Press. https://urldefense.com/v3/__https://doi.org/10.1017/9781316162477.006__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrmSt3Ip7$ . McDowell, Jonathan & Karl Anderbeck. 2020. The Malay Lects of Western Sumatra. Journal of the Southeast Asian Linguistics Society Special Publication No. 7. Honolulu: University of Hawai’i Press. https://evols.library.manoa.hawaii.edu/server/api/core/bitstreams/d3483ef7-a33d-42f3-8428-59f02ccf74cd/content Mohamed, Noriah. 2006. The Malay Chetty Creole Language of Malacca: A Historical and Linguistic Perspective. Kuala Lumpur: Universiti Sains Malaysia, Laporan Projek. Mohammad Khairulanwar, Abdul Ghani & Mohd Tarmizi Hasrah. 2023. Aspiration in Baling Malay. NUSA 75.1:3–18. Universitas Katolik Indonesia Atma Jaya, Tokyo University of Foreign Studies. https://urldefense.com/v3/__https://doi.org/10.15026/0002000125__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrg0rWF6E$ Paauw, Scott H. 2009. The Malay contact varieties of eastern Indonesia: A typological comparison. Doctoral dissertation. Buffalo, NY: State University of New York at Buffalo. Påhlsson, Christer. 1972. The Northumbrian Burr. van Lund studies in English. Vol. 41. Lund: Gleerup. Poedjosoedarmo, Gloria R. 1996. Variation and Change in the Sound Systems of Brunei Dialects of Malay. In Peter W. Martin, Conrad Ożóg & Gloria Risser. Poedjosoedarmo (eds.), 37–43. Athens, Ohio: Ohio University Centre for International Studies. Poedjosoedarmo, Soepomo. 1992. Unsur lama dalam tatabahasa dialek Melayu Brunei [Archaic features in the grammar of Brunei Malay]. In Dato Seri Laila Jasa Awang Haji Abu Bakar bin Haji Apong (ed.), Sumbangasih UBD. Esei-esei mengenai Negara Brunei Darussalam. Gadong: Universiti Brunei Darussalam. Purcell, Victor. 1956. The Chinese in modern Malaya (Background to Malaya Series 9). Singapore: Donald Moore. Rennicke, Iiris. 2015. Variation and Change in the Rhotics of Brazilian Portuguese. Helsinki: University of Helsinki doctoral thesis. Schuchardt, Hugo. 1890. Kreolische Studien IX: Ueber das Malaioportugiesische von Batavia und Tugu. [About the Malayo-Portuguese of Batavia and Tugu]. Sitzungsberichte der kaiserlichen Akademie der Wissenschaften zu Wien 105. 881–904. Sebregts, Koen. 2015. The sociophonetics and phonology of Dutch “r” (LOT Dissertation Series 379). Utrecht: LOT. Sneddon, James N. 2006. Colloquial Jakartan Indonesian (Pacific Linguistics). Canberra: Pacific Linguistics, Research School of Pacific and Asian Studies, The Australian National University. Soderberg, Craig D & Kenneth S Olson. 2008. Illustrations of the IPA: Indonesian. Journal of the International Phonetic Association. Cambridge University Press 38.2:209–213. Sulissusiawan, Ahadi, Chairil Effendi, Sonlie & M. Yunus. 1998. Struktur Bahasa Melayu Dialek Ketapang [Structure of the Ketapang Dialect of Malay]. Jakarta: Pusat Pembinaan dan Pengembangan Bahasa Departemen Pendidikan dan Kebudayaan Jakarta. Tadmor, Uri. 2015. Languages of Western Borneo Documentation Project. Jakarta Field Station, Department of Linguistics, Max Planck Institute for Evolutionary Anthropology, 1999-2015. Tjia, Johnny. 2007. A grammar of Mualang an Ibanic language of Western Kalimantan, Indonesia (LOT Dissertation Series 153). Utrecht: LOT. Vihman, Marilyn May. 2014. Phonological development: the first two years. Phonological development: the first two years. Hoboken: Wiley-Blackwell. Werndly, George Henrik. 1736. Maleische Spraakkunst [Malay linguistics]. Amsterdam: Wetstein. Wijk, Gerth van. 1890. Spraakleer der maleische taal. [Grammar of the Malay language] Batavia: Kolff. Wu, Jiang. 2023. Malayic varieties of Kelantan and Terengganu: description and linguistic history. Malayic varieties of Kelantan and Terengganu: description and linguistic history (LOT Dissertation Series 651). Utrecht: LOT. LOANWORDS IN KAVALAN AND BASAY: LANGUAGE CONTACT WITH FORMOSAN AND PHILIPPINE LANGUAGES Li-yang TSENG Nanyang Technological University Liyang003@e.ntu.edu.sg Abstract Kavalan and Basay, two Austronesian languages of northern Taiwan, have maintained close linguistic contact with Malayo-Polynesian languages (Li 1995, 2004; Tsuchida et al. 1991, Tsuchida 2006). Earlier studies with a top-down approach identified loanwords either through irregular reflexes from Proto-Austronesian (PAN) or resemblance to Malayo-Polynesian lexical forms. This paper takes geographical distribution and a bottom-up approach reconstruction (Tseng 2023), aiming to examine irregularities of lexicons. Three case studies of borrowings and innovation in Li (2004) and Tsuchida (2006) are revisited. Extra contact-induced lexical items based on geographical distribution and Tseng’s reconstruction are proposed in this paper. Keywords: Language Contact, Loanwords, Austronesian Languages, East Formosan, Sound Correspondence ISO 639-3 codes: ami, bnn, byq, ckv, dru, eng, ifu, pag, pwn, pyu, pzh, mel, mog, spa, sun, sxr, szy, tay, tgl, trv, tsu, xsy, zlm 1 Introduction This paper aims to identify loanwords in Kavalan and Basay and their source languages. Kavalan, spoken in the northeast of Taiwan, has a prolonged influence from the Amis language when they migrated and lived in Amis communities. Basay was spoken in northern Taiwan. It has been an extinct language since the early 20th century. A Basay dialect—Trobiawan—mysteriously appeared in the Yilan County and adjacent to Kavalan community, and their lexicons show a heavy influence from the Kavalan langauge (cf. Li and Tsuchida 2014). According to Blust (1999:45), Kavalan and Basay-Trobiawan are classified as the northern branch of East Formosan subgroup. East Formosan is defined by (i) merger of PAN *t/*C, (ii) merger of PAN *j/*n, (iii) shift of PAN *q > ʔ, and the northern branch is further justified by (i) merger of *q/∅, (ii) merger of *n/*N, and (iii) irregular change in *susu > /sisu/ ‘breast’. By distinguishing irregularities based on the regular sound changes proposed by Blust (1999), or by comparison of lexicons among the languages geographically close, previous studies of contact-induced lexical forms and lexical replacements have been fruitful (Li 1995, 2001, 2004; Tsuchida et al. 1991, Tsuchida 2006), and loanwords from Spanish and Philippine languages serve as valuable linguistic evidence for the reconstruction of Taiwan society and culture in the distant past. For example, Basay baŋka ‘canoe’, puti ‘banana’, and manuk ‘bird’ are not found in other Formosan languages, so they were apparently borrowed from PMP *baŋkaq, *punti, and *manuk ‘chicken’. Kavalan baka ‘cow’, byabas ‘grava’, and pRasku ‘bottle’ are syllabic and phonologically resemble Spanish vaca, guayabas, and frasco, respectively, and are thus also seen as loanwords. b~vilang ‘to count’ in both Kavalan and Basay may come from PMP *bilang, while Kavalan Raaq ‘wine’ and RayaR ‘sail’ are possible loanwords from certain Austronesian languages spoken in Southeast Asia, such as Malay layer, Tagalog layag ‘sail’, and Tagalog alak ‘wine’ (Li 1995, 2001, 2004; Tsuchida et al. 1991, Tsuchida 2006; Blust 2013). All these loanwords show Taiwan’s international engagement in the region throughout the centuries. The proposed loanwords in the earlier studies were identified by a top-down comparison between the PAN etyma and their modern reflexes. They are regarded as instances of lexical borrowing if there are irregular reflexes, or if the words are only attested in non-Formosan languages. However, it may be problematic for the identification of irregularities via a top-down approach when the languages are not in the primary branch of Austronesian. Kavalan and Basay, according to the subgrouping of Blust (1999), are actually not in the primary branch, but in the lower branch of East Formosan, so they should have developed from a protolanguage after Proto-Austronesian had split into its descendant languages. We cannot be certain of what language change happened during this intermediate stage without reconstructing the protolanguage ancestor to Kavalan and Basay. Therefore, the understanding of regular and irregular sound changes is incomplete. In this case, I cast doubt on three case studies in Li (2001, 2004) and Tsuchida (2006), including the modern reflexes of PAN *piliq ‘to choose; to select; to pick out’, *mula ‘to plant’, and *layaR ‘sail’. The reflexes of PAN *piliq (> KAV pamil, BAS pamici), and the reflex of *mula (> KAV paruma) are argued to be shared innovations that justify the northern branch of East Formosan, whereas the reflexes of *layaR (> KAV RayaR, BAS ɭayaɭ) are identified as loanwords from Malayo-Polynesian. For PAN *piliq, Li (2004) accounts for the innovative change of *piliq by infixation, while an alternative analysis, proposed by an anonymous reviewer in Li (2004) which follows Tsuchida’s (2006) proposal of sound change process, suggests that the lexical form should be a language-specific *paN- prefixation in both Kavalan and Basay. However, the sound change process proposed by Tsuchida (2006) –*piliq (> **pamili > **pamill) > Kav pamil –is highly ad hoc. In fact, the elimination of final syllables can be better generalized into a synchronic morphophonemic rule. Moreover, the *paN- prefixation analysis implies that *paN- is an inherited and natural prefix in both Kavalan and Basay. This assumption is problematic, given both the geographical distribution and morphological properties of *paN-, which is not typical of Formosan languages. For PAN *layaR, Li (1995) and Tsuchida (2006) have noted that the lexical form is found in a large number of Malayo-Polynesian languages but only attested in two Formosan languages, so this lexical item is very likely to be a borrowing both in Kavalan and Basay. Yet, the borrowing analysis is still debatable. If the words in Kavalan and Basay are borrowings, it is surprising that the sound correspondence (Kav R vs. Bas ɭ) is quite regular, and there are no phonemes in RGH-series found in these lexical items. The present paper is also meant to expand understanding of language contact in Kavalan and Basay with newly identified loanwords in different types—family-internal borrowings or areal diffusion. This paper is organized as follows: Section 2 introduces Tseng’s reconstructions of Proto-Northern East Formosan (PNEF) phonology with its descendants—Proto-Kavalan (PKV) phonology and Proto-Basay (PBS) phonology. Section 3 revisits the proposed innovations (PAN *piliq and *mula), while Section 4 revisits the proposed loanword (PAN *layaR). The extra loanwords borrowed from the other Formosan languages spoken in East Taiwan are also proposed. 2 Reconstructing Proto-Northern East Formosan and Its Descendants Tseng (2023) reconstructs three protolanguages—Proto-Kavalan, Proto-Basay, and Proto-Northern East Formosan—based on a bottom-up approach. Kavalan dialects and Basay dialects both slightly differ in phonetic features. For example, Kavalan dialects vary in consonants such as b~v, q~χ, z~r, ʁ~ɣ~χ, whereas Basay dialects vary in consonants including b~v, q~h, -c~-t. The dialects in these two languages are comparable to reconstructions of Proto-Kavalan and Proto-Basay, respectively. Most of the variations are trivial to the claims in this study. The Proto-Kavalan phonemic inventory consists of 15 consonants, 4 monophthongs, and 4 diphthongs. The phonemic inventory is shown in Table 1. Table 1: Phonemic inventory of Proto-Kavalan (Tseng 2023:83) Consonant Bilabial Dental/ Alveolar Palatal Velar Uvular Stop *p *b *t *k *q Fricative *s *z *ʁ Nasal *m *n *ŋ Flap *r [ɾ] Lateral *l Semi-Vowel *w *y [j] Front Central Back Diphthongs High *i *u Mid *ə *aw *ay Low *a *iw *uy The Proto-Basay phonemic inventory consists of 14 consonants, 4 monophthongs, and 4 diphthongs, as shown in Table 2. Table 2: Phonemic inventory of Proto-Basay (Tseng 2023:115) Consonants Bilabial Dental/ Alveolar Palatal Velar Uvular Stop *p *b *t *k *q Fricative *s Affricate *c [ts] Nasal *m *n *ŋ Flap *r [ɾ] Lateral *l Semi-Vowel *w *y [j] Front Central Back Diphthongs High *i *u Mid *ə *aw *ay Low *a *iw *uy Following Blust’s (1999) and Li’s (2004) subgrouping hypotheses, Kavalan and Basay belong to the northern branch of 'East Formosan. To reconstruct their ancestral language, Tseng (2023) proposes Proto-Northern East Formosan, which is reconstructed based on the comparison of sound correspondences between Proto-Kavalan and Proto-Basay. Proto-Northern East Formosan phonetic system consists of 16 consonants, 4 vowels, and 4 diphthongs. The reconstructed phonemes of Proto-Northern East Formosan and their reflexes from the descendants are shown in Table 3. Table 3: Proto-Northern East Formosan phonemes and reflexes (Tseng 2023) PNEF PKV PBS *p *p *p *t *t *t *k *k, *q *k *Q1 ∅ ∅ *q *q *q *b *b *b *s *s *s *z *z *r *ʁ *ʁ *l *c *s *c *m *m *m *n *n *n *ŋ *ŋ *ŋ *r *r *c *w *w *w *y *y *y *i *i *i *u *u *u *ə *ə *u, *ə *a *i, *a *a *aw *aw *aw *ay *ay *ay *iw *iw *iw *uy *uy *uy Notably, two reconstructions and one sporadic change are important in the present paper, namely, PNEF *ʁ (< PAN *R), *r, and the sporadic change of *y, *i > l in Kavalan. PNEF *ʁ is reflected as PKV *ʁ and PBS *l, and PNEF *r (< PAN *l) is reflected as PKV *r and PBS *c. The sound changes will be discussed in Sections 3 and 4, in which irregularities are found in reconstructions of Li (2004) and Tsuchida (2006). PAN *y and *i both become KAV l in a few examples. Li (2001) notes that PAN *y > KAV l occurs intervocalically with preceding and following a. This sound change is only attested in alam ‘bird’ (< PAN *qayam). On the other hand, PAN *i > KAV l is attested in two examples, e.g., PAN *Caqi > KAV tal ‘feces’, PAN *Cinaqi > KAV tnal. According to Tsuchida (2006), these modern reflexes had undergone a series of sound changes, as shown in (1). (1) Step by step changes in KAV tal (Tsuchida 2006:589): *Caqi > *taiq (by metathesis) (cf. Ami tai’, tinai’) *Caiq > *tayi (loss of *q, and the development of a gliding semivowel y) *tayi > *tali (regular change of *-y- > -l-) *tali > *tall (assimilation of *i to the preceding *l) *tall > tal (simplification of geminated final consonants) While *y > l accounts for the emergence of l in the intermediate stage that comes from nowhere in PAN *Caqi, it is unclear why it is necessary to assume a sporadic change *i > l in the further development. According to Tsuchida’s reasoning, the sporadic change *i > l is proposed to account for a morphophonemic alternation in which the word is geminated when non-Agent focus (NAF) markers are attached, such as the NAF imperative -i, as in (2) and the Patient/Locative focus marker -an, as in (3) and (4). (2) tall-i ka ‘Shit on it!’ (3) put-tall-an ‘anus’ (4) qat-tall-an ‘toilet’ The gemination/simplification alternation is argued to be a diachronic development by Tsuchida. However, more data shows that this morphophonemic alternation would not be merely restricted in only a few examples that Tsuchida proposed. A synchronic morphophonemic analysis is proposed in Section 3 to account for the gemination/simplification alternation phenomenon. 3 Revisiting Innovative Lexical Forms While most loanwords can be identified via the irregularity found in the comparison between PAN etyma and the reflexes, irregularity is not necessarily an explicit indicator of borrowing. For example, Blust (1999) proposes an irregular change of PAN *susu ‘breast’ > KAV sisu, BAS cicu, as one piece of subgrouping evidence of the northern branch (East Formosan). However, the evidence is argued by Tseng (2023) to be implausible. First, the conservative form is found in some Basay dialects, e.g., Tatayu Basay cucu and Paktau Basay cucu. Second, the irregular form citoo in Babuza is found. Therefore, the irregular form should be considered as a parallel development among these languages. Likewise, Li (2004) and Tsuchida (2006) propose several innovative lexical items based on the irregularity spotted in the comparison between PAN and its descendants. Whether it is innovative or contact-induced leaves uncertainty, as they do not consider the intermediate development from Proto-Northern East Formosan to the descendants, nor do they investigate the geographical distribution of the lexical items. In the next section, two case studies of innovations are reevaluated. The historical development from PNEF to the modern languages and the geographical distribution of lexical items are considered. 3.1 *piliq ‘to choose; to select; to pick out’ PAN *piliq is reflected as pamil in Kavalan and is reflected as pamici in Basay. According to Li (2004), these reflexes are innovative because they all begin with pam-, while the lexical forms in most other Formosan languages are relatively conservative, e.g., Amis piliʔ, Sakizaya mi-piliʔ, Pazeh pii-piri, Rukai u-pili. Li (2004) suggests these reflexes involve an insertion of the infix . However, in his personal communication with Tsuchida, it is said that the infix is not attested in any Austronesian languages. Instead, a prefix with nasal substitution is more plausible, i.e., *paN-/*maN-. Tsuchida (2006), furthermore, proposes that BAS pamici resulted from a nasal substitution from PAN *piliq, as well as KAV pamil, and the Kavalan word has multiple derivations apart from the nasal substitution at the beginning. Being on par with partial derivation of KAV tal (cf. (1)), Tsuchida proposes that KAV pamil was derived similarly in which a subsequent assimilation of the final -i to the preceding *l and simplification of the final geminate consonant occur, as shown in (5). (5) Step by step change of KAV pamil *maN-/paN-piliq > *mamili/*pamili (nasal substitution and loss of *q) *mamili/*pamili > *mamill/*pamill (assimilation of *i to the preceding *l) *mamill/*pamill > mamil/pamil (simplification of geminated final consonants) As mentioned in Section 2, Tsuchida (2006) argues that the gemination/simplification is phonetically specific for word-final l. However, it is not only roots ending in l, but also monosyllabic roots ending in almost all the other consonants, which are geminated when NAF markers are attached, as shown in (6). (6) Morphophonemic alternation of NAF marked verbal forms: p: zap ‘to find’ → zapp-an ‘found-PF’ t: put ‘stuffy’ → putt-i ka ‘Block it!’ b: tub ‘to cover’ → tubb-i ka ‘Cover it!’ R: ŋaR ‘open the mouth’ → ŋaRR-i ka ‘Open the mouth!’ n: qan ‘to eat’ → qann-i ka ‘Eat it!’; zan ‘old’ → zann-i ka ‘Make it old!’ m: sum ‘to urinate’ → summ-i ka ‘Urinate here!’; tim ‘press upper and lower lips tightly together’ → timm-i ka ‘Press the upper and lower lips tightly together!’; tum ‘to light a cigarette’ → tumm-i ka ‘Cigarette it!’; zazzam ‘to catch up (Reduplication)’→ zazzamm-i ka ‘Catch up!’ s: bus ‘to block’ → buss-i ka ‘Block it!’; nis ‘take of clothes’ → niss-i ka ‘Take off it!’; mi-nes ‘make an effort to discharge’ → ness-i ka ‘Bear it down!’ z: niz ‘all’ → nizz-an ‘all-PF’ (7) Gemination rule of NAF monosyllabic verbal root: Ø → C1 / CVC1_#]-an/-i ka Therefore, I argue that the gemination of NAF monosyllabic verbal roots are not a sporadic change in a few words, and it should be synchronically considered as phonologically-conditioned morphological allomorphy. In the comparison of sound correspondences, ‘KAV l : BAS c’ from between KAV pamil and BAS pamici is not regular. According to Table 3, the regular sound correspondences of PBS *c are supposed to be either PKV *r or *s. In this correspondence set, BAS c is regular from PBS *c. KAV l is rarely found as the reflex of PKV *r (> KAV r), nor does it come from *s. In addition, ‘KAV l : BAS c’ is only attested in this example, making these words less likely to be the inherited ones. Apart from irregularity of sound correspondence, *piliq in Kavalan and Basay is more likely to be a borrowing in terms of morphology. If Tsuchida’s (2006) analysis is correct, the nasal substitution is found in Kavalan and Basay. It means that the lexical items in these two Formosan languages, among more than ten of the other Formosan languages, carry typical morphological forms in Malayo-Polynesian languages (Dahl 1976; Li 1995; Blust 2004), which are more likely to be borrowings. In conclusion, instead of analyzing KAV pamil and BAS pamici as innovations, I argue that the cognates are actually loanwords. The lexicon, perhaps **paN-piliq, was introduced to Kavalan2 and Basay at least before Proto-Northern East Formosan split into the descendants. PNEF speakers may realize **l as *r, and we see the liquid further developed to c in Basay, which is a regular sound change in Proto-Basay. While these two lexicons appear to be loanwords, this leads to the further question of their source of borrowing. 3.1.2 Evidence of borrowing source: lexical geographic distribution, archaeology, and other supporting loanwords The most explicit indicators are the prefixes *paN-/maN-, which are characterized as typical PMP morphemes. Clearly, the borrowing was from Malayo-Polynesian languages. However, it is difficult to narrow down the source among thousands of languages. To figure out the plausible source, I propose to examine the geographic distribution for those reflexes of *piliq in Malayo-Polynesian languages that consist of *paN-/maN-. According to Blust et al. (2023), reflexes of *piliq involving nasal substitution are mostly found in Philippine and western Indonesia, e.g., Itbayaten pamili/mamili. Tagalog pami-míliʔ/mamili, Ifugaw ma-mīli, Pangasinan ma-milí, Mongondow ma-miliʔ, Sundanese milih/pa-milih. The ‘*paN-/maN- + *piliq’ is not found in eastern Indonesia and Oceanic languages. According to Li (1995:676), there are several loanwords in Kavalan and Basay that are attested only in Malayo-Polynesian languages, including PMP etyma *bilaŋ, *baŋka, *punti, and *manuk, as shown in (8). (8) KAV vilaŋ, BAS bilaŋ ‘to count’ BAS baŋka ‘canoe’ BAS manukka ‘bird’ BAS puti ‘banana’ Together with KAV pamil and BAS pamici, these loanwords are contextually relevant. They may have been introduced into Kavalan and Basay in a trading scenario. In fact, Formosa and Southeast Asia have had a prolong trading connection in their prehistory. Under this circumstance, the words such as verbs denoting trading events and nouns for products and transportation are very likely to be introduced into the recipient communities. The most plausible source languages that Kavalan and Basay people would have had contact with during trading events is supposed to be Philippine languages. Based on archaeological studies, early Philippine people had contact with Formosan indigenous people. Borao (2013) and Green (2022) note that the Pampanga people and Cagayan people once lived in northern Taiwan during the Spaniard era in the 17th century. Based on Borao’s (2013:586) findings, there were at least thousands of Filipinos successively inhabiting Taiwan. Consequently, the northern Formosan indigenous communities were the most likely recipients of lexical items from languages of the Philippines. Basay, the lingua franca of northern Taiwan, and Kavalan, its closest neighboring language, almost certainly borrowed the lexical form *paN-/maN-piliq from a Philippine source. 3.2 *mula ‘to plant’ The reflex of PAN *mula ‘to plant’ is attested in most Formosan languages and most Malayo-Polynesian languages. It is reflected as paruma in both Kavalan and Basay. According to Li (2004), this lexical form underwent metathesis of *m and *l not only in Kavalan and Basay, but also in Amis, another East Formosan language, where it is regarded as an innovation. In addition, as Li points out, BAS paruma is not the expected form, if considering a regular reflex of PAN *l in Basay, which is expected to be **pacuma. Therefore, the lexicon in Basay should be borrowed from Kavalan. Despite BAS paruma is apparently a loanword, one cannot confirm whether paruma is an inherited lexical form even in Kavalan, as the sporadic metathesis does not happen only in East Formosan, but also in the other primary subgroup. In fact, the metathesized *mula and the conservative form are found in a clear geographical distribution. The metathesized form is found in three primary subgroups mostly located in the northeast mountain area and coastal area, with an extra case in a western plain language, Pazeh paxu-ruma (Li 2004:366, footnote 9). On the other hand, the retention form is found in most languages spoken in western Taiwan, e.g., Saisiyat ma-moLa, Atayal muhi, Tsou mʉ'a, Hla’alua lumalʉmʉkʉ, as well as in most Malayo-Polynesian languages. (9) Seediq mhuma, Truku mhuma (Atayalic) Amis paruma, Sakizaya paruma, Kavalan paruma, Basay paruma (East Formosan) Instead of arguing that paruma is an innovative lexical form, I suggest that the sporadic metathesis is either a contact-induced change or a parallel development. It is because Kavalan, Amis, and Basay are geographically closed, and they share many words for flora and fauna and living materials, and Siraya is not attested with this lexical item, so the metathesized form is not inherited from Proto-East Formosan. Otherwise, the lexical form could alternatively be analyzed as a parallel development because the lexical form is only found in partial East Formosan languages and partial Atayalic languages (cf. Atayal muyaʔ ~ muhiʔ). 4 Revisiting Borrowed Lexical Forms 4.1 *layaR ‘sail’ PAN *layaR is reflected as KAV RayaR and as BAS ɭayaɭ. Apart from the reflexes found in these two East Formosan languages, Paiwan la-laya ‘a flag, banner’ is the only one Formosan language with the word, and apparently it has undergone a semantic shift. Otherwise, the reflexes of *layaR are mostly found in Malayo-Polynesian. Tsuchida (2006) suggests that the words in Kavalan and Basay are likely loanwords from a Malay layer. He infers that the borrowing may have resulted from the migration of Spaniard’s oarsmen from the Philippines or Indonesia during the 17th century, making the borrowing hypothesis historically plausible. Given that very few primary subgroups have the reflex of PAN *layaR, reconstructing this word to PAN is problematic, and one might argue that it should instead be reconstructed to PMP. I propose, however, an alternative analysis that supports retaining the reconstruction of PAN *layaR. If KAV RayaR and BAS ɭayaɭ were loanwords, this would not account for the fact that neither form exhibits the RGH consonants commonly found in Malayo-Polynesian. Instead, they show the voiced uvular fricative R and possibly the retroflex lateral ɭ. According to Conant (1911), Malayo-Polynesian languages display variable sound correspondences involving r, g, h, and sometimes y. Languages with certain RGH consonants also exhibit specific phonetic tendencies; for instance, a language that regularly has r may also show an interchangeable variant g. This sound correspondence set has been reconstructed as PAN *R. Therefore, if KAV RayaR and BAS ɭayaɭ were borrowings from Malay layer, as Tsuchida (2006) suggested, or from other Malayo-Polynesian languages such as Tagalog láyag and Melanau layah, we would expect RGH consonants to surface in Kavalan and Basay. Since they do not, the precise source of borrowing remains uncertain. Although irregularities emerge in the top-down comparison between PAN *layaR and its modern reflexes in Kavalan and Basay—namely, *l is irregularly reflected as *R in Kavalan, and *R is irregularly reflected as *ɭ in Basay—these forms can nevertheless be interpreted within the regular correspondence set ‘KAV R : BAS l’ (cf. Table 3). Accordingly, Tseng (2022) reconstructs *ʁayaʁ for Proto-Northern East Formosan, which implies two sound changes in the historical development of *layaR, e.g., PAN *l > PNEF *ʁ (expected > *r) and PAN *R > PNEF *ʁ. I interpret the irregular shift of PAN *l > PNEF *ʁ as a case of regressive assimilation: (10) PAN *l > PNEF *ʁ/_XR Moreover, the regressive assimilation does not only happen to PAN *layaR, but also happens to PAN *qiCəluR ‘egg’ and *laRiw ‘to run,’ as shown in Tabel 4. Table 4: Regressive assimilation of *layaR, *qiCəluR, and *laRiw (Tseng 2023, Appendix III) PAn PNEF PKv PBs Gloss *layaR *ʁayaʁ *ʁayaʁ *layal ‘sail’ *qiCəluR *ʁayaʁ *tiʁuʁ (> **tilul) *tirul ‘egg’ *laRiw *ʁaʁiw *ʁaʁiw *laliw ‘to run’ In these cases, PAN *l is regularly reflected as PNEF *ʁ, especially when followed by another *ʁ. These reconstructions are established on the basis of regular sound correspondence sets for each segment. For example, PBS *tirul ‘egg’ is reconstructed from ma-terol and telod̊ in Basay dialects, which show two distinct liquids. Tseng (2023) interprets *tirul as a later development through liquid dissimilation from **tilul, and as the only instance of such a change. Although this sporadic development remains unexplained, liquid assimilation and dissimilation are widely attested cross-linguistically. To sum up, the actual source of the borrowings cannot be confirmed, given the absence of RGH consonants in the Kavalan and Basay forms. The irregular reflexes of PAN *layaR can instead be explained as the result of regressive assimilation, although this process is opaque in the present data. Tsuchida’s (2006) borrowing hypothesis therefore appears implausible; rather, KAV RayaR and BAS ɭayaɭ are more likely to be inherited forms. One might argue that borrowing could have occurred alongside the trade-related loanwords discussed in Section 3. This remains debatable, but it still fails to explain why the reflexes of *layaR in Kavalan and Basay do not exhibit RGH consonants. 4.2 Other Contact-Induced Irregularities under Bottom-Up Reconstruction 4.2.1 Family-internal borrowings With a fuller understanding of the historical developments within Northern East Formosan, additional loanwords can be identified, as in (11). (11) Irregular sound correspondence sets ‘KAV r : BAS l ~ ɭ’ KAV riŋi : BAS sa-liŋi ‘guard’; KAV saraŋ : BAS saɭaŋ ‘soup; to sap’; KAV sumkir : BAS sumukiɭ ‘to anwser’ ‘KAV l : BAS l’ KAV bəlayiŋ : BAS balayən ‘pan’ It is expected that KAV r corresponds to BAS c, or that BAS l corresponds to KAV R. Accordingly, the sound correspondence sets of ‘KAV r : BAS l’ and ‘KAV l : BAS l’ are irregular, and the words in these sets should be regarded as loanwords. However, the direction of borrowing cannot be determined, as the irregularity could arise from either KAV r or BAS l. In contrast, the Trobiawan dialect of Basay was heavily influenced by Kavalan due to geographic proximity. This influence is reflected in its z consonant, which is evidently a borrowed phoneme from Kavalan z. In the other Basay dialects, the counterpart to Kavalan z is l, e.g., BSSTN m-ulan : KAV uzan ‘(to) rain,’ BSSTN laise : KAV zais ‘face’; BSSTN lusa : KAV zusa ‘two.’ In this case, the words in Trobiawan Basay with z apparently are loanwords from Kavalan, as shown in (12). (12) Kavalan loanwords in Trobiawan Basay BSTRB zanum ‘water’ (cf. KAV zanum); BSTRB zais ‘face’ (cf. KAV zais); BSTRB m-azas ‘to bring’ (cf. KAV azas); BSTRB ŋazuy ‘lips’ (cf. KAV ŋazuy); BSTRB izip ‘body’ (cf. KAV izip); BSTRB uzip ‘bitter; ginger’ (cf. KAV uzip); BSTRB ma-sazmakən ‘to allow’ (cf. KAV sazmakən); BSTRB ŋuzus ‘cape’ (cf. KAV ŋuzus); BSTRB tawiz ‘button’ (cf. KAV tawiz); BSTRB m-utuz ‘earthquake’ (cf. KAV utuz). 4.2.2 Loanwords in the East Coast Apart from words that are internally borrowed, some lexical items are shared across several Formosan languages spoken in eastern Taiwan. I argue that these lexical items are likely borrowings, as illustrated in (13). (13) Lexical exchange among Basay, Kavalan, Amis, Bunun, Puyuma, and Paiwan Figure 1: Geographical distribution of Basay, Kavalan, Amis, Bunun, Puyuma, and Paiwan Bowl; Basin: Basay kisiŋ, Kavalan kaysiŋ, Amis kaisiŋ, Bunun haysiŋ ‘rice’, Puyuma kaisiŋ, Paiwan kisi Plate: Kavalan piaz, Puyuma piaɖ, Paiwan piaḍ Cauldron: Kavalan siuy, Amis sioy, Bunun siuy Bottle: Amis talid, Puyuma dalilr, Paiwan dalilj Mat: Basay sukan/sikkam, Puyuma (Kapitul) skam, Paiwan səkam Pot: Kavalan kapuʔuy, Amis kafo'oy, Puyuma kapuʔuy Spatula: Kavalan siansi, Amis (Sakizaya) tiansi, Puyuma siyansi Shrimp: Kavalan qabus, Amis kafos, Puyuma kabus field mouse: Kavalan melabaw, Amis kolafaw, Puyuma kulabaw, Rukai kulrabaw Bamboo shoot: Amis tefo', Paiwan cuvuq Nest fern: Amis lokot, Puyuma lukutr, Rukai lrukucu, Paiwan lukuc Autumn Maple Tree: Kavalan saquR, Amis (Sakizaya) sakul, Bunun sual One may reconstruct these lexical forms to Proto-Austronesian as the lexical items are shared across the primary branches. However, two factors argue against such reconstructions. First, these items are highly concentrated in eastern Taiwan and are absent from most other Austronesian languages, suggesting that their distribution reflects areal diffusion rather than inheritance from PAN etyma. Second, the semantic fields of these shared items further undermine the possibility of inheritance. They largely fall into two categories—flora and fauna, and tools—which are either geographically or culturally specific. The shared lexical items indicate that the Formosan languages spoken in east Taiwan may have had a profound interaction in ancient times. Given the properties of the lexicons they share, the borrowing may have happened in daily life context via trading or marriage among different language communities. 5 Conclusion While Li (2004) and Tsuchida (2006) both suggest that the reflexes of PAN *piliq ‘to choose’ are lexical innovations in Kavalan and Basay, as well as the reflex of *mula ‘to plant’ in Kavalan, the present paper suggests that these two words may have been borrowed from Malayo-Polynesian through maritime trading between ancestors on both sides. Regarding the borrowing of *piliq, the main account is on the unusual paN-/maN- prefixation, which is rarely seen in Formosan languages, and the irregularity of the ‘KAV l : BAS c’ correspondence. A piece of supporting evidence also shows that some Philippine people immigrated with Spanish sailors to the north and northeast Taiwan and had contact with the indigenous people. Therefore, a borrowing scenario is more plausible. On the other hand, *mula has undergone metathesis in both Kavalan and Basay. While Li (2004) suspects that Basay form may have borrowed from Kavalan, as it does not exhibit the expected form **pacuma, but it is identical with Kavalan paruma, the present paper also supports Li’s viewpoint. Moreover, if broadening the investigation of the lexicon form to the whole Formosan languages, we can observe that the metathesized *mula form is exclusively attested in the Formosan languages spoken in eastern Taiwan, indicating that this form is not innovative at all, and it is also likely to be seen as an areal diffusion. Although it remains debatable, this paper regards reflexes of PAN *layaR ‘sail’ in Kavalan and Basay as non-borrowed forms. From the perspective of the bottom-up approach of reconstruction, it once was an irregular regressive liquid assimilation during Proto-Northeast Formosan developing into Proto-Kavalan and Proto-Basay, which happened to a few words, e.g., *qiCəluR ‘egg’, *laRiw ‘to run’, and *layaR ‘sail’. Finally, the exploration of loanwords in this paper has shown that there was a close relationship between the Formosan languages spoken in the east coast of Taiwan, such as Basay, Kavalan, Amis, Puyuma, and Paiwan. They do not only share daily related words, but also terms of flora and fauna, and the shared lexicons are not attested in the other Formosan languages. Thus, these would be yet other examples of areal diffusion in this area, which implies its history with frequently cross-ethnic corporations. References Blust, Robert. 1999. Subgrouping, circularity and extinction: Some issues in Austronesian comparative linguistics. In Elizabeth Zeitoun and Paul Jen-kuei Li (ed.), Selected Papers from the Eighth International Coreference on Austronesian Linguistics (Symposium Series of the Institute of Linguistics (Preparatory Office), No. 1.), pp. 31–94. Taipei: Institute of Linguistics, Academia Sinica. Blust, Robert. 2004. Austronesian nasal substitution: A survey. Oceanic Linguistics 43:73–148. Borao, José Eugenio. 2007. An overview of the Spaniards in Taiwan (1626–1642). Proceedings of the Conference on China and Spain during the Ming and Qing Dynasties Centre of Sino-Western Cultural Studies: 1–17. Conant, Carlos Everett. 1911. The RGH Law in Philippine Languages. Journal of the American Oriental Society 31.1:70–85. Dahl, Otto Christian. 1976. Proto-Austronesian (2nd, revised edition). Scandinavian Institute of Asian studies monograph series, No. 15. London: Curzon Press. Green, Simon. 2022. Formosan indigenous peoples and the Spanish (1626–1642). Ph.D. dissertation, School of Humanities, Language & Global Studies, University of Central Lancashire. Li, Paul Jen-kuei. 1995. Formosan vs. non-Formosan features in some Austronesian languages in Taiwan. In Paul Jen-kuei Li, Cheng-hwa Tsang, Ying-kuei Huang, Dah-an Ho, and Chiu-yu Tseng (ed.), Austronesian studies relating to Taiwan (Symposium Series of the Institute of History and Philology, Academia Sinica No. 3.), pp. 651–681. Taipei: Institute of History and Philology, Academia Sinica. Li, Paul Jen-kuei. 2001. The linguistic position of Basay. Language and linguistics 2.2:155–171. (In Chinese) Li, Paul Jen-kuei. 2004. Origins of the East Formosans: Basay, Kavalan, Amis, and Siraya. Language and linguistics 5.2:363–376. Li, Paul Jen-kuei. 2014. Texts of the Trobiawan dialect of Basay. Tokyo: Tokyo University of Foreign Studies. Li, Paul Jen-kuei, and Tsuchida Shigeru. 2006. Kavalan dictionary. Taipei: Institute of Linguistics, Academia Sinica. Tsuchida, Shigeru. 2006. Kavalan alam ‘bird’: Loanword or inheritance? In Henry Y. Chang, Lillian M. Huang and Dah-an Ho (ed.), Streams converging into an ocean: Festschrift in honor of Professor Paul Jen-kuei Li on his 70th birthday (Language and Linguistics Monograph Series W-5.), pp. 585–593. Taipei: Institute of Linguistics, Academia Sinica. Tsuchida, Shigeru, Yukihiro Yamada and Tsunekazu Moriguchi. 1991. Linguistic Materials of the Formosan Sinicized Populations 1: Siraya and Basai. Tokyo: University of Tokyo. Tseng, Li-yang. 2023. A reconstruction of Northeast Formosan. M.A. thesis, Institute of Linguistics, National Tsing Hua University. TOWARDS CONTINUOUS COMMUNITY COLLABORATION IN LANGUAGE DOCUMENTATION: INSIGHTS FROM BUGKALOT/EG̓ONGOT1 John Michael Vincent S. DE PANO Patricia Anne Y. ASUNCION University of the Philippines Diliman University of the Philippines Diliman jsdepano1@up.edu.ph pyasuncion@up.edu.ph Abstract This paper describes the history and status of the documentation of the Bugkalot/Eg̓ongot language spoken in the Philippines to show an example of new linguistics (Florey 2008). In this approach, speakers of a language are given the license to determine how their language is documented, rather than having documentation strictly conducted by researchers outside the community. Experiences from the series of fieldwork for Bugkalot/Eg̓ongot are evaluated, specifically by highlighting the value of collaboration between language communities and linguists ‘to ensure the sustainability of language documentation’ (Gallego & Barcelo 2024). The need for linguists in language description is decreasing as speech communities gain the capacity to do it themselves (Barcelo 2024). Keywords: capacity building, documentation, endangerment, Bugkalot/Eg̓ongot, fieldwork ISO 639-3 codes: ilk, tgl 1 Introduction: Language documentation in the Philippines In 1580, Friar Juan de Plasencia was commissioned by the Roman Catholic Church and, by extension, the Spanish government, to conduct what is now known as one of the earliest attempts at documenting a language in the Philippines: The writing of a grammar, a dictionary, and a catechism in/of Tagalog (ISO 639-3 [tgl]). With great help from a child named Miguel de Talavera, Plasencia did the work by comparing the structure of and patterns in Tagalog with those of Latin. De Talavera acted as the translator between Plasencia and Tagalog speakers. This led Phelan (1955) to exclaim that ‘without the aid of Miguel de Talavera, it is doubtful whether Plasencia could have completed his Tagalog texts in a short space of time’ (p. 156). While not all three works were published, they represent what may now be called the Boasian trilogy (Evans & Dench 2006): A set of literature usually produced after conducting linguistic fieldwork in communities. For about three and a half centuries, the description of Philippine languages was to be done solely by Spanish churchmen (Constantino 1971). In 1901, Blumentritt and Mason published a list of Philippine ethnolinguistic groups and also remarked that only ‘a small number of grammars and dictionaries [of Philippine languages] have been published’ (p. 19). Most of the sources for the list are ethnographic accounts by Western scholars, and Blumentritt and Mason explain that, despite the existence of previous attempts, the monumental task of describing the structures of the different Philippine languages remains to be done. Interestingly, Christian missionaries, soldiers, and language scholars spearheaded language description efforts during the American occupation up until the late 1990s. Henry Swift, a US army officer, even said that ‘the English and Dutch, as masters in the art of colonizing, make the knowledge of the languages of the natives a necessary part of the curriculum’ (Swift 1909), highlighting the importance of language and language description in colonization. All of these led Liao (2009) to describe the documentation of Philippine languages as ‘hav[ing] been done mostly by non-Filipinos’ (p. 31). Throughout history, the role of the community in language description and documentation has remained minimal. While Constantino (1971) explained that speakers are having more and more agency in the description, Paz (1984) observes that the number of Filipinos working on minor languages is extremely low. Speakers of these languages are still primarily treated as mere sources of data, and this is evident in most of the institutions in the Philippines, including the Department of Linguistics of the University of the Philippines (UP). 2 Communities and legacy materials Constantino, starting in 1963, built the Archives of Philippine Languages and Dialects through an extension grant from UP. This body of work, now called the Constantino collection, includes data from dozens of Philippine languages, some of which are considered endangered based on contemporary vitality assessment tools. The preservation and utilization of this collection are especially important, considering the current linguistic situation in the Philippines. The majority of the lists estimate that there are more than a hundred indigenous languages in the country, with Ethnologue (Eberhard et al. 2025) pegging it at 175. A considerable fraction of this is also deemed in critical condition: 44 are endangered, 11 are on the verge of extinction, and two are already extinct. This situation warrants the immediate description of the Philippines’ languages and dialects, serving as one of the starting points for language revitalization. This also poses the challenge of helping communities recognize the importance and relevance of language description and documentation to their lived realities. To achieve this, the conduct of activities must be aligned with the new linguistics (Florey 2008): A close collaboration between a community and linguists, with language users being the prime agents in language documentation. This is especially important in the context of communities speaking endangered languages and leads to sustainable language documentation through good working relationships and solid collaboration. This is what we aim to continuously achieve as we cooperate with and learn from the Bugkalot/Eg̓ongot community. 3 Bugkalot/Eg̓ongot: A case study Bugkalot/Eg̓ongot (ISO 639-3 [ilk]) is a Southern Cordilleran language of the Austronesian family spoken by less than 6,000 speakers (Philippine Statistics Authority 2010) in the northern Philippine provinces of Nueva Vizcaya, Quirino, Aurora, and Nueva Ecija. It is a threatened language, with weak intergenerational transmission in the domains of family, education, government, and commerce. This status, coupled with the rich history and customs of the ethnolinguistic group, moved the Philippine national government to recognize the community as an indigenous people by law. Bugkalot, Egongot, and Ilongot have all been historically used to refer to the ethnolinguistic group since a general consensus among the speakers was only arrived at in 2023. Bugkalot/Eg̓ongot has been prescribed to be the singular term by elders and other members of the community, as it combines the autonym Bugkalot and the exonym Eg̓ongot. Eg̓ongot is derived from the morphemes e ‘from’ and gongot ‘forest,’ and it roughly translates to ‘of/from the forest.’ In this paper, the name Bugkalot/Eg̓ongot will be used to acknowledge the agency of the community. Since the 1960s, the Bugkalot/Eg̓ongot community and the UP Department of Linguistics have worked together on different projects, the first of which was the collection of sentence lists, word lists, and oral history recordings from the different provinces. In 2009, a group of undergraduate students went to Nueva Vizcaya for a field methods course, at the end of which they were able to write a preliminary grammar sketch. Less than two decades later, five graduate students and two faculty members returned to the province, aiming to arrive at a more detailed description of the language’s phonology and morphosyntax. Most recently, in July 2025, 10 undergraduate students and three faculty members conducted their fieldwork in Nueva Vizcaya to collaborate with about a dozen different community members towards writing descriptions of the language’s phonology and morphosyntax, as well as building a dictionary. Aside from these, four additional field activities were organized between July 2023 and August 2024. The insights in the next part of this paper are categorized into four, reflecting the chronological arrangement of activities related to the collaboration between the community and the linguists. First are the challenges that have been encountered so far in the digitization of the Constantino collection. This will be followed by observations regarding the repatriation of legacy materials to the community and the filial connections retraced through this endeavor. Some notes on how the orthography of the Bugkalot/Eg̓ongot language was built through the community’s initiatives will also be discussed. Various research dissemination efforts involving both the community and the academe will then be mentioned. 3.1 Digitization: Goals and challenges The Bugkalot/Eg̓ongot documentation project is an offshoot of previous extension works spearheaded by the UP Department of Linguistics: the Digital Archiving of Linguistic Fieldwork Cassette Tapes and the Cataloguing and Digitization of Dr. Ernesto Constantino’s Archive. The goal of these three related projects is to ensure the preservation of previously gathered data, some of which was collected six decades ago, and to hopefully provide copies of the digitized recordings to the communities that use these languages. As part of the Constantino collection, the tapes of recorded Bugkalot/Eg̓ongot language are subject to challenges identified by Or and Estrellado’s (2023) work on the challenges faced by archivists and secondary users of legacy materials. Primary concerns involve the form of materials, the content, the analyses of recorded data, and, in relation to the third one, the lack of context. Linguists had the chance to listen to, try to annotate, and share recordings with the community members later on. In terms of format, there are instances when audio recordings need to be compressed and converted to other file formats to meet the compatibility requirements of annotation software programs. While such steps are sometimes necessary in order to access and process the recordings’ contents, these may lead to inevitable damage and/or loss of parts of the recordings. Another challenge related to file compression is file size. Given that the file sizes are still relatively large, most common devices, such as smartphones, are not ideal for handling such amounts of data disseminated to the community. Thus, digitized recordings and other pertinent files are temporarily stored in an external hard drive turned over to the Bugkalot/Eg̓ongot community. However, considering that the collaboration will lead to the production of more files in the future, the team has to come up with an optimal storage solution that works not only for language researchers but also for community members. Additionally, concerns regarding access and metadata tagging should be addressed, keeping in mind that some recordings contain personal histories or sensitive stories related to the relationships and dynamics between different community members and families. Likewise, inconsistent transcription of the collected data brings upon additional hindrances to the immediate documentation of the language, but might have been a result of the realities of fieldwork. As the majority of the data gathering for the Constantino collection was conducted in the 1960s, paper and magnetic reel tapes were the common materials. This, coupled with the fact that the linguist had to go to different provinces and commute there while bringing these materials, may have led to different transcription norms and formatting. Our team attempted to apply a uniform transcription method during the recent fieldwork in Bugkalot/Eg̓ongot-speaking areas, working with community members to achieve consistency while also noting natural disparities. As for the metadata, a third (around 20 hours) of the digitized Bugkalot/Eg̓ongot data from the Constantino collection now has an in-depth description of the contents, but much work is yet to be done. Of great help are the community’s insights and the audio commentary embedded in the tapes. Some additional information recorded are the location (province, town, barangay) of the data collection session, date, and brief descriptions (language profile, physical features) of the language consultants, although not all recordings were provided with such information. 3.2 Repatriation: Conducting fieldwork, six decades later An initial consultation with the faculty of the local state university, the Nueva Vizcaya State University (NVSU) and members of the Bugkalot/Eg̓ongot community in the said province in early 2023 served as an avenue to discuss the details of the project and identify its primary targets, namely, training for language documentation and description, planning for orthography development, and repatriation of data from the collection currently housed at the UP Department of Linguistics. During the 2023 fieldwork by graduate students, more information regarding the Constantino collection was shared with the Bugkalot/Eg̓ongot community members residing in Nueva Vizcaya. The interaction was also an opportunity to learn from community members about the contents of the recordings and what they reveal about their culture and customs. Equally important was the discussion of what can be done with these materials, specifically how they can contribute to the documentation and description of the language. By the end of this fieldwork, as mentioned earlier, a one-terabyte external hard drive containing the digitized Constantino recordings and the additional video and audio recordings made in the two weeks of July 2023 was turned over to community members. Detailed metadata for the said files were created during the undergraduate fieldwork. Aside from listening to the recordings, community members commented on their contents and provided more information crucial to data organization, language description, and access restrictions. 3.3 Description: A look at orthography-building It was during the workshops conducted between 2023 and 2025 in Nueva Vizcaya and Quirino that elders, teachers, and translators from the different provinces identified and described the features of their language’s phonology, morphology, and syntax to arrive at an orthography guide for their language. This series of workshops, designed to train community members in doing linguistic analysis through the assistance of several government institutions and state universities, was an opportunity to identify variations among the Bugkalot/Eg̓ongot varieties and come up with internal agreements regarding orthographic representation. This was also a chance to understand historical connections among the Bugkalot/Eg̓ongot communities from three different provinces. A challenge that had been continuously encountered during the orthography-building workshops was that the participants came from different provinces, as reflected in their linguistic and social relationships. However, as has also been observed by the facilitators, the community members worked with each other throughout the series of workshops in order to achieve a common goal: to produce an orthography representative of their language’s unique sounds and features. For instance, there exists dialectal allophony between /j/ and /z/, and between /w/ and /v/, that coincide with geographical boundaries. Speakers from Nueva Vizcaya and Quirino say [be.ja] for ‘knowledge,’ while those from Aurora pronounce it as [be.za]. Likewise, speakers from Nueva Vizcaya and Aurora pronounce [ke.va] for ‘walk,’ while those from Quirino use [ke.wa]. Through the first in‑person workshop, community members from the three provinces were able to agree on a single grapheme after discussing the nuances and implications of such a choice. 3.4 Dissemination: To the community, to the linguists, and to the public The collaboration also led to opportunities for both Bugkalot/Eg̓ongot community members and linguists to disseminate their findings in different avenues. Beginning with welcoming a small class of graduate students, members of the Bugkalot/Eg̓ongot community have shared their stories with a more diverse audience. Sir Fred, a culture bearer and Bible translator from Nueva Vizcaya, was among the plenary speakers during the International Conference on Language Endangerment (ICLE) last October 2024 and was a panel member in a roundtable discussion during the 16th International Conference on Austronesian Linguistics (ICAL) last June 2024. He also co-presented this community-led documentation project during the 2024 Language and Documentation and Archiving (LD&A) Conference. Aiming to make the community’s collective voice heard, Sir Fred patiently works with linguists and provides important insights into the future directions of this collaboration. Related to this project are two conference presentations by members of the UP Department of Linguistics and two journal articles by one project member. The initial findings of undergraduate students who went to Nueva Vizcaya in 2025 were also presented in a student research colloquium last August 2025. 4 Future directions Enumerated in this section are the most recent activities organized under the continuous cooperation between the Bugkalot/Eg̓ongot community and the UP Department of Linguistics. Two graduate students are doing their theses on the language—one on the subcategorization of verbs and the other on a dialectological study that traces the use of the language in the provinces of Nueva Vizcaya, Quirino, Aurora, and Nueva Ecija. Moreover, a group of undergraduate students previously went to Nueva Vizcaya for their field methods class. This fieldwork focused on building a general metadata for the Constantino recordings, and on particular grammatical aspects such as pronoun paradigms and clitic ordering. Key informants from the community are also involved in writing about the prospects of building a dictionary with the aid of a computer software program and in updating the grammar sketch produced in 2009. Meanwhile, a co-authored paper on the repatriation of Bugkalot/Eg̓ongot legacy language materials was presented in the Philippine Studies Conference last September 2025. It is also important to note that, apart from Ernesto Constantino, anthropologist Renato Rosaldo also produced several works about the ethnolinguistic group based on his fieldwork in the 1960s. Rosaldo’s digitized collection was repatriated in June 2025 to the community in Kakidugen, Nueva Vizcaya through the assistance of linguist Daniel Kaufman. Likewise, a digitized version of an 1893 catechism written in Bugkalot/Eg̓ongot was returned by a descendant of Otto Johns Scheerer, the first professor emeritus of Philippine linguistics, in 2024. The community-led description of Bugkalot/Eg̓ongot is not the only project of the UP Department of Linguistics of this kind. Faculty members previously worked with and for other communities, such as the Mangyan communities in Mindoro, in response to a certain need. Instead of imposing some sort of agenda, the cruciality of communities identifying what they need and what they want is taken into consideration, this being an important reason for them to reach out to and collaborate with institutions and organizations. This kind of motivation ultimately contextualizes and guides the direction of the partnership, acting as a basis for setting clear goals and catering to the actual needs of communities. It seems fitting to end by quoting a reminder from Consuelo Paz (2005), a pillar of Philippine linguistics who spent years interacting with and learning from ethnolinguistic communities in different parts of the country: ‘The best attitude a field researcher can have is to consider or treat those who help them as co-researchers, be it an individual or a community’ (p. 4). The community may not possess the technical training that linguists or researchers have, but that does not mean that its members cannot contribute or participate in linguistic analysis. As speakers, they have ways to describe what their language does and explain how their language works. The Bugkalot/Eg̓ongot are well aware of variations concerning certain sounds across the three provinces, treating certain words and expressions as characteristic of one variety. Beyond telling what is grammatical and ungrammatical, well-formed and ill-formed, they try to make sense of such judgments and consult each other regarding these. They acknowledge that their language is changing, and they certainly know that they want it to remain spoken not only during rituals or even church-related activities—they assert that their language has to be an important part of their lived reality. Thus, we all look forward to continuously working with and learning from each other. References Blumentritt, Ferdinand, & Mason, O. T. 1901. List of the native tribes of the Philippines and of the languages spoken by them. US Government Printing Office. Constantino, Ernesto. 1971. Tagalog and other major languages of the Philippines. In Linguistics in Oceania, pp. 112–54. De Gruyter Mouton. https://urldefense.com/v3/__https://doi.org/10.1515/9783111418827-005__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrrL76j6t$ Eberhard, David M., Simons, Gary F., & Fennig, Charles D. (Eds.). 2025. Ethnologue: Languages of the world (27th ed.). SIL Global. Evans, Nicholas & Dench, Alan. 2006. Introduction: Catching language. In F. K. Ameka, A. Dench, & N. Evans (Eds.), Catching language: The standing challenge of grammar writing, pp. 1–39. Mouton de Gruyter. Florey, Margaret. 2008. Language activism and the “new linguistics”: Expanding opportunities for documenting endangered languages in Indonesia. Language Documentation and Description, 5. Florey, Margaret, & Himmemann, Nikolaus. 2010. New directions in field linguistics: Training strategies for language documentation in Indonesia. In Endangered languages of Austronesia, pp. 121–40. Oxford University Press. Gallego, Maria Kristina S., & Barcelo, Frederick. 2024, August 4. Developing a community-led documentation project for Bugkalot/Eg̓ongot [Conference presentation]. 2024 Language Documentation and Archiving (LD&A) Conference, Berlin. Liao, Hsiu-chuan. 2009, March 12–14. The state of the art of the documentation of Philippine languages [Conference presentation]. The First International Conference on Language Documentation and Conservation, Hawai‘i. Or, Elsie Marie, & Estrellado, Dustin Matthew. 2023. Legacy language materials in the Ernesto Constantino Collection: Challenges and lessons for building a Philippine language archive. The Archive Journal 4.1–2:157–207. https://urldefense.com/v3/__https://journals.upd.edu.ph/index.php/archive/article/view/9594__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrsErXldF$ Paz, Consuelo J. 1984. Mga unang pag-aaral tungkol sa mga maynor na wika. The Archive Journal. Paz, Consuelo J. 2005. Gabay sa fildwurk. University of the Philippines Press. Phelan, John Leddy. 1955. Philippine linguistics and Spanish missionaries, 1565–1700. Mid-America, A historical review 37:153–70. Philippine Statistics Authority. 2010. 2010 census of population and housing. Retrieved from https://urldefense.com/v3/__https://psa.gov.ph/system/files/main-publication/2010_PHIILIPPINES_FINAL*2520DF.pdf__;JQ!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrhlNy3tY$ Swift, Henry. 1909. A study of the Iloco language. Washington: Byron S. Adams. THE VITALITY OF ETHNIC LANGUAGES IN MULTILINGUAL SOCIETIES IN THE SOUTHERN PHILIPPINES1 Atsuko UTSUMI Nelson DINO Meisei University Mindanao State University Tawi-Tawi College of Technology and Oceanography utsumi@lc.meisei-u.ac.jp nelsondino@msutcto.edu.ph Abstract This paper explores the language use of young people from the southern Philippines and the vitality of indigenous languages, specifically Tausug (Sinug). The data were taken from a sociolinguistic study conducted in January 2024 in Zamboanga City and Tawi-Tawi Island, in which questionnaires were distributed to and collected from college students: 87 students in Zamboanga and 96 in Tawi-Tawi. The research shows that indigenous languages in the southern Philippines have strong vitality with family, friends, and various private situations, though English and Tagalog are used with increasing frequency among young people who live in the Philippines, especially those pursuing higher education. Tausug, one of these indigenous languages, showed especially strong vitality in both locations. We argue that Tausug’s vitality stems from its deep regional roots and its use in early childhood education. Keywords: sociolinguistics, language vitality, language maintenance ISO 639-3 codes: eng, tgl, tsg 1 Aims of this paper This study aims to answer the following three research questions, focusing on the two locations in the southern Philippines. First, how do speakers select a language in each sociolinguistic situation? Second, how well do young people maintain their native language? Third, what is the factor that conditions the level of linguistic vitality? To answer these questions, sociolinguistic questionnaires were distributed to college students who were born from 1994 to 2004. The respondents were 87 college students in Zamboanga city and 96 college students on Tawi-Tawi island. The research was conducted in January 2024 with the help of staff from West Mindanao State University and Mindanao State University Tawi-Tawi College of Technology and Oceanography. As will be shown in sections 5 and 6, young people in the southern Philippines retain their native languages remarkably well, but the vitality and wide range of domains of use of the Tausug language stands out. In order to explain its strength, a brief history and description of the research area is presented in section 2, followed by the linguistic composition of the areas in section 3. The method of the survey is described in section 4, with the results shown in section 5 (Zamboanga City) and 6 (Tawi‑Tawi Island). Section 7 summarizes the linguistic situation and the language use of the areas. 2 Method of the survey: questionnaire and respondents 2.1 Method and the content of the questionnaire A sociolinguistic questionnaire with 31 questions was used in this study. 87 college students at Mindanao State University in Zamboanga City,2 and 96 college students at Mindanao State University Tawi-Tawi school of Technology and Oceanography on Tawi-Tawi Island, participated in this study. Respondents were born between the year 1994 and 2004. The questionnaire comprised three parts. The first section was the face sheet, which contained thirteen questions about the ethnic background of the respondent. The information included name/initials, age (date of birth), gender, ethnicity of each parent and spouse, marital status, education, occupation, place of birth, place of residence during childhood, current place of residence, and experience of living outside the region. The second part consisted of questions about language use, in which three variables were considered. The first variable was the addressee. This section asked which language(s) was/were used in speaking with fathers, mothers, grandparents, siblings, friends, teachers, strangers, people of the same ethnicity (of younger, the same, and older generations), and people of different ethnicities. The second variable was the situation: these questions ask which language(s) are used at schools, banks, municipal offices, ceremonies, small shops, and shopping malls in the city. The third variable related to the language used in the private domains, such as praying alone, counting money, and talking to babies who were not yet able to speak. Lastly, questions about language attitude were asked. This part consisted of questions such as their most competent language, the language they used most often, their favorite singer and the language of their songs, possession of books in their own ethnic language, the favorite language, the language they hoped their children to acquire as their native language, and the prospect of its linguistic vitality. The answers were diverse and will not be dealt with in this paper. For most questions in the first and the second parts, multiple choices of languages were allowed as the answers. The respondents tended to select only one language to the question about the language they used with their family members, but in contrast, they often selected multiple languages as the answers to questions such as ‘the most competent language’, ‘the most often used language’, ‘the language used with friends’, and ‘the language used in small shops’. 2.2 Ethnic background of respondents The ethnic background of the respondents is summarized in Tables 1and 2. The majority of respondents were born and raised in the southern part of the Philippines, but some have lived outside the region, such as Manila, and Basilan, and a few have lived abroad, mainly in neighboring Malaysian Borneo. Respondents predominantly belong to one of the native indigenous groups in the southern Philippines. The Tausug population is the largest among the parental generation of respondents and consists of 32% to 55% of that group. In Zamboanga city, the Bisaya and Chavacano populations are the second and the third largest groups, comprising 10% to 22%. The Sama population, comprising 9%, is the fourth largest group in Zamboanga. On Tawi-Tawi island, Sama is the second largest indigenous group, comprising 38% to 49%. Other ethnicities, such as Maupun, Bisaya, Chavacano, and Tagalog are a very small proportion: 1% to 3%. Intra-ethnic marriages are very common; 60% of the students in Zamboanga and 78% on Tawi‑Tawi have parents of the same ethnicity. This factor is critical for the prospects of indigenous languages. The more common people marry someone of the same ethnicity, the more stable indigenous languages are, since parents use their native indigenous language at home, and their children will acquire it and keep using it until adolescence. Other important factors include the community environment in which a speaker resides. If the majority of the population belongs to one indigenous group, residents there use that language very often. From this perspective, Tausug, Bisaya, and Chavacano seem to fulfill the requirement for language maintenance in Zamboanga. In Tawi-Tawi, on the other hand, only Tausug and Sama have the strongest vitality. The prospects for other languages, such as Maupun, Chavacano, Biasaya and Tagalog, depend on language use in the community and at home. Table 1: Ethnicity of respondents and their parents of respondents in Zamboanga City Ethnicity of respondent Father (% are calculated by known ethnicities of 82) Mother (% are calculated by known ethnicities of 82) Percentage of Intra-ethnic marriage Tausug 45 (55%) 34 (41%) 30 (66%) Bisaya 11 (13%) 18 (22%) 7 (63%) Chavacano 9 (11%) 8 (10%) 6 (66%) Sama 7 (9%) 7 (9%) 4 (57%) Tagalog 2 (2%) 2 (2%) 1 (50%) Others 8 (10%) (Ilongo 2, Yakan 2, Iranun 1, Kolibungan 1, Mandarin 1, Subanen 1) 13 (16%) (Yakan 6, Ilongo 1, Iranun 1, Kolibungan 1, Waray 1, Maranao 1, Mangindanao 1, Batagen 1 ) 1 (12%) Total 82 (100%) 82 (100%) 49 (60%) Table 2: Ethnicity of respondents and their parents in Tawi-Tawi Father (% are calculated by known ethnicities of 87) Mother (% are calculated by known ethnicities of 87) Intra-ethnic marriage Tausug 37 (46%) 28 (32%) 28 (76%) Sama 33 (38%) 43 (49%) 32 (97%) Mapun 6 (7%) 6 (7%) 5 (83%) Chavacano 3 (3%) 2 (2%) 1 (33%) Tagalog 2 (2%) 2 (2%) 2 (100%) Bisaya 3 (3%) 6 (7%) 1 (16%) Others Subanen 2, Ilongo 1, unknown 9 Unknown 9 Total 96 96 68 (78%, out of 87 known ethnicities) 3 Results in Zamboanga City In this section, the answers to the questionnaire collected in Zamboanga City will be presented. Zamboanga City has a population of more than one million inhabitants and is home to the Chavacano people whose language is a Spanish-based creole. Today, Bisaya, Tausug, and Sama people are increasing in number. The results were categorized into four parts. The first part, section 3.1, consists of answers to the questions regarding the language/languages of highest proficiency and the most frequently used languages. The second part presents language use at home and in the surrounding neighborhood, in section 3.2. Answers to the language use in the most private situations, such as saying a prayer alone, will be provided in section 3.3, while language use in public domains will be provided in section 3.4. In the following description, Tagalog, and Filipino, which is the standardized version of Tagalog, are not differentiated because respondents themselves consider these to be the same speech variety. Tagalog is used as the language name, including Filipino. 3.1 Competent languages and frequently used languages of respondents The authors allowed respondents multiple answers to the questions “What language(s) are you most competent in?” and “What is/are the language/s you use most often?”. Almost all the respondents named more than one language. The results are shown in Table 3. Tagalog, along with Tausug, was chosen most frequently as the language of highest proficiency and frequency of usage. Respondents were competent in Tagalog and used it very often because of its use in the public domain, as we will see in section 3.4; 62% of them picked this language as the one in which they were most competent, and 55% of them selected Tagalog as the language they used most frequently. Tausug, too, showed its vitality because of its use at home and in the surrounding neighborhood, as will be shown in section 3.2. The large proportion of respondents with a Tausug ethnic background (55% have a Tausug father, and 41% have a Tausug mother) explains why 55% selected this language as their most competent language, and why 52% chose it as their most used language. Respondents with a background of one of the other indigenous groups, such as Chavacano and Sama/Bajau, also selected their native language as the most competent and most used language. These results show that young people in the southern Philippines are still able to speak in their native languages with a high level of proficiency. The English language was also selected by a good number of people: 21% selected it as the language in which they were most competent, and 7% chose it as their most frequently used language. This is understandable because in Philippine universities, the use of English is mandatory, and the respondents are university students. Children in the Philippines start learning English in an elementary school, and it is used as the medium of instruction in mathematics and science classes. Some students went to high schools which offer a special curriculum focusing on English, mathematics and science. This explains why a good portion of students have a good command of English. Table 3: Competent and frequently used language in Zamboanga Most competent language (multiple choices) Most frequently used language (multiple choices) Tagalog 51 (62%) 45 (55%) Tausug 45 (55%) 43 (52%) Chavacano 7 (9%) 3 (4%) Sama 5 (6%) 3 (4%) English 21 (26%) 6 (7%) Others (including Bisaya, Yakan, Iranun, Maranao, Kolibungan, Arabic, Malay) 16 (20%) 15 (18%) 3.2 Language use at home and in the community While the authors of this paper did not instruct respondents to select only one language, the majority of them did so, with the exception of a small number of answers to the question “What language do you use with your grandparents/father/mother”. In contrast, a third of them chose multiple languages as the languages they used with their siblings. The answers are summarized in Table 4. Students in Zamboanga demonstrated a strong tendency to use their native language. Tausug was selected as the language used toward mothers with slightly more frequency (60%) than those who have Tausug mothers (41%). This showed that if one of their parents is Tausug, the language tends to be chosen. The other indigenous languages also displayed their strong vitality at home and in the community. The percentages of other languages, such as Bisaya, Sama, Chavacano, and Yakan, corresponded to the ethnic background of the respondents. Although 26% percent of respondents selected Tagalog as the language they use most competently, only 11% of them used it with their grandparents, and only 12% with their fathers. The percentage rose to 18% with mothers and 26% with their siblings. Tagalog seems to be used increasingly more often with younger members of the family, which leads to the prospect that this national language will be used more frequently in the future. This is supported also by the fact that a third of students answered that they used more than one language with their siblings; presumably, they used their native language together with Tagalog. Table 4: Language use at home in Zamboanga To Grandparents To Father To Mother To Siblings Tausug 51 (62%) 45 (53%) 49 (60%) 50 (60%) Bisaya 13 (16%) 10 (12%) 4 (5%) 9 (11%) Sama 9 (11%) 6 (7%) 6 (7%) 7 (9%) Tagalog 9 (11%) 10 (12%) 15 (18%) 21 (26%) Chavacano 7 (9%) 5 (6%) 6 (7%) 10 (12%) English 1(1%) 0 (0%) 2 (2%) 8 (10%) Others Yakan 3, Ilongo 2, Iranun 1, Kolibugan 1 Yakan 2, Malay, Mandarin, Iranun, Maranao, Kolibugan, Ilongo 1 each Yakan 2, English 2, Maranao, Iranun, Kolibugan, Waray 1 each Yakan 2, Kolibugan, Maranao, Iranun 1 each The answers to the question “what language(s) do you use with people of the same ethnicity” exhibited the same tendency as above, but more people selected multiple languages; 35-40% of them selected multiple languages as the languages they used with people of the same generation, as shown in Table 5. In a sharp contrast to this, 88% of respondents answered that they used multiple languages with their close friends. They appeared to have friends from various ethnic backgrounds, and if the addressee was a speaker of a different indigenous language, they preferred to use Tagalog. Speakers appeared to use a language depending on the language of the addressee, but more people used Tagalog together with their native language even with the people of the same ethnicity, as Tagalog was used by a little less than half of the respondents. The age difference between the speaker and the addressee did not lead to a difference in the selection of the language. English was also used more often than at home, and one in five students preferred to use English among friends. Table 5: Language use with neighbors in Zamboanga Elders of the same ethnicity Same ethnicity, same generation Younger people of the same ethnicity To close friends Tausug 46 (56%) 44 (54%) 47 (57%) 52 (63%) Bisaya 8 (10%) 8 (10%) 7 (9%) 7 (9%) Sama 8 (10%) 6 (7%) 6 (7%) 4 (5%) Tagalog 34 (41%) 38 (46%) 36 (44%) 61 (74%) Chavacano 4 (5%) 6 (7%) 5 (6%) 12 (15%) English 7 (9%) 10 (12%) 7 (9%) 15 (18%) Others Yakan 2, Maranao, Kolibugan 1 each Yakan 2, Kolibugan 1 Yakan 2, Kolibugan 1 Arabic 2, Kolibugan 1, 3.3 Language use in private domains The questions regarding language use in the private domains included “What language do you use when you pray alone?”, “What language do you use when you are angry?”, “What language do you use with babies who are not able to speak yet?”, and “What language do you use when you count money or things?”. The results are shown in Table 6. The situations expressed in the above questions should depict the language use of respondents when they were alone. In the situation of saying a prayer, Tausug and Arabic were the most chosen languages. The use of Arabic can be explained by the fact that the southern Philippines has a large Muslim population that uses the language in their prayers. The Tausug people are predominantly Muslim, and the high percentage of its use in a prayer shows that respondents also use their native languages alongside Arabic in this situation. The Catholic respondents chose their native languages, alongside English, but of Chavacano speakers - seven of the respondents - did not choose their family language at all. The respondents tended to use their family languages when they were angry, and when talking to babies who were not yet able to speak, however, they seemed to use Tagalog more and more often, as a third of them answered that they use it with babies. It is interesting to note that young people in the survey often used English with babies. The reason for this is not known. English was selected by 45% of respondents when talking to babies, and when counting money/things the percentage rose to 84%. In contrast, the use of native indigenous languages decreased significantly in the counting situation, which shows the strong influence of school education, where they learn calculation and mathematics in English. Tagalog and Tausug are still used in the counting situation by a fifth of the respondents, though. Table 6: Language use in private situations in Zamboanga Saying a prayer alone When angry To babies who do not speak When counting money/things Tausug 37 (45%) 45 (55%) 38 (46%) 25 (30%) Bisaya 7 (9%) 9 (11%) 0 (0%) 2 (2%) Sama 5 (6%) 4 (5%) 4 (5%) 0 (0%) Tagalog 12 (15%) 28 (34%) 29 (35%) 22 (27%) Chavacano 0 (0%) 5 (6%) 0 (0%) 2 (2%) English 13 (16%) 11 (13%) 37 (45%) 69 (84%) Others Arabic 33(40%), Iranun, Maranao, Kolibugan 1 each Yakan 3, Kolibugan, Thai, Arabic 1 each Magindanao, Iranun, Kolibugan 1 each Arabic 1 3.4 Language use in public domains The results of four questions regarding language use in public domains are presented in Table 7. The questions were “What language do you use when talking to strangers?”, “What language do you use with your friends before the ceremonies begin?”, “What language do you use with your friends at school recess?”, and “What language do you use with your teacher at school?”. 20% to 90% of the respondents selected more than one language. The answers to questions presented significant differences from those in private domains, as the use of indigenous home languages decreased and the use of Tagalog and English increased. The use of Tagalog in every public domain rose to 78% to 98%, which indicates the national language has obtained its status as a common language between people of different ethnic backgrounds. Almost all respondents answered that they used Tagalog with strangers, but 23% of them answered that they used English, and 10% of them used Tausug alongside Tagalog. At school, Tagalog and English were the two mainly used languages. Overall, people in the community tended to use Tagalog to talk to their friends (57%) and English with their teachers (78%). Although they are supposed to use English during classes at school, the percentage of the respondents who answered that they used Tagalog exceeds that of those who used English. Even with their teachers, 96% of them used Tagalog while 78% of them used English. Tausug presented its vitality in public domains, since it is used by 57% for communication with friends at school, and by 38% at the ceremonial places before the ceremony starts. Table 7: Language use in public domain in Zamboanga 1 To strangers To friends before ceremonies To friends at school recess To teachers at school Tausug 8 (10%) 31 (38%) 47 (57%) 8 (10%) Bisaya 2 (2%) 4 (5%) 8 (10%) 1 (1%) Sama 0 (0%) 2 (2%) 0 (0%) 0 (0%) Tagalog 80 (98%) 67 (82%) 64 (78%) 79 (96%) Chavacano 3 (4%) 5 (6%) 7 (9%) 1 (1%) English 19 (23%) 18 (22%) 10 (12%) 64 (78%) Others 0 (0%) Arabic 1 (1%) Arabic 1 (1%) 0 (0%) As presented in Table 8, in other public situations, such as talking to a clerk, shopping in the city center, or shopping at small shops in the community, Tagalog was dominant, too. In a bank, an office, or a shopping center, the students used Tagalog or English. To a clerk in a bank or an office, English was also used by 61%, but Tagalog was even more preferred, at 88%. At a shopping center, Tagalog was almost always used, but English and Tausug were also spoken by 16-17% of respondents. Other indigenous languages, however, were rarely used in these situations. When shopping at a small shop in the community, the use of indigenous languages, including Tausug, Bisaya, and Chavacano, rose as the communities often consist of people of the same ethnicity. However, Tagalog was the language most likely to be used, as 79% spoke it. Table 8: Language use in public domain in Zamboanga 2 To clerk at bank/office Shopping in the city center Shopping at a small shop in the community Tausug 1 (1%) 13 (16%) 27 (33%) Bisaya 0 (0%) 2 (2%) 11 (13%) Sama 0 (0%) 0 (0%) 1 (1%) Tagalog 72 (88%) 79 (96%) 65 (79%) Chavacano 0 (0%) 5 (6%) 10 (12%) English 50 (61%) 14 (17%) 5 (6%) Others 0 (0%) 0 (0%) Yakan 2, Kalibugan 1 3.5 Summary: Language Use in Zamboanga City Zamboanga City is a typical mid-sized city in the southern Philippines. Several major indigenous groups comprise the original inhabitants, while people outside the region frequently arrive seeking education and job opportunities. Tausug, Bisaya, Chavacano, and Sama are the main indigenous groups, but descendants of Ilongo, Yakan, Magindanao, Iranun, and other indigenous groups are also present. This study reveals that people in the younger generation still use their native languages at home and in other private domains. They tend to speak indigenous languages with people of the same ethnic group, regardless of the addressee’s age group. They also use their native languages in situations where they should use a language without anyone hearing it, such as counting money or saying prayer alone. Arabic for Muslim prayer is an exception. As expected, Tagalog and English are the most predominantly used languages in the public domains. Tagalog is selected in every public situation, whereas English is used with school teachers and office clerks. Tausug is one of the indigenous languages in the region, but it is also selected more than any others in the public domains. 4 Language use in Tawi-Tawi In Tawi-Tawi, Tausug and Sama (Bajau) comprise most of the local population, but other indigenous groups, such as Mapun, Chavacano, and Bisaya, are also present. The composition of indigenous groups is somewhat different from that of Zamboanga City. The questionnaire consisted of the same set of questions as in Zamboanga City. Multiple answers were allowed for every question. Section 4.1 presents answers to the questions regarding the language(s) of highest proficiency and frequency, followed by Section 4.2, in which the respondents’ language use at home and in the surrounding neighborhood are summarized. Answers to language use in the most private domains will be provided in section 4.3, while language use in the public domains will be provided in section 4.4. 4.1 Competent languages and frequently used languages of respondents The respondents’ answers to the questions “What language(s) are you most competent in?” and “What is/are the language/s you use most often?” are given in Table 9. Many of them selected more than one language. As an answer to the languages in which respondents feel most competent, Tagalog, Tausug, and Sama were chosen most frequently (44%, 46%, 40% each). Respondents with other ethnic backgrounds chose their own indigenous language as well. When it comes to the most frequently used language, Tausug was chosen by the largest number of people, consisting of 48% of the respondents. Sama was also chosen by a third, and Tagalog by a fourth. English was selected as the most competent language by 14% of the people, but as a frequently used language, the percentage dropped to only 6%. Table 9: Competent language and frequently used language in Tawi-Tawi Most competent language Most frequently used language (multiple choices) (multiple choices) Tagalog 42 (44%) 24 (25%) Tausug 44 (46%) 46 (48%) Sama 38 (40%) 31 (32%) Mapun 8 (8%) 8 (8%) English 13 (14%) 6 (6%) Others Bisaya 3, Chavacano 1, Malay 1 0 (0%) 4.2 Language at home and in the community The answers to the language/s used at home reflect the ethnic composition of students in the Tawi-Tawi study. Tausug and Sama were selected by 40% of the respondents when they talked to grandparents, parents, and siblings, as shown in Table 10. Mapun, Bisaya, and Chavacano were spoken at home by those indigenous groups. Tagalog was also used at home by students coming from Luzon and those born into inter-ethnic families. Tagalog and English were more commonly used with speakers of the same generation, such as siblings, than with those of older generations, as Tagalog was selected by a fourth of the respondents and English by a tenth of the respondents when they speak to their siblings. Table 10: Language use at home in Tawi-Tawi To Gramdparents To Father To Mother To Siblings Tausug 34 (35%) 42 (44%) 39 (41%) 47 (49%) Sama 44 (46%) 37 (39%) 44 (46%) 39 (41%) Mapun 7 (7%) 7 (7%) 7 (7%) 8 (8%) Tagalog 10 (10%) 7 (7%) 9 (9%) 25 (26%) Bisaya 4 (4%) 4 (4%) 4 (4%) 4 (4%) English 3 (3%) 4 (4%) 5 (5%) 10 (10%) Chavacano 2 (2%) 1 (1%) 1 (1%) 1 (1%) Languages which were used by students with people of the same ethnicity did not show much difference from languages used at home, except for Tagalog, which was used by more than 40% of the respondents, as in Table 11. This signifies the status of Tagalog as the language used with people outside one’s family. Indigenous languages were still used vigorously, which indicates the strong vitality of indigenous languages within the community. Table 11: Language use in neighborhood Tawi-Tawi Elders of the same ethnicity Same ethnicity, same generation Younger people of the same ethnicity To close friends Tausug 33 (34%) 37 (39%) 40 (42%) 53 (55%) Sama 32 (33%) 26 (27%) 31 (32%) 32 (33%) Mapun 6 (6%) 5 (5%) 3 (3%) 9 (9%) Tagalog 40 (42%) 46 (48%) 42 (44%) 62 (65%) Bisaya 3 (3%) 2 (2%) 2 (2%) 1 (1%) English 5 (5%) 13 (14%) 3 (3%) 30 (31%) Chavacano 0 (0%) 1 (1%) 0 (0%) 0 (0%) 4.3 Language use in the most private situations When asked which language they used when no one was around, students commonly noted their native languages, as shown in Table 12. When saying Muslim prayers alone, more than 10% used Arabic alongside their native language. Use of Tagalog increased in situations, such as “when angry”, “when talking to babies”, and “when counting”. English was the most selected language in counting money or things, which reflects the fact that English is the medium of instruction at school; counting and calculations are taught mostly in English. It was also preferred when the respondents spoke to babies who had not yet learned how to speak. Except for these, language selection had the same tendency as that within the family or in the neighborhood. Table 12: Language use in private domains in Tawi-Tawi Saying a prayer alone When angry To babies who do not speak When counting money/things Tausug 39 (41%) 44 (46%) 25 (26%) 15 (16%) Sama 34 (35%) 34 (35%) 27 (28%) 15 (16%) Mapun 7 (7%) 7 (7%) 3 (3%) 1 (1%) Tagalog 10 (10%) 24 (25%) 25 (26%) 16 (17%) Bisaya 2 (2%) 2 (2%) 1 (1%) 1 (1%) English 11 (11%) 17 (18%) 41 (43%) 76 (79%) Chavacano 0 (0%) 1 (1%) 0 (0%) 0 (0%) Others Arabic (10%) Malay 1 (1%) 0 (0%) 0 (0%) 4.4 Language use in public domains As expected from the results in Zamboanga City, Tagalog was also primarily used in communication with strangers in Tawi-Tawi, as the results shown in Table 13 suggest. Tausug and English were also commonly used in this situation, used by around a third of respondents. Tagalog was also preferred for speaking with friends at an event (70%) or at school (75%). These figures suggest that this language is used as the medium of communication between different ethnic groups. Although English is a medium of instruction and students are supposed to use English in the classroom, the rate of use of Tagalog exceeded that of English: 92% students chose Tagalog and only 71% chose English. Sama was selected when talking to friends by those who have the same ethnic background, but the rate of its usage in public domains is smaller than in private situations. Tausug, however, was frequently used: 30% of the respondents used it with strangers, 39% with friends at ceremonies, and 47% during school recess. Even at school, 20% answered that they use it with their teachers. Table 13: Language use in public domains in Tawi-Tawi 1 To strangers To friends before the ceremonies To friends at school recess To teachers at school Tausug 29 (30%) 37 (39%) 45 (47%) 19 (20%) Sama 6 (6%) 16 (17%) 17 (18%) 6 (6%) Mapun 0 (0%) 2 (2%) 5 (5%) 0 (0%) Tagalog 88 (92%) 67 (70%) 72 (75%) 88 (92%) Bisaya 1 (1%) 1 (1%) 1 (1%) 1 (1%) English 25 (26%) 30 (31%) 23 (24%) 68 (71%) Chavacano 0 (0%) 0 (0%) 0 (0%) 0 (0%) Others 0 (0%) Arabic 1 (1%) 0 (0%) 0 (0%) Table 14 shows language use in banks, offices, shopping centers, and in small shops within the community. Tagalog was used most in banks, offices and large shopping centers. 90% of speakers indicated that they spoke Tagalog when they talked to a clerk in a banks/offices, and 86% of them answered that they spoke it in a shopping center. English was also a popular choice, as 38% answered that they used it in banks/offices, and 30% responded that they used it in shopping centers. A fifth of respondents answered that they used Tausug in banks/offices, but very few chose other indigenous languages. Tausug was chosen by 27% for conversations in shopping centers, and by many more in small shops in the community (46%). Sama was also favored in small shops; 27% choose it, but other indigenous languages were rarely used. The choice of languages in Tawi-Tawi in public situations exhibits the same tendency as in Zamboanga City: Tagalog has obtained status as a common language in formal and public situations, whereas indigenous languages are used much less frequently. Tausug, in contrast, is used at a higher rate than any other indigenous language, which indicates the vitality of this language. Table 14: Language use in public domains in Tawi-Tawi 2: To clerk in a bank/office Shopping in the city center Shopping in a small shop in the community Tausug 18 (19%) 26 (27%) 44 (46%) Sama 4 (4%) 6 (6%) 16 (27%) Mapun 0 (0%) 0 (0%) 3 (3%) Tagalog 86 (90%) 83 (86%) 60 (63%) Bisaya 1 (1%) 1 (1%) 1 (1%) English 36 (38%) 28 (30%) 11 (11%) Chavacano 0 (0%) 0 (0%) 0 (0%) Others 0 (0%) 0 (0%) 0 (0%) 5 Summary of Language use in Zamboanga and Tawi-Tawi We sought to investigate the language use by members of the younger generation in several multi-ethnic communities of the southern Philippines. College students in Zamboanga City and Tawi-Tawi Island participated in a sociolinguistic survey conducted from January to February 2024, in which they discussed their language use according to variables such as the addressee’s attributes and the communicative situations. Their ethnic backgrounds are diverse, which reflects the ethnic composition of the two areas. Zamboanga City is home to people who speak Chavacano, a Spanish-based creole, but Bisaya, Tausug, Sama, and other ethnic groups also reside there. Tawi-Tawi Island, which is located very close to Kalimantan, also exhibits an ethnic mosaic. Tausug and Sama are two major ethnic groups on the island, but it also includes Bisaya, Chavacano, Maupun, and other ethnic groups. It is to be noted that although inter-ethnic marriages are increasingly common, 60% to 70% of the parents’ generation marry within the same ethnic group. When the parents speak their ethnic language to each other, children born to them naturally acquire the languages, so that those minority languages remain strong. The answers to the questionnaire demonstrate that speakers in the young generation maintain their family languages very well. They use their own languages with family members and people of the same ethnicity. In private situations, most respondents tend to also use their native languages. Because the two areas are comprised of several ethnic groups, people select the most appropriate language according to the addressee’s ethnicity. Tagalog is the most popular option when two speakers belong to different ethnic groups, and the second most popular option is English: around 70% of young people used Tagalog, but only around 20% of people used English. Tausug, one of the local ethnic languages, is also sometimes selected when one of the participants of the communication is Tausug. Tagalog or Filipino, as the national language of the Philippines, has gained a firm position as the most frequently used language in these areas. It is used not only among young people of different ethnicities, but also in the classroom, and even with their teachers. English has become a common language among young people and is used as a medium of instruction in schools. However, 20% of respondents in Zamboanga City and 30% in Tawi-Tawi did not select English when communicating with their teachers. Tagalog has a much stronger position as the most frequently used language among people of different ethnicities, as well as the language used in public domains. Around 90% of young people in the community used Tagalog in banks or offices, but English was selected by only 40% in Tawi-Tawi and 60% in Zamboanga City. This paper has focused on the vitality of Tausug as is manifested in the study. In public domains, such as banks, offices, or schools, it was still selected by 20% or more of the people in Tawi-Tawi Island. In Zamboanga City, few people chose Tausug in banks or offices, but 10% used it to address their teachers at school, and 60% used it among students at school. The vitality of Tausug can partly be explained by the large population in the two locations we have surveyed. It is estimated that around 40% of the population of the two locations we surveyed have Tausug parent(s). However, the population of Sama on Tawi-Tawi island is as large as that of Tausug, but the Sama language is less frequently used than Tausug as the language of communication between people of different ethnicities. We can find another reason for the vitality of Tausug from a historical perspective. From the 16th to the 19th centuries, the Sulu Sultanate was a powerful maritime polity, which ruled the Sulu Archipelago, coastal areas of Zamboanga City, a certain portion of Palawan, and North Kalimantan. Tawi-Tawi is the southernmost province of the Philippines, located at the southwestern tip of the Sulu Archipelago. The two locations we surveyed for this paper had been within the territory of Sulu Sultanate, and Tausug was the predominant language throughout the dominion (McKenna and Steinberg 1981). The language has been used as the most frequently used language among different ethnic groups from that time (Fox and Sather 2006), and its vitality remains even today. In order to understand Tausug vitality in future, we need to investigate actual language use in the communities and at schools, especially at elementary schools in order to make an assessment by looking at people of the much younger generation. Author Contributions This paper was conceptualized by the first author, Atsuko Utsumi. She created the questionnaire and scheduled the survey in Zamboanga City and Tawi-Tawi Island. The collected data was also analyzed and interpreted by Utsumi. The second author, Nelson Dino, collaborated on the data collection in Tawi-Tawi, surveyed linguistic composition of the southern Philippines. The whole paper was edited by the first author. References Abinales, Patricio N., and Donna J. Amoroso. State and Society in the Philippines. Lanham, MD: Rowman & Littlefield, 2017. Caparas, Pilar Superales. 2019. Dialectology of Tausug in the island province of Basilan. Doctoral dissertation, De La Salle University, Manila. Frake, Charles O. The Cultural Ecology of the Yakan. In Origins, Ancestry, Alliance: Explorations in Austronesian Ethnography, edited by James J. Fox and Clifford Sather, 189–209. Canberra: ANU Press, 2006. Kurais II, Mohammad. 1979. The History of Tawi-Tawi and Its People. Bongao: MSU Sulu College of Technology and Oceanography. Majul, Cesar Adib. 1973. Muslims in the Philippines. Quezon City: University of the Philippines Press. McFarland, Curtis D. 2004. Linguistic Atlas of the Philippines. Manila: Summer Institute of Linguistics. McKenna, Thomas M., and David C. Steinberg, eds. 1981. Islamic Southern Philippines: Historical Perspectives. Quezon City: New Day Publishers. Nimmo, H. Arlo. 1986. Recent Population Movements in the Sulu Archipelago: Implications to Sama Culture History. Archipel 32 (1986):25–38. Nimmo, H. Arlo. 2001. Magosaha: An Ethnography of the Tawi-Tawi Sama Dilaut. Quezon City: Ateneo de Manila University Press. Pallesen, Arne A. 1985. Culture and Language of the Sama-Bajau of the Sulu Archipelago. Cebu: Linguistic Society of the Philippines. Rodil, Rudy B. 2003. The Minoritization of the Indigenous Communities of Mindanao and the Sulu Archipelago. Davao City: Alternate Forum for Research in Mindanao. Rodríguez, J. J. R. B., et al. 2022. Ethical Challenges in Genetic Research Among Philippine Indigenous Peoples: Insights from Fieldwork in Zamboanga and the Sulu Archipelago. Frontiers in Genetics 13 (2022): 901515. Rubino, Carl R. Galvez. 2000. A Handbook of Philippine Languages. Manila: De La Salle University Press. Sather, Clifford. 1997. The Bajau Laut: Adaptation, History, and Fate in a Maritime Fishing Society of South-Eastern Sabah. Kuala Lumpur: Oxford University Press. Warren, James Francis. 2002. The Sulu Zone, 1768–1898: The Dynamics of External Trade, Slavery, and Ethnicity in the Transformation of a Southeast Asian Maritime State. Singapore: NUS Press. Wernstedt, Frederick L. and Joseph E. Spencer. 1967. The Philippine Island World: A Physical, Cultural, and Regional Geography. Berkeley: University of California Press. GENDER REPRESENTATION IN VIETNAMESE INTERNET MEMES: A MULTIMODAL CRITICAL ANALYSIS LƯƠNG Thị Hiền NGUYỄN Đức Long Hanoi National University of Education Vietnam Academy of Social Sciences luonghien@hnue.edu.vn duclong0067@yahoo.com TRỊNH Khánh Hiền NGUYỄN An Nguyên Hanoi National University of Education Hanoi University khanhhien816@gmail.com an.nguyen3.2003@gmail.com Abstract This study analyzes gender representation in 120 Vietnamese Internet memes collected from two popular Facebook pages. Using Multimodal Critical Discourse Analysis, which combines Fairclough’s Critical Discourse Analysis, Kress & van Leeuwen’s Visual Grammar, and Butler’s gender performativity, the research explores how linguistic and visual choices construct gender ideologies. Findings show that women dominate meme discourse but are repeatedly portrayed through fatigue, helplessness, and negative effect. The study introduces the concept of “Gendered Discourse Fatigue,” describing how femininity is framed as emotional vulnerability rather than active agency. At the same time, memes function as a “safe zone” where humor and parody enable subtle forms of soft resistance, allowing women to negotiate identity without overtly disrupting social norms. By aestheticizing exhaustion and embedding it in collective humor, Vietnamese memes reinforce soft inequality—empowering women to voice emotions while constraining them within passive roles. Keywords: Vietnamese memes, gender representation, Multimodal Critical Discourse Analysis. ISO 639-3 codes: eng, vie 1 Introduction 1.1 Memes as Social Discourse in the Digital Age In the digital era, social media has become a space that generates and disseminates new forms of communication at an unprecedented scale. Among these, internet memes have emerged as a prominent multimodal text, combining language and images. Initially created as spontaneous entertainment by young people, memes have increasingly taken on the role of a social discourse, reflecting ideologies, shaping identities, and contributing to the reconfiguration of contemporary cultural and social values. In Vietnam, according to the We Are Social & Meltwater (2025) report, as of early 2025, there were 79.8 million internet users (78.8% of the population) and 76.2 million social media users (equivalent to 75.2% of the total population). Notably, young people aged 18–24 account for 9.2%, forming a dynamic generation of digital natives who simultaneously act as both content creators and active consumers. Within this demographic, memes function not only as entertainment but also as a tool for expressing emotions, commenting on social issues, and articulating personal viewpoints, particularly with regard to gender. By combining language, images, and popular cultural references, Vietnamese memes constitute a distinctive discursive space where humor intertwines with sociological depth. The multimodal and expressive nature of memes therefore requires them to be examined not merely as entertainment, but as ideologically loaded discourses—capable of maintaining, reproducing, or contesting social norms, especially those related to gender. 1.2 Research gap and theoretical background In recent years, a growing body of international scholarship has affirmed that memes are an important medium for articulating and negotiating social issues such as politics, ethnicity, education, and especially gender (Drakett, Rickett, Day & Milnes, 2018; Blewitt-Golsch & Lorraine, 2019; Howard & Adan, 2022; Love & Wimsatt, 2019). Several studies have employed Critical Discourse Analysis (CDA) and Visual Grammar to decode how language and imagery in memes reproduce, sustain, or contest social ideologies (Calimbo, 2015; Cannizzaro, 2016; Grundlingh, 2017; Yus, 2018). In Vietnam, however, research on memes remains limited. Existing studies have largely been restricted to initial explorations through the lens of Systemic Functional Linguistics (SFL) or the application of memes in language and literature education (Lương & Trịnh, 2023a, 2023b; Nguyễn, 2024). Notably, there is still an absence of systematic research situating memes at the intersection of language, visuality, and gender ideology within the local cultural context, particularly when analyzed through the framework of Multimodal Critical Discourse Analysis (MCDA). 1.3 Research Problem and Research Questions When images and captions in memes are circulated with high frequency, they may inadvertently reproduce entrenched gender stereotypes in Vietnamese society, thereby contributing to the construction of a “gendered reality” in digital spaces. This raises a central issue: Do Vietnamese memes primarily reinforce traditional gender norms, or are they emerging as tools for disrupting and renegotiating gender representation in contemporary culture? From this concern, the study addresses three central research questions: (1) How do linguistic and visual elements in Vietnamese memes construct gender representation? (2) What gender stereotypes are manifested across four thematic categories of memes: professions, mental health, physical health, and everyday habits? (3) In what ways can memes’ roles in reinforcing or challenging gender representation be interpreted in relation to ideology and the broader socio-cultural context of Vietnam? 1.4 Theoretical and Practical Contributions At the theoretical level, this study contributes to the development of Multimodal Critical Discourse Analysis (MCDA) by integrating three complementary approaches: (1) Critical Discourse Analysis (CDA) (Fairclough, 1995), which enables the examination of how language functions as a means of reproducing or challenging ideology; (2) Visual Grammar (Kress & van Leeuwen, 2006), which provides tools for analyzing how images construct social roles, power relations, and gender representation; and (3) The conception of gender as a discursive practice (Butler, 1990), which emphasizes that gender identities are constituted through the repeated enactment of social behaviors and discourses. This integrative framework allows for a holistic analysis of how memes operate simultaneously as linguistic and visual texts, and how they contribute to the reproduction, contestation, or renegotiation of gender representation in digital culture. At the practical level, the study provides a systematic account of Vietnamese memes, showing how they not only mirror social realities but also participate in shaping and negotiating gendered subjectivities in the context of digital media. This contribution is particularly significant for recognizing the role of social media in sustaining or questioning gendered power structures in contemporary Vietnamese society. 2 Theoretical Framework and Multimodal Critical Discourse Analysis 2.1 Memes as a Form of Social Discourse The concept of the meme was first introduced by Richard Dawkins (1976, p. 194) in The Selfish Gene to denote cultural units capable of transmission, persistence, and transformation through replication within a community. Dawkins identified three core attributes to evaluate the “success” of a meme: copying-fidelity, fecundity, and longevity. Building on this foundation, Shifman (2014, p. 41) extended the definition of internet meme as:“(a) a group of digital items sharing common characteristics of content, form, and/or stance, which (b) were created with awareness of each other, and (c) were circulated, imitated, and/or transformed via the Internet by many users”. This definition frames memes as a form of collective discourse, wherein their variations reflect diverse social perspectives. In practice, any text, image, sound, or video may become an internet meme (hereafter referred to simply as meme). Memes typically embody four defining features: (1) Replicability: memes are received, edited, and reproduced from an original version, generating countless variations from a shared template. (2) Humoristic mode: memes often rely on parody, exaggeration, or irony to produce humor while simultaneously providing social commentary. (3) Virality: memes spread widely through platforms such as Facebook, Zalo, Instagram, or Messenger, and are continuously shared and remixed. (4) Multimodality: memes integrate multiple semiotic resources—language, images, icons, sound—creating layered meanings across linguistic and visual modes. In Vietnam, memes have evolved into an informal vernacular of youth culture, functioning both as a medium of personal expression and a tool for social critique. Vietnamese memes are created through the community’s creative practices by combining globally recognizable visuals with locally produced captions or dialogues. Many are adaptations of international meme formats but are localized through Vietnamese language, cultural references, and social sensibilities. Within their multimodal structure, the linguistic channel (written text) usually appears in two primary forms: (1) Captions: providing descriptive framing or interpretive cues for the image; (2) Character speech/inner thought: contributing to humor through dialogic or monologic expression. While the visual channel often draws on globally circulated and easily recognizable images, the linguistic channel reflects distinctly local cultural identities. The flexibility and expressive potential of Vietnamese allow memes to exploit wordplay, cultural metaphors, and social innuendos, thereby generating secondary layers of meaning rich in symbolism and ideology. Example (1): Figure 1: A Vietnamese meme using caption as description Source: https://urldefense.com/v3/__https://www.facebook.com/share/p/17U9FwLBcm/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrqTfK17W$ Figure 2: A Vietnamese meme using character speech Source:https://urldefense.com/v3/__https://www.facebook.com/share/p/1CZwNmJMvT/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrpAnrr7j$ 2.2 Gender as Discursive Practice From a sociolinguistic perspective, gender is not merely a biological attribute but the outcome of social practices institutionalized through discourse. Classic works such as Language and Woman’s Place by Lakoff (2004) demonstrated that language contributes to the maintenance of gender inequality through patterns of pronoun use, syntactic structures, and the expression of affect. Following this line of inquiry, Judith Butler (1990) argues that gender is not a fixed property but a form of identity that is constituted through the repeated enactment of behaviors and discursive practices. In the context of digital communication, memes can be regarded as a form of discursive practice in which gender representation is continuously reproduced. Characters are frequently positioned as sensors (those who perceive or feel), sayers (those who speak), or phenomena (those who are observed). Accordingly, analyzing gender in memes requires uncovering how linguistic and visual elements interact to reinforce or challenge gender norms within the local socio-cultural context. 2.3 Critical Discourse Analysis (CDA) Critical Discourse Analysis (CDA), particularly through the three-dimensional framework developed by Norman Fairclough (1995), provides a powerful tool for uncovering the layered meanings embedded in texts. This model comprises three interrelated levels: (1) Text level: the analysis of vocabulary, grammar, speech acts, and textual structures in order to identify indicators of gender bias or strategies of resistance. (2) Discursive practice level: the examination of how memes are produced, circulated, and interpreted within specific social contexts. (3) Social practice level: the positioning of memes within broader structures of power—such as cultural norms, workplace hierarchies, or systems of gender inequality—in order to explain their deeper social meanings. Applying this three-dimensional model to memes demonstrates that they are not merely individual products of humor, but rather a form of collective discourse. Memes encapsulate multiple layers of indirect meaning tied to power relations, gender stereotypes, and the representation of gender roles in society. 2.4 Visual Grammar To analyze the visual components of memes, this study adopts the framework of Visual Grammar developed by Kress & van Leeuwen (1996, 2006), which builds on Halliday’s (1994) three metafunctions of grammar: representational meaning, interactive meaning, and compositional meaning. First, representational meaning focuses on how images depict actions, states, or social relations. Two major structures are distinguished: (1) Narrative representation, which highlights processes of action and the relationships among participants. Core processes include: (i) Action process – involving Actors and Goals; (ii) Reactional process – involving Reactors and Phenomena; (iii) Verbal process – involving Sayers and Utterances; (iv) Mental process – involving Sensers and Phenomena; (v) Conversion process – replay structures in which a participant simultaneously functions as both Goal and Actor in a different process. (2) Conceptual representation, which emphasizes the qualities, states, or essences of participants. Processes include: (i) Classification – hierarchical relations between Superordinates and Subordinates; (ii) Analytical – whole–part relations; (iii) Symbolic – in which Carriers embody symbolic attributes, either attributive (a concrete trait) or suggestive (an abstract or atmospheric quality). These processes and participant roles (summarized in Table 1) provide tools for identifying how gender representation is constructed through transitivity and the allocation of social roles in meme images. Table 1: Structures of representation and participants in Visual Grammar Structure Process Participants Narrative Action Actor, Goal Reactional Reactor, Phenomenon Mental/Speech (Projective) Senser, Sayer, Utterance, Phenomenon Relay Double role participant (Actor–Goal) Verbal Sayer, Utterance Conceptual Classification Superordinate, Subordinates Analytical Whole, Parts Symbolic (Attributive/Suggestive) Carrier, Symbolic Attribute Second, interactive meaning explains the relationship between depicted participants and viewers. Three key features are: gaze (direct demand vs. indirect offer), angle (high, low, or horizontal, signaling relations of power or equality), and social distance (from close-up to long shot, indicating degrees of intimacy or detachment). These dimensions allow us to assess whether characters are subjectivized or objectivized in the visual structure. Third, compositional meaning concerns the arrangement of visual elements to establish hierarchies of meaning. This includes left–right (given vs. new information), top–bottom (ideal vs. real), and center–margin (salient vs. peripheral elements). Together, these dimensions help identify central figures and the distribution of power within the visual field. Visual Grammar thus treats images as a semiotic system parallel to language, enabling the decoding of the social meanings embedded in visual structures. Kress & van Leeuwen also highlight two additional factors: salience (the prominence of elements via size, color, or brightness) and framing (the degree of separation or connection among components within a composition). 2.5 Multimodal Critical Discourse Analysis Framework To approach memes as multimodal discourses, this study adopts an integrated analytical framework that draws on three complementary theoretical traditions. First, Critical Discourse Analysis (CDA) (Fairclough, 1995) enables the analysis of how language operates as a means of reproducing or contesting ideology. Second, Visual Grammar (Kress & van Leeuwen, 2006) provides analytical tools to decode how images construct social roles, power relations, and forms of gender representation. Third, the conception of gender as a discursive practice (Butler, 1990) underscores that gender is constituted through the repeated enactment of socially regulated behaviors and discourses. The integrated framework is operationalized across three levels of analysis to examine how language and image jointly contribute to the construction, maintenance, or contestation of gender representation in digital spaces: Table 2: Three levels of analysis in the Multimodal Critical Discourse Analysis framework Level of analysis Focal elements Theoretical sources Analytical goals Textual / Representational level Language: vocabulary, pronouns, speech acts, linguistic strategies Image: participants, processes, gaze, composition Fairclough (1995); Halliday (1994, 2001); Kress & van Leeuwen (2006) To identify how gender is represented through linguistic and visual forms Discursive practice level Processes of meme production, circulation, and reception; intertextuality and digital context Fairclough (1995) To examine memes as discursive practices embedded in multimodal communication Social practice level Social conditions, cultural context, and gendered power structures Fairclough (1995); Butler (1990); Cameron (2003) To trace the ideological conditions underpinning gender representation By synthesizing these three perspectives, the framework moves beyond text-internal analysis to incorporate both discursive practice and social practice, thereby providing a multi-layered account of how Vietnamese memes function as ideologically loaded multimodal discourses. This approach not only extends the application of multimodal critical discourse analysis to the study of gender and digital culture but also demonstrates how theoretical models can be localized to examine gender representation in specific socio-cultural contexts such as Vietnam. 3 Methodology 3.1 Research Design This study applies Multimodal Critical Discourse Analysis (MCDA) to explore how language, images, and socio-cultural contexts interact in constructing and negotiating gender representation in Vietnamese memes. The research design follows a mixed qualitative–quantitative exploratory approach: the quantitative component provides descriptive statistics on frequency and distribution patterns, while the qualitative component allows for in-depth interpretation of meaning-making strategies and underlying ideologies. 3.2 Data Collection The dataset consists of 120 Vietnamese memes, collected from two popular Facebook fanpages among Vietnamese youth: Hội Người Lười Việt Nam (872,000 followers) and Deadline Trong Ngày (165,000 followers). The collection period spans from January 2024 to May 2025, ensuring that the dataset reflects contemporary digital practices. All memes were archived in their original image format and systematically organized and coded into an Excel-based corpus for analysis. 3.3 Sampling and Coding A purposive sampling method was employed, guided by three criteria: (1) Each meme must demonstrate multimodality, combining both linguistic and visual elements. (2) Each meme must contain explicit or implicit gender-related components, either in language, images, or both. (3) Each meme must have significant online visibility, defined as at least 1,500 likes or shares, in order to ensure representativeness and social impact. To ensure analytical consistency, a coding scheme was developed based on the integrated theoretical framework (CDA, Visual Grammar, and gender-as-discourse perspectives). The scheme comprised three main categories of variables. (1) Linguistic variables: pronouns, speech acts (narrative, directive, expressive, declarative, commissive), and emotional keywords. (2) Visual variables: narrative processes and participants (Actor, Goal, Reactor, Phenomenon, Senser, Sayer, Utterance, Replay), gaze (demand vs. offer), angle (high, low, equal), and social distance (intimate, social, impersonal). (3) Social variables: gender of characters (female, male, unspecified), meme topics (profession, emotion, habit, physical), and gender ideologies (reinforcing or resisting stereotypes) Conceptual representation was not coded in this study, as most Vietnamese memes in the dataset rely on narrative structures of action, speech, and emotion rather than abstract taxonomies or symbolic attributions. Table 3: Coding scheme for meme analysis Category Variables Values (examples) Linguistic Pronouns em, tôi, chị, anh, bạn… Speech acts narrative, directive, expressive, declarative, commissive Emotional keywords chịu, mệt, chán, áp lực, buồn, không muốn… Visual Narrative structures Action, Reactional, Verbal, Mental, Conversion Participants Actor, Goal, Reactor, Phenomenon, Senser, Sayer, Utterance, Replay Gaze demand, offer Angle high, low, equal Social distance intimate, social, impersonal Social Gender female, male, unspecified Meme topic profession, emotion, habit, physical Gender ideology reinforcing, resisting, neutral 3.4 Analytical Procedure The analysis proceeded in three steps: (1) Data coding: all memes were annotated according to the coding scheme. (2) Quantitative analysis: descriptive statistics (frequency counts and percentages) were applied to identify patterns in the relationship between gender and linguistic/visual elements. (3) Qualitative analysis: twelve representative memes were selected from each main theme for in‑depth analysis. This stage combined textual and visual description to reveal the mechanisms of meaning-making, ideological strategies, and the role of memes in reinforcing or challenging gender representation. 4 Findings In this section, the authors present results from 1) quantitative analysis, namely linguistic, visual elements, and 2) qualitative one, which focuses on gendered patterns and resistance in Vietnamese memes. 4.1 Quantitative Analysis 4.1.1 Linguistic Elements 4.1.1.1 Pronouns The analysis of 120 Vietnamese memes reveals that the system of personal pronouns is used with high frequency and considerable variation, particularly in memes featuring female characters. The six most common pronouns are presented in the table 4. These pronouns appear primarily in dialogues or captions attached to female characters, reflecting the intimate and socially nuanced style of digital discourse. Table 4: Distribution of female pronouns in Vietnamese memes (n = 120) No. Female pronoun Frequency Percentage (%) 1 Tôi (I) 36 30.00 2 Em (You) 26 21.67 3 Chị (Older sister) 13 10.83 4 Mình (I/We) 12 10.00 5 Tao (I) 5 4.17 6 Ta/chúng tôi (We) 3 2.50 The results indicate a clear socio-gendered stratification in pronoun use. In the domain of professional memes, dyads such as chị–em (“older sister–younger”) or tôi–em are employed to index soft power relations, often situated in workplace contexts where women occupy positions as colleagues or subordinates. In mental health–related memes, pronouns such as tôi, em, and mình are used in a narrative and introspective manner, emphasizing personal experiences and inner feelings. By contrast, in everyday habit memes, pronouns like em, bạn (“friend”), and mình highlight peer intimacy and solidarity among same-generation interlocutors. Occasional uses of ta or chúng tôi are marked by parody, heightening humor and irony. A particularly notable finding is the absence of male-associated pronouns in the dataset. Within the Vietnamese meme space, women overwhelmingly appear as the linguistic subjects, actively constructing social interaction and emotional expression, whereas men are largely absent from the pronoun system. 4.1.1.2 Speech Acts The analysis of 225 speech acts across 120 Vietnamese memes demonstrates a marked gender imbalance in the distribution of speaking roles, as shown in Table 5. Table 5: Distribution of speech acts by gender in Vietnamese memes Type of speech act Female Male Total Percentage (%) Narrative 48 4 52 23.11 Directive 8 0 8 3.56 Expressive 128 4 132 58.67 Declarative 19 2 21 9.33 Commissive 10 2 12 5.33 Women dominate almost entirely in the two most frequent categories of speech acts: expressives (96.9%) and narratives (92.3%). Specifically, 128 expressive utterances and 48 narrative utterances are attributed to female characters, compared to only 4 utterances each for male characters. These speech acts primarily reflect psychological states, emotions, or personal experiences, thereby reinforcing the role of women as “emotional narrators” in digital discourse. By contrast, other types of speech acts—including directives, declaratives, and commissives—occur much less frequently, and male participation remains minimal. This indicates that men in Vietnamese memes tend to be discursively “silent”, rarely taking on speaking roles or projecting subjectivity. The data reflect a gendered stratification of discourse: women occupy the central role in constructing expressive and narrative discourse, while men remain marginal and seldom participate as active speakers. These findings confirm the broader tendency toward a feminization of expressive space in Vietnamese memes, underscoring the dual nature of digital discourse as both a medium of self‑presentation and a site for the reproduction of traditional gender roles. 4.1.1.3 Emotional Keywords The analysis of emotional vocabulary in Vietnamese memes groups keywords into five categories: (1) Fatigue–exhaustion–stress, denoting physical and mental depletion at varying levels; (2) Boredom–pessimism, associated with negative moods ranging from lack of motivation to bleak outlooks on the future; (3) Anger–indignation, representing outward, explosive reactions; (4) Anxiety–insecurity, expressing fear, uncertainty, and lack of safety; 5) Self-deprecation–self-mockery–self-blame, capturing reflexive emotions directed inward. The findings reveal a striking trend: female characters frequently employ utterances marked by negative effect, focusing on fatigue, sadness, helplessness, or self-mockery. Of the 128 expressive utterances coded, 97 contained emotional keywords, and the overwhelming majority belonged to female characters, most of which carried negative tones. These expressions can be categorized into the five groups summarized in Table 6. Table 6: Distribution of negative emotional keywords in Vietnamese memes (n = 97) Emotional category No. of expressive utterances Percentage (%) Fatigue – exhaustion – stress 21 21.65 Boredom – pessimism 30 30.93 Anger – indignation 14 14.43 Anxiety – insecurity 18 18.56 Self-deprecation – mockery – blame 14 14.43 Total 97 100 Frequently recurring keywords such as mệt (“tired”), kiệt sức (“exhausted”), chán (“bored”), không còn sức (“out of energy”), áp lực (“pressure”), stress, tuyệt vọng (“despair”), thất vọng (“disappointed”), không làm được (“unable to do”), chưa làm gì (“have done nothing”), nghèo (“poor”), and đói (“hungry”) collectively form a dense semantic field of negativity. These lexical choices dominate female speech in memes, effectively positioning women as subjects of exhaustion and vulnerability Although often framed humorously or ironically, the repeated circulation of such expressions unconsciously normalizes a discourse of femininity characterized by resignation, fatigue, and self‑negation. Visual metaphors such as lying flat, bent back, collapsing body, crying out loud, or accustomed to bowing down further reinforce the depiction of women in passive, physically and mentally depleted states. Moreover, the combination of negative vocabulary with emphatic linguistic constructions such as “không còn gì” (“nothing left”), “chán đến mức không thở nổi” (“so bored that I can’t breathe”), “áp lực quá rồi” (“too much pressure”), or “tôi chưa làm được gì” (“I haven’t accomplished anything”) illustrates a strategy of emotional exaggeration. While serving humorous purposes, these structures simultaneously consolidate women’s representation as subjects of prolonged helplessness in digital discourse. 4.1.2 Visual Elements 4.1.2.1 Gender Distribution of Characters in Visuals The dataset reveals a significant imbalance in the gender distribution of characters represented in the 120 memes analyzed. Female characters account for 75.2%, whereas male characters constitute only 15.9%; memes featuring both genders make up 8.8% (see Figure X). This disparity reflects a consistent bias in gender representation, in which female figures are far more frequently placed at the center of both linguistic and visual meaning-making. Figure 3: Gender distribution of characters in Vietnamese memes 4.1.2.2 Visual Processes and Participants The results indicate a significant gender imbalance in the visual structure of memes. Out of 120 coded processes, female characters account for 89.2% (107 instances), while male characters constitute only 10.8% (13 instances). This imbalance reflects a stratification of gender roles in visual discourse, where women are represented with far greater frequency and symbolic prominence than men. Table 7: The distribution of visual processes by gender in the meme corpus (N = 120) Process type Total Female Male % Female % Male Mental 70 65 5 92.9 7.1 Verbal 30 28 2 93.3 6.7 Action 11 8 3 72.7 27.3 Reactional 9 6 3 66.7 33.3 Conversion 0 0 0 – – Total 120 107 13 89.2 10.8 Notably, in the two most frequent process types—mental and verbal—female characters dominate almost completely (over 92%). Mental processes, representing cognition, emotion, and inner states, are the most common, with women accounting for 92.9% and men only 7.1%. Verbal processes, indexing speech acts, follow a similar pattern, with 93.3% attributed to women and 6.7% to men. These findings reinforce earlier observations that women in memes are often positioned as Sensers (feelers) and Sayers (speakers), rather than as Actors in physical actions. Men are relatively more visible in action processes (27.3%) and reactional processes (33.3%), yet their overall presence remains marginal. They not only appear less frequently but also lack discursive centrality, often excluded from prominent roles in meaning-making. By contrast, women dominate self‑narration and emotional expression, reproducing traditional gender scripts in popular culture. The absence of conversion processes is noteworthy. This may be explained by the static nature of memes, which rarely depict extended relay sequences or cyclical transformations. Consequently, the corpus reflects not only entrenched gender stereotypes but also the expressive limitations inherent to the meme genre. 4.1.2.3 Gaze, Angle, and Social Distance The analysis of visual variables in the corpus demonstrates that the visual structure of memes operates within a consistent system, in which gaze, angle, and social distance jointly contribute to the construction of discourse on gender representation. The results are summarized in Table 8. Table 8: Distribution of visual variables by gender in the meme corpus (N = 120) Visual variable Description Female Male % Female % Male Gaze Direct (Demand) 17 7 15.9 53.8 Indirect (Offer) 88 8 82.2 61.5 Unspecified 2 1 1.9 7.7 Angle High (power over) 0 0 0.0 0.0 Low (power under) 32 8 29.9 61.5 Horizontal (equality) 75 5 70.1 38.5 Distance Close-up (intimacy) 25 3 23.4 23.1 Medium (social distance) 68 8 63.6 61.5 Long shot (detachment) 14 2 13.1 15.4 The data in Table 8 show that visual variables are unevenly distributed across gender, with female characters dominating most configurations. (1) Gaze: The majority of female characters are depicted with indirect gaze (82.2%), compared with 61.5% for men. Conversely, direct gaze, which invites active engagement with the viewer, is far more common among men (53.8%) than women (15.9%). This suggests that, although men appear less frequently overall, when they do, they are often framed as more active interlocutors, while women are positioned primarily as objects of observation. (2) Angle: Female characters are overwhelmingly portrayed through horizontal angles (70.1%), indexing visual equality with viewers. Male characters also appear in this configuration but with a much lower proportion (38.5%). Strikingly, low-angle shots, which signal authority or elevated power, are far more frequent for men (61.5%) than for women (29.9%). High angles are absent altogether, indicating that characters in memes are rarely depicted as weak or subordinate. This pattern suggests that while women dominate numerically, their representation tends toward neutrality, whereas men—even as a minority—are often granted positions of symbolic authority. (3) Social distance: Female characters most frequently appear in medium shots (63.6%), suitable for depicting social contexts, followed by close-ups (23.4%), which emphasize personal emotion. Male characters follow a similar distribution (61.5% medium, 23.1% close-up), but at much lower frequencies. In long shots, men account for 15.4% compared to 13.1% for women, suggesting that when men do appear, they are slightly more likely to be positioned at a distance, visually peripheral to the frame. Overall, the findings point to a gendered asymmetry in visual discourse: women dominate in terms of frequency but are predominantly depicted in passive positions (indirect gaze, horizontal angle, medium distance), while men, though numerically marginal, are often assigned visual markers of power and agency (direct gaze, low angle). This dual imbalance highlights a layered form of gender representation in Vietnamese memes: women as quantitative centers but symbolic peripheries, and men as quantitative minorities but symbolic authorities. 4.2 Qualitative Analysis: Gendered Patterns and Resistance in Vietnamese Memes The quantitative analysis above shows that both visual and linguistic discourse in Vietnamese memes operate within a clearly stratified structure: women dominate in terms of frequency but are largely associated with passive and emotionalized roles, whereas men, though less frequently represented, are often positioned in roles of symbolic authority. To further uncover the meaning-making mechanisms behind these numbers, the following qualitative analysis examines how gendered stereotypes are reproduced, negotiated, and resisted within the meme space. 4.2.1 Reproducing Gendered Stereotypes: Endurance, Fatigue, and Exhaustion 4.2.1.1 Women as Endurers of Workplace Injustice Workplace-related memes frequently portray women in subordinate and passive roles—where their skills are under-recognized, their emotions disregarded, and their identities implicitly framed as those who “endure.” Captions saturated with frustration, disappointment, and disillusionment about low salaries, heavy workloads, or lack of recognition from superiors are most often voiced through female characters. The repetition of the “woman = endurer” trope constructs a discourse in which endurance of injustice becomes naturalized as a feminine trait. Visually, these memes use indirect gaze (offer), static central composition, and soft lighting, with no vectors of action. The repeated depiction of bent backs, slouched postures, or bowed heads not only illustrates the verbal content but also normalizes femininity as passive endurance. Example (2): Figure 4: How much is your salary? – 5 chịu Source:https://urldefense.com/v3/__https://www.facebook.com/share/p/1Awtgr94D6/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrifTMOZV$ Here, the use of the pronoun em reinforces a lower hierarchical position. Dialogues such as “How much is your salary? – 5 endurances: enduring hardship, enduring criticism, enduring scolding, enduring pressure” substitute emotional burdens for monetary value. The speaker employs a pun on the Vietnamese homophones chịu (to endure) and triệu (million, a unit of currency). Instead of the expected numeric response (e.g., 5 million), the answer is reframed as a catalog of affective burdens—enduring hardship, enduring criticism, enduring scolding, enduring pressure. In this discursive frame, women are represented in a workplace context where their labor is valued not by productivity, but by their capacity to withstand injustice. Visually, the meme draws on classical illustration, using a central but diverted gaze. The female figure is not portrayed as an individual subject but as an archetype of women collectively burdened by workplace injustice. The use of a vintage painting further functions to “historicize” feminized fatigue, rendering the state of endurance culturally familiar and socially acceptable. 4.2.1.2 Women’s Exhaustion in Mental and Physical Health Memes addressing mental and physical health in the dataset articulate a distinctly gendered discourse in which fatigue, helplessness, and exhaustion are not only expressed verbally but also visualized as symbolic markers of female identity. The representational structures in these memes frequently position women as Sensers—subjects who perceive and feel—thereby reaffirming a discourse of femininity grounded in emotional depletion and physical weariness. Example (4). Figure 5: I work well under stress Source: https://urldefense.com/v3/__https://www.facebook.com/share/p/1BD9VkGeyX/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrnd-MnXq$ The caption “I perform well under pressure… Me under real pressure: already breaking down” is paired with an image of a woman crying while brushing her hair, visually symbolizing the state of exhaustion. From a visual grammar perspective, the meme employs an indirect gaze and close-up framing, positioning the viewer as a powerless witness to the performance of fatigue—invited to empathize but not to intervene. The suffering is embodied through posture, facial expression, and isolated composition, constructing a discourse in which exhaustion is inseparably linked with femininity. Example (5). Figure 6: I am your sister Source: https://urldefense.com/v3/__https://www.facebook.com/share/p/14GpktcN6eN/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrtbyhU-Q$ The dialogue “Work will be tough later, grandma?” – “Later? I am already your elder sister” is paired with a classical illustration of an older woman. The reply “I am already your elder sister” metaphorically visualizes the accelerated pace of aging, the material consequences inscribed on the female body, and becomes a symbol of physical depletion. The use of classical imagery reinforces the “historicization” of female exhaustion, suggesting that fatigue is not merely an individual condition but a transgenerational trait. The combination of sarcastic dialogue and vintage artwork heightens the symbolic effect: exhaustion is represented as a long-standing cultural attribute of femininity. From a Critical Discourse Analysis (CDA) perspective, this linguistic expression functions as a mechanism that sustains traditional gender norms, positioning women as Sensers (those who feel and suffer) rather than Actors (those who act). Visually, these memes adopt a passive representational structure: the female figure occupies the center of the frame but avoids direct interaction with the viewer (offer gaze), with no clear vectors of action. Depictions of bowed heads, bent backs, and inert postures further emphasize bodily passivity. Within this context, the female body becomes a semiotic token of fatigue, symbolizing exhaustion as an essentialized aspect of womanhood. Taken together, the three memes (Examples 3, 4, and 5) demonstrate that discourses of health in digital culture are not merely spaces of emotional sharing but also mechanisms of reproducing gendered power structures through the emotionalization and corporealization of women. While memes foster empathy, they simultaneously normalize fatigue as a cultural script of femininity. 4.2.2 Strategies to resist gender stereotype in Vietnamese memes While the majority of memes reinforce feminine stereotypes of endurance, fatigue, and self-regulation, a number of examples exhibit subtle forms of resistance enacted through parody, irony, and rejection of gendered norms. These strategies do not confront power structures directly; rather, they function within a space of “safe discursive deviation,” where humor operates as a symbolic counter-mechanism that enables individuals to renegotiate gender representation within socially acceptable boundaries. 4.2.2.1 Parodic Strategies in Workplace Contexts Although most workplace memes in the dataset depict women in passive and enduring roles, some employ parody as a form of soft resistance. These instances do not directly dismantle gendered power relations but instead create a discursive safe zone in which humor undermines authority and allows women to reframe their identities within permissible social limits. Example (6). Figure 7: I am the child Source:https://urldefense.com/v3/__https://www.facebook.com/photo/?fbid=742588608019315&set=pcb.742591424685700__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrhiGGXaZ$ A recruitment interview meme depicts a female candidate being asked: “Do you plan to have a baby in the next two years?”—a discriminatory question commonly encountered in the Vietnamese hiring process. The female character responds: “I am the baby.” The reply is simultaneously humorous and resistant: the speaker identifies herself with the object of the discriminatory inquiry, thereby subverting the intent of the question and challenging the interviewer’s authority to define her role. Functionally, this form of resistance does not abolish the stereotype altogether but instead creates a discursive gap—a space where gender bias is recognized, parodied, and stripped of its normative seriousness through humor. Rather than confronting structural power directly, the female character reclaims discursive agency by making the bias itself laughable, thereby undermining its taken‑for̢‑granted authority. The repeated use of such parodic strategies across memes can be read as an individualized form of resistance under systemic inequality. Although these practices cannot dismantle structural pressures, they nonetheless provide women with a sense of agency and autonomy, shifting discourse from pragmatic compliance to ironic subversion of stereotypes. In this way, memes operate as a cultural tool for gender expression that is safe yet subversive, offering a mode of resistance that is cultural rather than overtly political. 4.2.2.2. Rejecting Traditional Femininity Memes focusing on daily habits and lifestyle illustrate forms of counter-normative femininity, where behaviors such as procrastination, neglect of self-care, or refusal to perform expected femininity are parodied in order to renegotiate social expectations about gender. Example (7). Source: https://urldefense.com/v3/__https://www.facebook.com/share/p/1AVnbun9o4/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrh_t3Rsf$ Example (8). Source: https://urldefense.com/v3/__https://www.facebook.com/share/p/16N6NvNxjM/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrvYbQuu4$ Example (9). Source: https://urldefense.com/v3/__https://www.facebook.com/share/p/1ERPpc2kYx/__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrmT9Oq0t$ Figure 8: I am a good example Example (7) Example (8) Example (9) Example (7): This meme exaggerates humorously to resist traditional expectations imposed on women regarding appearance and behavior. Shaving one’s head functions as an act of rejecting the performance of gender according to aesthetic norms. The radical gesture is encoded in humor, producing a gentle yet deliberate denial of femininity. Visually, these memes employ an offer gaze, with either central or off-center composition, constructing a mode of “active passivity,” where the female subject chooses not to act as a statement of self-representation. Example (8): This meme uses a split composition (e.g., “me in my profile picture” vs. “me at work”) to represent the contradiction between the idealized self and the lived self. Visual grammar features such as indirect gaze, medium distance, and divided framing contribute to a sense of fragmented identity, where the character is pulled between conflicting expectations of beauty and productivity. Rather than directly resisting, these memes expose pressure through structural opposition, highlighting the paradox between the demand to appear attractive online and the demand to perform effectively in the workplace. Example (9): This meme rejects appearance norms through wordplay on dieting, a common mechanism of bodily control associated with women in popular culture. The character on the left teases: “Didn’t you say you were dieting? Why are you getting bigger every day?”—evoking the widespread stereotype that women are responsible for maintaining a slim figure. The female character does not outright reject the norm (“must diet, must be beautiful”) but instead replays it parodically, asserting the right to postpone compliance: “I’ll start dieting tomorrow.” Here, the subject simultaneously plays the role of the compliant woman and the refuser, thereby resisting without negating the norm. This act does not abolish social pressure but enables the subject to avoid being fully consumed by it. 5 Discussion 5.1 Female Gendered Discourse Fatigue in meme The combined quantitative and qualitative findings delineate a new discursive pattern, conceptualized in this study as Gendered Discourse Fatigue. This concept captures a state of emotional exhaustion linked to the continuous reproduction of women’s representation in memes, where women are encouraged to express emotions but rarely granted agency to act. In this discursive configuration, emotion is framed less as an individual choice than as a social obligation. Three dimensions characterize this pattern: (1) Linguistic: repeated use of negative affective expressions (e.g., fatigue, boredom, pressure), frequently framed through parody or self-mockery; (2) Visual: reliance on indirect gaze, passive bodily postures, and medium or close-up framing that highlight resignation; (3) Discursive function: the construction of emotion as a collective burden of women rather than a pathway toward subjectivity or action. This form of fatigue underscores a paradox of digital discourse: while memes provide women with opportunities to voice their experiences, the constant circulation of negative affect risks reproducing traditional stereotypes, positioning women as those who “always feel” but “never act.” Such dynamics maintain a form of soft inequality, wherein the empowerment of expression does not translate into empowerment of action. 5.2 Memes as a “Safe Zone” for Gender Negotiation Another key finding is that memes do not serve as sites of direct resistance but rather function as “safe zones” where gender representation is negotiated through emotion and humor. Within this discursive space, women can articulate fatigue, dissatisfaction, or injustice, yet these expressions are typically encoded in parody and satire—forms more readily accepted in public culture than overt resistance. In Vietnamese memes, women are seldom depicted as active agents; rather, they appear primarily as Sensers (those who feel) or Sayers (those who speak ironically). In workplace memes, they recount injustice through captions such as “5 endurances” without directly challenging structural systems. In health-related memes, images of bent backs, downward gazes, or slouched postures operate as existential markers of feminized exhaustion. Humor in this context works as a form of soft gender resistance: it allows women to renegotiate and question gender norms without disrupting the existing social order. Unlike direct resistance, which carries political and social risks, soft resistance is symbolic and limited—it does not fracture structures of power but instead opens discursive safe spaces where suppressed emotions can be voiced. This dual function highlights humor’s ambivalence: on the one hand, it serves as a coping mechanism that diffuses messages and fosters communities of empathy; on the other, its repeated self-deprecating tones may inadvertently sustain inequality by trivializing the seriousness of gendered issues. 5.3 Aestheticization of Fatigue in Vietnamese Memes: From Individual Experience to Collective Identity A distinctive aesthetic of Vietnamese memes lies in the hybridization of classical artistic imagery (traditional illustrations, historical paintings) with colloquial language and Gen Z slang. This juxtaposition produces a hybrid aesthetic that oscillates between solemnity and irony. Within this aesthetic frame, fatigue and exhaustion—initially personal experiences—are aestheticized and widely circulated. When paired with humorous or sarcastic captions, fatigue is no longer interpreted as a sign of inequality or a problem requiring critique, but as a collective marker of identity in digital culture. This generates a paradox: instead of interrogating the socio-structural roots of fatigue—gendered pressures, labor exploitation, or cultural norms—memes embellish and normalize it. The repeated imagery of “exhausted bodies” and “resigned postures” transforms fatigue into a cultural aesthetic of femininity. In other words, within Vietnamese memes, exhaustion is not presented as a condition to be resolved but as an affect to be shared, circulated, and consumed. This demonstrates how digital discourse can aestheticize social injustice, reducing it to a collective joke rather than a site of critique. 5.4 Risks of Soft Inequality in Meme Culture An important consequence of the aestheticization of fatigue is the emergence of a form of soft inequality. In this configuration, women are encouraged to express exhaustion, but not to demand change. Fatigue becomes a shared discursive language for generating empathy and community, yet simultaneously entrenches women’s representation as subjects who feel but do not act. Humor here plays a double-edged role: it helps alleviate pressure and fosters communal bonds, but it also risks masking and normalizing inequality. Through ironic tone and repetitive circulation, memes centered on fatigue and endurance dilute the seriousness of structural gendered issues into laughter. Humor thus functions as a mechanism of concealment, rendering soft inequality harder to identify and more difficult to challenge. This leads to a profound paradox: while memes give women “a voice” to articulate emotional states, that voice remains constrained within the realm of sharing and empathy, without expanding into demands for social transformation. Meme culture thus simultaneously empowers and regulates: it grants women the right to speak emotionally, but denies them the discursive authority to act politically or socially. 6 Conclusion, Implications and Future Directions This study analyzed 120 Vietnamese memes to examine how gender representation is performed, reproduced, and negotiated in digital communication. By integrating three theoretical perspectives—Critical Discourse Analysis (CDA), Visual Grammar, and Gender Performativity—the study illuminates the interplay between language, images, and socio-cultural context in the construction of gendered meanings. Overall, the research highlights the dual role of memes: they function both as a medium for articulating and negotiating gender identity, and as a subtle discursive mechanism that sustains forms of soft inequality in contemporary digital culture. Four major findings emerge: (1) Feminization of digital discourse: women occupy central positions in both linguistic and visual representation, while men are largely absent. (2) Exhaustion as representation: female fatigue and depletion are consistently visualized through indirect gaze, passive postures, and medium shots that emphasize resignation. (3) Humor as soft resistance: humor is mobilized as a strategy of soft resistance, enabling critique within safe discursive boundaries. (4) Gendered Discourse Fatigue: the study proposes this new concept to describe how women are simultaneously empowered to express emotion and burdened with the responsibility of collective affect - a systemic form of emotional inequality. From a scholarly perspective, this research extends theoretical discussions by introducing Gendered Discourse Fatigue as an indicator of soft inequality in digital communication, demonstrating that repeated expressions of emotion may not necessarily lead to empowerment but can instead reduce women’s capacity for action. Methodologically, the combination of CDA, Visual Grammar, and Gender Performativity demonstrates the potential to localize multimodal critical discourse analysis in Southeast Asian contexts. Empirically, the systematically coded corpus provides a reference point for future studies on digital culture, gender, and media in Vietnam. From a societal perspective, the findings raise critical concerns about the ways women are encouraged to “speak about fatigue” rather than to “speak demands for change.” Humor in memes, while entertaining, may serve as a mechanism of maintaining soft gender inequality by reproducing feminized patterns of affect that are easy to share and empathize with, but difficult to challenge critically. This calls for greater recognition of indirect gendered discourses - not through explicit hate speech, but through the repetitive framing of women as default symbols of collective exhaustion. This study also acknowledges several limitations. First, the dataset was restricted to 120 memes from two popular Facebook fanpages, which does not fully capture the diversity of Vietnamese memes across other platforms (e.g., TikTok, Instagram, X). Second, the analysis focused primarily on linguistic and visual elements, without incorporating user comments, shares, or feedback, which could provide additional insight into audience reception and negotiation. Third, the quantitative analysis remained descriptive and did not employ advanced statistical tests to confirm significance across gendered categories. These limitations open promising avenues for future research. Potential directions include: (1) Expanding datasets across multiple platforms and longer timeframes to capture discursive shifts. (2) Conducting cross-cultural comparisons between Vietnamese memes and those in other contexts (e.g., Japan, Korea, Western societies). (3) Examining user interactions such as comments and shares as a secondary discursive layer. (4) Assessing the impact of humor on shifting social perceptions of gender. (5) Quantifying the cumulative effects of Gendered Discourse Fatigue using sentiment analysis and sociological surveys. In sum, this study contributes to critical understanding of how memes simultaneously enable and constrain gender representation in digital culture. Memes should therefore be viewed not merely as entertainment but as ideologically charged discourses that subtly restructure gender roles in contemporary society—both reflecting and reproducing the boundaries of inequality. References Blewitt-Golsch, Ashley Lorraine. 2019. Transgender experience depicted through memes: An ethnographic investigation of minority stress and resilience. Electronic Theses and Dissertations 1565. Butler, Judith P. 1990. Gender Trouble: Feminism and the Subversion of Identity. New York: Routledge. Calimbo, Alma Cito. 2015. Deconstructing myths via humor: a semiotic analysis of Philippine political internet memes. CASS Langkit Journal 06:24–41. MSU-Iligan Institute of Technology, Philippines. Cameron, Deborah. 2003. Feminism and linguistic theory (2nd ed.). Palgrave Macmillan. Cannizzaro, Sara. 2016. Internet memes as internet signs: a semiotic view of digital culture. Sign Systems Studies 44.4:562–86. Dawkins, Richard. 1976. The selfish gene. Oxford: Oxford University Press. Reprint: The Selfish Gene 40th Anniversary Edition. Drakett, Jessica, Bridgette Rickett, Katy Day & Kate Milnes. 2018. Old jokes, new media – online sexism and constructions of gender in Internet memes. Feminism & Psychology 28.1: 109–127. https://urldefense.com/v3/__https://doi.org/10.1177/0959353517727560__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrhBs1_Oj$ Fairclough, Norman. 1995. Critical discourse analysis: the critical study of language. London: Longman. Grundlingh, Lezandra. 2017. Memes as speech acts. Social Semiotics 28.2:147–68. Routledge, Taylor & Francis Group. Halliday, M. A. K. & C. M. I. M. Matthiessen. 2004. An introduction to functional grammar. 3rd ed. London: Hodder Arnold. Halliday, M. A. K., & Matthiessen, C. M. I. M. 2014. Halliday’s introduction to functional grammar (4th ed.). Routledge. Howard, V. & A. Adan. 2022. The end justifies the memes: a feminist relational discourse analysis of the role of macro memes in facilitating supportive discussions for victim-survivors of narcissistic abuse. Cyberpsychology: Journal of Psychosocial Research on Cyberspace 16.4:1–20. https://urldefense.com/v3/__https://doi.org/10.5817/CP2022-4-10__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrtzQd7Al$ Kress, Gunther & Theo van Leeuwen. 1996/2006. Reading images: the grammar of visual design. 1st ed. 1996; 2nd ed. 2006. London: Routledge. Lakoff, Robin Tolmach. 2004. Language and woman’s place: text and commentaries. Edited by Mary Bucholtz. Oxford: Oxford University Press. Love, Alan C. & Wimsatt, William C. 2019. Beyond the meme: development and structure in cultural evolution. Minnesota Studies in the Philosophy of Science 22:191–214. Minneapolis: The University of Minnesota Press. Lương, Thị Hiền & Trịnh, Khánh Hiền. 2023a. Nghiên cứu đặc điểm meme tiếng Việt từ góc độ phân tích diễn ngôn đa phương thức [A study on the characteristics of Vietnamese memes from the perspective of multimodal discourse analysis]. HNUE Journal of Science: Social Sciences 68.3: 211–220. Lương, Thị Hiền & Trịnh, Khánh Hiền. 2023b. Vận dụng meme tiếng Việt trong dạy học một số nội dung chương trình Ngữ văn [Applying Vietnamese memes in teaching some contents of Philology General Education Curriculum]. VNU Journal of Science: Education Research 40.1: 91–103. Nguyễn, Phương Ngân. 2024. Internet meme nhìn từ lí thuyết ngôn ngữ học chức năng hệ thống (Trường hợp meme macro hình ảnh) [Internet memes based on the theory of systemic functional linguistics: A case study of image macro meme]. Tạp chí Khoa học và Công nghệ – Đại học Đà Nẵng 22.8: 59–66. Shifman, Limor. 2014. Memes in digital culture. Cambridge, MA: MIT Press. PDF version accessed from https://urldefense.com/v3/__https://www.rosario.gob.ar/inicio/sites/default/files/2024-09/Memes*20in*20Digital*20Culture*20-Limor*20Shifman.pdf__;JSUlJSU!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrt3i9rFB$ We Are Social & Meltwater. 2025. Digital 2025: Vietnam.datareportal. Accessed September 10, 2025. https://urldefense.com/v3/__https://datareportal.com/reports/digital-2025-vietnam__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPru92uwVQ$ Yus, Francisco. 2018. Multimodality in memes: a cyber pragmatic approach. In:Analyzing digital discourse: new insights and future directions, ed. by R. Taiwo, A. Odebunmi, and A. Adetunji, 105–31. Cham: Palgrave Macmillan. 1 I am deeply indebted to my two language consultants, one hailing from Kaheciday village and the other from Kinanuka village. Both of them are now based in Kaohsiung City, but they frequently return to their home villages. I sincerely thank National Science and Technology Council, R.O.C. for funding this research (NSC101-2010-H-160-004-). This paper has been revised from sections of the project’s research report. 2 Applicatives and imperatives are not tested in this study. 3 An interesting observation is that mental-attitude adverbial verbs can be reanalyzed as adjectival predicates, whereas manner adverbial verbs cannot. This contrast may be attributed to their distinct selections of voice markers. The mental-attitude verb sadly is restricted to the marker ma-, which denotes a transient state (Wu 2007), whereas the manner verb violently alternates between the actor voice mi- (expressing transitive activity) and the undergoer voice ma- (distinct from the ma- used with mental-attitude verbs). 4 At SEALS 34, participants proposed that ‘reportedly’ ka-tengil-an corresponds more closely to the English expression it is heard. Even if that is the case, the argument remains valid. For instance, atay hani ‘fortunately’ appears independently and does not combine with any tense or aspectual marker. As noted in Jheng (2025), the mood-related evaluative adverbial modifier lemed ‘luckily’ may appear with the past tense marker na-, a possibility attributed to its derivation from the lexical root luck. 5 As reported in Jheng (2025), the evaluative adverb luckily is realized in Siwkolan Amis as lemed, a form that may be marked with either ma- or –en. 1 To generalize, the non-contracted ‘strong’ form of the definite article is used to express anaphoricity, while the contracted ‘weak’ form is used to express uniqueness. Thus, in the following example from Schwarz (2009:14), a) could be used the express a house that is known to both speech participants while b) could be used to refer to a house previously mentioned in the discourse. Contrastively in English, ‘the’ can be used with either a unique or an anaphoric referent. a) Hans ging zu dem Haus. b) Hans ging zum Haus. Hans went to thestrong house Hans went to-theweak house ‘Hans went to the house.’ ‘Hans went to the house.’ 2 Without the definite marker and in isolation / without context, buku tebal and presiden bodoh would have generic interpretations, i.e., ‘books are thick’ and ‘presidents are stupid’. 3 In this way, =nya is similar to the “familiar definite article” nʊ in Akan, a Niger-Congo language spoken in Ghana. (Arkoh & Matthewson 2013) 4 This construct is perfectly acceptable in many CIs; see §4. 5 It is unclear whether one could say bukunya saya in this context in SI/CIs. However, it is certainly not possible to use the first-person possessive clitic =ku with =nya, e.g., *bukukunya or *bukunyaku. This could be for syntactic or morphological reasons. 6 Blust (2013:295) notes that grammars of numerous Malay variants from the early 1900s through the 1970s list 15-38 classifiers and hypothesizes that ‘urbanisation and the creation of national languages has had a simplifying effect on their use’. 7 It is not clear that this distinction exists for all speakers, or that it exists in CIs (see §4). 8 Analogously, Hsu (2021:67–69) mentions the likelihood of two concurrent grammars in Kashmiri with regards to V2/V3 alteration. 9 It may be that (se- +) classifiers have undergone reanalysis in Cis and become lexicalized as indefinite articles, and are thus externally merged as Head,DP. 10 MacDonald & Soenjono (1967:130) also note that these are the only ‘numeratives’ (numerals and quantifiers) that may occur with a reduplicated noun. It is possible that the prohibition against using banyak and segala with reduplicated nouns is a recent prescriptive phenomenon in SI. 11 The colloquial Javanese spoken by young speakers in Yogyakarta also shows syntactic evidence of Indonesian influence in certain word orders. It is historically likely that Javanese influenced the formation of Betawi; both share pad(h)a as a marker of plural agreement, among other traits, such as a lack of classifiers. 12 According to Robson (1992), this alternation is purely phonological in Javanese with -né affixing to words ending in a vowel and -é affixing to words ending in a consonant is shown in (30c), with some lexical exceptions. He also notes that these suffixes ‘have another kind of function, namely as a kind of demonstrative, translatable with the English definite article’ e.g., wedhusé ‘his/the goat’ (Robson 1992:34). 1 For the sake of notational consistency and readability, the tone markings (e.g., register numbers or contour values) provided for these markers in the original sources have been omitted throughout this paper. 2 Note that mɤj² represents two homophonous but distinct items: the possession verb ‘have’ in (17) and the negative marker ‘not’ in (18). As shown in (17), the possession verb mɤj² is negated by poː⁴ (i.e., poː⁴ mɤj² ‘not have’) rather than the homophonous negative marker mɤj². 3 The abbreviation NEG.SFP stands for Negative Sentence-Final Particle. Longming Zhuang has two such particles which modulate the force of the negation: naːw³ softens the assertion, while naːw¹ reinforces it, adding an emphatic nuance. 1 Since 2008, I have been conducting fieldwork on the Hlai language and analyzing field-collected data. My field site is Tongzha City on Hainan Island, where the local Hlai people primarily speak the Gei dialect, the second largest dialect of Hlai. All Hlai data discussed in this paper are from the Gei dialect. 1 Unless otherwise stated, Singlish data reflect the native-speaker judgments of the first author and six other Singlish speakers. All consultants speak English alongside Singlish. Four of the consultants are ethnic Chinese and additionally speak Mandarin, while one of them also speaks Hokkien (Min Nan). The other two consultants are ethnic Malays who also speak Malay. 1 Elias (2018) classifies Lio as Central-Malayo-Polynesian, while Glottolog classifies Lio as a Bima-Lembata Malayo-Polynesian language. As a more nuanced formal classification is beyond the scope of this study, we shall conform to the more general grouping. 2 The syntactic representation of mixed-type SVCs is not shown here, as they mirror the structures of the SVC types involved and do not generate any structures unique to the mixed type. 3 The fact that only manner SVCs permit flexible ordering invites further investigation into the interaction between semantic type and syntactic movement. This supports cross-linguistic claims that not all SVC types are uniform in their syntactic behaviour (Aikhenvald 2006; Dixon 2006). 1 Prior to Vietnam’s Plan to Arrange and Merge Administrative Units, Lũng Cú village was located within Đồng Văn District, Hà Giang Province, Vietnam. However, as of the reorganization of provinces on July 1, 2025 Lũng Cú is now located in Tuyên Quang province after Hà Giang province was merged into Tuyên Quang province. 2 This fieldwork was conducted as independent research while studying at the School for International Training: Vietnam. In Vietnam, I wish to thank Cô Dương Vân Thanh and Trần Minh Trí for helping me to organize and obtain the necessary permits to complete this fieldwork, and Dìu Thị Hương and the Black Lolo people for their time and hospitality in teaching me their language. I wish also to thank the Columbia University Department of Linguistics, specifically my advisor, Professor William A. Foley, and Professor Meredith Landman. Finally, thank you to my parents, Laura Forte and Kenneth Kharbanda, for their unwavering support. 3 The Vietnamese words used for elicitation are shown here as some have more precise senses than in English. 1 Most of the background information presented this section comes from Hoipo Myers (p.c.), who collected this information through conversations with Chuyo speakers and through different sociolinguistic mapping tools with a group of Chuyo speakers in Chuyo Noknyu village in May 2023. 2 The speaker appears to show free alternation between the phonemes /u/ and /i/ in this word. Throughout the current paper, underlying tone categories are indicated with a subscript number after the syllable while surface realization is indicated in superscript Chao tone numerals after the syllable. 1 Weaving of baskets, mats, and the like has a much deep history in the region, but again, such items are not in the scope of this study. 2 See Alves 2025 for a survey of loanwords in this domain in Vietnamese from this period to the Colonial era. 3 Blust and Trussel (2010) reconstruct *tenun-an ‘loom’ in Proto-Austronesian. However, this is based on only two Formosan languages. It is also a transparently nominalized form of the verb ‘to weave’, ProtoAustronesian *tenun, which they also reconstruct. We find this insufficient to claim that this is a proto‑form in the deep history of Proto-Austronesian, though perhaps more robust evidence can be provided. 4 The form skam in Khmu is provocative as it has an onset cluster, but whether the /s/ is a kind of innovated prefix or a retention from a very early borrowing is unknown. 5 Weitzel (1999:52) suggested that the ‘cotton’ word noted here was a loanword from Munda. However, the evidence in India showing domestication upwards of 8,000 years ago means that the domestication event long predates the expansion of Munda languages into India, which is only in the 2nd millennium BCE (Sidwell and Rau 2019). 6 Pulleyblank (Ibid.) suggested a Sanskritizing influence leading to similar word shape of a Pali loan, but it could certainly just be borrowing from Sanskrit. 7 While the MC form has a palatal medial, as in modern Mandarin, a review of data in the Xiaoxuetang database of many dozens of modern Sinitic lects shows that most lack palatal medials, and while /i/ is the more common modern reflex among southern Chinese varieties, both /e/ and /ɛ/ are seen, especially among Hakka dialects. 8 We have noted a Chinese word with some similarity, including the syllable shape and the voiceless onset: 徽 huī ‘rope’, MC xjw+j, OC *m̥əj. However, as it has the meaning of ‘rope’, and the vowel does not match, it may be an instance of chance partial similarity. 1 I have standardized the orthographies in these tables to approximate the IPA. I have also added hyphens, not necessarily to indicate synchronic morpheme boundaries but to facilitate diachronic analysis. For the reconstruction of Proto-Oceanic (POC) cardinal numerals, see Ross (2023a). 2 Exceptionally, in this table, I have retained Sarfert’s (1920) orthography for the vowels, since it is not clear to me which IPA symbols correspond to them. I have, however, changed his <š> to ʂ and his <ṅ> to ŋ. 3 https://urldefense.com/v3/__https://ehrafworldcultures.yale.edu__;!!PvDODwlR4mBZyAb0!QKVkcztz5UvRopxhjleCDnBhIltsUXI2US9MQvhvQP5lgMqFQYGkiZsNl9RdKGI7ddcmoEPtDhsCKeM9rAm9WBmPrnxf8jNv$ . 4 Many thanks to Harald Hammarström for help with this. 5 The Greek letters used in Table 14 are simply a way of tracking recurrences of (mostly) identical formal elements. 1 Namely, a “velar or uvular” fricative. 2 However, this uvular ‘burr’ is not synchronically productive, cf. Påhlsson (1972) showing that the burr is restricted to the older generations. 3 This is in line with contemporary data, cf. Wu (2023) where Malay varieties of Terengganu consistently reflect *r as /ɣ/ 4 The source also states that final *r causes centralization of vowels; namely, *-ir *-ur *-ar are reflected as -e -o -ɐ. While not explicitly stated in the work, final *l also seems to exhibit this centralization phenomenon, cf. [kʰaɛ] or [kʰaːe] for Standard Malay [kail] ‘fish hook’ 5 With caveats of slight synchronic differences. 6 However, attestations of approximants in free variation are found. In postvocalic positions, Deterding et al. (2022) observed that /r/ is often realized as [ɹ] in Standard Malay in word final position or in clusters 7 Purcell (1956) (also briefly mentioned in Edwards & Blagden (1931)) notes two Ming envoys to Malacca in the years 1403 and 1405 A.D. respectively. 8 Referencing the works of Hirth (1888) 9 However, during this period, /ʐ-/ were distinctive from /l-/ initials; cf. (Coblin 2000); thus, it should be taken into account that the Chinese transcribers saw Malay /r/ as closer to [l] than a [ʐ]. 10 In a similar vein, a word final /k/ (realized as a glottal stop in Standard Malay) is orthographically rendered as a zero; also cf. bapa for /bapak/ 11 However, the isolect as described in dH does not feature this change for kecil. In the document, it is mostly rendered as ketjil. 12 cf. YL /-ar/ finals which vary between the erhua and ∅. 13 cf. Anderbeck (2003) on Jambi Malay varieties which elide /r/ prevocalically, see bajalan~bajalɛn in the Jambi isolects described in the paper. Similar behavior in dH’s should thus be considered for whether dH described an [r]-ful or a [ɣ]-ful isolect. 14 On the other hand, [r]-ful isolects are also known to elide /r/ in prefixes, cf. Brunei Malay (Clynes 2001) ba- for Standard Malay bər-, though, to my knowledge, there are no isolects in Sumatra which have the elision of /r/ and where the rhotic is pronounced with an apical trill instead of a dorsal fricative. 1 Blust (1999) states that PAN *q had already been lost in East Formosan, while Tseng (2023) hypothetically proposes *Q in Proto-Northern East Formosan. If comparing PAN etyma and the modern reflexes, *q is supposed to be the environment that triggers *a > i in Kavalan except for intervocalic position, e.g., PAN *qabu ‘stone’ > KAV ibu, *qamiS > imis ‘north’, *biRaq > biRi ‘leaf’, *mataq > mti ‘unripe, raw’, and *panaq ‘to throw; bow and arrow’ > pani. These examples imply that the loss of *q should be no earlier than Proto-Northern East Formosan, so there should be an exact phoneme being reconstructed. Given that it is phonetically uncertain, it is symbolized as *Q. 2 It remains unclear why the liquid had not shifted to r in Kavalan pamil (**pamir). Note that l in the present data has a quality of [ɮ] (Li and Tsuchida 2006:3). 1 This paper is an output of the extension project, Developing a community-based documentation project for Bugkalot – Phase II, funded by the Office of the Chancellor of the University of the Philippines Diliman, through the Office of the Vice Chancellor for Research and Development (OVCRD). The authors thank Associate Professor Maria Kristina S. Gallego, PhD, and Assistant Professor Ria P. Rafael of the University of the Philippines Department of Linguistics for their guidance during the conduct of the field activities from where this paper is derived from. Utmost gratitude is also extended to the Bugkalot/Eg̓ongot community for welcoming the authors into their community and letting them conduct collaborative language documentation towards the revitalization of the language. 1 This study was supported by JSPS KAKENHI Grant Number 23K20093. The authors are grateful for the assistance of Nurhasan Danial at West Mindanao University in the data collection. We also thank students at West Mindanao University and Mindanao State University Tawi-Tawi College of Technology and Oceanography for answering the questionnaire. 2 The data collection in Zamboanga City was conducted in West Mindanao State University. Professor Nurhasan Daniel, the head of Islamic Studies and Research coordinator, assisted with the data collection. --------------- ------------------------------------------------------------ --------------- ------------------------------------------------------------ Laurence Reid 2 Laurence Rei Laurence Rei Papers from SEALS 34 – Chen Laurence Rei Papers from SEALS 34 – Fettes Laurence Rei Papers from SEALS 34 – Huang Laurence Rei Papers from SEALS 34 – Lee Laurence Rei Papers from SEALS 34 – Lincoln & Baptista Laurence Rei Papers from SEALS 34 – Wivell et al. Laurence Rei Papers from SEALS 34 – Burhagohain Laurence Rei Papers from SEALS 34 – Kharbanda Laurence Rei Papers from SEALS 34 – Mulder Laurence Rei Papers from SEALS 34 – Phan Laurence Rei Papers from SEALS 34 – Alves & Dockum Laurence Rei Papers from SEALS 34 – Barlow Laurence Rei Papers from SEALS 34 – Pan & Song Laurence Rei Papers from SEALS 34 – Suryatama Laurence Rei Papers from SEALS 34 – Tseng Laurence Rei Papers from SEALS 34 – De Pano & Asuncion Laurence Rei Papers from SEALS 34 – Dino & Utsumi Laurence Rei Papers from SEALS 34 – Lương, Nguyễn, Trịnh, & Nguyễn Papers from SEALS 33 – XX 122 102