Lu Wang, Yahui Guo, Chengcheng Ren
|GLOTTOMETRICS||Issue 50, pp 42 – 56 (2021)||https://doi.org/10.53482/2021_50_387|
This paper reports quantitative research on the parts of speech of English words using the data from British National Corpus. Most of the part-of-speech investigations focus on the rank-frequency distribution. However, in English and many other languages, we can find that partd of speech can be ambiguous. For example, hope can be a noun and a verb. Such words are called polyfunctional words, while other words, which belong to only one part of speech, are called monofunctional words. The number of parts of speech that a word belongs to is referred to as polyfunctionality. First, we study polyfunctionality distribution of English words and find that the Shenton-Skees-geometric and the Waring distributions capture the data very well. Then, we group words according to their part of speech, e.g., monofunctional nouns, like Saturday, and polyfunctional nouns, like hope (noun, verb) compose noun group, and try to work out a general model for all the groups. The result is that the extended positive binomial distribution captures all the groups except the article group, because of the sparsity of the data. Last, we study the diversification variants. Since there are polyfunctional words in each group – e.g., in a noun group, a polyfunctional noun may also be a verb, we consider the “verb” function as a diversification variant and try to model the rank-frequency distribution of variants with the Popescu-Altmann function, as used in the previous investigation. The results show very good fit for all groups exzept conjunction group.
Polyfunctionality, polyfunctional words, parts of speech, BNC.