Research

Published & Forthcoming Papers

Inverse morpheme words are compound words that have the same morphemes but are arranged in the opposite order. The majority of related works on the subject have focused on a narrow investigation of dictionary definitions, with few studies based on large-scale corpora. We used the People's Daily corpus (1946-2017) to add and delete words from a base list and obtained a word list of 668 pairs of inverse morpheme words. Furthermore, the cosine similarity is computed by using word embedding based on the distributed representation, and the Pearson correlation coefficient between it and the manually annotated value is 0.907, indicating that this method can measure the semantic similarity of inverse morpheme words very close to human judgment. We also discovered that 76 percent of inverse morpheme words have a cosine similarity of 0.4 or higher, and that word formation, part-of-speech, and frequency all have an impact on semantic similarity.

Building Chinese Word Knowledge Base for Children's Leveled Reading
with Zhiying Liu, Lijiao Yang, Lu Zhang.

With the development of great Chinese education, the domestic leveled reading of Chinese has attracted more and more attention. Both schools and parents urgently need a reading system that meets the development of children's reading ability. The hierarchical construction of words as the carrier of reading materials is even more important. The difficulty level of words has a direct and significant impact on the text complexity of reading materials. This paper focuses on the construction of the Chinese Character-word grading of the Chinese reading system, and attempts to establish the Chinese characters knowledge base with Character ranks in line with the characteristics of Chinese characters themselves. In terms of the Chinese character knowledge base, this paper absorbs the research results of exegetical studies, and determines the hierarchical attributes of Chinese characters including shape, meaning, and word formation ability of Chinese characters, builds the Chinese character knowledge base for leveled reading containing 3350 Chinese characters with features. As for the word knowledge base, this paper describes the attributes of part of speech, word meaning, context, etc., especially the use of Hierarchical Network of Concepts theory to define the level of difficulty about the cognitive attributes of semantic categories, and finally builds a Chinese reading leveled word knowledge base containing 18300 words with features covering shape, meaning and context. Based on it, the content of words, the word density, the proportion of super-class words, the number of class symbols, IOG and other attributes are described to guide the automatic grading of Chinese texts which got a better result.

Jiaomei Zhou

Home

CV

Personal

Published & Forthcoming Papers