When we read English, if there is no space between words, a sentence will seem to us to become! @#¥%......&*(). However, when we add spaces between words in Chinese sentences, it seems redundant, such as when you see this sentence......
English without spaces and Chinese with spaces (Image source: Editor's homemade)
In fact, even for native English speakers, separating words with spaces is essential for reading. So why does English need spaces to separate words, but Chinese doesn't? What is the underlying reason behind this difference? Scientists from the Institute of Psychology of the Chinese Academy of Sciences have found that there is an "economic" problem in this.
Spaces come with a "amount of information", which is not the same in Chinese and English
English as an alphabetic writing system, each letter represents a phoneme, usually consisting of multiple letters to form a word. English text clearly and unambiguously marks the beginning and end positions of a word, i.e., word boundaries, with spaces. Doesn't Chinese need marker word boundaries?
Chinese is a typical ideographic writing system, with each Chinese character representing a syllable or morpheme. Chinese text consists of consecutive Chinese characters, with different words separated by spaces without spaces. Most Chinese words can be represented by one or two Chinese characters, with short word lengths and little variation (average word length is 1.40 Chinese characters with a standard deviation of 0.57). Therefore, it is easy for Chinese readers to predict the length of words when reading, so as to identify the beginning and end positions of words more quickly, that is, the uncertainty of the boundary position of Chinese words is smaller.
In contrast, English words tend to consist of multiple letters, and the word length varies greatly (the average word length is 3.78 letters, with a standard deviation of 2.04), which makes it more difficult for English readers to predict the beginning and end positions of each word, i.e., the uncertainty of the position of the English word boundary.
Based on a large-scale corpus, the researchers used information theory methods to quantify the amount of information provided by spaces in 27 languages to determine word boundaries. The results show that whether different writing systems use spaces to mark word boundaries is related to the amount of word boundary information provided by spaces: in writing systems that use spaces such as English, spaces provide more information (2.90 bits), while in writing systems that do not use spaces such as Chinese, the inserted spaces provide less information (1.10 bits).
The amount of information provided by the spaces between words in 27 languages to determine the boundaries of words
The root cause of the difference in the amount of space information is related to the uncertainty of the position of the word boundary in different writing systems. Chinese word boundary positions have less uncertainty, even if spaces are inserted between words and words, which provides limited additional information for determining word boundaries; However, the uncertainty of the position of English word boundaries is large, and the space between words can provide more information for determining word boundaries.
What is the use of the "amount of information" of the space to the reader?
The amount of information provided by whitespace to word boundaries reflects the reader's cognitive effort to segment words when reading text without spaces.
In the absence of spaces, the reader needs to slice a line of consecutive strings into different words, that is, word segmentation, which is what we usually call "sentence breaking". In this process, readers need to use contextual information and linguistic knowledge to segment words. And in some cases, the results of word segmentation may be wrong, and the reader needs to detect and correct the word segmentation error. For example, when many readers see the news headline "World Cup China, Japan and South Korea enter the round of 16", they will divide it into "World Cup/China, Japan and South Korea/enter the round of 16". After reading the content of the news, the reader realized that the news was not as expected, so he realized that the cutting was wrong, and corrected the cutting of the headline to "World Cup in / Japan and South Korea / Entering the round of 16".
The reader's cognitive effort in the process of word segmentation and error detection and correction will affect the reading rate. The amount of space information in English is large, and after the text removes the space, readers need to make more cognitive efforts to segment words, which is more likely to cause word segmentation errors. However, if Chinese inserts spaces into the text, the amount of information provided by the spaces is smaller, and the reader does not need to make much cognitive effort to cut the space-free text. Therefore, English tends to use spaces to reduce the cognitive burden of word segmentation, while Chinese chooses not to use spaces.
Consistent with this finding, previous studies have shown that changing the way word boundaries are marked has different effects on the reading efficiency of readers of different languages. These studies found that the removal of spaces in writing systems with a large amount of space information, such as English, significantly reduced reading speed by about 50%; In writing systems with a small amount of space information (such as Chinese), even if a space is inserted, the reading rate will not be significantly improved.
The effect of the way word boundaries are marked on reading efficiency
To use spaces or not to use spaces is to be more "economical"?
English chooses to use spaces, and Chinese chooses not to use spaces, which may be choices made to achieve the economy of reading.
When reading, the visual perception range of a fixation point is limited, and the insertion of spaces will cause the reader to perceive fewer characters at a fixation point, thus reducing the efficiency of visual perception. For Chinese, the amount of information provided by the inserted spaces is small, and the reader does not need to make much cognitive effort to slice the text without spaces. Therefore, the benefits of space segmentation in Chinese are not enough to offset the cost of visual perception, so it is more economical not to use spaces in Chinese. In contrast, in alphabetic writing systems such as English, spaces provide a large amount of information, and removing spaces will cause readers to put more cognitive effort into word segmentation. Thus, for English, the benefits of space segmentation far outweigh the costs it incurs in terms of visual perception.
It can be seen that although different writing systems may or may not use space marker word boundaries, they all choose a more economical word boundary marking method after weighing the cognitive effort required for word segmentation and the visual processing efficiency during reading.
Evidence of historical evolution
From the evolutionary history of the alphabet writing system, people have gradually reformed the writing system to achieve the most economical word boundary marking.
Historically, alphabetic writing systems have not always used spaces to mark word boundaries. There were no spaces in the early written texts due to the lack of word boundary information in the transcribed spoken language and the high cost of writing materials. Readers have to read aloud in order to understand the meaning of the text, resulting in low reading efficiency. The writing system of this period was only used by a few scribes or missionaries. It was not until the Renaissance, with the increasing demand for mass reading, that these writing systems gradually added spaces between words, improving literacy and reading efficiency. It can be seen that the change of the writing system of adding spaces between words in alphabetic languages gradually adapts to the cognitive needs of human beings and is more in line with the principle of economy.
In contrast, Chinese text has historically not used spaces to mark word boundaries. The use of punctuation marks reduces the reading difficulty of Chinese texts, makes sentence boundaries clearer, helps readers understand the structure and meaning of sentences faster, and improves reading efficiency.
However, even after the use of punctuation, Chinese still does not employ interword spaces like alphabetic languages. This suggests that the use of punctuation is sufficient to reduce the cognitive load of Chinese readers, and the additional benefits of whitespace are not enough to offset its negative impact on visual processing efficiency. The evolutionary path of Chinese, which is different from the alphabet writing system, also follows the principle of economy, and on the basis of retaining the original writing form, the reading efficiency is effectively improved through fewer changes.
Bibliography:
[1] Bai, X., Yan, G., Liversedge, S. P., Zang, C., & Rayner, K. (2008). Reading spaced and unspaced Chinese text: Evidence from eye movements. Journal of Experimental Psychology: Human Perception and Performance, 34(5), 1277–1287.
[2] Gibson, E., Futrell, R., Piandadosi, S. T., Dautriche, I., Mahowald, K., Bergen, L., & Levy, R. (2019). How efficiency shapes human language. Trends in Cognitive Sciences, 23(5), 389–407.
[3] Huang, L., & Li, X. (2020). Early, but not overwhelming: The effect of prior context on segmenting overlapping ambiguous strings when reading Chinese. Quarterly Journal of Experimental Psychology, 73(9), 1382–1395.
[4] Huang, L., & Li, X. (2023). The effects of lexical-and sentence-level contextual cues on Chinese word segmentation. Psychonomic Bulletin & Review, 31, 293–302.
[5] Huang, L., Reichle, E. D., & Li, X. (2024). Comparative Analyses of the Information Content of Letters, Characters, and Inter-Word Spaces Across Writing Systems. Annals of the New York Academy of Sciences, 1537(1), 129–139.
[6] Huang, L., Staub, A., & Li, X. (2021). Prior context influences lexical competition when segmenting Chinese overlapping ambiguous strings. Journal of Memory and Language, 118, 104218.
[7] Li, X., Huang, L., Yao, P., & Hyönä, J. (2022). Universal and specific reading mechanisms across different writing systems. Nature Reviews Psychology, 1(3), 133–144.
[8] Ma, G., Li, X., & Rayner, K. (2014). Word segmentation of overlapping ambiguous strings during Chinese reading. Journal of Experimental Psychology: Human Perception and Performance, 40(3), 1046–1059.
[9] Rayner, K., Fischer, M. H., & Pollatsek, A. (1998). Unspaced text interferes with both word identification and eye movement control. Vision Research, 38(8), 1129–1144.
[10] Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423.
[11] Veldre, A., Reichle, E. D., Yu, L., & Andrews, S. (2023). Understanding the visual constraints on lexical processing: New empirical and simulation results. Journal of Experimental Psychology: General, 152, 693–722.
Author: Huang Lin Jieqiong
Author Affilications:Institute of Psychology, Chinese Academy of Sciences