Ilościowe charakterystyki złożoności języka naturalnego
Loading...
DOI
Date
2014
Authors
Journal Title
Journal ISSN
Volume Title
Thesis supervisors
Reviewers
Publisher
Institute of Nuclear Physics Polish Academy of Sciences
Abstract
This doctoral dissertation includes the following main theses:
- As samples of natural language, literary texts show several properties of complex
systems: they have internal organization, including a hierarchical structure, and the interactions between their components such as words are of complicated nature, which among others can be a consequence of imposed rules of grammar and an author’s style of writing. One also observes formation of large-scale effects that are inexplicable on a basis of the sole knowledge of the individual words. Such effect can include content, emotional charge, and
artistic value of the text.
- Interactions between words defined by their mutual adjacency, after expressing them in the network representation, show certain features of networks with accelerated growth and, approximately, scale-free degree distribution of nodes. Such networks are also characterized by unique tendency to condensation, which leads to shortening of the path lengths between nodes if the number of nodes increases.
- Despite strong differences in grammar, different European languages do not show comparable differences in network topology. Substantially larger differences can be seen within one language, when one compares texts that represent different literary genres.
- Modelling of the empirical word adjacency networks is possible either directly, via the appropriate network models (e.g., by various kinds of the networks with accelerated growth), or indirectly, via network representation of the relevant stochastic processes. Comparing topology of the model networks with the empirical ones shows, however, that language has some subtleties, which cannot fully be expressed by relatively simple, generic models.
- Literary texts, if parameterized by sentence lengths and expressed in a form of time series, show clear fractal structure, and in some cases even the multifractal structure. On the literary science ground, the latter group of texts can be linked with a narrative technique called the stream of consciousness. This dissertation is divided into 5 chapters. Chapter 1 contains a short introduction with listed the main objectives and theses of the work. Chapter 2 is devoted to description of the phenomenon of natural language - its origins, evolution, and morphology. The main theories of the language origin and formal classification of languages is also discussed in this part of the work. Chapter 3 contains an introduction to complex systems science. It begins with the explanation, why physics is a branch of science the best equipped to examine such systems and the natural language in particular. Later on, the term of complexity is introduced and the most important properties of complex systems are discussed together with the methodology allowing
for their study.
Chapter 4 is a container that includes description of all the analyses and discussion
of the obtained results. It is composed of several sections devoted to specific issues. Section 4.1 presents a statistical analysis of empirical data consisting of vocabulary of six European languages with particular emphasis put on the Zipf approach. In Section 4.2 literary texts expressed by word adjacencies are a subject to network analysis. Of interest are the topological properties of these networks, especially the node connectivity distributions and the average shortest path lengths. Empirical results are confronted with the results of simulations according to different network models. Last Section 4.3 presents the results of the fractal analysis applied to time series of sentence lengths with the main stress put on identification of multifractal properties. Finally, Chapter 5 contains a summary with critical discussion of the results presented throughout this work, as well as an indication of possible directions of future research.