Übersicht:Es wird gezeigt, daß der in bit pro Buchstabe gemessene redundanzfreie Wert der Information von natürlichsprachlichen Texten nicht einen festen unteren Grenzwert von l bit pro Buchstabe haben kann, sondern mit wachsender Textlänge monoton abfallen muß. Dies steht überraschenderweise in Übereinstimmung mit der schon von Shannon veröffentlichten Theorie. Der Bereich, im dem die gesuchte Funktion liegen darf, wird abgeschätzt. Eine erste Näherung für die gesamte in einem Text gegebener Länge enthaltene redundanzfreie Information ergibt sich proportional zur Wurzel aus dieser Länge.Abstract: It is shown that the redundancy-free value of specific information (entropy) in natural language texts, measured in bit per letter, cannot be a constant value. Rather it should decline monotonously with growing length of texts. Surprisingly this fact has been mentioned incidentally by Shannon already in his famous paper on printed English. Now the region is determined numerically in which the function of minimum entropy can be situated. A first approximation for this function leads to integrated information values, which are proportional to the square root of text length.Für die Dokumentation: Informationstheorie / Shannon / minimale Entropie / sprachliche Texte
Existing artificial neural network models are not very successful in understanding or generating natural language texts. Therefore it is proposed to design novel neural network structures in higher levels of abstraction. This concept leads to a hierarchy of network layers which extract and store local details in every layer and transfer the remaining nonlocal context information to higher levels. At the same time data compression is provided from layer to layer. The use of the same network elements (meta-words) in higher levels for different word series in the basis level is introduced and discussed for grammatical identity or similarity. Thus text can be compressed to forms which are almost free of redundancy. Possible applications are storage, transmission, understanding, generating and translation of texts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.