When information seekers use an information retrieval system their strategy is based, at least in part, on the perceptions they have formed about that environment. A random sample was gathered of more than 2,000 actual search queries submitted by users to one Web search engine, WebCrawler, in two separate capture sessions. The results suggest that a high proportion of users do not employ advanced search features, and those who do frequently misunderstand them. Furthermore, many users seem to have formed a model of the Web that imbues it with the intelligence found in a reference librarian, for example, but not a retrieval system. The linguistic structure of many queries resembles a typical human‐human communication model that is unlikely to produce satisfactory results in a human‐computer communication environment such as that offered currently by the Web. Design of more intuitive systems is dependent upon a more complete understanding of user behaviour at the intellectual and emotional as well as the technical levels.
The amount of electronic information in Arabic and other non-English languages available, especially on the World Wide Web, is increasing. Searches for such information can be undertaken on engines developed with the English language in mind, but will these engines work as effectively in other languages? This article investigates the impact on retrieval of prefixes in Arabic, which are far more common than in English. Typically search engines such as AltaVista designed implicitly for English include right hand (suffix) but not left hand (prefix) truncation. A test collection of 271Arabic HTML records was created and indexed using the personal version of AltaVista. A series of searches was conducted on this collection, again using AltaVista. The results showed that searches on nouns stripped of prefixes reduced recall, in some cases dramatically, and that total recall of nouns can only be guaranteed by repeating searches that include the various prefixed versions of the nouns. The research questions the assumption that search engines designed with English in mind will work as well with different language structures.
The World Wide Web offers access to information resources in many languages. Certain developments facilitate multilingual exploitation of these resources. Some search engines, for example, allow the user to restrict retrieved sites to those in particular languages; some also provide the searcher with an interface in a chosen language. Many web sites also offer their information in several languages, one of which typically is English. Systran, a machine translation system available from the AltaVista search engine, can even translate a search statement or a retrieved page from one language to another. Despite these features, however, language also creates obstacles to full exploitation of web resources. Not all languages are catered for by these multilingual tools. Machine translation output typically is but a rough and ready version of a human translation. The variety of scripts in which the written forms of the world’s languages appear also create major problems in searching, inputting, displaying and printing text in non‐roman scripts. The paper offers an overview of multilingual information access issues in relation to the Web.
The performances of general and Arabic search engines were compared based on their ability to retrieve morphologically related Arabic terms. The findings highlight the importance of making users aware of what they miss by using the general engines, underscoring the need to modify these engines to better handle Arabic queries.Nous avons comparé la performance de moteurs de recherches généraux et arabes en fonction de leur capacité à repérer des termes arabes morphologiquement reliés. Les résultats mettent en évidence l’importance de sensibiliser les usagers sur ce qu’ils manquent en utilisant des moteurs de recherche généraux et soulignent le besoin de modifier ces moteurs pour obtenir un meilleur traitement des requêtes arabes.
Many search engines on the Web offer their users the option to restrict searches to a specific language (a searchby-language feature), and therefore only retrieve HTML documents that contain text in the language of their choice. This search feature can be indispensable in such situations when searchers are not interested in material in other languages, or when searchers want to retrieve bilingualimultilingual documents--documents containing the search term(s) in one language and additional text in another language, or in a multitude of languages. The implementation of language-recognition capabilities in search engines necessitates the development of a mechanism through which the indexing software is able to recognize identifiable language properties in a document and consequently indicate the language(s) of this document. These properties include language tags, character encoding, and other language-identifying characteristics in the document. The development of language-recognition mechanisms on the Web is a process that is still in its infancy stage, and the successful identification of the language of documents by search engines can at best be described as a procedure fraught with inconsistencies and errors. The degree of success varies between one engine and another, and a better understanding of how the engines handle the identification process and how their results compare to each other will facilitate more successful implementation of language-based searching, and identify the points of strengths and weaknesses in the existing engines. This poster theorizes that hasty implementations of language-recognition features by some search engines have led to serious problems with the accuracy of search results, while informed acknowledgement of the limitations of these features contributed to an acceptable level of success by others.A random sample of English-language search queries submitted by users to one Web search engine was gathered in early 2002. Unique search terms were extracted from these queries and entered as individual searches in two major search engines that allow users to search for Arabic documents. The searches were limited to documents in Arabic--as provided by the two engines. The results of the searches were analyzed to ascertain the degree to which each engine succeeded in limiting the retrieved documents to those containing Arabic text. Also, each retrieved document that did not contain Arabic text (a false hit) will be analyzed to indicate the reason why it was retrieved by the engine(s). Preliminary analyses of the search results showed that there was a big difference between the numbers of correctly identified documents that were retrieved by each of the two engines, and that both engines produced false hits. Some false hits that have been analyzed so far resulted from the search engine's confusing the country of origin of a retrieved document with the language of this document. Others were the result of the misidentification of languages other than Arabic as Arabic, or of incorrect use...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.