Extracting accurate and complete results from search engines: Case study windows live

Thelwall, Mike

doi:10.1002/asi.20704

Cited by 58 publications

(36 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These represent the three current major search engine families, and although combining their results would probably cover less than half of the Web (perhaps under 16% each), 3 it still gives a significant amount of data. A main difficulty at this stage derives from the fact that for a given query, a search engine returns a maximum of 1,000 matches and automatically filters its results to avoid apparently redundant pages (Thelwall, 2008). The redundancy problem is particularly acute when searching for copies of a single joke because these copies are, by definition, large chunks of similar or identical text.…”

Section: : Gathering Urls and Assessing The Web Presence Of The Memementioning

confidence: 99%

Assessing global diffusion with Web memetics: The spread and evolution of a popular joke

Shifman

Thelwall

2009

J. Am. Soc. Inf. Sci.

Self Cite

View full text Add to dashboard Cite

Memes are small units of culture, analogous to genes, which flow from person to person by copying or imitation. More than any previous medium, the Internet has the technical capabilities for global meme diffusion. Yet, to spread globally, memes need to negotiate their way through cultural and linguistic borders. This article introduces a new broad method, Web memetics, comprising extensive Web searches and combined quantitative and qualitative analyses, to identify and assess: (a) the different versions of a meme, (b) its evolution online, and (c) its Web presence and translation into common Internet languages. This method is demonstrated through one extensively circulated joke about men, women, and computers. The results show that the joke has mutated into several different versions and is widely translated, and that translations incorporate small, local adaptations while retaining the English versions' fundamental components. In conclusion,Web memetics has demonstrated its ability to identify and track the evolution and spread of memes online, with interesting results, albeit for only one case study.

show abstract

Section: : Gathering Urls and Assessing The Web Presence Of The Memementioning

confidence: 99%

Assessing global diffusion with Web memetics: The spread and evolution of a popular joke

Shifman

Thelwall

2009

J. Am. Soc. Inf. Sci.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Using this software, we were also able to automatically split the queries whose results exceeded the maximum of 1,000 hits permitted when using Yahoo! for this purpose (Thelwall, 2008b). In this way, we obtained up to 19,619 inlinks for each query.…”

Section: Methodsmentioning

confidence: 99%

Mapping the network structure of science parks

Minguillo

Thelwall

2012

Aslib Proceedings

Self Cite

View full text Add to dashboard Cite

Purpose -This study introduces a method based on link analysis to investigate the structure of the R&D support infrastructure associated with science parks in order to determine whether this webometric approach gives plausible results. Design/Methodology/Approach -Three science parks from Yorkshire and the Humber in the UK were analysed with webometric and social network analysis techniques. Interlinking networks were generated through the combination of two different data sets extracted from three sources (Yahoo!, Bing, SocSciBot). Findings -These networks suggest that institutional sectors, representing business, universities and public bodies, are primarily tied together by a core formed by research institutions, support structure organisations and business developers. The comparison of the findings with traditional indicators suggests that the web-based networks reflect the offline conditions and policy measures adopted in the region, giving some evidence that the webometric approach is plausible to investigating science park networks. Originality/value -This is the first study that applies a web-based approach to investigate to what extent the science parks facilitate a closer interaction between the heterogeneous organisations that converge in R&D networks. This indicates that link analysis may help to get a first insight into the organisation of the R&D support infrastructure provided by science parks.

show abstract

“…Instead of hit count estimates, complete lists of matching URLs can be obtained. This is an improvement because the hit count estimates can be unreliable (Thelwall, 2008;Uyar, 2009) and because additional information can be extracted from URL lists, as described below. Full URL lists can be extracted from the results pages manually but this can be timeconsuming so the use of software like Webometric Analyst (http://lexiurl.wlv.ac.uk) is recommended to automate this although this program is only able to use Bing.…”

Section: Lists and Counts Of Web Pages Citing The Documentsmentioning

confidence: 99%

“…A problem arises if there are more than 1,000 results because search engines never return any results after the 1,000 th . The "query splitting" technique has been designed to resolve this issue by automatically constructing new queries to retrieve additional results (Thelwall, 2008). This is available in Webometric Analyst.…”

Section: Lists and Counts Of Web Pages Citing The Documentsmentioning

confidence: 99%

Substance without citation: evaluating the online impact of grey literature

2013

Self Cite

View full text Add to dashboard Cite

Individuals and organisations producing information or knowledge for others sometimes need to be able to provide evidence of the value of their work in the same way that scientists may use journal Impact Factors and citations to indicate the value of their papers. There are many cases, however, when organisations are charged with producing reports but have no real way of measuring their impact, including when they are distributed free, do not attract academic citations and their sales cannot be tracked. Here, the Web Impact Report (WIRe) is proposed as a novel solution for this problem. A WIRe consists of a range of web-derived statistics about the frequency and geographic location of online mentions of an organisation's reports. WIRe data is typically derived from commercial search engines. This article defines the component parts of a WIRe and describes how to collect and analyse the necessary data. The process is illustrated with a comparison of the web impact of the reports of a large UK organisation. Although a formal evaluation was not conducted, the results suggest that WIRes can indicate different levels of web impact between reports and can reveal the type of online impact that the reports have.

show abstract

Extracting accurate and complete results from search engines: Case study windows live

Cited by 58 publications

References 25 publications

Assessing global diffusion with Web memetics: The spread and evolution of a popular joke

Assessing global diffusion with Web memetics: The spread and evolution of a popular joke

Mapping the network structure of science parks

Substance without citation: evaluating the online impact of grey literature

Contact Info

Product

Resources

About