2017
DOI: 10.1111/weng.12281
|View full text |Cite
|
Sign up to set email alerts
|

ICE vs GloWbE: Big data and corpus compilation

Abstract: ICE and GloWbE are the two main corpora available for the study of world Englishes, and as such they share several features. However, they also differ in a number of respects owing to the different techniques adopted during compilation: while ICE is a traditional corpus with 32 text types represented, GloWbE only contains material from the Internet. This article compares the two corpora from a number of perspectives: (i) frequency and collocation of modal verbs of necessity, (ii) degree of orality and informal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

4
19
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 28 publications
(23 citation statements)
references
References 37 publications
4
19
0
Order By: Relevance
“…However, there are also a number of problems associated with this type of big data corpus, as noted by Nelson (), Mukherjee (), and Mair (). We therefore concur with Loureiro‐Porto () that there is a need for carefully compiled traditional‐type corpora of interactive online registers either as independent datasets or as extensions of existing corpora like ICE, which, in addition to blogs, also include other registers like those studied in this research paper. Not only do the two types of data sources complement each other in the feature‐based kind of research that both in principle allow, but carefully compiled multi‐register corpora are particularly essential to address broader questions about how register and regional variation in English are evolving in our current technological age.…”
Section: Resultssupporting
confidence: 83%
“…However, there are also a number of problems associated with this type of big data corpus, as noted by Nelson (), Mukherjee (), and Mair (). We therefore concur with Loureiro‐Porto () that there is a need for carefully compiled traditional‐type corpora of interactive online registers either as independent datasets or as extensions of existing corpora like ICE, which, in addition to blogs, also include other registers like those studied in this research paper. Not only do the two types of data sources complement each other in the feature‐based kind of research that both in principle allow, but carefully compiled multi‐register corpora are particularly essential to address broader questions about how register and regional variation in English are evolving in our current technological age.…”
Section: Resultssupporting
confidence: 83%
“…Perhaps the most major change of contents for second generation corpora is the recommendation for the inclusion of a new component of electronic texts, totalling up to 500,000 words, making the total maximum size of a second generation corpus 1.5 million words. Indeed, from her thorough comprehensive comparison between ICE and the GloWbE corpus of blogs and websites (Davies, ), Loureiro‐Porto (, p. 468) concludes that ‘the future of ICE should include web registers alongside the text types included so far’; from other comparisons, similar conclusions are drawn by Mair (), Mukherjee (), Nelson () and Peters (), in their responses to Davies and Fuchs (). A list of possible electronic texts is presented in Appendix 3 for discussion.…”
Section: Second Generation Corporamentioning
confidence: 79%
“…The project, with currently 27 component corpora, has contributed immeasurably to research and to the advancement of knowledge on world Englishes. As Loureiro‐Porto (, p. 448) rightly remarks: ‘the validity of ICE is wholly unquestioned’. The need for ongoing continuity, effective coordination and directive leadership has never been greater.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…In spite of all these caveats, ICE continues to be the only project that provides representative corpora of varieties of English, although excluding online texts (and important differences have been found between those two ways of compiling corpora, see Loureiro-Porto, 2017). In addition to the 12 corpora released thus far (India, New Zealand, Singapore, Australia, Canada, Great Britain, East Africa, Hong Kong, Ireland, Jamaica, Nigeria and Philippines), the written components of three varieties are also available (Ghana, Sri-Lanka and USA), and 12 international teams are working on the compilation of new members of the ICE family: Bahamas, Fiji, Gibraltar, Malaysia, Malta, Namibia, Pakistan, Puerto Rico, Scotland, South Africa, Trinidad & Tobago and Uganda.…”
Section: The Ice Projectmentioning
confidence: 99%