Social media data now enriches and supplements information flow in various sectors of society. The question addressed here is whether social media can act as a credible information source of sufficient quality to meet the needs of transport planners, operators, policy makers and the travelling public. A typology of primary transport data needs, current and new data sources is initially established, following which this study focuses on social media textual data in particular. Three sub-questions are investigated: the potential to use social media data alongside existing transport data, the technical challenges in extracting transport-relevant information from social media and the wider barriers to the uptake of this data. Following an overview of the text mining process to extract relevant information from the corpus, a review of the challenges this approach holds for the transport sector is given. These include ontologies, sentiment analysis, location names and measuring accuracy. Finally, institutional issues in the greater use of social media are highlighted, concluding that social media information has not yet been fully explored. The contribution of this study is in scoping the technical challenges in mining social media data within the transport context, laying the foundation for further research in this field.
Harnessing the potential of new generation transport data and increasing public participation are high on the agenda for transport stakeholders and the broader community. The initial phase in the program of research reported here proposed a framework for mining transport-related information from social media, demonstrated and evaluated it using transport-related tweets associated with three football matches as case studies. The goal of this paper is to extend and complement the previous published studies. It reports an extended analysis of the research results, highlighting and elaborating the challenges that need to be addressed before a large-scale application of the framework can take place. The focus is specifically on the automatic harvesting of relevant, valuable information from Twitter. The results from automatically mining transport related messages in two scenarios are presented i.e. with a small-scale labelled dataset and with a large-scale dataset of 3.7 m tweets. Tweets authored by individuals that mention a need for transport, express an opinion about transport services or report an event, with respect to different transport modes, were mined. The challenges faced in automatically analysing Twitter messages, written in Twitter's specific language, are illustrated. The results presented show a strong degree of success in the identification of transport related tweets, with similar success in identifying tweets that expressed an opinion about transport services. The identification of tweets that expressed a need for transport services or reported an event was more challenging, a finding mirrored during the human based message annotation process. Overall, the results demonstrate the potential of automatic extraction of valuable information from tweets while pointing to areas where challenges were encountered and additional research is needed. The impact of a successful solution to these challenges (thereby creating efficient harvesting systems) would be to enable travellers to participate more effectively in the improvement of transport service
Rapid and recent developments in social media networks are providing a vision amongst transport suppliers, governments and academia of ‘next-generation’ information channels. This chapter identifies the main requirements for a social media information harvesting methodology in the transport context and highlights the challenges involved. Three questions are addressed concerning (1) The ways in which social media data can be used alongside or potentially instead of current transport data sources, (2) The technical challenges in text mining social media that create difficulties in generating high quality data for the transport sector and finally, (3) Whether there are wider institutional barriers in harnessing the potential of social media data for the transport sector. The chapter demonstrates that information harvested from social media can complement, enrich (or even replace) traditional data collection. Whilst further research is needed to develop automatic or semi-automatic methodologies for harvesting and analysing transportrelated social media information, new skills are also needed in the sector to maximise the benefits of this new information source
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.