2022
DOI: 10.12694/scpe.v23i1.1957
|View full text |Cite
|
Sign up to set email alerts
|

Exploring Usability of Reddit in Data Science and Knowledge Processing

Abstract: This contribution argues that Reddit, as a massive, categorized, open-access dataset, is a useful data source, on "almost any topic", which can be used for data science and knowledge exploration. This statement is backed-up with presented analysis, based on 180 manually annotated papers, related to Reddit itself, and data acquired from top databases of scientific papers. Finally, an open source tool is introduced, which provides easy access to Reddit resources, and exploratory data analysis of how Reddit cover… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 37 publications
0
2
0
Order By: Relevance
“…Reddit ( https://Reddit.com ) has approximately 57 million daily active users until 2023, one of the largest social media outlets in terms of users [ 11 ]. Reddit users can share text, links, images, or videos in various sub-communities (called subreddits and dedicated to specific topics) [ 48 ]. Everyone has access to the public subforum (called subreddits on Reddit), and users can comment and vote on posts and comments for free and anonymously.…”
Section: Methodsmentioning
confidence: 99%
“…Reddit ( https://Reddit.com ) has approximately 57 million daily active users until 2023, one of the largest social media outlets in terms of users [ 11 ]. Reddit users can share text, links, images, or videos in various sub-communities (called subreddits and dedicated to specific topics) [ 48 ]. Everyone has access to the public subforum (called subreddits on Reddit), and users can comment and vote on posts and comments for free and anonymously.…”
Section: Methodsmentioning
confidence: 99%
“…Two recent overviews of Reddit-related research [14,15] suggested that natural language processing and graph networks are the most popular techniques used to analyze various aspects of Reddit-derived datasets. The following sections discuss the state of the art of the NLP methods, graph networks, and performance evaluation found in Redditrelated work.…”
Section: Pertinent State Of the Art Of Reddit-related Researchmentioning
confidence: 99%
“…Reddit consists of over 3.5 million communities (and over 1.5 billion monthly visitors). The most popular Reddit data source is the Pushshift database [13], [14]. The subreddit data was extracted from Pushshift subreddit dumps.…”
Section: A Datasetmentioning
confidence: 99%