Preliminary analysis showed that users did not stick to the intended forum behavior of discussing exactly one topic in one thread. Instead, they deviated from the original topic over time, sometimes coming back to the original topic. In short, any topic could appear in any thread. Thus, we needed a way to decide for every individual post, whether it was relevant or not. The simplest approach is to scan every post for a set of predefined keywords. However, we assumed that the context of a post also plays a role when determining the relevance of a post. We thus defined an Information Retrieval algorithm, that extends the keyword-based approach by also taking structural (contextual) information of posts into account. (The following is mostly taken from [5].)
Reasoning behind the new AlgorithmApproaches based on linguistic features alone can not be expected to perform well. This is because: