Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation 2007
DOI: 10.1145/1276958.1277279
|View full text |Cite
|
Sign up to set email alerts
|

Evolving Lucene search queries for text classification

Abstract: We describe a method for generating accurate, compact, human understandable text classifiers. Text datasets are indexed using Apache Lucene and Genetic Programs are used to construct Lucene search queries. Genetic programs acquire fitness by producing queries that are effective binary classifiers for a particular category when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from classification tasks.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0
1

Year Published

2010
2010
2021
2021

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 19 publications
0
4
0
1
Order By: Relevance
“…Luo (Luo and Zincir-Heywood 2006) describes a system where recurrent linear GP is used to classify documents that are encoded as word sequences. Genetic methods have also been used to induce rules or queries useful for classifying online text (Smith and Smith 1997;Hirsch et al 2007;Pietramala et al 2008). In this case, the evolution requires a fitness test based on some measure of classification accuracy.…”
Section: Genetic Methods In Text Classificationmentioning
confidence: 99%
“…Luo (Luo and Zincir-Heywood 2006) describes a system where recurrent linear GP is used to classify documents that are encoded as word sequences. Genetic methods have also been used to induce rules or queries useful for classifying online text (Smith and Smith 1997;Hirsch et al 2007;Pietramala et al 2008). In this case, the evolution requires a fitness test based on some measure of classification accuracy.…”
Section: Genetic Methods In Text Classificationmentioning
confidence: 99%
“…Para superar esta irregularidade, se faz uma categorização automática a partir do vocabulário matriz com técnicas de recuperação de informação. Apache Solr atua como motor de recuperação da informação: https://lucene.apache.org/solr Enquanto algumas propostas de classificação das ofertas de emprego recorrem à aprendizagem automática baseada no uso de máquinas vetoriais (Javed et al, 2015), preferimos aplicar algoritmos de classificação que constroem automaticamente consultas a serem executadas no Apache Solr (Hirsch et al, 2007;Sood et al, 2007;Cai et al, 2016). Se desenvolve um programa escrito em linguagem Python para pré-processar o texto dos anúncios, assim como para interagir com o Apache Solr durante a execução das consultas correspondentes e para elaborar com os resultados os conjuntos de dados.…”
Section: Processamento Da Informação Coletadaunclassified
“…Lucene uses a common data structure to accept the index input, so can be flexibly adapted to a variety of data sources, such as databases, office documents, PDF documents and html documents, etc., when data indexing, only needs an appropriate parser to convert the data source into the corresponding data structure. Although Lucene has powerful search and indexing capabilities, but it is not a complete search engine, cannot collect the information of Internet pages, and in sorting have yet to be perfected [8] . The sorting of search results is very important for the search engine, usually users only take attention to the first page search engine returned, therefore, taking the pages valuable for users, with high level as the top surface of the page is an important topic of search engine study.…”
Section: Technical Analysis Of Lucenementioning
confidence: 99%