Combination of levenshtein distance and rabin-karp to  improve the accuracy of document equivalence level

Siahaan, Andysah Putera Utama; Aryza, Solly; Hariyanto, Eko; Rusiadi,; Lubis, Andre Hasudungan; Ikhwan, Ali; Kan, Phak Len Eh

doi:10.14419/ijet.v7i2.27.12084

Cited by 25 publications

(4 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where 𝑚 is the number of similar characters, 𝑠 1 is the length of string-1, 𝑠 2 is the length of string-2, and t is the number of transpositions. The Rabin-Karp algorithm utilizes the hash method for performing multiple searches [31]. The steps involved in Rabin-Karp include: i) removing punctuation marks from the document and converting the text to lowercase for the search; ii) dividing the texts into grams with a predefined k-gram value; iii) calculating the hash value using the rolling hash function for each gram, following the formula: h=c1*bk-1|c2*bk-2 |...| ck-1*b| ck; iv) identifying matching hash values between two texts; and v) determining the similarity between two pieces of text using Dice's similarity coefficient equation.…”

Section: Jaro Winkler Distance Versus Rabin-karpmentioning

confidence: 99%

Generate fuzzy string-matching to build self attention on Indonesian medical-chatbot

Suwarningsih,

Nuryani

2024

IJECE

View full text Add to dashboard Cite

Chatbot is a form of interactive conversation that requires quick and precise answers. The process of identifying answers to users’ questions involves string matching and handling incorrect spelling. Therefore, a system that can independently predict and correct letters is highly necessary. The approach used to address this issue is to enhance the fuzzy string-matching method by incorporating several features for self-attention. The combination of fuzzy string-matching methods employed includes Jaro Winkler distance + Levenshtein Damerau distance and Damerau Levenshtein + Rabin Carp. The reason for using this combination is their ability not only to match strings but also to correct word typing errors. This research contributes by developing a self-attention mechanism through a modified fuzzy string-matching model with enhanced word feature structures. The goal is to utilize this self-attention mechanism in constructing the Indonesian medical bidirectional encoder representations from transformers (IM-BERT). This will serve as a foundation for additional features to provide accurate answers in the Indonesian medical question and answer system, achieving an exact match of 85.7% and an F1-score of 87.6%.

show abstract

Section: Jaro Winkler Distance Versus Rabin-karpmentioning

confidence: 99%

Generate fuzzy string-matching to build self attention on Indonesian medical-chatbot

Suwarningsih,

Nuryani

2024

IJECE

View full text Add to dashboard Cite

show abstract

“…A hashing-based string-matching algorithm known as Rabin-Karp (RK) was developed in 1987 [34]. Tis algorithm uses the hashing approach to identify patterns within a text [35]. Lecroq introduced the Hash-q algorithm, which calculates a hash value between 0 and 255 for each q-gram in the pattern p [36,37].…”

Section: Literature Reviewmentioning

confidence: 99%

An Improved Hashing Approach for Biological Sequence to Solve Exact Pattern Matching Problems

Mahmud,

Rahman,

Hasan Talukder

2023

Applied Computational Intelligence and Soft Computing

View full text Add to dashboard Cite

Pattern matching algorithms have gained a lot of importance in computer science, primarily because they are used in various domains such as computational biology, video retrieval, intrusion detection systems, and fraud detection. Finding one or more patterns in a given text is known as pattern matching. Two important things that are used to judge how well exact pattern matching algorithms work are the total number of attempts and the character comparisons that are made during the matching process. The primary focus of our proposed method is reducing the size of both components wherever possible. Despite sprinting, hash-based pattern matching algorithms may have hash collisions. The Efficient Hashing Method (EHM) algorithm is improved in this research. Despite the EHM algorithm’s effectiveness, it takes a lot of time in the preprocessing phase, and some hash collisions are generated. A novel hashing method has been proposed, which has reduced the preprocessing time and hash collision of the EHM algorithm. We devised the Hashing Approach for Pattern Matching (HAPM) algorithm by taking the best parts of the EHM and Quick Search (QS) algorithms and adding a way to avoid hash collisions. The preprocessing step of this algorithm combines the bad character table from the QS algorithm, the hashing strategy from the EHM algorithm, and the collision-reducing mechanism. To analyze the performance of our HAPM algorithm, we have used three types of datasets: E. coli, DNA sequences, and protein sequences. We looked at six algorithms discussed in the literature and compared our proposed method. The Hash-q with Unique FNG (HqUF) algorithm was only compared with E. coli and DNA datasets because it creates unique bits for DNA sequences. Our proposed HAPM algorithm also overcomes the problems of the HqUF algorithm. The new method beats older ones regarding average runtime, number of attempts, and character comparisons for long and short text patterns, though it did worse on some short patterns.

show abstract

“…The Rabin Karp algorithm is used for string matching and has advantages in the simple string matching process. This algorithm uses hashing to find a collection of string patterns in a text [24]. This research the Rabin Karp Algorithm use to guarantee of the data consistency in the blockchain process.…”

Section: Rabin Karp Algorithmmentioning

confidence: 99%

Enhanced PBFT Blockchain based on a Combination of Ripple and PBFT (R-PBFT) to Cryptospatial Coordinate

Wibowo¹,

Hariadi

Suhartono

et al. 2022

regist. j. ilm. teknol. sist. inf.

View full text Add to dashboard Cite

In this research, we introduce the combination of two Blockchain methods. Ripple Protocol Consensus Algorithm (RPCA) and Practical Byzantine Fault Tolerance (PBFT) are applied to cryptospatial coordinates to support cultural heritage tourism. The PBFT process is still used until the preparation process to ensure a maximum error of 33%, and every node would add a new chain in all nodes, so PBFT has a slower processing speed than other methods. This research cuts the PBFT process. After the preparation process in PBFT, the data was entered into the RPCA node and was calculated using an equation to minimize errors with a maximum limit of 20%. After this process, the was were sent to the commit process to store the data in all connected nodes in the Blockchain network; we call this combination of two methods R-PBFT. Combining the two methods can enhance data processing security and speed because it still uses the PBFT work combined with the speed of RPCA. Furthermore, this method uses a fault tolerance value from the RPCA of 20% to enhance data processing security and speed.

show abstract

Combination of levenshtein distance and rabin-karp to improve the accuracy of document equivalence level

Cited by 25 publications

References 18 publications

Generate fuzzy string-matching to build self attention on Indonesian medical-chatbot

Generate fuzzy string-matching to build self attention on Indonesian medical-chatbot

An Improved Hashing Approach for Biological Sequence to Solve Exact Pattern Matching Problems

Enhanced PBFT Blockchain based on a Combination of Ripple and PBFT (R-PBFT) to Cryptospatial Coordinate

Contact Info

Product

Resources

About