Proceedings of the 5th Annual on Lifelog Search Challenge 2022
DOI: 10.1145/3512729.3533006
|View full text |Cite
|
Sign up to set email alerts
|

Memento 2.0: An Improved Lifelog Search Engine for LSC'22

Abstract: In this paper, we present Memento 2.0, an improved version of our system which first participated in the Lifelog Search Challenge 2021. Memento 2.0 employs image-text embeddings derived from two CLIP models (ViT-L/14 and ResNet-50x64) and adopts a weighted ensemble approach to derive a combined final ranking. Our approach significantly improves the performance over the baseline LSC'21 system. We additionally make important updates to the system's user interface after analysing the shortcomings to make it more … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 16 publications
(7 citation statements)
references
References 26 publications
0
3
0
Order By: Relevance
“…The most common approach to organizing, retrieving, and analyzing data from wearable cameras involves assigning semantic contexts to images, like visual descriptions, time, and location [ 13 , 14 ]. Various computer vision models are employed to extract visual information from the images, including object detection, activity recognition, optical character recognition [ 13 , 15 ], and embedding models [ 16 , 17 ]. A typical retrieval system would also incorporate different techniques, namely, query enhancement [ 13 ], visual similarity search [ 16 ], and temporal search [ 16 ].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The most common approach to organizing, retrieving, and analyzing data from wearable cameras involves assigning semantic contexts to images, like visual descriptions, time, and location [ 13 , 14 ]. Various computer vision models are employed to extract visual information from the images, including object detection, activity recognition, optical character recognition [ 13 , 15 ], and embedding models [ 16 , 17 ]. A typical retrieval system would also incorporate different techniques, namely, query enhancement [ 13 ], visual similarity search [ 16 ], and temporal search [ 16 ].…”
Section: Discussionmentioning
confidence: 99%
“…This involves assigning semantic contexts like visual descriptions, time, and location [ 13 , 14 ]. Various computer vision models are employed, such as object detection, activity recognition, and optical character recognition, in addition to embedding models [ 13 , 15 - 17 ]. Retrieval systems incorporate techniques such as query enhancement, visual similarity search, and temporal search [ 13 , 16 ].…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, this search engine could only capture the objects but not other information, such as the interaction between objects, the visible text in images, or the context of the images. Recently, many current lifelog retrieval systems [31,2,23] used CLIP to deal with those problems and obtained the top performance at LSC '22 [12]. This was because the CLIP model learns not only the general information of images but also detailed information, such as the visible text in them.…”
Section: Related Workmentioning
confidence: 99%
“…CLIP in Lifelog Retrieval. Recent works [2,27] experimented to show the performance of different versions of CLIP in the lifelog retrieval task. However, unlike their experiments, we compared the concept-based model with many SOTA crossmodality retrieval models, including CLIP, BLIP, and HADA, in the lifelog retrieval task with two conőgurations: automatic manner and interactive manner.…”
Section: Related Workmentioning
confidence: 99%
“…It also enhances its user interface to accommodate the new features while maintaining simplicity. Memento 2.0 [2] utilised a weighted ensemble approach to CLIP integration, which significantly improved the performance over the LSC'21 system and it also introduced a number of updates to the UI to enhance user efficiency.…”
Section: Participating Systemsmentioning
confidence: 99%