Near-duplicate Question Detection
Suggesting relevant questions to users is an important task in various applications, such as community Q&A or e-commerce websites.
To ensure that there is no redundancy in the selected set of candidate questions, it is essential to filter out any near-duplicate questions. Identifying near-duplicate questions has another use case in
light of the adoption of Large Language Models (LLMs) – fetching
pre-computed answers for similar questions. However, identifying
the similarity of questions is a bit more complex in comparison
to generic text, as questions entail open-ended information that is
not explicitly contained within the wording of the question itself.
We introduce a taxonomy that accounts for the subtle intricacies
characteristic of near-duplicate questions and propose a method
for detecting them utilizing the capabilities of LLMs.