Skip to content

iRangeGraph: A Dynamic Approach for Enhancing Range-Filtering Nearest Neighbor Search Performance Through Efficient Graph Construction and Reduced Memory Footprint in Large-Scale Data Systems Aswin Ak Artificial Intelligence Category – MarkTechPost

  • by

​[[{“value”:”

Graph-based methods have become increasingly important in data retrieval and machine learning, particularly in nearest neighbor (NN) search. NN search helps identify data points closest to a given query, which becomes critical with high-dimensional data such as text, images, or audio. Approximate nearest neighbor (ANN) methods emerged due to the inefficiency of exact searches in high-dimensional spaces. ANN methods, especially graph-based approaches, balance response time and accuracy, making them widely used in real-world applications such as recommendation engines, e-commerce platforms, and AI-based search systems. These systems rely heavily on timely and accurate retrieval of relevant data from large datasets.

One major challenge in NN search arises when there is a need to combine vector-based search with additional numeric attribute constraints. For instance, a user on an e-commerce platform might want to find products similar to a specific item within a certain price range. Traditional ANN methods filter out irrelevant data before the search or search without considering constraints and filter afterward. Both approaches face performance issues. Pre-filtering can become inefficient for large datasets, while post-filtering may return many irrelevant results, wasting computational resources. The need for efficient search techniques incorporating vector similarity and numeric constraints has become increasingly important, especially in systems handling massive data volumes across various industries.

Existing approaches to range-filtering approximate nearest neighbor (RFANN) queries include pre-filtering and post-filtering, where numeric constraints are applied before or after an ANN search. Another method, in-filtering, tries to integrate these numeric constraints during the search, aiming only to visit data points within the query’s numeric range. However, these methods struggle to provide optimal performance across different query scenarios. For instance, pre-filtering becomes slow when the numeric constraint is not selective enough while post-filtering results in wasted effort when too many irrelevant data points are visited. The inherent shortcomings of these strategies have prompted researchers to explore alternative approaches, particularly for cases where query workloads vary in size and complexity.

Researchers from Nanyang Technological University and Aalborg University have introduced a new method called iRangeGraph to address the limitations of existing processes. Instead of precomputing graphs for every possible numeric range, iRangeGraph materializes elemental graphs for only a few ranges. These graphs can then be used to dynamically construct a dedicated graph for any query range during execution, reducing the need for large-scale index storage. The technique has garnered attention from industry players like Apple and Alibaba, which utilize similar methods for their large-scale search systems. iRangeGraph’s primary advantage is its ability to reduce memory consumption while maintaining high query performance, making it an attractive solution for companies with large datasets.

The iRangeGraph technique involves a dynamic construction of graph-based indexes during query processing. Instead of building and storing an index for every possible range, the method constructs these graphs as needed, leveraging pre-built elemental graphs that cover a moderate number of ranges. This approach conserves memory and ensures that the query response time remains efficient. iRangeGraph is particularly useful in scenarios where the numeric constraints applied to the search are neither highly selective nor unselective and where existing methods tend to perform poorly. iRangeGraph can handle multi-attribute RFANN queries, meaning that queries involving more than one numeric constraint can be processed efficiently. For example, a query might look for data points within a specific price and date range, and iRangeGraph can handle such scenarios effectively.

Performance testing of iRangeGraph was conducted on several real-world datasets, including WIT-Image, TripClick, Redcaps, and YouTube datasets. These datasets involved high-dimensional vector data and numeric attributes such as image size, publication date, and number of likes. The tests showed that iRangeGraph outperformed existing methods significantly. At 0.9 recall, iRangeGraph achieved 2x to 5x better query-per-second (qps) performance than its competitors. The memory footprint was consistently smaller, a key advantage when dealing with large-scale systems where storage is a critical concern. Compared to dedicated graph-based indexes materialized for every query range, iRangeGraph was slower by less than 2x while consuming much less memory. For multi-attribute RFANN queries, iRangeGraph demonstrated a performance improvement of 2x to 4x in qps compared to the most competitive baseline methods.

In conclusion, iRangeGraph presents a novel and efficient solution for range-filtering approximate nearest neighbor queries. By dynamically constructing graph indexes during query execution and using elemental graphs to reduce memory requirements, this method successfully addresses the shortcomings of existing RFANN techniques. iRangeGraph’s ability to deliver high performance across various query workloads while significantly reducing memory consumption makes it an ideal choice for large-scale data systems. The method’s flexibility in handling multi-attribute queries extends its applicability in real-world scenarios. The research findings highlight iRangeGraph’s potential to revolutionize range-filtering in nearest neighbor search, especially for systems managing high-dimensional data with numeric constraints.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

FREE AI WEBINAR: ‘SAM 2 for Video: How to Fine-tune On Your Data’ (Wed, Sep 25, 4:00 AM – 4:45 AM EST)

The post iRangeGraph: A Dynamic Approach for Enhancing Range-Filtering Nearest Neighbor Search Performance Through Efficient Graph Construction and Reduced Memory Footprint in Large-Scale Data Systems appeared first on MarkTechPost.

“}]] [[{“value”:”Graph-based methods have become increasingly important in data retrieval and machine learning, particularly in nearest neighbor (NN) search. NN search helps identify data points closest to a given query, which becomes critical with high-dimensional data such as text, images, or audio. Approximate nearest neighbor (ANN) methods emerged due to the inefficiency of exact searches in
The post iRangeGraph: A Dynamic Approach for Enhancing Range-Filtering Nearest Neighbor Search Performance Through Efficient Graph Construction and Reduced Memory Footprint in Large-Scale Data Systems appeared first on MarkTechPost.”}]]  Read More AI Paper Summary, AI Shorts, Applications, Artificial Intelligence, Databases, Editors Pick, Staff, Tech News, Technology 

Leave a Reply

Your email address will not be published. Required fields are marked *