MDEV-35327: Add VEC_DISTANCE_MANHATTAN (L1) distance function#4654
MDEV-35327: Add VEC_DISTANCE_MANHATTAN (L1) distance function#4654TQSH-Dev wants to merge 1 commit intoMariaDB:mainfrom
Conversation
vuvova
left a comment
There was a problem hiding this comment.
please add tests. with and without a vector index
Thanks for the review,will do it by tommorow. |
sql/item_vectorfunc.cc
Outdated
| switch (kind) { | ||
| case EUCLIDEAN: calc_distance= calc_distance_euclidean; break; | ||
| case COSINE: calc_distance= calc_distance_cosine; break; | ||
| case MANHATTAN: calc_distance= calc_distance_manhattan;break; |
There was a problem hiding this comment.
nit: additional space before break
sql/item_vectorfunc.h
Outdated
|
|
||
| public: | ||
| enum distance_kind { EUCLIDEAN, COSINE, AUTO } kind; | ||
| enum distance_kind { EUCLIDEAN, COSINE, MANHATTAN,AUTO } kind; |
| }; | ||
|
|
||
|
|
||
| Create_func_vec_distance_manhattan Create_func_vec_distance_manhattan::s_singleton; |
There was a problem hiding this comment.
nit: 1 blank line before and 2 blank lines after for consistency.
gkodinov
left a comment
There was a problem hiding this comment.
This is a preliminary review: I have no new remarks aside from what others already did.
added |
|
The build failure in rpl.rpl_row_foreign_key_mdl appears to be a random timeout unrelated to my changes, I am unsure as to why this is happening. My added test main.mdev_35327 passed successfully. |
mysql-test/main/mdev_35327.test
Outdated
| --echo # | ||
| --echo # With Vector Index | ||
| --echo # | ||
| CREATE VECTOR INDEX idx ON t1(v); |
There was a problem hiding this comment.
This is an index created with the euclidean distance. It cannot be used for searches with manhattan distance.
You didn't implement support for manhattan distance at all. It's rather more complex, than adding a new Item class.
There was a problem hiding this comment.
Thanks for the feedback. You are absolutely correct,I dug into the EXPLAIN plans and realized that while the VEC_DISTANCE_MANHATTAN function returns the correct values, the optimizer is still performing a full table scan. I'll work on this.
6af8462 to
5131eae
Compare
359e824 to
36d3ef8
Compare
|
Fixed the part_of_sortkey loop to correctly identify Manhattan distance for index utilization.New test cases in mdev_35327.test confirm the mhnsw index is now being utilized for Manhattan distance queries. |
sql/item_vectorfunc.cc
Outdated
| Field *f= item->field; | ||
| KEY *keyinfo= f->table->s->key_info; | ||
| for (uint i= f->table->s->keys; i < f->table->s->total_keys; i++) | ||
| for (uint i= 0; i < f->table->s->total_keys; i++) |
mysql-test/main/mdev_35327.test
Outdated
| --echo # Without Vector Index | ||
| --echo # | ||
| CREATE TABLE t1 (id INT, v VECTOR(3) NOT NULL); | ||
| INSERT INTO t1 VALUES (1, VEC_FromText('[0,0,0]')), (2, VEC_FromText('[1,1,1]')), (3, VEC_FromText('[5,5,5]')); |
There was a problem hiding this comment.
this isn't a very good dataset. L1 distance between your vectors is 0,3,12, as your tests show.
L2 distance would be 0, 1.732, 6.9282.
That is, you cannot know if the index works correctly, Euclidean and Manhattan provide exactly the same ordering. Try vectors where L1 and L2 produce different results.
a24ad01 to
28407cb
Compare
mysql-test/main/mdev_35327.result
Outdated
| # | ||
| CREATE VECTOR INDEX idx ON t1(v) DISTANCE=manhattan; | ||
| # Output should be 3,5 and 6 again | ||
| SELECT id, VEC_DISTANCE_MANHATTAN(v, VEC_FromText('[0,0,0]')) as dist FROM t1 ORDER BY dist; |
There was a problem hiding this comment.
you don't have LIMIT here, so the index is not used. You have LIMIT in EXPLAIN below, but not here.
28407cb to
ed38e47
Compare
| 1 6 | ||
| # Checking if the vector index is actually implemented using manhattan distance | ||
| EXPLAIN SELECT id FROM t1 FORCE INDEX (idx) | ||
| ORDER BY VEC_DISTANCE_MANHATTAN(v, VEC_FromText('[0,0,0]')) LIMIT 1; |
There was a problem hiding this comment.
why you don't use EXPLAIN of exactly the same query as above? You cannot see whether the query above uses the index, you only know that the one here with LIMIT 1 does.
please, do a query with EXPLAIN and exactly the same query without EXPLAIN.
Implemented the Manhattan distance function as described in the Jira task. Added item_vectorfunc implementation and registered the function in item_create