Skip to content
/ server Public

Comments

MDEV-35327: Add VEC_DISTANCE_MANHATTAN (L1) distance function#4654

Open
TQSH-Dev wants to merge 1 commit intoMariaDB:mainfrom
TQSH-Dev:mdev-35327-manhattan
Open

MDEV-35327: Add VEC_DISTANCE_MANHATTAN (L1) distance function#4654
TQSH-Dev wants to merge 1 commit intoMariaDB:mainfrom
TQSH-Dev:mdev-35327-manhattan

Conversation

@TQSH-Dev
Copy link

Implemented the Manhattan distance function as described in the Jira task. Added item_vectorfunc implementation and registered the function in item_create

@CLAassistant
Copy link

CLAassistant commented Feb 15, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@vuvova vuvova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add tests. with and without a vector index

@TQSH-Dev
Copy link
Author

please add tests. with and without a vector index

Thanks for the review,will do it by tommorow.

switch (kind) {
case EUCLIDEAN: calc_distance= calc_distance_euclidean; break;
case COSINE: calc_distance= calc_distance_cosine; break;
case MANHATTAN: calc_distance= calc_distance_manhattan;break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: additional space before break

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


public:
enum distance_kind { EUCLIDEAN, COSINE, AUTO } kind;
enum distance_kind { EUCLIDEAN, COSINE, MANHATTAN,AUTO } kind;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sapce before AUTO

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

};


Create_func_vec_distance_manhattan Create_func_vec_distance_manhattan::s_singleton;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 1 blank line before and 2 blank lines after for consistency.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@grooverdan grooverdan added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Feb 16, 2026
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a preliminary review: I have no new remarks aside from what others already did.

@TQSH-Dev
Copy link
Author

please add tests. with and without a vector index

added

@TQSH-Dev
Copy link
Author

The build failure in rpl.rpl_row_foreign_key_mdl appears to be a random timeout unrelated to my changes, I am unsure as to why this is happening. My added test main.mdev_35327 passed successfully.

@TQSH-Dev TQSH-Dev requested a review from vuvova February 17, 2026 04:35
--echo #
--echo # With Vector Index
--echo #
CREATE VECTOR INDEX idx ON t1(v);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an index created with the euclidean distance. It cannot be used for searches with manhattan distance.

You didn't implement support for manhattan distance at all. It's rather more complex, than adding a new Item class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. You are absolutely correct,I dug into the EXPLAIN plans and realized that while the VEC_DISTANCE_MANHATTAN function returns the correct values, the optimizer is still performing a full table scan. I'll work on this.

@TQSH-Dev TQSH-Dev force-pushed the mdev-35327-manhattan branch 2 times, most recently from 6af8462 to 5131eae Compare February 18, 2026 16:52
@TQSH-Dev TQSH-Dev changed the base branch from main to bb-10.11-new-innodb-julius February 18, 2026 18:17
@TQSH-Dev TQSH-Dev requested a review from ottok as a code owner February 18, 2026 18:17
@TQSH-Dev TQSH-Dev changed the base branch from bb-10.11-new-innodb-julius to main February 18, 2026 18:17
@TQSH-Dev TQSH-Dev force-pushed the mdev-35327-manhattan branch from 359e824 to 36d3ef8 Compare February 18, 2026 18:50
@TQSH-Dev
Copy link
Author

Fixed the part_of_sortkey loop to correctly identify Manhattan distance for index utilization.New test cases in mdev_35327.test confirm the mhnsw index is now being utilized for Manhattan distance queries.

@TQSH-Dev TQSH-Dev requested a review from vuvova February 18, 2026 19:39
Field *f= item->field;
KEY *keyinfo= f->table->s->key_info;
for (uint i= f->table->s->keys; i < f->table->s->total_keys; i++)
for (uint i= 0; i < f->table->s->total_keys; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that?

--echo # Without Vector Index
--echo #
CREATE TABLE t1 (id INT, v VECTOR(3) NOT NULL);
INSERT INTO t1 VALUES (1, VEC_FromText('[0,0,0]')), (2, VEC_FromText('[1,1,1]')), (3, VEC_FromText('[5,5,5]'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't a very good dataset. L1 distance between your vectors is 0,3,12, as your tests show.
L2 distance would be 0, 1.732, 6.9282.

That is, you cannot know if the index works correctly, Euclidean and Manhattan provide exactly the same ordering. Try vectors where L1 and L2 produce different results.

#
CREATE VECTOR INDEX idx ON t1(v) DISTANCE=manhattan;
# Output should be 3,5 and 6 again
SELECT id, VEC_DISTANCE_MANHATTAN(v, VEC_FromText('[0,0,0]')) as dist FROM t1 ORDER BY dist;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't have LIMIT here, so the index is not used. You have LIMIT in EXPLAIN below, but not here.

@TQSH-Dev TQSH-Dev force-pushed the mdev-35327-manhattan branch from 28407cb to ed38e47 Compare February 21, 2026 17:54
@TQSH-Dev TQSH-Dev requested a review from vuvova February 21, 2026 17:56
1 6
# Checking if the vector index is actually implemented using manhattan distance
EXPLAIN SELECT id FROM t1 FORCE INDEX (idx)
ORDER BY VEC_DISTANCE_MANHATTAN(v, VEC_FromText('[0,0,0]')) LIMIT 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you don't use EXPLAIN of exactly the same query as above? You cannot see whether the query above uses the index, you only know that the one here with LIMIT 1 does.

please, do a query with EXPLAIN and exactly the same query without EXPLAIN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

5 participants