Conversation
update dependency
|
Before you submit for review:
If you did not complete any of these, then please explain below. |
MarkWolters
left a comment
There was a problem hiding this comment.
a couple nit-picky suggestions but looks good
...s/src/main/java/io/github/jbellis/jvector/example/benchmarks/datasets/DataSetLoaderHDF5.java
Show resolved
Hide resolved
...s/src/main/java/io/github/jbellis/jvector/example/benchmarks/datasets/DataSetLoaderHDF5.java
Show resolved
Hide resolved
...s/src/main/java/io/github/jbellis/jvector/example/benchmarks/datasets/DataSetLoaderHDF5.java
Show resolved
Hide resolved
...s/src/main/java/io/github/jbellis/jvector/example/benchmarks/datasets/DataSetLoaderHDF5.java
Show resolved
Hide resolved
| similarityFunction = VectorSimilarityFunction.EUCLIDEAN; | ||
| } | ||
| else { | ||
| throw new IllegalArgumentException("Unknown similarity function -- expected angular or euclidean for " + filename); |
There was a problem hiding this comment.
Let's use the terminology we use elsewhere: cosine, l2 (Euclidean is fine), and dot product. And dot product should map to dot product, not cosine.
Also, it's pretty precarious selecting the VSF based on the file name.
There was a problem hiding this comment.
Ok, cosine, l2, dot_product, got it. Not sure why we have the wrong matching there.
...main/java/io/github/jbellis/jvector/example/benchmarks/datasets/DataSetLoaderVectordata.java
Show resolved
Hide resolved
...main/java/io/github/jbellis/jvector/example/benchmarks/datasets/DataSetLoaderVectordata.java
Show resolved
Hide resolved
| } | ||
|
|
||
| private VectorSimilarityFunction mapDistanceFunction(DistanceFunction df) { | ||
| if (df == null) return COSINE; |
There was a problem hiding this comment.
Why wouldn't df == null be an error?
If there is a need for a default, it should be DOT_PRODUCT and is only safe to arbitrarily use in all cases if the vectors are normalized.
Also, shouldn't this return VectorSimilarityFunction.COSINE?
There was a problem hiding this comment.
Yes, undefined should be an error.
I think it might actually return VectorSimilarityFunction.COSINE on line 281, but it doesn't look like it from the way it is written. I agree that explicit is better here.
tlwillke
left a comment
There was a problem hiding this comment.
Please see my comments for requested changes. It's looking good so far, but I cannot evaluate it further until the catalog.yaml comment is addressed.
This PR activates the vectodata loader as the third option after the HDF5 and MFD loaders.
If a dataset is not found in either of these, then it will be loaded from vectordata sources so long as the dataset is visible.
Users will want to add a ~/.config/vectordata/catalogs.yaml file to get access to their own or group shared datasets.