Skip to content

Conversation

@ottointhesky
Copy link
Contributor

This merge request will contain small improvements regarding the docu and unittests of the label feature. For now, only unittests were added which check submitted labels with the dictDB and sqliteDB backend.

A couple of days ago I realized that label will be written twice to the DB which is maybe unwanted (since its wastes resources):

grafik

We need label as explicit column to make it queryable. Hence, we could remove the entry from the metadata before writing the record to the database. This can be handled centrally. However, retrieving a record needs re-adding the label to metadata which makes everything more complicated since it requires specific handling for the different DB backends. So is it worth the effort since label will be empty for most users anyway? Probably not...

As mentioned earlier we also need the possibility to find records based on substrings within DB columns. In monoDB syntax this can be achieved using regex. E.g.
{'label': {'$regex': 'my'}}
would find any records where label contains the string my (at any position). So this would require a new comparision operator ($regex) for the filter defintion in ipp. Supporting this operator in dictDB shouldn't be to difficult but for sqlite, only a strongly reduced regex defintion could be supported via like. sql like basically only support wildcard (single and multi character macthing). So a regex only containing ^ $ . .* .+ could be translated. Anything else isn't possible. So the question here is, should we extended the supported operators by $regex or should we go a different/new way by add the possibility of passing backend specific filter objects (e.g. lamba object for dictDB and where clauses for sqliteDB)? If you are thinking of dropping support of mongoDB the second option might be more appealing. If you do not want to drop support for mongoDB yet, I sugguest that we add a monoDB installation to the github actions. Using the following action script this should be to difficult. No matter which way you want to go, I'm happy to provid the necessary implementation...

@minrk
Copy link
Member

minrk commented Feb 12, 2026

I don't think we need to worry about the cost of writing the label twice to make it queryable. It's quite small compared to anything else, so the impact will be negligible.

I don't imagine full regex search is going to be that useful, since users would only craft the labels specifically to make them searchable, I imagine wildcard matching is plenty.

If you wanted to put some time into testing mongodb, that would be super appreciated! If it takes too much of your time, just say so, and we can probably drop it.

@ottointhesky
Copy link
Contributor Author

I don't think we need to worry about the cost of writing the label twice to make it queryable. It's quite small compared to anything else, so the impact will be negligible.

Ok & thanks. I just wanted to double check with you...

If you wanted to put some time into testing mongodb, that would be super appreciated! If it takes too much of your time, just say so, and we can probably drop it.

As presumed, adding mongodb to the github tests was easy. supercharge/mongodb-github-action only works for linux container but that's definitely better than no test. I also changed to pymongo api 4.x and raise an exception if pymongo version is below 4

I don't imagine full regex search is going to be that useful, since users would only craft the labels specifically to make them searchable, I imagine wildcard matching is plenty.

Agreed, but how should a wildcard matching look in python code? So far the query objects syntax is defined by mongodb (query objects are passed to mongodb untouched) and there is no wildcard syntax there. If we come up with something new, e.g. based on sql like

{'label': {'$like': '%my%'}}

query objects will need preprocessing also for mongodb as it is NOT currently the case. Which direction should we go?

@ottointhesky
Copy link
Contributor Author

FYI: for what ever reason the mongodb container seem to interfere with the slurm container. Sometimes it works but most of the time it doesn't. Deactivating mongodb via if for the slurm test doesn't seem to work. Hopefully I can find a solution to this problem...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants