Currently the evaluate method compares two local results to each another, which is useful. But as suggested by @marius10p, sometimes we want the evaluation to incorporate metadata from the "standard" ground truth datasets.
So one idea is to add an extra method, maybe called benchmark or evaluate-remote that takes as input ONE set of results, and the name of a ground truth dataset, then fetches both the remote regions and the metadata, and returns the scores.
In other words, we'll have both
neurofinder evaluate a.json b.json
and
neurofinder benchmark 01.00 a.json
Thoughts?
cc @syncrostone
Currently the
evaluatemethod compares two local results to each another, which is useful. But as suggested by @marius10p, sometimes we want the evaluation to incorporate metadata from the "standard" ground truth datasets.So one idea is to add an extra method, maybe called
benchmarkorevaluate-remotethat takes as input ONE set of results, and the name of a ground truth dataset, then fetches both the remote regions and the metadata, and returns the scores.In other words, we'll have both
and
Thoughts?
cc @syncrostone