BMDeep tutorial added #95
BMDeep tutorial added #95DanielaSchacherer wants to merge 28 commits intoImagingDataCommons:masterfrom
Conversation
was stored with a wrong name
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
@fedorov did we already talk about this? If not, maybe we should briefly in the Slim meeting later today! |
There was a problem hiding this comment.
Copilot wasn't able to review any files in this pull request.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I admit I lost track of this PR... |
|
no problem, me too :D |
|
@DanielaSchacherer I pushed some minor changes, and also left some suggestion in the comments here: https://colab.research.google.com/drive/1kD_mEbfi1ozhyyS0WK9zl3vyKYMJJJjA?usp=sharing |
|
I took the feedback in and answered to your comments in the notebook linked above. I already changed the code to use ann_index, so let's not merge yet, but let me test as soon as ann_index is out :) |
|
@fedorov do you have any additional feedback? |
|
You rely on "monolayer" in SeriesDescription in one of the queries, which is not a good pattern. Why not use Search by unstructured data is not desirable, especially when the query can be resolved using structured data. client.fetch_index("ann_group_index")
# Select all annotation groups from ANN series whose SeriesDescription
# mentions "monolayer" in the bonemarrowwsi_pediatricleukemia collection
monolayer_ann_groups = client.sql_query("""
SELECT
ag.SeriesInstanceUID,
ag.AnnotationGroupNumber,
ag.AnnotationGroupUID,
ag.AnnotationGroupLabel,
ag.NumberOfAnnotations,
ag.GraphicType,
ag.AnnotationPropertyCategory_CodeMeaning,
ag.AnnotationPropertyType_CodeMeaning,
ai.referenced_SeriesInstanceUID
FROM ann_group_index ag
JOIN ann_index ai
ON ag.SeriesInstanceUID = ai.SeriesInstanceUID
JOIN index i
ON ag.SeriesInstanceUID = i.SeriesInstanceUID
WHERE i.collection_id = 'bonemarrowwsi_pediatricleukemia'
AND ag.AnnotationPropertyType_CodeMeaning = 'Selected region'
ORDER BY ag.SeriesInstanceUID, ag.AnnotationGroupNumber
""")
print(f"Found {len(monolayer_ann_groups)} annotation groups "
f"across {monolayer_ann_groups['SeriesInstanceUID'].nunique()} ANN series")
display(monolayer_ann_groups) |
|
Also note that |
|
I think similar comment applies further in the notebook where you deal with unlabeled cells. |
|
@fedorov I have adapted the notebook. You are right, it is of course better to query by standardized values. However here, this added another layer of complexity. I will ask André to review and give me feedback on how well understandable the notebook is and how it could be improved. Let mw know, what you think about the current version. |
|
The issue is that if we rely on collection-specific conventions to be able to use the data, it will be more difficult to discover and reuse - most importantly, these days, by the LLMs. This collection-specific approach that is oriented towards a human using a single collection does not scale, and this is exactly why we put the effort into using codes and standard DICOM attributes. But I am fine either way for the tutorial - it's your call. It will be an interesting experiment to ask Claude + IDC skill to write a tutorial on this topic, and compare! |
As discussed, I added the BMDeep tutorial. Happy for feedback! :)