Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 114 additions & 49 deletions notebooks/pathomics/microscopy_dicom_ann_intro.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
"----------------------\n",
"\n",
"Initial version: Jan 2025 \n",
"Last updated: Oct 2025"
"Last updated: Feb 2026"
]
},
{
Expand Down Expand Up @@ -68,7 +68,7 @@
"\n",
"## Example\n",
"\n",
"As an example, you can open the sample slide and annotations available here https://viewer.imaging.datacommons.cancer.gov/slim/studies/2.25.150973379448125660359643882019624926008/series/1.3.6.1.4.1.5962.99.1.1062471168.429178244.1637445043712.2.0. Note that not all of the slides in a given study contain annotations. Make sure you select the slide that has the suffix \"DX1\", as shown in the screenshot below. To enable visualization of the annotations, open \"Annotation Groups\" section in the right panel, and toggle \"Nuclei\" switch.\n",
"As an example, you can open the sample slide and annotations available here https://viewer.imaging.datacommons.cancer.gov/slim/studies/2.25.150973379448125660359643882019624926008/series/1.3.6.1.4.1.5962.99.1.1062471168.429178244.1637445043712.2.0. Note that not all of the slides in a given study contain annotations. Make sure you select the slide that has the suffix \"DX1\", as shown in the screenshot below. To enable visualization of the annotations, open \"Annotation Groups\" section in the right panel, and toggle the \"Nuclei\" switch.\n",
"\n",
"<img src=\"https://raw.githubusercontent.com/ImagingDataCommons/IDC-Tutorials/master/notebooks/pathomics/pan_cancer_annotation_slim_example.png\" alt=\"Example visualization of annotations\" width=\"1000\"/>"
]
Expand All @@ -88,7 +88,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"metadata": {
"id": "M2nIduVum6mP"
},
Expand All @@ -111,11 +111,23 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"metadata": {
"id": "3IuBLeoEnWVU"
"id": "3IuBLeoEnWVU",
"outputId": "cfaab9bb-ed0c-4421-e65d-e443c3649152",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"outputs": [],
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:pydicom:get_frame_offsets is deprecated and will be removed in v4.0\n"
]
}
],
"source": [
"import os\n",
"import json\n",
Expand All @@ -140,62 +152,66 @@
},
"source": [
"## Accessing DICOM ANNs from the IDC\n",
"To access and download the ANNs files, we utilize the Python package [idc-index](https://github.com/ImagingDataCommons/idc-index) that facilitates querying metadata and downloading DICOM files from the IDC. Since all available ANN documents in the IDC have a combined size of 1.82 TB - for the demonstration in this tutorial we will use only a single ANN from the TCGA-BRCA collection as an example."
"To access and download the ANNs files, we utilize the Python package [idc-index](https://github.com/ImagingDataCommons/idc-index) that facilitates querying metadata and downloading DICOM files from the IDC. Since all available ANN objects in the IDC have a combined size of 1.82 TB - for the demonstration in this tutorial we will use only a single ANN from the TCGA-BRCA collection as an example."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {
"id": "RxHTkvyZ_FzV"
},
"outputs": [],
"source": [
"idc_client = index.IDCClient() # set-up idc_client\n",
"idc_client.fetch_index('sm_instance_index')"
"idc_client.fetch_index('sm_instance_index') # fetch additional sm_instance_index containing all slide microscopy (SM) instances available in the IDC\n",
"idc_client.fetch_index('ann_index') # fetch additional ann_index containing all ANN series available in the IDC"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 4,
"metadata": {
"id": "cBxXm8bQ_Hcq"
},
"outputs": [],
"source": [
"query_sr = '''\n",
"query_ann = '''\n",
"SELECT\n",
" SeriesInstanceUID,\n",
" index.SeriesInstanceUID,\n",
" StudyInstanceUID,\n",
" PatientID,\n",
" collection_id\n",
"FROM\n",
" index\n",
"JOIN\n",
" ann_index ON ann_index.SeriesInstanceUID = index.SeriesInstanceUID\n",
"WHERE\n",
" analysis_result_id = 'Pan-Cancer-Nuclei-Seg-DICOM' AND Modality = 'ANN' AND collection_id = 'tcga_luad'\n",
" analysis_result_id = 'Pan-Cancer-Nuclei-Seg-DICOM'\n",
" AND collection_id = 'tcga_luad'\n",
"ORDER BY\n",
" crdc_series_uuid\n",
"LIMIT 1\n",
"'''\n",
"pan_ann = idc_client.sql_query(query_sr)"
"pan_ann = idc_client.sql_query(query_ann)"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 6,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "mkInIVjjARnu",
"outputId": "34ab1e81-90cd-4d6f-a2f9-230c18b1a1a2"
"outputId": "0b545010-ecfa-4fcf-a57a-33970f76b3fd"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"name": "stderr",
"text": [
"Downloading data: 100%|██████████| 215M/215M [00:02<00:00, 85.5MB/s]\n"
"Downloading data: 100%|██████████| 215M/215M [00:02<00:00, 106MB/s]\n"
]
}
],
Expand All @@ -218,11 +234,39 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 7,
"metadata": {
"id": "O7VT4D0jVnhv"
"id": "O7VT4D0jVnhv",
"outputId": "822b713b-6b70-4147-f1f4-dcf8445bf364",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 921
}
},
"outputs": [],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<IPython.lib.display.IFrame at 0x7f91a7affce0>"
],
"text/html": [
"\n",
" <iframe\n",
" width=\"1500\"\n",
" height=\"900\"\n",
" src=\"https://viewer.imaging.datacommons.cancer.gov/slim/studies/2.25.150973379448125660359643882019624926008\"\n",
" frameborder=\"0\"\n",
" allowfullscreen\n",
" \n",
" ></iframe>\n",
" "
]
},
"metadata": {},
"execution_count": 7
}
],
"source": [
"viewer_url = idc_client.get_viewer_URL(studyInstanceUID=pan_ann['StudyInstanceUID'].iloc[0], viewer_selector='slim')\n",
"from IPython.display import IFrame\n",
Expand All @@ -239,29 +283,25 @@
"\n",
"DICOM ANNs extend the capabilities of [DICOM Structured Report (SR)](https://highdicom.readthedocs.io/en/latest/generalsr.html) documents as they were developed specifically for the storage of a **large number of similar annotations** and corresponding measurements (hence the full name Microscopy Simple **Bulk** Annotations). A popular example are annotations of small structures like cells or cell nuclei.\n",
"\n",
"Each ANN object contains one or more \"Annotation Groups\" consisting of many similar graphical annotations, optionally accompanied by one or several numerical measurements belonging to those graphical annotations as well as some required and some optional metadata that describe the contents of the group (see [here](https://highdicom.readthedocs.io/en/latest/package.html#highdicom.ann.AnnotationGroup) for more information).\n",
"\n",
"The following code uses the Python library [highdicom](https://github.com/ImagingDataCommons/highdicom) to extract annotation groups and corresponding measurements from a single DICOM ANN.\n",
"Each ANN object contains one or more \"Annotation Groups\" consisting of many similar graphical annotations, optionally accompanied by one or several numerical measurements belonging to those graphical annotations as well as some required and some optional metadata that describe the contents of the group. These include an annotation group identifier, a human-readable label but also coded values that describe the category and the type of the annotated structure (see [here](https://highdicom.readthedocs.io/en/latest/package.html#highdicom.ann.AnnotationGroup) for a complete documentation).\n",
"\n",
"In highdicom, the annotation data are encoded as a list of numpy arrays, each of the shape (N x D). N is the number of coordinates which depends on the graphic type, e.g. a `POINT` will have one coordinate, while a `POLYGON` has >= 3 coordinates. Coordinates are either defined in the 2D image coordinate system (D=2), or in the frame-of-reference coordinate system (D=3).\n",
"\n",
"More explanations and guidance through implementation details for ANN objects in highdicom can be found [here](https://highdicom.readthedocs.io/en/latest/ann.html#microscopy-bulk-simple-annotation-ann-objects)."
"The following code uses [highdicom](https://github.com/ImagingDataCommons/highdicom) to extract annotation groups and metadata from a single DICOM ANN. More helpful explanations and guidance through implementation details for ANN objects in highdicom can be found [here](https://highdicom.readthedocs.io/en/latest/ann.html#microscopy-bulk-simple-annotation-ann-objects)."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 8,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "1gacgd-ELU3l",
"outputId": "1eb09b11-e30b-4b9a-b222-c0443168650d"
"outputId": "e851613e-08b4-430b-ee5e-56f6a7bba935"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"name": "stdout",
"text": [
"Number of annotation groups: 1\n"
]
Expand All @@ -276,55 +316,80 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "cFM0--gRPXtp",
"outputId": "3d6b1d0e-ce37-4cc6-fa28-309cf98fc60c"
"outputId": "40fb6c1e-9c39-4fe3-cfea-7f7bcf3993b2"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"name": "stdout",
"text": [
"Label of the annotation group: Nuclei\n",
"Unique identifier of the annotation group: 1.2.826.0.1.3680043.10.511.3.11186236449463673483170621129963156\n",
"Human-readable label of the annotation group: Nuclei\n",
"Coded label of the annotated property category: \n",
"(0008,0100) Code Value SH: '91723000'\n",
"(0008,0102) Coding Scheme Designator SH: 'SCT'\n",
"(0008,0104) Code Meaning LO: 'Anatomical Stucture'\n",
"Coded label of the annotated property type: \n",
"(0008,0100) Code Value SH: '84640000'\n",
"(0008,0102) Coding Scheme Designator SH: 'SCT'\n",
"(0008,0104) Code Meaning LO: 'Nucleus'\n",
"Graphic type of annotations: GraphicTypeValues.POLYGON\n",
"Number of annotations: 865921\n"
"Number of annotations: 865921\n",
"Algorithm type used to generate the annotations: AnnotationGroupGenerationTypeValues.AUTOMATIC\n"
]
}
],
"source": [
"# Most essential metadata of the annotation group can be accessed as follows:\n",
"ann_group = ann_groups[0]\n",
"print(f'Label of the annotation group: {ann_group.label}')\n",
"print(f'Unique identifier of the annotation group: {ann_group.uid}')\n",
"print(f'Human-readable label of the annotation group: {ann_group.label}')\n",
"print(f'Coded label of the annotated property category: \\n{ann_group.annotated_property_category}')\n",
"print(f'Coded label of the annotated property type: \\n{ann_group.annotated_property_type}')\n",
"print(f'Graphic type of annotations: {ann_group.graphic_type}')\n",
"print(f'Number of annotations: {ann_group.number_of_annotations}')"
"print(f'Number of annotations: {ann_group.number_of_annotations}')\n",
"print(f'Algorithm type used to generate the annotations: {ann_group.algorithm_type}')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "g-z2XnaMyIpH"
},
"source": [
"In highdicom, the graphic annotation data are encoded as a list of numpy arrays, each of the shape (N x D). N is the number of coordinates which depends on the graphic type, e.g. a `POINT` will have one coordinate, while a `POLYGON` has >= 3 coordinates. Coordinates are either defined in the 2D image coordinate system (D=2), or in the frame-of-reference coordinate system (D=3)."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 10,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "t1h11OwixXD-",
"outputId": "0e158d16-d47a-4b4f-95d7-e1a32c71adc6"
"outputId": "20d6cc4d-ebc8-4fcc-bc45-0f701fbf5fcc"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"name": "stdout",
"text": [
"First annotation: [[15950.0, 15620.0], [15949.0, 15621.0], [15948.0, 15621.0], [15946.0, 15623.0], [15946.0, 15626.0], [15947.0, 15627.0], [15947.0, 15632.0], [15949.0, 15634.0], [15950.0, 15634.0], [15951.0, 15635.0], [15954.0, 15635.0], [15955.0, 15634.0], [15956.0, 15634.0], [15958.0, 15632.0], [15958.0, 15631.0], [15959.0, 15630.0], [15959.0, 15624.0], [15958.0, 15623.0], [15958.0, 15622.0], [15957.0, 15622.0], [15956.0, 15621.0], [15955.0, 15621.0], [15954.0, 15620.0]]\n"
"Coordinates of first annotation: [[15950.0, 15620.0], [15949.0, 15621.0], [15948.0, 15621.0], [15946.0, 15623.0], [15946.0, 15626.0], [15947.0, 15627.0], [15947.0, 15632.0], [15949.0, 15634.0], [15950.0, 15634.0], [15951.0, 15635.0], [15954.0, 15635.0], [15955.0, 15634.0], [15956.0, 15634.0], [15958.0, 15632.0], [15958.0, 15631.0], [15959.0, 15630.0], [15959.0, 15624.0], [15958.0, 15623.0], [15958.0, 15622.0], [15957.0, 15622.0], [15956.0, 15621.0], [15955.0, 15621.0], [15954.0, 15620.0]]\n"
]
}
],
"source": [
"nuclei_annotations = ann_group.get_graphic_data(coordinate_type='2D')\n",
"# The actual graphical annotation can be accessed as follows:\n",
"nuclei_annotations = ann_group.get_graphic_data(coordinate_type=ann.annotation_coordinate_type)\n",
"first_ann = nuclei_annotations[0].tolist()\n",
"print(f'First annotation: {first_ann}')"
"print(f'Coordinates of first annotation: {first_ann}')"
]
},
{
Expand All @@ -333,23 +398,23 @@
"id": "AfxGQ-WXNGkj"
},
"source": [
"As previously mentioned, annotations can be accompanied by one or multiple measurements. For the Pan-Cancer-Nuclei-Seg-DICOM collection these are each nuclei's area given in µm²."
"As previously mentioned, annotations can be accompanied by one or multiple measurements. In highdicom, they are returned as tuple of `(names, values, units)` each of them being a list of coded values. For the Pan-Cancer-Nuclei-Seg-DICOM collection these are each nuclei's area given in µm²."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 11,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "xHfHTZqUEZTV",
"outputId": "7471c8a5-a469-42a4-d910-b71b8628289b"
"outputId": "ab727308-f2cf-43aa-9ab5-be278bfbc488"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"name": "stdout",
"text": [
"Measurements for \"Area\" in unit \"square micrometer\"\n",
"[11.367216 21.59136 4.699296 ... 7.429968 5.905872 3.111696]\n"
Expand Down Expand Up @@ -751,7 +816,7 @@
"id": "_YEYKt-Rj1qd"
},
"source": [
"The resulting GeoJSON file can then be imported into other tools for viewing and analysis, such as for example [QuPath](https://qupath.github.io/). Here QuPath v.0.5.1 was used. After opening the slide, you can load annotations as GeoJSON under `File` > `Import objects from file` and adapt the visualization (e.g. changing color) in the `Annotations` pane. \n",
"The resulting GeoJSON file can then be imported into other tools for viewing and analysis, such as for example [QuPath](https://qupath.github.io/). Here QuPath v.0.5.1 was used. After opening the slide, you can load annotations as GeoJSON under `File` > `Import objects from file` and adapt the visualization (e.g. changing color) in the `Annotations` pane.\n",
"\n",
"<img src=\"https://github.com/ImagingDataCommons/IDC-Tutorials/releases/download/0.22.0/pan_cancer_annotation_qupath_example.png\" alt=\"Example visualization of annotations\" width=\"1000\"/>"
]
Expand Down Expand Up @@ -784,4 +849,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
}