PangeAI · tornikeo · Dec 2, 2024 · Jul 26, 2024 · Jul 26, 2024 · Jul 26, 2024
diff --git a/.dockerignore b/.dockerignore
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -14,3 +14,7 @@ repos:
     rev: v5.10.1  # Use the latest version
     hooks:
       - id: isort
+  - repo: https://github.com/PyCQA/flake8
+    rev: 7.0.0
+    hooks:
+    -   id: flake8
diff --git a/Dockerfile b/Dockerfile
diff --git a/README.md b/README.md
@@ -3,41 +3,57 @@
 
 <table>
 <tr>
-  <td>
+  <!-- Disable huggingface space until there's any demand -->
+  <!-- <td>
     <a href="https://huggingface.co/spaces/TornikeO/simms" rel="nofollow"><img src="https://camo.githubusercontent.com/5762a687b24495afb299c2c0bc68674a2a7dfca9bda6ee444b9da7617d4223a6/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f25463025394625413425393725323048756767696e67253230466163652d5370616365732d626c7565" alt="Hugging Face Spaces" data-canonical-src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" style="max-width: 100%;"></a>
-  </td>
-  <td>
+  </td> -->
+  <!-- Needs an update -->
+  <!-- <td>
     <a target="_blank" href="https://colab.research.google.com/drive/1ppcCy5gTWUaOQdnH4eXqyEn2hBaQRolR?usp=sharing">
       <img alt="Static Badge" src="https://img.shields.io/badge/colab-quickstart-blue?logo=googlecolab">
     </a>
-  </td>
+  </td> -->
   <td>
-    <a target="_blank" href="https://colab.research.google.com/github/PangeAI/simms/blob/main/notebooks/samples/upload_your_own_mgf.ipynb">
-      <img alt="Static Badge" src="https://img.shields.io/badge/colab-upload_your_mgf-blue?logo=googlecolab">
+    <a target="_blank" href="https://colab.research.google.com/github/PangeAI/simms/blob/main/notebooks/samples/colab_tutorial_pesticide.ipynb">
+      <img alt="Static Badge" src="https://img.shields.io/badge/colab-quickstart-blue?logo=googlecolab">
     </a>
   </td>
   <td>
-    <a target="_blank" href="https://colab.research.google.com/github/PangeAI/simms/blob/main/notebooks/accuracy/accuracy_vs_match_limit.ipynb">
-      <img alt="Static Badge" src="https://img.shields.io/badge/colab-comparison_with_matchms-blue?logo=googlecolab">
+    <a target="_blank" href="https://colab.research.google.com/github/PangeAI/simms/blob/main/notebooks/samples/upload_your_own_mgf.ipynb">
+      <img alt="Static Badge" src="https://img.shields.io/badge/colab-upload_your_mgf-blue?logo=googlecolab">
     </a>
   </td>
 </tr>
 </table>
 
 Calculate similarity between large number of mass spectra using a GPU. SimMS aims to provide very fast replacements for commonly used similarity functions in [matchms](https://github.com/matchms/matchms/).
-
+`
 <div style='text-align:center'>
 
   ![img](./assets/perf_speedup.svg)
 
 </div>
 
-![alt text](assets/accuracy.png)
 
-Note: CudaCosineGreedy uses fp32 format, wherease MatchMS uses fp64, and this difference causes most of the occasional errors.
+# How SimMS works, in a nutshell
+
+![alt text](assets/visual_guide.png)
+
+Comparing large sets of mass spectra can be done in parallel, since scores can be calculated independent of the other scores. By leveraging a large number of threads in a GPU, we created a GPU program (kernel) that calculates a 4096 x 4096 similarity matrix in a fraction of a second. By iteratvely calculating similarities for batches of spectra, SimMS can quickly process datasets much larger than the GPU memory. For details, visit the [preprint](https://www.biorxiv.org/content/biorxiv/early/2024/07/25/2024.07.24.605006.full.pdf).
 
 # Quickstart
 
+## Hardware
+
+Any GPU [supported](https://numba.pydata.org/numba-doc/dev/cuda/overview.html#requirements) by numba can be used. We tested a number of GPUs:
+
+- RTX 2070, used on local machine
+- T4 GPU, offered for free on Colab
+- RTX4090 GPU, offered on vast.ai
+- Any A100 GPU, offered on vast.ai
+
+The `pytorch/pytorch:2.2.1-cuda12.1-cudnn8-devel` docker [image](https://hub.docker.com/layers/pytorch/pytorch/2.2.1-cuda12.1-cudnn8-devel/images/sha256-42204bca460bb77cbd524577618e1723ad474e5d77cc51f94037fffbc2c88c6f?context=explore) was used for development and testing. 
+
 ## Install
 ```bash
 pip install git+https://github.com/PangeAI/simms
@@ -89,13 +105,11 @@ pangea-simms --references library.mgf --queries queries.mgf --output_file scores
 - `CudaCosineGreedy`, equivalent to [CosineGreedy](https://matchms.readthedocs.io/en/latest/_modules/matchms/similarity/CosineGreedy.html)
 - `CudaFingerprintSimilarity`, equivalent to [FingerprintSimilarity](https://matchms.readthedocs.io/en/latest/_modules/matchms/similarity/FingerprintSimilarity.html) (`jaccard`, `cosine`, `dice`)
 
-- More coming soon - requests are welcome!
+- More coming soon - **requests are welcome**!
 
 
 # Installation
-The **easiest way** to get started is to visit our [huggingface space](https://huggingface.co/spaces/TornikeO/simms), which offers a simple UI, where you can upload a pair of MGF files and run similarity calculations there (we also offer some control over parameters). 
-
-Alternatively, you can use the <a target="_blank" href="https://colab.research.google.com/github/PangeAI/simms/blob/main/notebooks/samples/colab_tutorial_pesticide.ipynb">colab notebook
+The **easiest way** to get started is to use the <a target="_blank" href="https://colab.research.google.com/github/PangeAI/simms/blob/main/notebooks/samples/colab_tutorial_pesticide.ipynb">colab notebook
 </a>  that has everything ready for you.
 
 For local installations, we recommend using [`micromamba`](https://mamba.readthedocs.io/en/latest/installation/micromamba-installation.html), it is much faster. 
@@ -131,6 +145,7 @@ Use [this template](https://cloud.vast.ai/?ref_id=51575&template_id=f45f6048db51
 ```
 pip install git+https://github.com/PangeAI/simms
 ```
+
 # Frequently asked questions
 
 ### I want to get `referenece_id`, `query_id` and `score` as 1D arrays, separately. How do I do this?