Spaces:

halleewong
/

ScribblePrompt

Running

App Files Files Community

halleewong commited on Dec 13, 2023

Commit

b6bb35e

1 Parent(s): 26e8a2f

initial commit

Browse files

Files changed (20) hide show

LICENSE +201 -0
README.md +5 -5
app.py +591 -0
checkpoints/ScribblePrompt_unet_v1_nf192_res128.pt +3 -0
network.py +123 -0
predictor.py +242 -0
requirements.txt +4 -0
test_examples/COBRE.jpg +0 -0
test_examples/SCR.jpg +0 -0
test_examples/TotalSegmentator.jpg +0 -0
test_examples/TotalSegmentator_2.jpg +0 -0
val_od_examples/ACDC.jpg +0 -0
val_od_examples/BTCV.jpg +0 -0
val_od_examples/BUID.jpg +0 -0
val_od_examples/DRIVE.jpg +0 -0
val_od_examples/HipXRay.jpg +0 -0
val_od_examples/PanDental.jpg +0 -0
val_od_examples/SCD.jpg +0 -0
val_od_examples/SpineWeb.jpg +0 -0
val_od_examples/WBC.jpg +0 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,201 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

README.md CHANGED Viewed

@@ -1,12 +1,12 @@
 ---
-title: ScribblePrompt
-emoji: 👁
 colorFrom: blue
-colorTo: yellow
 sdk: gradio
-sdk_version: 4.8.0
 app_file: app.py
-pinned: false
 license: apache-2.0
 ---

 ---
+title: Scribbleprompt
+emoji: 🩻
 colorFrom: blue
+colorTo: pink
 sdk: gradio
+sdk_version: 3.41.0
 app_file: app.py
+pinned: true
 license: apache-2.0
 ---

app.py ADDED Viewed

	@@ -0,0 +1,591 @@

+import gradio as gr
+import numpy as np
+import torch
+import torch.nn.functional as F
+import os
+import cv2
+import pathlib
+device = 'cuda' if torch.cuda.is_available() else 'cpu'
+from predictor import Predictor
+RES = 256
+test_example_dir = pathlib.Path("./test_examples")
+test_examples = [str(test_example_dir / x) for x in sorted(os.listdir(test_example_dir))]
+val_example_dir = pathlib.Path("./val_od_examples")
+val_examples = [str(val_example_dir / x) for x in sorted(os.listdir(val_example_dir))]
+default_example = test_example_dir / "TotalSegmentator_2.jpg"
+exp_dir = pathlib.Path('./checkpoints')
+default_model = 'ScribblePrompt-Unet'
+model_dict = {
+    'ScribblePrompt-Unet': 'ScribblePrompt_unet_v1_nf192_res128.pt'
+}
+# -----------------------------------------------------------------------------
+# Model initialization functions
+# -----------------------------------------------------------------------------
+def load_model(exp_key: str = default_model):
+    fpath = exp_dir / model_dict.get(exp_key)
+    exp = Predictor(fpath)
+    return exp, None
+# -----------------------------------------------------------------------------
+# Vizualization functions
+# -----------------------------------------------------------------------------
+def _get_overlay(img, lay, const_color="l_blue"):
+    """
+    Helper function for preparing overlay
+    """
+    assert lay.ndim==2, "Overlay must be 2D, got shape: " + str(lay.shape)
+    if img.ndim == 2:
+        img = np.repeat(img[...,None], 3, axis=-1)
+    assert img.ndim==3, "Image must be 3D, got shape: " + str(img.shape)
+    if const_color == "blue":
+        const_color = 255*np.array([0, 0, 1])
+    elif const_color == "green":
+        const_color = 255*np.array([0, 1, 0])
+    elif const_color == "red":
+        const_color = 255*np.array([1, 0, 0])
+    elif const_color == "l_blue":
+        const_color = np.array([31, 119, 180])
+    elif const_color == "orange":
+        const_color = np.array([255, 127, 14])
+    else:
+        raise NotImplementedError
+    x,y = np.nonzero(lay)
+    for i in range(img.shape[-1]):
+        img[x,y,i] = const_color[i]
+    return img
+def image_overlay(img, mask=None, scribbles=None, contour=False, alpha=0.5):
+    """
+    Overlay the ground truth mask and scribbles on the image if provided
+    """
+    assert img.ndim == 2, "Image must be 2D, got shape: " + str(img.shape)
+    output = np.repeat(img[...,None], 3, axis=-1)
+    if mask is not None:
+        assert mask.ndim == 2, "Mask must be 2D, got shape: " + str(mask.shape)
+        if contour:
+            contours = cv2.findContours((mask[...,None]>0.5).astype(np.uint8), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
+            cv2.drawContours(output, contours[0], -1, (0, 255, 0), 1)
+        else:
+            mask_overlay = _get_overlay(img, mask)
+            mask2 = 0.5*np.repeat(mask[...,None], 3, axis=-1)
+            output = cv2.convertScaleAbs(mask_overlay * mask2 + output * (1 - mask2))
+    if scribbles is not None:
+        pos_scribble_overlay = _get_overlay(output, scribbles[0,...], const_color="green")
+        cv2.addWeighted(pos_scribble_overlay, alpha, output, 1 - alpha, 0, output)
+        neg_scribble_overlay = _get_overlay(output, scribbles[1,...], const_color="red")
+        cv2.addWeighted(neg_scribble_overlay, alpha, output, 1 - alpha, 0, output)
+    return output
+def viz_pred_mask(img, mask=None, point_coords=None, point_labels=None, bbox_coords=None, seperate_scribble_masks=None, binary=True):
+    """
+    Visualize image with clicks, scribbles, predicted mask overlaid
+    """
+    assert isinstance(img, np.ndarray), "Image must be numpy array, got type: " + str(type(img))
+    if mask is not None:
+        if isinstance(mask, torch.Tensor):
+            mask = mask.cpu().numpy()
+    if binary and mask is not None:
+        mask = 1*(mask > 0.5)
+    out = image_overlay(img, mask=mask, scribbles=seperate_scribble_masks)
+    if point_coords is not None:
+        for i,(col,row) in enumerate(point_coords):
+            if point_labels[i] == 1:
+                cv2.circle(out,(col, row), 2, (0,255,0), -1)
+            else:
+                cv2.circle(out,(col, row), 2, (255,0,0), -1)
+    if bbox_coords is not None:
+        for i in range(len(bbox_coords)//2):
+            cv2.rectangle(out, bbox_coords[2*i], bbox_coords[2*i+1], (255,165,0), 1)
+        if len(bbox_coords) % 2 == 1:
+            cv2.circle(out, tuple(bbox_coords[-1]), 2, (255,165,0), -1)
+    return out
+# -----------------------------------------------------------------------------
+# Collect scribbles
+# -----------------------------------------------------------------------------
+def get_scribbles(seperate_scribble_masks, last_scribble_mask, scribble_img, label: int):
+    """
+    Record scribbles
+    """
+    assert isinstance(seperate_scribble_masks, np.ndarray), "seperate_scribble_masks must be numpy array, got type: " + str(type(seperate_scribble_masks))
+    if scribble_img is not None:
+        color_mask = scribble_img.get('mask')
+        scribble_mask = color_mask[...,0]/255
+        not_same = (scribble_mask != last_scribble_mask)
+        if not isinstance(not_same, bool):
+            not_same = not_same.any()
+        if not_same:
+            # In case any scribbles were removed
+            corrected_scribble_masks = np.stack(2*[(scribble_mask > 0)], axis=0)*seperate_scribble_masks
+            corrected_last_scribble_mask = last_scribble_mask*(scribble_mask > 0)
+            delta = (scribble_mask - corrected_last_scribble_mask) > 0
+            new_scribbles = scribble_mask * delta
+            corrected_scribble_masks[label,...] = np.clip(corrected_scribble_masks[label,...] + new_scribbles, a_min=0, a_max=1)
+            last_scribble_mask = scribble_mask
+            seperate_scribble_masks = corrected_scribble_masks
+        return seperate_scribble_masks, last_scribble_mask
+def get_predictions(predictor, input_img, click_coords, click_labels, bbox_coords, seperate_scribble_masks, low_res_mask, img_features, multimask_mode):
+    """
+    Make predictions
+    """
+    box = None
+    if len(bbox_coords) == 1:
+        gr.Error("Please click a second time to define the bounding box")
+        box = None
+    elif len(bbox_coords) == 2:
+        box = torch.Tensor(bbox_coords).flatten()[None,None,...].int().to(device) # B x n x 4
+    if seperate_scribble_masks is not None:
+        scribble = torch.from_numpy(seperate_scribble_masks)[None,...].to(device)
+    else:
+        scribble = None
+    prompts = dict(
+        img=torch.from_numpy(input_img)[None,None,...].to(device)/255,
+        point_coords=torch.Tensor([click_coords]).int().to(device) if len(click_coords)>0 else None,
+        point_labels=torch.Tensor([click_labels]).int().to(device) if len(click_labels)>0 else None,
+        scribble=scribble,
+        mask_input=low_res_mask.to(device) if low_res_mask is not None else None,
+        box=box,
+        )
+    mask, img_features, low_res_mask = predictor.predict(prompts, img_features, multimask_mode=multimask_mode)
+    return mask, img_features, low_res_mask
+def refresh_predictions(predictor, input_img, output_img, click_coords, click_labels, bbox_coords, brush_label,
+                        scribble_img, seperate_scribble_masks, last_scribble_mask,
+                        best_mask, low_res_mask, img_features, binary_checkbox, multimask_mode):
+    # Record any new scribbles
+    seperate_scribble_masks, last_scribble_mask = get_scribbles(
+        seperate_scribble_masks, last_scribble_mask, scribble_img,
+        label=(0 if brush_label == "Positive (green)" else 1) # current color of the brush
+    )
+    # Make prediction
+    best_mask, img_features, low_res_mask = get_predictions(
+        predictor, input_img, click_coords, click_labels, bbox_coords, seperate_scribble_masks, low_res_mask, img_features, multimask_mode
+    )
+    # Update input visualizations
+    mask_to_viz = best_mask.numpy()
+    click_input_viz = viz_pred_mask(input_img, mask_to_viz, click_coords, click_labels, bbox_coords, seperate_scribble_masks, binary_checkbox)
+    scribble_input_viz = viz_pred_mask(input_img, mask_to_viz, click_coords, click_labels, bbox_coords, None, binary_checkbox)
+    out_viz = [
+        viz_pred_mask(input_img, mask_to_viz, point_coords=None, point_labels=None, bbox_coords=None, seperate_scribble_masks=None, binary=binary_checkbox),
+        255*(mask_to_viz[...,None].repeat(axis=2, repeats=3)>0.5) if binary_checkbox else mask_to_viz[...,None].repeat(axis=2, repeats=3),
+    ]
+    return click_input_viz, scribble_input_viz, out_viz, best_mask, low_res_mask, img_features, seperate_scribble_masks, last_scribble_mask
+def get_select_coords(predictor, input_img, brush_label, bbox_label, best_mask, low_res_mask,
+                      click_coords, click_labels, bbox_coords,
+                      seperate_scribble_masks, last_scribble_mask, scribble_img, img_features,
+                      output_img, binary_checkbox, multimask_mode, autopredict_checkbox, evt: gr.SelectData):
+    """
+    Record user click and update the prediction
+    """
+    # Record click coordinates
+    if bbox_label:
+        bbox_coords.append(evt.index)
+    elif brush_label in ['Positive (green)', 'Negative (red)']:
+        click_coords.append(evt.index)
+        click_labels.append(1 if brush_label=='Positive (green)' else 0)
+    else:
+        raise TypeError("Invalid brush label: {brush_label}")
+    # Only make new prediction if not waiting for additional bounding box click
+    if (len(bbox_coords) % 2 == 0) and autopredict_checkbox:
+        click_input_viz, scribble_input_viz, output_viz, best_mask, low_res_mask, img_features, seperate_scribble_masks, last_scribble_mask = refresh_predictions(
+            predictor, input_img, output_img, click_coords, click_labels, bbox_coords, brush_label,
+            scribble_img, seperate_scribble_masks, last_scribble_mask,
+            best_mask, low_res_mask, img_features, binary_checkbox, multimask_mode
+        )
+        return click_input_viz, scribble_input_viz, output_viz, best_mask, low_res_mask, img_features, click_coords, click_labels, bbox_coords, seperate_scribble_masks, last_scribble_mask
+    else:
+        click_input_viz = viz_pred_mask(
+            input_img, best_mask, click_coords, click_labels, bbox_coords, seperate_scribble_masks, binary_checkbox
+        )
+        scribble_input_viz = viz_pred_mask(
+            input_img, best_mask, click_coords, click_labels, bbox_coords, None, binary_checkbox
+        )
+        # Don't update output image if waiting for additional bounding box click
+        return click_input_viz, scribble_input_viz, output_img, best_mask, low_res_mask, img_features, click_coords, click_labels, bbox_coords, seperate_scribble_masks, last_scribble_mask
+def undo_click(predictor, input_img, brush_label, bbox_label, best_mask, low_res_mask, click_coords, click_labels, bbox_coords,
+               seperate_scribble_masks, last_scribble_mask, scribble_img, img_features,
+                output_img, binary_checkbox, multimask_mode, autopredict_checkbox):
+    """
+    Remove last click and then update the prediction
+    """
+    if bbox_label:
+        if len(bbox_coords) > 0:
+            bbox_coords.pop()
+    elif brush_label in ['Positive (green)', 'Negative (red)']:
+        if len(click_coords) > 0:
+            click_coords.pop()
+            click_labels.pop()
+    else:
+        raise TypeError("Invalid brush label: {brush_label}")
+    # Only make new prediction if not waiting for additional bounding box click
+    if (len(bbox_coords)==0 or len(bbox_coords)==2) and autopredict_checkbox:
+        click_input_viz, scribble_input_viz, output_viz, best_mask, low_res_mask, img_features, seperate_scribble_masks, last_scribble_mask = refresh_predictions(
+            predictor, input_img, output_img, click_coords, click_labels, bbox_coords, brush_label,
+            scribble_img, seperate_scribble_masks, last_scribble_mask,
+            best_mask, low_res_mask, img_features, binary_checkbox, multimask_mode
+        )
+        return click_input_viz, scribble_input_viz, output_viz, best_mask, low_res_mask, img_features, click_coords, click_labels, bbox_coords, seperate_scribble_masks, last_scribble_mask
+    else:
+        click_input_viz = viz_pred_mask(
+            input_img, best_mask, click_coords, click_labels, bbox_coords, seperate_scribble_masks, binary_checkbox
+        )
+        scribble_input_viz = viz_pred_mask(
+            input_img, best_mask, click_coords, click_labels, bbox_coords, None, binary_checkbox
+        )
+        # Don't update output image if waiting for additional bounding box click
+        return click_input_viz, scribble_input_viz, output_img, best_mask, low_res_mask, img_features, click_coords, click_labels, bbox_coords, seperate_scribble_masks, last_scribble_mask
+# --------------------------------------------------
+with gr.Blocks(theme=gr.themes.Default(text_size=gr.themes.sizes.text_lg)) as demo:
+    # State variables
+    seperate_scribble_masks = gr.State(np.zeros((2,RES,RES), dtype=np.float32))
+    last_scribble_mask = gr.State(np.zeros((RES,RES), dtype=np.float32))
+    click_coords = gr.State([])
+    click_labels = gr.State([])
+    bbox_coords = gr.State([])
+    # Load default model
+    predictor = gr.State(load_model()[0])
+    img_features = gr.State(None) # For SAM models
+    best_mask = gr.State(None)
+    low_res_mask = gr.State(None)
+    gr.HTML("""\
+    <h1 style="text-align: center; font-size: 28pt;">ScribblePrompt: Fast and Flexible Interactive Segmention for Any Medical Image</h1>
+    <p style="text-align: center; font-size: large;"><a href="https://scribbleprompt.csail.mit.edu">ScribblePrompt</a> is an interactive segmentation tool designed to help users segment <b>new</b> structures in medical images using scribbles, clicks <b>and</b> bounding boxes.
+    </p>
+    """)
+    with gr.Accordion("Open for instructions!", open=False):
+        gr.Markdown(
+        """
+            * Select an input image from the examples below or upload your own image through the <b>'Input Image'</b> tab.
+            * Use the <b>'Scribbles'</b> tab to draw <span style='color:green'>positive</span> or <span style='color:red'>negative</span> scribbles.
+                - Use the buttons in the top right hand corner of the canvas to undo or adjust the brush size
+                - Note: the app cannot detect new scribbles drawn on top of previous scribbles in a different color. Please undo/erase the scribble before drawing on the same pixel in a different color.
+            * Use the <b>'Clicks/Boxes'</b> tab to draw <span style='color:green'>positive</span> or <span style='color:red'>negative</span> clicks and <span style='color:orange'>bounding boxes</span> by placing two clicks.
+            * The <b>'Output'</b> tab will show the model's prediction based on your current inputs and the previous prediction.
+            * The <b>'Clear Input Mask'</b> button will clear the latest prediction (which is used as an input to the model).
+            * The <b>'Clear All Inputs'</b> button will clear all inputs (including scribbles, clicks, bounding boxes, and the last prediction).
+        """
+        )
+    # Interface ------------------------------------
+    with gr.Row():
+        model_dropdown = gr.Dropdown(
+            label="Model",
+            choices = list(model_dict.keys()),
+            value=default_model,
+            multiselect=False,
+            interactive=False,
+            visible=False
+        )
+    with gr.Row():
+        with gr.Column(scale=1):
+            brush_label = gr.Radio(["Positive (green)", "Negative (red)"],
+                           value="Positive (green)", label="Scribble/Click Label")
+            bbox_label = gr.Checkbox(value=False, label="Bounding Box (2 clicks)")
+        with gr.Column(scale=1):
+            binary_checkbox = gr.Checkbox(value=True, label="Show binary masks", visible=False)
+            autopredict_checkbox = gr.Checkbox(value=True, label="Auto-update prediction on clicks")
+            gr.Markdown("<span style='color:orange'>Troubleshooting:</span> If the image does not fully load in the Scribbles tab, click 'Clear Scribbles' or 'Clear All Inputs' to reload (it make take multiple tries). If you encounter an <span style='color:orange'>error</span> try clicking 'Clear All Inputs'.")
+            multimask_mode = gr.Checkbox(value=True, label="Multi-mask mode", visible=False)
+    with gr.Row():
+        display_height = 500
+        with gr.Column(scale=1):
+            with gr.Tab("Scribbles"):
+                scribble_img = gr.Image(
+                    label="Input",
+                    brush_radius=3,
+                    interactive=True,
+                    brush_color="#00FF00",
+                    tool="sketch",
+                    height=display_height,
+                    type='numpy',
+                    value=default_example,
+                )
+                clear_scribble_button = gr.ClearButton([scribble_img], value="Clear Scribbles", variant="stop")
+            with gr.Tab("Clicks/Boxes") as click_tab:
+                click_img = gr.Image(
+                    label="Input",
+                    type='numpy',
+                    value=default_example,
+                    height=display_height
+                )
+                with gr.Row():
+                    undo_click_button = gr.Button("Undo Last Click")
+                    clear_click_button = gr.Button("Clear Clicks/Boxes", variant="stop")
+            with gr.Tab("Input Image"):
+                input_img = gr.Image(
+                    label="Input",
+                    image_mode="L",
+                    visible=True,
+                    value=default_example,
+                    height=display_height
+                )
+                gr.Markdown("To upload your own image: click the `x` in the top right corner to clear the current image, then drag & drop")
+        with gr.Column(scale=1):
+            with gr.Tab("Output"):
+                output_img = gr.Gallery(
+                    label='Outputs',
+                    columns=1,
+                    elem_id="gallery",
+                    preview=True,
+                    object_fit="scale-down",
+                    height=display_height+50
+                )
+    submit_button = gr.Button("Refresh Prediction", variant='primary')
+    clear_all_button = gr.ClearButton([scribble_img], value="Clear All Inputs", variant="stop")
+    clear_mask_button = gr.Button("Clear Input Mask")
+    # ----------------------------------------------
+    # Loading Models
+    # ----------------------------------------------
+    model_dropdown.change(fn=load_model,
+                          inputs=[model_dropdown],
+                          outputs=[predictor, img_features]
+                          )
+    # ----------------------------------------------
+    # Loading Examples
+    # ----------------------------------------------
+    gr.Examples(examples=test_examples,
+                inputs=[input_img],
+                examples_per_page=10,
+                label='Unseen Examples from Test Datasets'
+                )
+    gr.Examples(examples=val_examples,
+                inputs=[input_img],
+                examples_per_page=10,
+                label='Unseen Examples from Validation Datasets'
+                )
+    # When clear scribble button is clicked
+    def clear_scribble_history(input_img):
+        if input_img is not None:
+            input_shape = input_img.shape[:2]
+        else:
+            input_shape = (RES, RES)
+        return input_img, input_img, np.zeros((2,)+input_shape, dtype=np.float32), np.zeros(input_shape, dtype=np.float32), None, None
+    clear_scribble_button.click(clear_scribble_history,
+        inputs=[input_img],
+        outputs=[click_img, scribble_img, seperate_scribble_masks, last_scribble_mask, best_mask, low_res_mask]
+    )
+    # When clear clicks button is clicked
+    def clear_click_history(input_img):
+        return input_img, input_img, [], [], [], None, None
+    clear_click_button.click(clear_click_history,
+                             inputs=[input_img],
+                             outputs=[click_img, scribble_img, click_coords, click_labels, bbox_coords, best_mask, low_res_mask])
+    # When clear all button is clicked
+    def clear_all_history(input_img):
+        if input_img is not None:
+            input_shape = input_img.shape[:2]
+        else:
+            input_shape = (RES, RES)
+        return input_img, input_img, [], [], [], [], np.zeros((2,)+input_shape, dtype=np.float32), np.zeros(input_shape, dtype=np.float32), None, None, None
+    input_img.change(clear_all_history,
+                    inputs=[input_img],
+                    outputs=[click_img, scribble_img,
+                            output_img, click_coords, click_labels, bbox_coords,
+                            seperate_scribble_masks, last_scribble_mask,
+                            best_mask, low_res_mask, img_features
+                    ])
+    clear_all_button.click(clear_all_history,
+                    inputs=[input_img],
+                    outputs=[click_img, scribble_img,
+                            output_img, click_coords, click_labels, bbox_coords,
+                            seperate_scribble_masks, last_scribble_mask,
+                            best_mask, low_res_mask, img_features
+                    ])
+    # clear previous prediction mask
+    def clear_best_mask(input_img, click_coords, click_labels, bbox_coords, seperate_scribble_masks):
+        click_input_viz = viz_pred_mask(
+            input_img, None, click_coords, click_labels, bbox_coords, seperate_scribble_masks
+        )
+        scribble_input_viz = viz_pred_mask(
+            input_img, None, click_coords, click_labels, bbox_coords, None
+        )
+        return None, None, click_input_viz, scribble_input_viz
+    clear_mask_button.click(
+        clear_best_mask,
+        inputs=[input_img, click_coords, click_labels, bbox_coords, seperate_scribble_masks],
+        outputs=[best_mask, low_res_mask, click_img, scribble_img],
+    )
+    # ----------------------------------------------
+    # Clicks
+    # ----------------------------------------------
+    click_img.select(get_select_coords,
+                     inputs=[
+                        predictor,
+                        input_img, brush_label, bbox_label, best_mask, low_res_mask, click_coords, click_labels, bbox_coords,
+                        seperate_scribble_masks, last_scribble_mask, scribble_img, img_features,
+                        output_img, binary_checkbox, multimask_mode, autopredict_checkbox
+                      ],
+                     outputs=[click_img, scribble_img, output_img, best_mask, low_res_mask, img_features,
+                              click_coords, click_labels, bbox_coords, seperate_scribble_masks, last_scribble_mask],
+                    api_name = "get_select_coords"
+                    )
+    submit_button.click(fn=refresh_predictions,
+                        inputs=[
+                            predictor, input_img, output_img, click_coords, click_labels, bbox_coords, brush_label,
+                            scribble_img, seperate_scribble_masks, last_scribble_mask,
+                            best_mask, low_res_mask, img_features, binary_checkbox, multimask_mode
+                        ],
+                        outputs=[click_img, scribble_img, output_img, best_mask, low_res_mask, img_features,
+                                 seperate_scribble_masks, last_scribble_mask],
+                        api_name="refresh_predictions"
+                        )
+    undo_click_button.click(fn=undo_click,
+                            inputs=[
+                                predictor,
+                                input_img, brush_label, bbox_label, best_mask, low_res_mask, click_coords, click_labels, bbox_coords,
+                                seperate_scribble_masks, last_scribble_mask, scribble_img, img_features,
+                                output_img, binary_checkbox, multimask_mode, autopredict_checkbox
+                            ],
+                            outputs=[click_img, scribble_img, output_img, best_mask, low_res_mask, img_features,
+                                    click_coords, click_labels, bbox_coords, seperate_scribble_masks, last_scribble_mask],
+                            api_name="undo_click"
+                        )
+    def update_click_img(input_img, click_coords, click_labels, bbox_coords, seperate_scribble_masks, binary_checkbox,
+                         last_scribble_mask, scribble_img, brush_label, best_mask):
+        """
+        Draw scribbles in the click canvas
+        """
+        seperate_scribble_masks, last_scribble_mask = get_scribbles(
+            seperate_scribble_masks, last_scribble_mask, scribble_img,
+            label=(0 if brush_label == "Positive (green)" else 1) # previous color of the brush
+        )
+        click_input_viz = viz_pred_mask(
+            input_img, best_mask, click_coords, click_labels, bbox_coords, seperate_scribble_masks, binary_checkbox
+        )
+        return click_input_viz, seperate_scribble_masks, last_scribble_mask
+    click_tab.select(fn=update_click_img,
+        inputs=[input_img, click_coords, click_labels, bbox_coords, seperate_scribble_masks,
+                binary_checkbox, last_scribble_mask, scribble_img, brush_label, best_mask],
+        outputs=[click_img, seperate_scribble_masks, last_scribble_mask],
+        api_name="update_click_img"
+    )
+    # ----------------------------------------------
+    # Scribbles
+    # ----------------------------------------------
+    def change_brush_color(seperate_scribble_masks, last_scribble_mask, scribble_img, label):
+        """
+        Recorn new scribbles when changing brush color
+        """
+        if label == "Negative (red)":
+            brush_update = gr.Image.update(brush_color = "#FF0000") # red
+        elif label == "Positive (green)":
+            brush_update = gr.Image.update(brush_color = "#00FF00") # green
+        else:
+            raise TypeError("Invalid brush color")
+        # Record latest scribbles
+        seperate_scribble_masks, last_scribble_mask = get_scribbles(
+            seperate_scribble_masks, last_scribble_mask, scribble_img,
+            label=(1 if label == "Positive (green)" else 0) # previous color of the brush
+        )
+        return seperate_scribble_masks, last_scribble_mask, brush_update
+    brush_label.change(fn=change_brush_color,
+        inputs=[seperate_scribble_masks, last_scribble_mask, scribble_img, brush_label],
+        outputs=[seperate_scribble_masks, last_scribble_mask, scribble_img],
+        api_name="change_brush_color"
+    )
+if __name__ == "__main__":
+    demo.queue(api_open=False).launch(show_api=False)

checkpoints/ScribblePrompt_unet_v1_nf192_res128.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:43f57ee8fa8ec529c31be281e06749f9e629b30157bbbcc9baf200cddec1acbe
+size 15977486

network.py ADDED Viewed

	@@ -0,0 +1,123 @@

+from typing import Optional, Dict, Any, List
+import torch
+import torch.nn as nn
+# -----------------------------------------------------------------------------
+# Blocks
+# -----------------------------------------------------------------------------
+class Conv2d(nn.Module):
+    """ Perform a 2D convolution
+    inputs are [b, c, h, w] where
+        b is the batch size
+        c is the number of channels
+        h is the height
+        w is the width
+    """
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: int,
+                 do_activation: bool = True,
+                 ):
+        super(Conv2d, self).__init__()
+        conv = nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, padding=padding)
+        lst = [conv]
+        if do_activation:
+            lst.append(nn.PReLU())
+        self.conv = nn.Sequential(*lst)
+    def forward(self, x):
+        # x is [B, C, H, W]
+        return self.conv(x)
+# -----------------------------------------------------------------------------
+# Network
+# -----------------------------------------------------------------------------
+class _UNet(nn.Module):
+    def __init__(self,
+                 in_channels: int = 1,
+                 out_channels: int = 1,
+                 features: List[int] = [64, 64, 64, 64, 64],
+                 conv_kernel_size: int = 3,
+                 conv: Optional[nn.Module] = None,
+                 conv_kwargs: Dict[str,Any] = {}
+                 ):
+        """
+        UNet (but can switch out the Conv)
+        """
+        super(_UNet, self).__init__()
+        self.in_channels = in_channels
+        padding = (conv_kernel_size - 1) // 2
+        self.ups = nn.ModuleList()
+        self.downs = nn.ModuleList()
+        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
+        # Down part of U-Net
+        for feat in features:
+            self.downs.append(
+                conv(
+                    in_channels, feat, kernel_size=conv_kernel_size, padding=padding, **conv_kwargs
+                )
+            )
+            in_channels = feat
+        # Up part of U-Net
+        for feat in reversed(features):
+            self.ups.append(nn.UpsamplingBilinear2d(scale_factor=2))
+            self.ups.append(
+                conv(
+                    # Factor of 2 is for the skip connections
+                    feat * 2, feat, kernel_size=conv_kernel_size, padding=padding, **conv_kwargs
+                )
+            )
+        self.bottleneck = conv(
+            features[-1], features[-1], kernel_size=conv_kernel_size, padding=padding, **conv_kwargs
+            )
+        self.final_conv = conv(
+            features[0], out_channels, kernel_size=1, padding=0, do_activation=False, **conv_kwargs
+            )
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        skip_connections = []
+        for down in self.downs:
+            x = down(x)
+            skip_connections.append(x)
+            x = self.pool(x)
+        x = self.bottleneck(x)
+        skip_connections = skip_connections[::-1]
+        for idx in range(0, len(self.ups), 2):
+            x = self.ups[idx](x)
+            skip_connection = skip_connections[idx // 2]
+            concat_skip = torch.cat((skip_connection, x), dim=1)
+            x = self.ups[idx + 1](concat_skip)
+        return self.final_conv(x)
+class UNet(_UNet):
+    """
+    Unet with normal conv blocks
+    input shape: B x C x H x W
+    output shape: B x C x H x W
+    """
+    def __init__(self, **kwargs) -> None:
+        super().__init__(conv=Conv2d, **kwargs)
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        return super().forward(x)

predictor.py ADDED Viewed

	@@ -0,0 +1,242 @@

+import torch
+import torch.nn.functional as F
+from typing import Dict, Tuple, Optional
+import network
+class Predictor:
+    """
+    Wrapper for ScribblePrompt Unet model
+    """
+    def __init__(self, path: str, verbose: bool = False):
+        self.verbose = verbose
+        assert path.exists(), f"Checkpoint {path} does not exist"
+        self.path = path
+        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        self.build_model()
+        self.load()
+        self.model.eval()
+        self.to_device()
+    def build_model(self):
+        """
+        Build the model
+        """
+        self.model = network.UNet(
+            in_channels = 5,
+            out_channels = 1,
+            features = [192, 192, 192, 192],
+        )
+    def load(self):
+        """
+        Load the state of the model from a checkpoint file.
+        """
+        with (self.path).open("rb") as f:
+            state = torch.load(f, map_location=self.device)
+            self.model.load_state_dict(state, strict=True)
+            if self.verbose:
+                print(
+                    f"Loaded checkpoint from {self.path} to {self.device}"
+                )
+    def to_device(self):
+        """
+        Move the model to cpu or gpu
+        """
+        self.device = "cuda" if torch.cuda.is_available() else "cpu"
+        self.model = self.model.to(self.device)
+    def predict(self, prompts: Dict[str,any], img_features: Optional[torch.Tensor] = None, multimask_mode: bool = False):
+        """
+        Make predictions!
+        Returns:
+            mask (torch.Tensor): H x W
+            img_features (torch.Tensor): B x 1 x H x W (for SAM models)
+            low_res_mask (torch.Tensor): B x 1 x H x W logits
+        """
+        if self.verbose:
+            print("point_coords", prompts.get("point_coords", None))
+            print("point_labels", prompts.get("point_labels", None))
+            print("box", prompts.get("box", None))
+            print("img", prompts.get("img").shape, prompts.get("img").min(), prompts.get("img").max())
+            if prompts.get("scribble") is not None:
+                print("scribble", prompts.get("scribble", None).shape, prompts.get("scribble").min(), prompts.get("scribble").max())
+        original_shape = prompts.get('img').shape[-2:]
+        # Rescale to 128 x 128
+        prompts = rescale_inputs(prompts)
+        # Prepare inputs for ScribblePrompt unet (1 x 5 x 128 x 128)
+        x = prepare_inputs(prompts).float()
+        with torch.no_grad():
+            yhat = self.model(x.to(self.device)).cpu()
+        mask = torch.sigmoid(yhat)
+        # Resize for app resolution
+        mask = F.interpolate(mask, size=original_shape, mode='bilinear').squeeze()
+        # mask: H x W, yhat: 1 x 1 x H x W
+        return mask, None, yhat
+# -----------------------------------------------------------------------------
+# Prepare inputs
+# -----------------------------------------------------------------------------
+def rescale_inputs(inputs: Dict[str,any], res=128):
+    """
+    Rescale the inputs
+    """
+    h,w = inputs['img'].shape[-2:]
+    if h != res or w != res:
+        inputs.update(dict(
+            img = F.interpolate(inputs['img'], size=(res,res), mode='bilinear')
+        ))
+        if inputs.get('scribble') is not None:
+            inputs.update({
+                'scribble': F.interpolate(inputs['scribble'], size=(res,res), mode='bilinear')
+            })
+        if inputs.get("box") is not None:
+            boxes = inputs.get("box").clone()
+            coords = boxes.reshape(-1, 2, 2)
+            coords[..., 0] = coords[..., 0] * (res / w)
+            coords[..., 1] = coords[..., 1] * (res / h)
+            inputs.update({'box': coords.reshape(1, -1, 4).int()})
+        if inputs.get("point_coords") is not None:
+            coords = inputs.get("point_coords").clone()
+            coords[..., 0] = coords[..., 0] * (res / w)
+            coords[..., 1] = coords[..., 1] * (res / h)
+            inputs.update({'point_coords': coords.int()})
+    return inputs
+def prepare_inputs(inputs: Dict[str,torch.Tensor], device = None) -> torch.Tensor:
+    """
+    Prepare inputs for ScribblePrompt Unet
+    Returns:
+        x (torch.Tensor): B x 5 x H x W
+    """
+    img = inputs['img']
+    if device is None:
+        device = img.device
+    img = img.to(device)
+    shape = tuple(img.shape[-2:])
+    if inputs.get("box") is not None:
+        # Embed bounding box
+        # Input: B x 1 x 4
+        # Output: B x 1 x H x W
+        box_embed = bbox_shaded(inputs['box'], shape=shape, device=device)
+    else:
+        box_embed = torch.zeros(img.shape, device=device)
+    if inputs.get("point_coords") is not None:
+        # Encode points
+        # B x 2 x H x W
+        scribble_click_embed = click_onehot(inputs['point_coords'], inputs['point_labels'], shape=shape)
+    else:
+        scribble_click_embed = torch.zeros((img.shape[0], 2) + shape, device=device)
+    if inputs.get("scribble") is not None:
+        # Combine scribbles with click encoding
+        # B x 2 x H x W
+        scribble_click_embed = torch.clamp(scribble_click_embed + inputs.get('scribble'), min=0.0, max=1.0)
+    if inputs.get('mask_input') is not None:
+        # Previous prediction
+        mask_input = inputs['mask_input']
+    else:
+        # Initialize empty channel for mask input
+        mask_input = torch.zeros(img.shape, device=img.device)
+    x = torch.cat((img, box_embed, scribble_click_embed, mask_input), dim=-3)
+    # B x 5 x H x W
+    return x
+# -----------------------------------------------------------------------------
+# Encode clicks and bounding boxes
+# -----------------------------------------------------------------------------
+def click_onehot(point_coords, point_labels, shape: Tuple[int,int] = (128,128), indexing='xy'):
+    """
+    Represent clicks as two HxW binary masks (one for positive clicks and one for negative)
+    with 1 at the click locations and 0 otherwise
+    Args:
+        point_coords (torch.Tensor): BxNx2 tensor of xy coordinates
+        point_labels (torch.Tensor): BxN tensor of labels (0 or 1)
+        shape (tuple): output shape
+    Returns:
+        embed (torch.Tensor): Bx2xHxW tensor
+    """
+    assert indexing in ['xy','uv'], f"Invalid indexing: {indexing}"
+    assert len(point_coords.shape) == 3, "point_coords must be BxNx2"
+    assert point_coords.shape[-1] == 2, "point_coords must be BxNx2"
+    assert point_labels.shape[-1] == point_coords.shape[1], "point_labels must be BxN"
+    assert len(shape)==2, f"shape must be 2D: {shape}"
+    device = point_coords.device
+    batch_size = point_coords.shape[0]
+    n_points = point_coords.shape[1]
+    embed = torch.zeros((batch_size,2)+shape, device=device)
+    labels = point_labels.flatten().float()
+    idx_coords = torch.cat((
+        torch.arange(batch_size, device=device).reshape(-1,1).repeat(1,n_points)[...,None],
+        point_coords
+    ), axis=2).reshape(-1,3)
+    if indexing=='xy':
+        embed[ idx_coords[:,0], 0, idx_coords[:,2], idx_coords[:,1] ] = labels
+        embed[ idx_coords[:,0], 1, idx_coords[:,2], idx_coords[:,1] ] = 1.0-labels
+    else:
+        embed[ idx_coords[:,0], 0, idx_coords[:,1], idx_coords[:,2] ] = labels
+        embed[ idx_coords[:,0], 1, idx_coords[:,1], idx_coords[:,2] ] = 1.0-labels
+    return embed
+def bbox_shaded(boxes, shape: Tuple[int,int] = (128,128), device='cpu'):
+    """
+    Represent bounding boxes as a binary mask with 1 inside boxes and 0 otherwise
+    Args:
+        boxes (torch.Tensor): Bx1x4 [x1, y1, x2, y2]
+    Returns:
+        bbox_embed (torch.Tesor): Bx1xHxW according to shape
+    """
+    assert len(shape)==2, "shape must be 2D"
+    if isinstance(boxes, torch.Tensor):
+        boxes = boxes.int().cpu().numpy()
+    batch_size = boxes.shape[0]
+    n_boxes = boxes.shape[1]
+    bbox_embed = torch.zeros((batch_size,1)+tuple(shape), device=device, dtype=torch.float32)
+    if boxes is not None:
+        for i in range(batch_size):
+            for j in range(n_boxes):
+                x1, y1, x2, y2 = boxes[i,j,:]
+                x_min = min(x1,x2)
+                x_max = max(x1,x2)
+                y_min = min(y1,y2)
+                y_max = max(y1,y2)
+                bbox_embed[ i, 0, y_min:y_max, x_min:x_max ] = 1.0
+    return bbox_embed