DistributedScience
diff --git a/‎AWS_hygiene_scripts.html‎
Lines changed: 4 additions & 17 deletions b/‎AWS_hygiene_scripts.html‎
Lines changed: 4 additions & 17 deletions
diff --git a/‎SQS_QUEUE_information.html‎
Lines changed: 0 additions & 5 deletions b/‎SQS_QUEUE_information.html‎
Lines changed: 0 additions & 5 deletions
diff --git a/‎_sources/AWS_hygiene_scripts.md‎
Lines changed: 2 additions & 1 deletion b/‎_sources/AWS_hygiene_scripts.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎_sources/advanced_configuration.md‎
Lines changed: 8 additions & 5 deletions b/‎_sources/advanced_configuration.md‎
Lines changed: 8 additions & 5 deletions
diff --git a/‎_sources/external_buckets.md‎
Lines changed: 18 additions & 8 deletions b/‎_sources/external_buckets.md‎
Lines changed: 18 additions & 8 deletions
diff --git a/‎_sources/overview.md‎
Lines changed: 2 additions & 0 deletions b/‎_sources/overview.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎_sources/overview_2.md‎
Lines changed: 12 additions & 7 deletions b/‎_sources/overview_2.md‎
Lines changed: 12 additions & 7 deletions
@@ -167,9 +167,6 @@
         <p aria-level="2" class="caption" role="heading"><span class="caption-text">FAQ</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="overview_2.html">What happens in AWS when I run Distributed-CellProfiler?</a></li>
-
-
-
 <li class="toctree-l1"><a class="reference internal" href="costs.html">What does Distributed-CellProfiler cost?</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Running DCP</span></p>
@@ -197,10 +194,8 @@
 <li class="toctree-l1"><a class="reference internal" href="external_buckets.html">Using External Buckets</a></li>
 <li class="toctree-l1"><a class="reference internal" href="advanced_configuration.html">Advanced Configuration of DCP</a></li>
 <li class="toctree-l1 current active"><a class="current reference internal" href="#">AWS Hygiene Scripts</a></li>
-
 <li class="toctree-l1"><a class="reference internal" href="troubleshooting_runs.html">Troubleshooting</a></li>
 <li class="toctree-l1"><a class="reference internal" href="versions.html">Versions</a></li>
-
 </ul>
 
     </div>
@@ -413,13 +408,9 @@ <h2> Contents </h2>
             </div>
             <nav aria-label="Page">
                 <ul class="visible nav section-nav flex-column">
-<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#">AWS Hygiene Scripts</a><ul class="visible nav section-nav flex-column">
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#clean-out-old-alarms">Clean out old alarms</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#clean-out-old-log-groups">Clean out old log groups</a></li>
 </ul>
-</li>
-<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#clean-out-old-log-groups">Clean out old log groups</a></li>
-</ul>
-
             </nav>
         </div>
     </div>
@@ -453,9 +444,8 @@ <h2>Clean out old alarms<a class="headerlink" href="#clean-out-old-alarms" title
 </pre></div>
 </div>
 </section>
-</section>
 <section id="clean-out-old-log-groups">
-<h1>Clean out old log groups<a class="headerlink" href="#clean-out-old-log-groups" title="Permalink to this heading">#</a></h1>
+<h2>Clean out old log groups<a class="headerlink" href="#clean-out-old-log-groups" title="Permalink to this heading">#</a></h2>
 <p>Bash:</p>
 <div class="highlight-sh notranslate"><div class="highlight"><pre><span></span>aws<span class="w"> </span>logs<span class="w"> </span>describe-log-groups<span class="p">|</span><span class="w"> </span>in2csv<span class="w"> </span>-f<span class="w"> </span>json<span class="w"> </span>--key<span class="w"> </span>logGroups<span class="w"> </span>&gt;<span class="w"> </span>logs.csv
 </pre></div>
@@ -484,6 +474,7 @@ <h1>Clean out old log groups<a class="headerlink" href="#clean-out-old-log-group
 <div class="highlight-sh notranslate"><div class="highlight"><pre><span></span>parallel<span class="w"> </span>aws<span class="w"> </span>logs<span class="w"> </span>delete-log-group<span class="w"> </span>--log-group-name<span class="w"> </span><span class="o">{</span><span class="m">1</span><span class="o">}</span><span class="w"> </span>::::<span class="w"> </span>logs_clear.txt
 </pre></div>
 </div>
+</section>
 </section>
 
     <script type="text/x-thebe-config">
@@ -549,13 +540,9 @@ <h1>Clean out old log groups<a class="headerlink" href="#clean-out-old-log-group
   </div>
   <nav class="bd-toc-nav page-toc">
     <ul class="visible nav section-nav flex-column">
-<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#">AWS Hygiene Scripts</a><ul class="visible nav section-nav flex-column">
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#clean-out-old-alarms">Clean out old alarms</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#clean-out-old-log-groups">Clean out old log groups</a></li>
 </ul>
-</li>
-<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#clean-out-old-log-groups">Clean out old log groups</a></li>
-</ul>
-
   </nav></div>
 
 </div></div>
 
@@ -167,9 +167,6 @@
         <p aria-level="2" class="caption" role="heading"><span class="caption-text">FAQ</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="overview_2.html">What happens in AWS when I run Distributed-CellProfiler?</a></li>
-
-
-
 <li class="toctree-l1"><a class="reference internal" href="costs.html">What does Distributed-CellProfiler cost?</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Running DCP</span></p>
@@ -197,10 +194,8 @@
 <li class="toctree-l1"><a class="reference internal" href="external_buckets.html">Using External Buckets</a></li>
 <li class="toctree-l1"><a class="reference internal" href="advanced_configuration.html">Advanced Configuration of DCP</a></li>
 <li class="toctree-l1"><a class="reference internal" href="AWS_hygiene_scripts.html">AWS Hygiene Scripts</a></li>
-
 <li class="toctree-l1"><a class="reference internal" href="troubleshooting_runs.html">Troubleshooting</a></li>
 <li class="toctree-l1"><a class="reference internal" href="versions.html">Versions</a></li>
-
 </ul>
 
     </div>
 
@@ -23,7 +23,8 @@ while True:
   alarms = client.describe_alarms(AlarmTypes=['MetricAlarm'],StateValue='INSUFFICIENT_DATA',NextToken=token)
 ```
 
-# Clean out old log groups
+## Clean out old log groups
+
 Bash:
 
 ```sh
 
@@ -4,6 +4,7 @@ We've tried very hard to make Distributed-CellProfiler light and adaptable, but
 Below is a non-comprehensive list of places where you can adapt the code to your own purposes.
 
 ***
+
 ## Changes you can make to Distributed-CellProfiler outside of the Docker container
 
 * **Location of ECS configuration files:** By default these are placed into your bucket with a prefix of 'ecsconfigs/'.
@@ -29,14 +30,16 @@ This value can be modified in run.py .
 * **Distributed-CellProfiler version:** At least CellProfiler version 4.2.4, and use the DOCKERHUB_TAG in config.py as `bethcimini/distributed-cellprofiler:2.1.0_4.2.4_plugins`.
 * **Custom model: If using a [custom User-trained model](https://cellpose.readthedocs.io/en/latest/models.html) generated using Cellpose, add the model file to S3.
 We use the following structure to organize our files on S3.
-```
+
+```text
    └── <project_name>
       └── workspace
            └── model
                └── custom_model_filename
 ```
-* **RunCellpose module:** 
-    * Inside RunCellpose, select the "custom" Detection mode.
-    In "Location of the pre-trained model file", enter the mounted bucket path to your model. 
+
+* **RunCellpose module:**
+  * Inside RunCellpose, select the "custom" Detection mode.
+    In "Location of the pre-trained model file", enter the mounted bucket path to your model.
     e.g. **/home/ubuntu/bucket/projects/<project_name>/workspace/model/**
-    * In "Pre-trained model file name", enter your custom_model_filename
+  * In "Pre-trained model file name", enter your custom_model_filename
@@ -1,5 +1,6 @@
 # Using External Buckets
-Distributed-CellProfiler can read and/or write to/from an external S3 bucket (i.e. a bucket not in the same account as you are running DCP). 
+
+Distributed-CellProfiler can read and/or write to/from an external S3 bucket (i.e. a bucket not in the same account as you are running DCP).
 To do so, you will need to appropriately set your configuration in run.py.
 You may need additional configuration in AWS Identity and Access Management (IAM).
 
@@ -21,42 +22,50 @@ If you don't need to add UPLOAD_FLAGS, keep it as the default ''.
 ## Example configs
 
 ### Reading from the Cell Painting Gallery
-```
+
+```python
 AWS_REGION = 'your-region'              # e.g. 'us-east-1'
 AWS_PROFILE = 'default'                 # The same profile used by your AWS CLI installation
 SSH_KEY_NAME = 'your-key-file.pem'      # Expected to be in ~/.ssh
 AWS_BUCKET = 'bucket-name'              # Your bucket
 SOURCE_BUCKET = 'cellpainting-gallery'  
+WORKSPACE_BUCKET = 'bucket-name'        # Likely your bucket
 DESTINATION_BUCKET = 'bucket-name'      # Your bucket
 UPLOAD_FLAGS = ''                      
 ```
 
 ### Read/Write to a collaborator's bucket
-```
+
+```python
 AWS_REGION = 'your-region'              # e.g. 'us-east-1'
 AWS_PROFILE = 'role-permissions'        # A profile with the permissions setup described above
 SSH_KEY_NAME = 'your-key-file.pem'      # Expected to be in ~/.ssh
 AWS_BUCKET = 'bucket-name'              # Your bucket
 SOURCE_BUCKET = 'collaborator-bucket'  
+WORKSPACE_BUCKET = 'collaborator-bucket'
 DESTINATION_BUCKET = 'collaborator-bucket'    
-UPLOAD_FLAGS = '--acl bucket-owner-full-control --metadata-directive REPLACE'
+UPLOAD_FLAGS = '--acl bucket-owner-full-control --metadata-directive REPLACE' # Examples of flags that may be necessary
 ```
 
 ## Permissions setup
+
 If you are reading from a public bucket, no additional setup is necessary.
+Note that, depending on the configuration of that bucket, you may not be able to mount the public bucket so you will need to set `DOWNLOAD_FILES='True'`.
 
-If you are reading from a non-public bucket or writing to a bucket, you wil need further permissions setup.
+If you are reading from a non-public bucket or writing to a bucket that is not yours, you wil need further permissions setup.
 Often, access to someone else's AWS account is handled through a role that can be assumed.
 Learn more about AWS IAM roles [here](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html).
 Your collaborator will define the access limits of the role within their AWS IAM.
 You will also need to define role limits within your AWS IAM so that when you assume the role (giving you access to your collaborator's resource), that role also has the appropriate permissions to run DCP.
 
 ### In your AWS account
+
 In AWS IAM, for the role that has external bucket access, you will need to add all of the DCP permissions described in [Step 0](step_0_prep.md).
 
-You will also need to edit the trust relationship for the role so that ECS and EC2 can assume the role. 
+You will also need to edit the trust relationship for the role so that ECS and EC2 can assume the role.
 A template is as follows:
-```
+
+```json
 {
     "Version": "2012-10-17",
     "Statement": [
@@ -80,6 +89,7 @@ A template is as follows:
 ```
 
 ### In your DCP instance
+
 DCP reads your AWS_PROFILE from your [control node](step_0_prep.md#the-control-node).
 Edit your AWS CLI configuration files for assuming that role in your control node as follows:
 
@@ -95,4 +105,4 @@ In `~/.aws/credentials`, copy in the following text block at the bottom of the f
 
 [my-account-profile]
 aws_access_key_id = ACCESS_KEY
-aws_secret_access_key = SECRET_ACCESS_KEY
+aws_secret_access_key = SECRET_ACCESS_KEY
@@ -3,6 +3,7 @@
 **How do I run CellProfiler on Amazon?** Use Distributed-CellProfiler!
 
 Distributed-CellProfiler is a series of scripts designed to help you run a Dockerized version of CellProfiler on [Amazon Web Services](https://aws.amazon.com/) (AWS) using AWS's file storage and computing systems.  
+
 * Data is stored in S3 buckets.
 * Software is run on "Spot Fleets" of computers (or instances) in the cloud.
 
@@ -12,6 +13,7 @@ Docker is a software platform that packages software into containers.
 In a container is the software that you want to run as well as everything needed to run it (e.g. your software source code, operating system libraries, and dependencies).
 
 Dockerizing a workflow has many benefits including
+
 * Ease of use: Dockerized software doesn't require the user to install anything themselves.
 * Reproducibility: You don't need to worry about results being affected by the version of your software or its dependencies being used as those are fixed.
 
 
@@ -1,4 +1,4 @@
-## What happens in AWS when I run Distributed-CellProfiler?
+# What happens in AWS when I run Distributed-CellProfiler?
 
 The steps for actually running the Distributed-CellProfiler code are outlined in the repository [README](https://github.com/DistributedScience/Distributed-CellProfiler/blob/master/README.md), and details of the parameters you set in each step are on their respective Documentation pages ([Step 1: Config](step_1_configuration.md), [Step 2: Jobs](step_2_submit_jobs.md), [Step 3: Fleet](step_3_start_cluster.md), and optional [Step 4: Monitor](step_4_monitor.md)).
 We'll give an overview of what happens in AWS at each step here and explain what AWS does automatically once you have it set up.
@@ -8,6 +8,7 @@ We'll give an overview of what happens in AWS at each step here and explain what
 **Step 1**:
 In the Config file you set quite a number of specifics that are used by EC2, ECS, SQS, and in making Dockers.
 When you run `$ python3 run.py setup` to execute the Config, it does three major things:
+
 * Creates task definitions.
 These are found in ECS.
 They define the configuration of the Dockers and include the settings you gave for **CHECK_IF_DONE_BOOL**, **DOCKER_CORES**, **EXPECTED_NUMBER_FILES**, and **MEMORY**.
@@ -25,6 +26,7 @@ In the Config file you set the number and size of the EC2 instances you want.
 This information, along with account-specific configuration in the Fleet file is used to start the fleet with `$ python3 run.py startCluster`.
 
 **After these steps are complete, a number of things happen automatically**:
+
 * ECS puts Docker containers onto EC2 instances.
 If there is a mismatch within your Config file and the Docker is larger than the instance it will not be placed.
 ECS will keep placing Dockers onto an instance until it is full, so if you accidentally create instances that are too large you may end up with more Dockers placed on it than intended.
@@ -59,6 +61,7 @@ Read more about this and other configurations in [Step 1: Configuration](step_1_
 ## How do I determine my configuration?
 
 To some degree, you determine the best configuration for your needs through trial and error.  
+
 * Looking at the resources your software uses on your local computer when it runs your jobs can give you a sense of roughly how much hard drive and memory space each job requires, which can help you determine your group size and what machines to use.  
 * Prices of different machine sizes fluctuate, so the choice of which type of machines to use in your spot fleet is best determined at the time you run it.
 How long a job takes to run and how quickly you need the data may also affect how much you're willing to bid for any given machine.
@@ -67,12 +70,14 @@ However, you're also at a greater risk of running out of hard disk space.
 
 Keep an eye on all of the logs the first few times you run any workflow and you'll get a sense of whether your resources are being utilized well or if you need to do more tweaking.
 
- ## What does this look like on AWS?
+## What does this look like on AWS?
+
  The following five are the primary resources that Distributed-CellProfiler interacts with.
  After you have finished [preparing for Distributed-CellProfiler](step_0_prep), you do not need to directly interact with any of these services outside of Distributed-CellProfiler.
  If you would like a granular view of what Distributed-CellProfiler is doing while it runs, you can open each console in a separate tab in your browser and watch their individual behaviors, though this is not necessary, especially if you run the [monitor command](step_4_monitor.md) and/or have DS automatically create a Dashboard for you (see [Configuration](step_1_configuration.md)).
- * [S3 Console](https://console.aws.amazon.com/s3)
- * [EC2 Console](https://console.aws.amazon.com/ec2/)
- * [ECS Console](https://console.aws.amazon.com/ecs/)
- * [SQS Console](https://console.aws.amazon.com/sqs/)
- * [CloudWatch Console](https://console.aws.amazon.com/cloudwatch/)
+
+* [S3 Console](https://console.aws.amazon.com/s3)
+* [EC2 Console](https://console.aws.amazon.com/ec2/)
+* [ECS Console](https://console.aws.amazon.com/ecs/)
+* [SQS Console](https://console.aws.amazon.com/sqs/)
+* [CloudWatch Console](https://console.aws.amazon.com/cloudwatch/)