Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Copy this file to .env and fill in token placeholders.
# Port and image defaults are referenced by docker-compose.yml; override
# any of them here if you have a local collision.

# --- Image / platform knobs --------------------------------------------------
# Bump to mongo:7 by default. Override to mongo:4.4.28 for legacy CPUs
# without AVX support (older Macs, some VMs).
MONGO_IMAGE=mongo:7
# Force an image platform (e.g. linux/amd64) if you need to. Empty = native.
PLATFORM=

# --- Host ports --------------------------------------------------------------
WEB_PORT=3000
SERVICE_PORT=4000
CRAWLER_PORT=5000
MONGO_PORT=27017
SERVICE_DEBUG_PORT=9230
CRAWLER_DEBUG_PORT=9229

# --- Curation GitHub Info ----------------------------------------------------
# A GitHub token with minimal access (repo:public_repo) is required.
# The same token can be used for CRAWLER_GITHUB_TOKEN.
CURATION_GITHUB_BRANCH="master"
CURATION_GITHUB_OWNER="clearlydefined"
CURATION_GITHUB_REPO="curated-data-dev"
CURATION_GITHUB_TOKEN="<your GitHub token>"

# --- Curation Store Info -----------------------------------------------------
CURATION_MONGO_CONNECTION_STRING="mongodb://clearlydefined_mongo_db"
CURATION_MONGO_DB_NAME="clearlydefined"
CURATION_MONGO_COLLECTION_NAME="curations"
CURATION_PROVIDER="github"
CURATION_STORE_PROVIDER="mongo"

# --- GitLab info -------------------------------------------------------------
# A random string is fine unless you're working on GitLab API code.
GITLAB_TOKEN="<your GitLab token>"

# --- Definition Store Info ---------------------------------------------------
DEFINITION_STORE_PROVIDER="mongo"
DEFINITION_MONGO_CONNECTION_STRING="mongodb://clearlydefined_mongo_db"
DEFINITION_MONGO_DB_NAME="clearlydefined"
DEFINITION_MONGO_COLLECTION_NAME="definitions-paged"

# --- Harvest Store Info ------------------------------------------------------
HARVEST_STORE_PROVIDER="file"
FILE_STORE_LOCATION="/tmp/harvested_data"

# --- Webhooks ----------------------------------------------------------------
WEBHOOK_CRAWLER_SECRET="secret"
WEBHOOK_GITHUB_SECRET="secret"

# --- Crawler Info ------------------------------------------------------------
CRAWLER_API_URL="http://crawler:5000"
CRAWLER_GITHUB_TOKEN="<your GitHub token>"
CRAWLER_DEADLETTER_PROVIDER=cd(file)
CRAWLER_NAME=cdcrawlerlocal
CRAWLER_QUEUE_PROVIDER=memory
CRAWLER_STORE_PROVIDER=cd(file)
87 changes: 28 additions & 59 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,48 +37,12 @@ $ git clone git@github.com:clearlydefined/service.git
$ git clone git@github.com:clearlydefined/crawler.git
```

Alternately, you can edit the **docker-compose.yml** file to point to where you have those repos cloned on your local system:
Alternately, you can edit the **docker-compose.yml** file to point at clones elsewhere on your system by changing the `build.context` for `web`, `service`, and `crawler`.

**docker-compose.yml**
```bash
version: "3.8"
services:
web:
build:
context: <path-to-website-repo-on-your-system>
dockerfile: DevDockerfile
ports:
- "3000:3000"
stdin_open: true
service:
build:
context: <path-to-service-repo-on-your-system>
dockerfile: DevDockerfile
ports:
- "4000:4000"
env_file: .env
volumes:
- ./harvested_data:/tmp/harvested_data/
links:
- clearlydefined_mongo_db
- crawler
crawler:
build:
context: <path-to-crawler-repo-on-your-system>
dockerfile: DevDockerfile
env_file: .env
volumes:
- ./harvested_data:/tmp/harvested_data/
ports:
- "5000:5000"
```
If your CPU does not support AVX (older Macs, some VMs), override the mongo image in your `.env`:

If you are using a Mac computer and your CPU does not have a AVX support, please change the version of Mongo from 5.0.6 to 4.4.28 in docker-compose.yml file.
```bash
clearlydefined_mongo_db:
image: "mongo:4.4.28"
ports:
- "27017:27017"
```env
MONGO_IMAGE=mongo:4.4.28
```

### Setting up environmental variables
Expand All @@ -87,13 +51,13 @@ This environment handles environmental variables a little differently from the [

The docker-compose.yml file loads environmental variables from a **.env** file.

To set this up, copy the **sample_env** file in this repo to **.env**
To set this up, copy the **.env.example** file in this repo to **.env**:

```bash
$ cp sample_env .env
$ cp .env.example .env
```

And add in appropriate values to the .env file:
And fill in token placeholders. Port and image defaults (`SERVICE_PORT`, `MONGO_IMAGE`, `PLATFORM`, etc.) are also configurable there — change them locally if you have a port collision or need a different mongo image.

(You will need a [GitHub token](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/creating-a-personal-access-token) with minimal permissions)

Expand All @@ -108,7 +72,7 @@ CURATION_GITHUB_REPO="curated-data-dev"
CURATION_GITHUB_TOKEN="<your GitHub token>"

# GitLab info
GITLAB_TOKEN="<your GitLab token (unless you are working on code that interacts with the GitLab API, this can be a random string of characters)>
GITLAB_TOKEN="<your GitLab token (unless you are working on code that interacts with the GitLab API, this can be a random string of characters)>

# Curation Store Info
CURATION_MONGO_CONNECTION_STRING="mongodb://clearlydefined_mongo_db"
Expand Down Expand Up @@ -141,14 +105,21 @@ CRAWLER_QUEUE_PROVIDER=memory
CRAWLER_STORE_PROVIDER=cd(file)
```

Now, from withing your **docker_dev_env_experiment** directory, run:
Now, from within your **docker_dev_env_experiment** directory, run:

```bash
$ docker-compose build
$ docker-compose up
$ docker compose build
$ docker compose up
```

Or, if you have [Task](https://taskfile.dev) installed:

```bash
$ task build
$ task up
```

*NOTE: If you have an issue seeding, prune all volumes, containers, and images then try again.*
The mongo seed is idempotent (it upserts), so re-running `task seed` or letting the seed container run again on startup will not duplicate data.

And head to http://localhost:3000 to see your running website UI along with some seeded data!

Expand All @@ -168,9 +139,11 @@ The `docker-compose-azurite.yml` file is modified to include the Azurite Docker
- Replace the token placeholders in `azurite_localdb_env` and rename the file to `azurite_localdb.env`.
- Run the command:
```bash
docker-compose -f docker-compose-azurite.yml up
docker compose -f docker-compose.yml -f docker-compose-azurite.yml up
```

(Or `task up-azurite`.) The Azurite compose file is now an override that layers Azurite + Redis on top of the base stack instead of duplicating every service definition.

To view the data stored in Azurite, you can use Storage Explorer. See the [documentation](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azurite?tabs=docker-hub%2Cblob-storage#microsoft-azure-storage-explorer) for connection information.


Expand Down Expand Up @@ -276,7 +249,7 @@ The database contains the following collections:
* definitions-paged (contains definitions)
* definitions-trimmed (contains definitions without files)

Our production Azure setup includes definitions-trimmed, that is actively used. The reason the definitions collection is called definitions-trimmed is because, previously, the definitions collection was paged. The pagination was added in [this January 2019 pull request](https://github.com/clearlydefined/service/pull/364). To improve performance and reduce cost of the definition database, [this Feb 2023 pull request](https://github.com/clearlydefined/service/pull/976) subsequently stores definitions without files.
Our production Azure setup includes definitions-trimmed, that is actively used. The reason the definitions collection is called definitions-trimmed is because, previously, the definitions collection was paged. The pagination was added in [this January 2019 pull request](https://github.com/clearlydefined/service/pull/364). To improve performance and reduce cost of the definition database, [this Feb 2023 pull request](https://github.com/clearlydefined/service/pull/976) subsequently stores definitions without files.

The sample setup uses definitions-paged instead of definitions-trimmed because definitions-trimmed works well as part of a definition store collection, but not on its own. In comparison, definitions-paged collection stores the definitions in entirety, and can be used as a standalone definitions store. To emulate the production environment more closely, one can use dispatch+file+mongoTrimmed as DEFINITION_STORE_PROVIDER.

Expand Down Expand Up @@ -309,23 +282,19 @@ This will show all of the logs from all of the container in your current shell.

### Rebuilding after changes

When you make changes to one of the code bases, you do need to rebuild the Docker images.

If you were to make a change to the website, and want to rebuild only that container, you can do so by running:
When you make changes to one of the code bases, you currently need to rebuild the Docker image for that service. Rebuild a single container with:

```bash
$ docker-compose up --detach --build web
$ docker compose up --detach --build service
```

The same will work for the service and the crawler:
Or with Task:

```bash
$ docker-compose up --detach --build service
$ task rebuild SVC=service
```

```bash
$ docker-compose up --detach --build crawler
```
(Hot-reload via bind mounts + nodemon is on the roadmap; see `docker-compose.yml` for the current entrypoint overrides used for `--inspect` debugging.)

### Limitations

Expand Down
84 changes: 84 additions & 0 deletions Taskfile.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Copyright (c) Microsoft Corporation and others. Licensed under the MIT license.
# SPDX-License-Identifier: MIT
#
# Task runner for the ClearlyDefined local dev env. Install Task from
# https://taskfile.dev and run `task --list` to see available targets.

version: "3"

vars:
COMPOSE: docker compose
AZURITE_COMPOSE: docker compose -f docker-compose.yml -f docker-compose-azurite.yml

tasks:
default:
desc: List available tasks
cmds:
- task --list

init:
desc: Create .env from .env.example if it doesn't exist
cmds:
- test -f .env || cp .env.example .env
- echo "Edit .env to fill in your GitHub / GitLab tokens."

build:
desc: Build all images
cmds:
- "{{.COMPOSE}} build"

up:
desc: Start the full dev stack in the background
cmds:
- "{{.COMPOSE}} up -d"

up-fg:
desc: Start the dev stack in the foreground (Ctrl-C to stop)
cmds:
- "{{.COMPOSE}} up"

up-azurite:
desc: Start the dev stack with Azurite + Redis layered on top
cmds:
- "{{.AZURITE_COMPOSE}} up -d"

down:
desc: Stop containers (keeps volumes)
cmds:
- "{{.COMPOSE}} down"

nuke:
desc: Stop containers AND remove volumes (resets seeded data)
prompt: This will delete the mongo volume. Continue?
cmds:
- "{{.COMPOSE}} down -v"

rebuild:
desc: 'Rebuild a single service. Usage: task rebuild SVC=service'
cmds:
- "{{.COMPOSE}} up -d --build {{.SVC}}"
requires:
vars: [SVC]

logs:
desc: 'Tail logs. Usage: task logs SVC=service'
cmds:
- "{{.COMPOSE}} logs -f {{.SVC}}"
requires:
vars: [SVC]

seed:
desc: Re-run the mongo seed container (idempotent upsert)
cmds:
- "{{.COMPOSE}} run --rm clearlydefined_mongo_seed"

ps:
desc: Show container status
cmds:
- "{{.COMPOSE}} ps"

smoke:
desc: Smoke test the running stack
cmds:
- 'curl -fsS "http://localhost:${SERVICE_PORT:-4000}/" && echo'
- 'curl -fsS -o /dev/null -w "web: %{http_code}\n" "http://localhost:${WEB_PORT:-3000}/"'
Loading