diff --git a/.env.example b/.env.example new file mode 100644 index 0000000..2537bad --- /dev/null +++ b/.env.example @@ -0,0 +1,59 @@ +# Copy this file to .env and fill in token placeholders. +# Port and image defaults are referenced by docker-compose.yml; override +# any of them here if you have a local collision. + +# --- Image / platform knobs -------------------------------------------------- +# Bump to mongo:7 by default. Override to mongo:4.4.28 for legacy CPUs +# without AVX support (older Macs, some VMs). +MONGO_IMAGE=mongo:7 +# Force an image platform (e.g. linux/amd64) if you need to. Empty = native. +PLATFORM= + +# --- Host ports -------------------------------------------------------------- +WEB_PORT=3000 +SERVICE_PORT=4000 +CRAWLER_PORT=5000 +MONGO_PORT=27017 +SERVICE_DEBUG_PORT=9230 +CRAWLER_DEBUG_PORT=9229 + +# --- Curation GitHub Info ---------------------------------------------------- +# A GitHub token with minimal access (repo:public_repo) is required. +# The same token can be used for CRAWLER_GITHUB_TOKEN. +CURATION_GITHUB_BRANCH="master" +CURATION_GITHUB_OWNER="clearlydefined" +CURATION_GITHUB_REPO="curated-data-dev" +CURATION_GITHUB_TOKEN="" + +# --- Curation Store Info ----------------------------------------------------- +CURATION_MONGO_CONNECTION_STRING="mongodb://clearlydefined_mongo_db" +CURATION_MONGO_DB_NAME="clearlydefined" +CURATION_MONGO_COLLECTION_NAME="curations" +CURATION_PROVIDER="github" +CURATION_STORE_PROVIDER="mongo" + +# --- GitLab info ------------------------------------------------------------- +# A random string is fine unless you're working on GitLab API code. +GITLAB_TOKEN="" + +# --- Definition Store Info --------------------------------------------------- +DEFINITION_STORE_PROVIDER="mongo" +DEFINITION_MONGO_CONNECTION_STRING="mongodb://clearlydefined_mongo_db" +DEFINITION_MONGO_DB_NAME="clearlydefined" +DEFINITION_MONGO_COLLECTION_NAME="definitions-paged" + +# --- Harvest Store Info ------------------------------------------------------ +HARVEST_STORE_PROVIDER="file" +FILE_STORE_LOCATION="/tmp/harvested_data" + +# --- Webhooks ---------------------------------------------------------------- +WEBHOOK_CRAWLER_SECRET="secret" +WEBHOOK_GITHUB_SECRET="secret" + +# --- Crawler Info ------------------------------------------------------------ +CRAWLER_API_URL="http://crawler:5000" +CRAWLER_GITHUB_TOKEN="" +CRAWLER_DEADLETTER_PROVIDER=cd(file) +CRAWLER_NAME=cdcrawlerlocal +CRAWLER_QUEUE_PROVIDER=memory +CRAWLER_STORE_PROVIDER=cd(file) diff --git a/README.md b/README.md index cae10ff..370ca20 100644 --- a/README.md +++ b/README.md @@ -37,48 +37,12 @@ $ git clone git@github.com:clearlydefined/service.git $ git clone git@github.com:clearlydefined/crawler.git ``` -Alternately, you can edit the **docker-compose.yml** file to point to where you have those repos cloned on your local system: +Alternately, you can edit the **docker-compose.yml** file to point at clones elsewhere on your system by changing the `build.context` for `web`, `service`, and `crawler`. -**docker-compose.yml** -```bash -version: "3.8" -services: - web: - build: - context: - dockerfile: DevDockerfile - ports: - - "3000:3000" - stdin_open: true - service: - build: - context: - dockerfile: DevDockerfile - ports: - - "4000:4000" - env_file: .env - volumes: - - ./harvested_data:/tmp/harvested_data/ - links: - - clearlydefined_mongo_db - - crawler - crawler: - build: - context: - dockerfile: DevDockerfile - env_file: .env - volumes: - - ./harvested_data:/tmp/harvested_data/ - ports: - - "5000:5000" -``` +If your CPU does not support AVX (older Macs, some VMs), override the mongo image in your `.env`: -If you are using a Mac computer and your CPU does not have a AVX support, please change the version of Mongo from 5.0.6 to 4.4.28 in docker-compose.yml file. -```bash -clearlydefined_mongo_db: - image: "mongo:4.4.28" - ports: - - "27017:27017" +```env +MONGO_IMAGE=mongo:4.4.28 ``` ### Setting up environmental variables @@ -87,13 +51,13 @@ This environment handles environmental variables a little differently from the [ The docker-compose.yml file loads environmental variables from a **.env** file. -To set this up, copy the **sample_env** file in this repo to **.env** +To set this up, copy the **.env.example** file in this repo to **.env**: ```bash -$ cp sample_env .env +$ cp .env.example .env ``` -And add in appropriate values to the .env file: +And fill in token placeholders. Port and image defaults (`SERVICE_PORT`, `MONGO_IMAGE`, `PLATFORM`, etc.) are also configurable there — change them locally if you have a port collision or need a different mongo image. (You will need a [GitHub token](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/creating-a-personal-access-token) with minimal permissions) @@ -108,7 +72,7 @@ CURATION_GITHUB_REPO="curated-data-dev" CURATION_GITHUB_TOKEN="" # GitLab info -GITLAB_TOKEN=" +GITLAB_TOKEN=" # Curation Store Info CURATION_MONGO_CONNECTION_STRING="mongodb://clearlydefined_mongo_db" @@ -141,14 +105,21 @@ CRAWLER_QUEUE_PROVIDER=memory CRAWLER_STORE_PROVIDER=cd(file) ``` -Now, from withing your **docker_dev_env_experiment** directory, run: +Now, from within your **docker_dev_env_experiment** directory, run: ```bash -$ docker-compose build -$ docker-compose up +$ docker compose build +$ docker compose up +``` + +Or, if you have [Task](https://taskfile.dev) installed: + +```bash +$ task build +$ task up ``` -*NOTE: If you have an issue seeding, prune all volumes, containers, and images then try again.* +The mongo seed is idempotent (it upserts), so re-running `task seed` or letting the seed container run again on startup will not duplicate data. And head to http://localhost:3000 to see your running website UI along with some seeded data! @@ -168,9 +139,11 @@ The `docker-compose-azurite.yml` file is modified to include the Azurite Docker - Replace the token placeholders in `azurite_localdb_env` and rename the file to `azurite_localdb.env`. - Run the command: ```bash -docker-compose -f docker-compose-azurite.yml up +docker compose -f docker-compose.yml -f docker-compose-azurite.yml up ``` +(Or `task up-azurite`.) The Azurite compose file is now an override that layers Azurite + Redis on top of the base stack instead of duplicating every service definition. + To view the data stored in Azurite, you can use Storage Explorer. See the [documentation](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azurite?tabs=docker-hub%2Cblob-storage#microsoft-azure-storage-explorer) for connection information. @@ -276,7 +249,7 @@ The database contains the following collections: * definitions-paged (contains definitions) * definitions-trimmed (contains definitions without files) -Our production Azure setup includes definitions-trimmed, that is actively used. The reason the definitions collection is called definitions-trimmed is because, previously, the definitions collection was paged. The pagination was added in [this January 2019 pull request](https://github.com/clearlydefined/service/pull/364). To improve performance and reduce cost of the definition database, [this Feb 2023 pull request](https://github.com/clearlydefined/service/pull/976) subsequently stores definitions without files. +Our production Azure setup includes definitions-trimmed, that is actively used. The reason the definitions collection is called definitions-trimmed is because, previously, the definitions collection was paged. The pagination was added in [this January 2019 pull request](https://github.com/clearlydefined/service/pull/364). To improve performance and reduce cost of the definition database, [this Feb 2023 pull request](https://github.com/clearlydefined/service/pull/976) subsequently stores definitions without files. The sample setup uses definitions-paged instead of definitions-trimmed because definitions-trimmed works well as part of a definition store collection, but not on its own. In comparison, definitions-paged collection stores the definitions in entirety, and can be used as a standalone definitions store. To emulate the production environment more closely, one can use dispatch+file+mongoTrimmed as DEFINITION_STORE_PROVIDER. @@ -309,23 +282,19 @@ This will show all of the logs from all of the container in your current shell. ### Rebuilding after changes -When you make changes to one of the code bases, you do need to rebuild the Docker images. - -If you were to make a change to the website, and want to rebuild only that container, you can do so by running: +When you make changes to one of the code bases, you currently need to rebuild the Docker image for that service. Rebuild a single container with: ```bash -$ docker-compose up --detach --build web +$ docker compose up --detach --build service ``` -The same will work for the service and the crawler: +Or with Task: ```bash -$ docker-compose up --detach --build service +$ task rebuild SVC=service ``` -```bash -$ docker-compose up --detach --build crawler -``` +(Hot-reload via bind mounts + nodemon is on the roadmap; see `docker-compose.yml` for the current entrypoint overrides used for `--inspect` debugging.) ### Limitations diff --git a/Taskfile.yml b/Taskfile.yml new file mode 100644 index 0000000..2052552 --- /dev/null +++ b/Taskfile.yml @@ -0,0 +1,84 @@ +# Copyright (c) Microsoft Corporation and others. Licensed under the MIT license. +# SPDX-License-Identifier: MIT +# +# Task runner for the ClearlyDefined local dev env. Install Task from +# https://taskfile.dev and run `task --list` to see available targets. + +version: "3" + +vars: + COMPOSE: docker compose + AZURITE_COMPOSE: docker compose -f docker-compose.yml -f docker-compose-azurite.yml + +tasks: + default: + desc: List available tasks + cmds: + - task --list + + init: + desc: Create .env from .env.example if it doesn't exist + cmds: + - test -f .env || cp .env.example .env + - echo "Edit .env to fill in your GitHub / GitLab tokens." + + build: + desc: Build all images + cmds: + - "{{.COMPOSE}} build" + + up: + desc: Start the full dev stack in the background + cmds: + - "{{.COMPOSE}} up -d" + + up-fg: + desc: Start the dev stack in the foreground (Ctrl-C to stop) + cmds: + - "{{.COMPOSE}} up" + + up-azurite: + desc: Start the dev stack with Azurite + Redis layered on top + cmds: + - "{{.AZURITE_COMPOSE}} up -d" + + down: + desc: Stop containers (keeps volumes) + cmds: + - "{{.COMPOSE}} down" + + nuke: + desc: Stop containers AND remove volumes (resets seeded data) + prompt: This will delete the mongo volume. Continue? + cmds: + - "{{.COMPOSE}} down -v" + + rebuild: + desc: 'Rebuild a single service. Usage: task rebuild SVC=service' + cmds: + - "{{.COMPOSE}} up -d --build {{.SVC}}" + requires: + vars: [SVC] + + logs: + desc: 'Tail logs. Usage: task logs SVC=service' + cmds: + - "{{.COMPOSE}} logs -f {{.SVC}}" + requires: + vars: [SVC] + + seed: + desc: Re-run the mongo seed container (idempotent upsert) + cmds: + - "{{.COMPOSE}} run --rm clearlydefined_mongo_seed" + + ps: + desc: Show container status + cmds: + - "{{.COMPOSE}} ps" + + smoke: + desc: Smoke test the running stack + cmds: + - 'curl -fsS "http://localhost:${SERVICE_PORT:-4000}/" && echo' + - 'curl -fsS -o /dev/null -w "web: %{http_code}\n" "http://localhost:${WEB_PORT:-3000}/"' diff --git a/docker-compose-azurite.yml b/docker-compose-azurite.yml index 9cda5b5..a075926 100644 --- a/docker-compose-azurite.yml +++ b/docker-compose-azurite.yml @@ -1,23 +1,18 @@ # (c) Copyright 2025, SAP SE and ClearlyDefined contributors. Licensed under the MIT license. # SPDX-License-Identifier: MIT +# +# Override file: layer Azurite + Redis on top of docker-compose.yml. +# +# Usage: +# cp azurite_localdb_env azurite_localdb.env # fill in tokens +# docker compose -f docker-compose.yml -f docker-compose-azurite.yml up +# +# The base compose file provides web / service / crawler / mongo / seed; this +# file only adds azurite + redis, swaps env_file to azurite_localdb.env, and +# wires the extra dependencies. No service definitions are duplicated. services: - web: - platform: linux/amd64 - build: - context: ./website - dockerfile: DevDockerfile - ports: - - "3000:3000" - stdin_open: true service: - platform: linux/amd64 - build: - context: ./service - dockerfile: DevDockerfile - ports: - - "4000:4000" - - "9230:9229" env_file: azurite_localdb.env environment: - NODE_OPTIONS=--trace-deprecation @@ -27,37 +22,27 @@ services: - CACHING_REDIS_SERVICE=redis - CACHING_REDIS_PORT=6379 - CACHING_REDIS_TLS=false - entrypoint: - - "node" - - "--inspect=0.0.0.0:9229" - - "./bin/www" - links: - - clearlydefined_mongo_db - - crawler - - azurite - - redis depends_on: + clearlydefined_mongo_db: + condition: service_healthy + crawler: + condition: service_started + azurite: + condition: service_healthy redis: condition: service_healthy + crawler: - platform: linux/amd64 - build: - context: ./crawler - dockerfile: DevDockerfile env_file: azurite_localdb.env environment: - CRAWLER_ECHO=true - ports: - - "5000:5000" - - "9229:9229" - entrypoint: - - "node" - - "--inspect=0.0.0.0:9229" - - "./index.js" - links: - - azurite + depends_on: + azurite: + condition: service_healthy + azurite: image: "mcr.microsoft.com/azure-storage/azurite" + platform: ${PLATFORM:-} command: "azurite --blobHost 0.0.0.0 --queueHost 0.0.0.0 --tableHost 0.0.0.0 --skipApiVersionCheck" ports: - "10000:10000" @@ -65,17 +50,16 @@ services: - "10002:10002" volumes: - ./azurite_data:/data - clearlydefined_mongo_db: - image: "mongo:4.4.28" - #image: "mongo:5.0.6" - ports: - - "27017:27017" - clearlydefined_mongo_seed: - image: "clearlydefined/docker_dev_env_experiment_clearlydefined_mongo_seed" - links: - - clearlydefined_mongo_db + healthcheck: + test: ["CMD-SHELL", "nc -z localhost 10000 || exit 1"] + interval: 5s + timeout: 3s + retries: 20 + start_period: 5s + redis: - image: "redis:6.0.14-alpine" + image: "redis:7-alpine" + platform: ${PLATFORM:-} ports: - "6379:6379" healthcheck: diff --git a/docker-compose.yml b/docker-compose.yml old mode 100755 new mode 100644 index 7b50056..958c334 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -1,24 +1,23 @@ # Copyright (c) Microsoft Corporation and others. Licensed under the MIT license. # SPDX-License-Identifier: MIT -version: "3.8" services: web: - platform: linux/amd64 + platform: ${PLATFORM:-} build: context: ./website dockerfile: DevDockerfile ports: - - "3000:3000" + - "${WEB_PORT:-3000}:3000" stdin_open: true service: - platform: linux/amd64 + platform: ${PLATFORM:-} build: context: ./service dockerfile: DevDockerfile ports: - - "4000:4000" - - "9230:9229" + - "${SERVICE_PORT:-4000}:4000" + - "${SERVICE_DEBUG_PORT:-9230}:9229" env_file: .env entrypoint: - "node" @@ -26,11 +25,19 @@ services: - "./bin/www" volumes: - ./harvested_data:/tmp/harvested_data - links: - - clearlydefined_mongo_db - - crawler + depends_on: + clearlydefined_mongo_db: + condition: service_healthy + crawler: + condition: service_started + healthcheck: + test: ["CMD-SHELL", "wget -qO- http://localhost:4000/ || exit 1"] + interval: 10s + timeout: 5s + retries: 12 + start_period: 30s crawler: - platform: linux/amd64 + platform: ${PLATFORM:-} build: context: ./crawler dockerfile: DevDockerfile @@ -38,18 +45,40 @@ services: volumes: - ./harvested_data:/tmp/harvested_data ports: - - "5000:5000" - - "9229:9229" + - "${CRAWLER_PORT:-5000}:5000" + - "${CRAWLER_DEBUG_PORT:-9229}:9229" entrypoint: - "node" - "--inspect=0.0.0.0:9229" - "./index.js" + healthcheck: + test: ["CMD-SHELL", "wget -qO- http://localhost:5000/ || exit 1"] + interval: 10s + timeout: 5s + retries: 12 + start_period: 30s clearlydefined_mongo_db: - # image: "mongo:4.4.28" - image: "mongo:5.0.6" + image: ${MONGO_IMAGE:-mongo:7} + platform: ${PLATFORM:-} ports: - - "27017:27017" + - "${MONGO_PORT:-27017}:27017" + volumes: + - mongo_data:/data/db + healthcheck: + test: + - "CMD-SHELL" + - "mongosh --quiet --eval 'db.adminCommand({ ping: 1 }).ok' 2>/dev/null | grep -q 1 || mongo --quiet --eval 'db.adminCommand({ ping: 1 }).ok' 2>/dev/null | grep -q 1" + interval: 5s + timeout: 5s + retries: 20 + start_period: 10s clearlydefined_mongo_seed: - image: "clearlydefined/docker_dev_env_experiment_clearlydefined_mongo_seed" - links: - - clearlydefined_mongo_db + build: ./mongo_seed + platform: ${PLATFORM:-} + depends_on: + clearlydefined_mongo_db: + condition: service_healthy + restart: "no" + +volumes: + mongo_data: diff --git a/mongo_seed/Dockerfile b/mongo_seed/Dockerfile index 5c07d56..debc6e7 100644 --- a/mongo_seed/Dockerfile +++ b/mongo_seed/Dockerfile @@ -1,9 +1,15 @@ # Copyright (c) Microsoft Corporation and others. Licensed under the MIT license. # SPDX-License-Identifier: MIT -FROM mongo:5.0.6 -RUN mkdir /definitions -COPY ./definitions/* definitions/ -COPY ./curations/* curations/ -COPY ./seed_data.sh seed_data.sh -CMD sh seed_data.sh +# Use the same major as the runtime mongo (default 7). Override via build arg +# to match older mongo versions if necessary. +ARG MONGO_TAG=7 +FROM mongo:${MONGO_TAG} + +WORKDIR /seed +COPY ./definitions/ ./definitions/ +COPY ./curations/ ./curations/ +COPY ./seed_data.sh ./seed_data.sh +RUN chmod +x ./seed_data.sh + +CMD ["./seed_data.sh"] diff --git a/mongo_seed/seed_data.sh b/mongo_seed/seed_data.sh old mode 100644 new mode 100755 index b47d30c..450b861 --- a/mongo_seed/seed_data.sh +++ b/mongo_seed/seed_data.sh @@ -1,8 +1,46 @@ +#!/usr/bin/env bash # Copyright (c) Microsoft Corporation and others. Licensed under the MIT license. # SPDX-License-Identifier: MIT +# +# Seed the local ClearlyDefined Mongo with sample definitions and curations. +# Idempotent: re-running will upsert documents, so you don't need to destroy +# volumes to re-seed. -mongoimport --host clearlydefined_mongo_db --db clearlydefined --collection definitions-trimmed --type json --file definitions/angular.json --jsonArray -mongoimport --host clearlydefined_mongo_db --db clearlydefined --collection definitions-trimmed --type json --file definitions/flyway-maven-plugin.json --jsonArray -mongoimport --host clearlydefined_mongo_db --db clearlydefined --collection definitions-paged --type json --file definitions/angular-paged.json --jsonArray -mongoimport --host clearlydefined_mongo_db --db clearlydefined --collection definitions-paged --type json --file definitions/flyway-maven-plugin-paged.json --jsonArray -mongoimport --host clearlydefined_mongo_db --db clearlydefined --collection curations --type json --file curations/387.json --jsonArray +set -euo pipefail + +HOST="${MONGO_HOST:-clearlydefined_mongo_db}" +PORT="${MONGO_PORT:-27017}" +DB="${MONGO_DB:-clearlydefined}" + +echo "Waiting for mongo at ${HOST}:${PORT}..." +for i in $(seq 1 60); do + if mongosh --quiet --host "${HOST}" --port "${PORT}" --eval 'db.adminCommand({ ping: 1 }).ok' 2>/dev/null | grep -q 1; then + echo "Mongo is ready." + break + fi + if [ "${i}" -eq 60 ]; then + echo "Mongo never became ready." >&2 + exit 1 + fi + sleep 1 +done + +import() { + local collection="$1" + local file="$2" + echo "Seeding ${collection} from ${file}..." + mongoimport \ + --host "${HOST}" --port "${PORT}" \ + --db "${DB}" --collection "${collection}" \ + --type json --jsonArray \ + --mode=upsert \ + --file "${file}" +} + +import definitions-trimmed definitions/angular.json +import definitions-trimmed definitions/flyway-maven-plugin.json +import definitions-paged definitions/angular-paged.json +import definitions-paged definitions/flyway-maven-plugin-paged.json +import curations curations/387.json + +echo "Seed complete." diff --git a/sample_env b/sample_env deleted file mode 100644 index 0b8af6f..0000000 --- a/sample_env +++ /dev/null @@ -1,46 +0,0 @@ -# Curation GitHub Info -# Note: -# - GitHub token with minimal access (e.g. repo:public_repo) is required -# - This can be the same token used for CRAWLER_GITHUB_TOKEN. -CURATION_GITHUB_BRANCH="master" -CURATION_GITHUB_OWNER="clearlydefined" -CURATION_GITHUB_REPO="curated-data-dev" -CURATION_GITHUB_TOKEN="" - -# Curation Store Info -CURATION_MONGO_CONNECTION_STRING="mongodb://clearlydefined_mongo_db" -CURATION_MONGO_DB_NAME="clearlydefined" -CURATION_MONGO_COLLECTION_NAME="curations" -CURATION_PROVIDER="github" -CURATION_STORE_PROVIDER="mongo" - -# GitLab info -GITLAB_TOKEN="" - -# Definition Store Info -DEFINITION_STORE_PROVIDER="mongo" -DEFINITION_MONGO_CONNECTION_STRING="mongodb://clearlydefined_mongo_db" -DEFINITION_MONGO_DB_NAME="clearlydefined" -DEFINITION_MONGO_COLLECTION_NAME="definitions-paged" - -# Harvest Store Info -HARVEST_STORE_PROVIDER="file" - -# Note: -# - This is mounted as a volume into the container for the clearly defined service. -# See docker-compose.yml for more details. -FILE_STORE_LOCATION="/tmp/harvested_data" - -WEBHOOK_CRAWLER_SECRET="secret" -WEBHOOK_GITHUB_SECRET="secret" - -# Crawler Info -# Note -# - GitHub token with minimal access (e.g. repo:public_repo) is required. -# - This can be the same token used for CURATION_GITHUB_TOKEN -CRAWLER_API_URL="http://crawler:5000" -CRAWLER_GITHUB_TOKEN="" -CRAWLER_DEADLETTER_PROVIDER=cd(file) -CRAWLER_NAME=cdcrawlerlocal -CRAWLER_QUEUE_PROVIDER=memory -CRAWLER_STORE_PROVIDER=cd(file)