Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 43 additions & 13 deletions ENV.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,45 @@ Please ensure these are properly defined in a `.env` file in the root directory.

[^1:] The user account in question will require elevated permissions to access certain endpoints. At a minimum, the user will require the `source_collector` and `db_write` permissions.

# Flags

Flags are used to enable/disable certain features. They are set to `1` to enable the feature and `0` to disable the feature. By default, all flags are enabled.

## Configuration Flags

Configuration flags are used to enable/disable certain configurations.

| Flag | Description |
|--------------|------------------------------------|
| `POST_TO_DISCORD_FLAG` | Enables posting errors to discord. |


## Task Flags
Task flags are used to enable/disable certain tasks. They are set to `1` to enable the task and `0` to disable the task. By default, all tasks are enabled.
Task flags are used to enable/disable certain tasks.

Note that some tasks/subtasks are themselves enabled by other tasks.

### Scheduled Task Flags

| Flag | Description |
|-------------------------------------|--------------------------------------------------------------------|
| `SCHEDULED_TASKS_FLAG` | All scheduled tasks. Disabling disables all other scheduled tasks. |
| `SYNC_AGENCIES_TASK_FLAG` | Synchonize agencies from Data Sources App. |
| `SYNC_DATA_SOURCES_TASK_FLAG` | Synchonize data sources from Data Sources App. |
| `PUSH_TO_HUGGING_FACE_TASK_FLAG` | Pushes data to HuggingFace. |
| `POPULATE_BACKLOG_SNAPSHOT_TASK_FLAG` | Populates the backlog snapshot. |
| `DELETE_OLD_LOGS_TASK_FLAG` | Deletes old logs. |
| `RUN_URL_TASKS_TASK_FLAG` | Runs URL tasks. |
| `IA_PROBE_TASK_FLAG` | Extracts and links Internet Archives metadata to URLs. |
| `IA_SAVE_TASK_FLAG` | Saves URLs to Internet Archives. |

### URL Task Flags

URL Task Flags are collectively controlled by the `RUN_URL_TASKS_TASK_FLAG` flag.

The following flags are available:

| Flag | Description |
|-------------------------------------|--------------------------------------------------------|
| `SCHEDULED_TASKS_FLAG` | All scheduled tasks. |
| Flag | Description |
|-------------------------------------|--------------------------------------------------------------------|
| `URL_HTML_TASK_FLAG` | URL HTML scraping task. |
| `URL_RECORD_TYPE_TASK_FLAG` | Automatically assigns Record Types to URLs. |
| `URL_AGENCY_IDENTIFICATION_TASK_FLAG` | Automatically assigns and suggests Agencies for URLs. |
Expand All @@ -45,14 +76,13 @@ The following flags are available:
| `URL_AUTO_RELEVANCE_TASK_FLAG` | Automatically assigns Relevances to URLs. |
| `URL_PROBE_TASK_FLAG` | Probes URLs for web metadata. |
| `URL_ROOT_URL_TASK_FLAG` | Extracts and links Root URLs to URLs. |
| `SYNC_AGENCIES_TASK_FLAG` | Synchonize agencies from Data Sources App. |
| `SYNC_DATA_SOURCES_TASK_FLAG` | Synchonize data sources from Data Sources App. |
| `PUSH_TO_HUGGING_FACE_TASK_FLAG` | Pushes data to HuggingFace. |
| `POPULATE_BACKLOG_SNAPSHOT_TASK_FLAG` | Populates the backlog snapshot. |
| `DELETE_OLD_LOGS_TASK_FLAG` | Deletes old logs. |
| `RUN_URL_TASKS_TASK_FLAG` | Runs URL tasks. |
| `IA_PROBE_TASK_FLAG` | Extracts and links Internet Archives metadata to URLs. |
| `IA_SAVE_TASK_FLAG` | Saves URLs to Internet Archives. |

### Agency ID Subtasks

Agency ID Subtasks are collectively disabled by the `URL_AGENCY_IDENTIFICATION_TASK_FLAG` flag.

| Flag | Description |
|-------------------------------------|--------------------------------------------------------------------|
| `AGENCY_ID_HOMEPAGE_MATCH_FLAG` | Enables the homepage match subtask for agency identification. |
| `AGENCY_ID_NLP_LOCATION_MATCH_FLAG` | Enables the NLP location match subtask for agency identification. |
| `AGENCY_ID_CKAN_FLAG` | Enables the CKAN subtask for agency identification. |
Expand Down
15 changes: 11 additions & 4 deletions src/api/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,13 @@
from src.external.internet_archives.client import InternetArchivesClient
from src.external.pdap.client import PDAPClient
from src.external.url_request.core import URLRequestInterface

from environs import Env

@asynccontextmanager
async def lifespan(app: FastAPI):
env_var_manager = EnvVarManager.get()
env = Env()
env.read_env()

# Initialize shared dependencies
db_client = DatabaseClient(
Expand All @@ -57,11 +59,16 @@ async def lifespan(app: FastAPI):

session = aiohttp.ClientSession()

task_handler = TaskHandler(
adb_client=adb_client,
discord_poster=DiscordPoster(
if env.bool("POST_TO_DISCORD_FLAG", True):
discord_poster = DiscordPoster(
webhook_url=env_var_manager.discord_webhook_url
)
else:
discord_poster = None

task_handler = TaskHandler(
adb_client=adb_client,
discord_poster=discord_poster
)
pdap_client = PDAPClient(
access_manager=AccessManager(
Expand Down
11 changes: 7 additions & 4 deletions src/core/tasks/handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
def __init__(
self,
adb_client: AsyncDatabaseClient,
discord_poster: DiscordPoster
discord_poster: DiscordPoster | None
):
self.adb_client = adb_client
self.discord_poster = discord_poster
Expand All @@ -24,7 +24,10 @@
self.logger.setLevel(logging.INFO)


async def post_to_discord(self, message: str):
async def post_to_discord(self, message: str) -> None:

Check warning on line 27 in src/core/tasks/handler.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] src/core/tasks/handler.py#L27 <102>

Missing docstring in public method
Raw output
./src/core/tasks/handler.py:27:1: D102 Missing docstring in public method

Check failure on line 27 in src/core/tasks/handler.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] src/core/tasks/handler.py#L27 <303>

too many blank lines (2)
Raw output
./src/core/tasks/handler.py:27:5: E303 too many blank lines (2)
if self.discord_poster is None:
print("Post to Discord disabled by POST_TO_DISCORD_FLAG")
return
self.discord_poster.post_to_discord(message=message)

async def initiate_task_in_db(self, task_type: TaskType) -> int: #
Expand All @@ -50,9 +53,9 @@
task_id=run_info.task_id,
error=run_info.message
)
msg: str = f"Task {run_info.task_id} ({run_info.task_type.value}) failed with error: {run_info.message}"
msg: str = f"Task {run_info.task_id} ({run_info.task_type.value}) failed with error: {run_info.message[:100]}..."
print(msg)
self.discord_poster.post_to_discord(
await self.post_to_discord(
message=msg
)

Expand Down
Empty file.
10 changes: 10 additions & 0 deletions tests/manual/external/discord/test_post.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
from discord_poster import DiscordPoster

Check warning on line 1 in tests/manual/external/discord/test_post.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] tests/manual/external/discord/test_post.py#L1 <100>

Missing docstring in public module
Raw output
./tests/manual/external/discord/test_post.py:1:1: D100 Missing docstring in public module
from environs import Env

def test_post_to_discord():

Check warning on line 4 in tests/manual/external/discord/test_post.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] tests/manual/external/discord/test_post.py#L4 <103>

Missing docstring in public function
Raw output
./tests/manual/external/discord/test_post.py:4:1: D103 Missing docstring in public function
env = Env()
env.read_env()
dp = DiscordPoster(
webhook_url=env.str("PROD_DISCORD_WEBHOOK_URL")
)
dp.post_to_discord("Testing")

Check warning on line 10 in tests/manual/external/discord/test_post.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] tests/manual/external/discord/test_post.py#L10 <292>

no newline at end of file
Raw output
./tests/manual/external/discord/test_post.py:10:34: W292 no newline at end of file