feat: Uptime Monitoring Alarm Integration (#268)#351
feat: Uptime Monitoring Alarm Integration (#268)#351mendarb wants to merge 5 commits intodatabuddy-analytics:mainfrom
Conversation
|
@mendarb is attempting to deploy a commit to the Databuddy OSS Team on Vercel. A member of the Team first needs to authorize it. |
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR integrates the uptime monitoring system with the existing alarms framework (#267), adding automatic "Site Down" / "Site Recovered" notifications after configurable consecutive failure thresholds, a The overall architecture is sound — non-blocking alarm checks, per-channel notification dispatch, in-memory deduplication state — but there are several correctness issues that need to be addressed before this ships:
Confidence Score: 1/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant QStash
participant UptimeService as Uptime Service (index.ts)
participant AlarmTrigger as alarm-trigger.ts
participant DB as Database
participant Notifier as @databuddy/notifications
QStash->>UptimeService: POST / (upstash-signature, x-schedule-id)
UptimeService->>UptimeService: verify signature
UptimeService->>DB: lookupSchedule(scheduleId)
DB-->>UptimeService: schedule data
UptimeService->>UptimeService: checkUptime(url)
UptimeService->>UptimeService: sendUptimeEvent(result)
UptimeService-->>QStash: 200 OK (non-blocking below)
UptimeService--)AlarmTrigger: checkAndTriggerAlarms(scheduleId, result) [fire & forget]
AlarmTrigger->>DB: query uptimeSchedules WHERE id = scheduleId
DB-->>AlarmTrigger: schedule (orgId, websiteId)
AlarmTrigger->>DB: query alarms WHERE org + triggerType=uptime [+ websiteId?]
DB-->>AlarmTrigger: matching alarms
AlarmTrigger->>AlarmTrigger: check monitorState Map (consecutiveFailures, lastNotifiedStatus)
alt threshold reached AND not already DOWN
AlarmTrigger->>Notifier: sendSlackWebhook / sendDiscordWebhook / sendWebhook
Notifier-->>AlarmTrigger: ok
AlarmTrigger->>AlarmTrigger: set lastNotifiedStatus = DOWN
else site recovered (isDown=false, wasDown=true)
AlarmTrigger->>Notifier: sendSlackWebhook / sendDiscordWebhook / sendWebhook
Notifier-->>AlarmTrigger: ok
AlarmTrigger->>AlarmTrigger: set lastNotifiedStatus = UP
end
Last reviewed commit: "feat: integrate upti..." |
apps/uptime/src/lib/alarm-trigger.ts
Outdated
| const conditions = [ | ||
| eq(alarms.enabled, true), | ||
| eq(alarms.triggerType, "uptime"), | ||
| eq(alarms.organizationId, organizationId), | ||
| ]; | ||
|
|
||
| if (websiteId) { | ||
| conditions.push(eq(alarms.websiteId, websiteId)); | ||
| } | ||
|
|
||
| const matchingAlarms = await db.query.alarms.findMany({ | ||
| where: and(...conditions), | ||
| }); |
There was a problem hiding this comment.
Alarm filter misses website-specific and org-level alarms
The query logic for fetching matching alarms has two symmetric bugs:
-
When the schedule has a
websiteId, the filtereq(alarms.websiteId, websiteId)is added — which correctly restricts to website-specific alarms, but excludes org-level alarms (wherealarms.websiteId IS NULL). An alarm created at the organization level without a specific website will never fire for any monitor. -
When the schedule has no
websiteId(just anorganizationId), no website filter is applied — so the query returns all org uptime alarms, including those tied to specific websites. An alarm linked to "website-A" will fire for a monitor that has no website association at all.
The intended semantics should be:
- A schedule with
websiteId: match alarms wherewebsiteId = schedule.websiteIdORwebsiteId IS NULL(org-level alarms) - A schedule without
websiteId: match only alarms wherewebsiteId IS NULL
// Correct approach using isNull from drizzle-orm
import { and, eq, isNull, or } from "@databuddy/db";
const websiteCondition = websiteId
? or(eq(alarms.websiteId, websiteId), isNull(alarms.websiteId))
: isNull(alarms.websiteId);
const conditions = [
eq(alarms.enabled, true),
eq(alarms.triggerType, "uptime"),
eq(alarms.organizationId, organizationId),
websiteCondition,
];| notificationChannels: string[]; | ||
| triggerType: string; | ||
| } | ||
|
|
||
| function MonitorAlarms({ | ||
| organizationId, | ||
| websiteId, | ||
| }: { |
There was a problem hiding this comment.
websiteId prop received but not used in query
The MonitorAlarms component accepts a websiteId prop but never passes it to the alarms.list query:
const { data: alarmsList } = useQuery({
...orpc.alarms.list.queryOptions({
input: {
organizationId,
triggerType: "uptime",
// websiteId is missing here
},
}),
});As a result, every monitor detail page shows all uptime alarms for the entire organization, not just those relevant to that specific monitor's website. This is confusing UX — a user visiting monitor A will see alarms configured for monitor B.
Pass websiteId through to the query (and to the invalidation key in onSuccess):
| notificationChannels: string[]; | |
| triggerType: string; | |
| } | |
| function MonitorAlarms({ | |
| organizationId, | |
| websiteId, | |
| }: { | |
| const { data: alarmsList } = useQuery({ | |
| ...orpc.alarms.list.queryOptions({ | |
| input: { | |
| organizationId, | |
| websiteId: websiteId ?? undefined, | |
| triggerType: "uptime", | |
| }, | |
| }), | |
| }); |
| for (const channel of channels) { | ||
| try { | ||
| if (channel === "slack" && alarm.slackWebhookUrl) { | ||
| await sendSlackWebhook(alarm.slackWebhookUrl, payload); | ||
| } else if (channel === "discord" && alarm.discordWebhookUrl) { | ||
| await sendDiscordWebhook(alarm.discordWebhookUrl, payload); | ||
| } else if (channel === "webhook" && alarm.webhookUrl) { | ||
| await sendWebhook(alarm.webhookUrl, payload, { | ||
| headers: | ||
| (alarm.webhookHeaders as Record<string, string>) ?? undefined, | ||
| }); | ||
| } | ||
| } catch (error) { | ||
| captureError(error, { | ||
| type: "alarm_notification_error", | ||
| alarmId: alarm.id, | ||
| channel, | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
Email channel silently ignored — no notification sent
sendAlarmNotifications iterates over alarm.notificationChannels but only handles slack, discord, and webhook. When a user selects the email channel (which is a fully supported option in the UI and schema), the channel loop falls through without sending anything and without capturing an error or warning.
This creates a silent failure: users who configure email-only alarms will never receive notifications. The test endpoint in alarms.ts explicitly returns an error for email ("Email notifications are not yet configured"), so at least the test path is honest about this — but the live alarm path is not.
Consider adding an explicit captureError or log for unhandled channels so failures surface in observability:
} else {
captureError(new Error(`Unsupported notification channel: ${channel}`), {
type: "alarm_notification_error",
alarmId: alarm.id,
channel,
});
}
packages/rpc/src/routers/alarms.ts
Outdated
| .object({ | ||
| organizationId: z.string().optional(), | ||
| websiteId: z.string().optional(), | ||
| triggerType: z.string().optional(), |
There was a problem hiding this comment.
triggerType filter accepts any string without enum validation
The list input schema accepts triggerType: z.string().optional(), yet the value is cast directly to the DB enum type:
eq(
alarms.triggerType,
input.triggerType as | "uptime" | "traffic_spike" | ...
)A caller passing an arbitrary string (e.g., "uptime; DROP TABLE alarms") would reach the DB layer — Drizzle will safely parameterize it, so there's no SQL injection, but the semantic intent is broken and the type cast hides that the schema is insufficiently constrained.
Use the existing triggerTypeSchema for this field:
| triggerType: z.string().optional(), | |
| triggerType: triggerTypeSchema.optional(), |
| import { describe, expect, it } from "bun:test"; | ||
| import { MonitorStatus } from "../types"; | ||
|
|
||
| describe("alarm-trigger", () => { | ||
| describe("getConsecutiveFailureThreshold", () => { | ||
| // We test the threshold logic inline since the function is not exported | ||
| // but we verify the behavior through the module's contract | ||
|
|
||
| it("should default to 3 consecutive failures when no trigger conditions", () => { | ||
| const conditions = null; |
There was a problem hiding this comment.
Tests validate reimplemented logic, not the actual implementation
The tests in this file duplicate the internal logic of alarm-trigger.ts rather than importing and exercising the real module. For example, getThreshold in the test file is a copy of the private getConsecutiveFailureThreshold, and the "notification deduplication" tests manually increment a counter variable.
This gives false confidence — if the production function had a bug (e.g., using >= instead of === for the threshold check), all these tests would still pass. The key path (checkAndTriggerAlarms) is not tested at all.
Consider exporting getConsecutiveFailureThreshold or testing checkAndTriggerAlarms via dependency injection / mocking the DB and notification functions, so the tests actually cover the real code paths.
| const [name, setName] = useState(alarm?.name ?? ""); | ||
| const [description, setDescription] = useState(alarm?.description ?? ""); | ||
| const [enabled, setEnabled] = useState(alarm?.enabled ?? true); | ||
| const [triggerType, setTriggerType] = useState<TriggerType>( | ||
| (alarm?.triggerType as TriggerType) ?? "uptime" | ||
| ); | ||
| const [channels, setChannels] = useState<Channel[]>( | ||
| (alarm?.notificationChannels as Channel[]) ?? [] | ||
| ); | ||
| const [slackUrl, setSlackUrl] = useState(alarm?.slackWebhookUrl ?? ""); | ||
| const [discordUrl, setDiscordUrl] = useState(alarm?.discordWebhookUrl ?? ""); | ||
| const [emailAddresses, setEmailAddresses] = useState( | ||
| alarm?.emailAddresses?.join(", ") ?? "" | ||
| ); | ||
| const [webhookUrl, setWebhookUrl] = useState(alarm?.webhookUrl ?? ""); |
There was a problem hiding this comment.
Form state not reset when
alarm prop changes
All form fields are initialized via useState(alarm?.field ?? default). In React, useState only uses the initial value on the first render — subsequent changes to the alarm prop do not reset the form state. If this dialog is reused across different alarms (open alarm A → close → open alarm B), the user will see stale data from alarm A in the form.
The standard fix is to pass a key prop at the call site so React fully remounts the dialog for each alarm instance. This is simpler and more reliable than a useEffect that manually resets every field.
- Add @databuddy/notifications to uptime package.json (was imported but not declared as a dependency, causing test and build failures) - Rewrite alarm-trigger tests to import and exercise the real exported getConsecutiveFailureThreshold function with broader input coverage - Remove misleading inline logic tests that duplicated implementation details instead of testing actual code paths - Document integration test needs for checkAndTriggerAlarms Addresses Greptile review feedback on PR databuddy-analytics#351. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Addressed the Greptile review feedback: AlarmDialog stale state — Added The other issues flagged by Greptile were already fixed in previous commits:
|
…board UI Implements the complete alarms system (databuddy-analytics#267): - Database: alarms table with trigger types, notification channels, and proper indexes - API: CRUD endpoints (list, get, create, update, delete, test) via ORPC - Dashboard: notifications settings page with alarm management UI - Tests: validation schema tests for create/update operations - Integrates with @databuddy/notifications package for Slack, Discord, and webhook delivery Closes databuddy-analytics#267 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add alarm trigger system that sends notifications when monitors go down or recover. Includes consecutive failure tracking, deduplication to prevent notification spam, and support for Slack, Discord, and webhook channels. - Add alarm-trigger helper with checkAndTriggerAlarms() - Integrate trigger into uptime service (non-blocking) - Add MonitorAlarms component to monitor detail page - Enhance alarms list endpoint with websiteId/triggerType filters - Add tests for threshold logic and notification deduplication Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix critical alarm query bug: use OR(websiteId match, websiteId IS NULL) for schedules with a websiteId, and IS NULL filter for schedules without, so org-level alarms fire correctly and cross-monitor false alarms are prevented - Pass websiteId to alarms.list query in MonitorAlarms component so each monitor detail page only shows relevant alarms - Log unsupported notification channels (e.g. email) via captureError instead of silently ignoring them - Use triggerTypeSchema enum instead of z.string() for the list endpoint triggerType filter, removing the unsafe type cast - Export getConsecutiveFailureThreshold and import it in tests instead of duplicating the logic in a test helper - Add key prop to AlarmDialog to force remount when switching between alarms, preventing stale form state Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add @databuddy/notifications to uptime package.json (was imported but not declared as a dependency, causing test and build failures) - Rewrite alarm-trigger tests to import and exercise the real exported getConsecutiveFailureThreshold function with broader input coverage - Remove misleading inline logic tests that duplicated implementation details instead of testing actual code paths - Document integration test needs for checkAndTriggerAlarms Addresses Greptile review feedback on PR databuddy-analytics#351. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents stale data from being displayed when editing different alarms in the same session. The useState initializers only run once on mount, so a useEffect is needed to sync form fields with the alarm prop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3ad9506 to
5cf500d
Compare
Summary
Implements uptime monitoring alarm integration as described in #268. This builds on the alarms system from #267 to automatically trigger notifications when uptime monitors detect outages or recoveries.
Changes
apps/uptime/src/lib/alarm-trigger.ts): Core logic that checks uptime results against configured alarms. Tracks consecutive failures per monitor in memory to prevent duplicate notifications. Sends "Site Down" alerts when the failure threshold is reached and "Site Recovered" alerts when the site comes back online.apps/uptime/src/index.ts): CallscheckAndTriggerAlarms()after each uptime check completes. Runs non-blocking to avoid impacting uptime check response times.apps/dashboard/app/(main)/monitors/[id]/page.tsx): AddedMonitorAlarmscomponent showing uptime alarms with enable/disable toggles directly on the monitor detail page. Links to settings when no alarms are configured.packages/rpc/src/routers/alarms.ts): AddedwebsiteIdandtriggerTypefilter parameters to the list endpoint for targeted alarm queries.apps/uptime/src/lib/alarm-trigger.test.ts): Tests for threshold logic, consecutive failure tracking, notification deduplication, and recovery detection.Notification Channels
Supports all channels from the alarms system:
Key Design Decisions
.catch()to avoid impacting uptime check latency.triggerConditions.consecutiveFailures.Depends on #267 (Alarms System).
Closes #268
Test Plan
bun test apps/uptime/src/lib/alarm-trigger.test.ts🤖 Generated with Claude Code