-
Notifications
You must be signed in to change notification settings - Fork 112
feat: async opportunity parsing #3443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@claude review |
|
Claude finished @capJavert's task in 1m 40s —— View job Code ReviewTasks
Review SummaryThis PR implements async opportunity parsing using a worker, similar to the preview flow. The implementation is well-structured and follows existing patterns in the codebase. I've identified several issues and suggestions below. Critical Issues1. Missing Error State Reset (src/schema/opportunity.ts:2772)When triggering async parsing, the code doesn't clear any existing Recommendation: Clear the flags: updateFlagsStatement<OpportunityJob>({
file: fileData,
parseError: null, // Clear any previous parse errors
}),2. Inconsistent Organization Association LogicIn Impact: Anonymous users who complete registration won't have their parsed opportunity automatically associated with their organization even if they had previous opportunities. Recommendation: Extract userId from 3. Race Condition in State Validation (src/workers/opportunity/parseOpportunity.ts:28-30)The worker returns early if state is not Recommendation: Add logging to track when workers skip processing: if (opportunity.state !== OpportunityState.PARSING) {
logger.info({ opportunityId, state: opportunity.state }, 'Skipping: opportunity not in PARSING state');
return;
}Major Issues4. File Cleanup on Early ReturnsThe worker downloads the file from GCS (line 51-54) but only deletes it on successful completion (line 68). If the worker returns early (e.g., missing opportunity, wrong state, missing file data), the file remains in GCS indefinitely. Recommendation: Use try-finally to ensure cleanup, or track which cases need cleanup: let shouldCleanupFile = false;
try {
const fileData = opportunity.flags?.file;
if (fileData) {
shouldCleanupFile = true;
// ... download and process
}
// ... success path
shouldCleanupFile = false; // Already cleaned up
} finally {
if (shouldCleanupFile && fileData) {
await storage.bucket(fileData.bucketName).file(fileData.blobName).delete().catch(err =>
logger.warn({ err, blobName: fileData.blobName }, 'Failed to cleanup file')
);
}
}5. Missing Recruiter Assignment in Async FlowWhen Impact: Edge case where a user starts parsing anonymously, logs in before parsing completes, but doesn't get assigned as recruiter. 6. Potential Memory Issue with Large FilesThe worker loads the entire file into memory as a buffer (line 51-54) before passing to Brokkr. For large PDF files (e.g., 10MB+), this could cause memory pressure if multiple workers process large files concurrently. Recommendation: Consider adding file size validation or streaming if Brokkr supports it. Minor Issues / Suggestions7. Inconsistent MIME Type ValidationIn Recommendation: Add basic validation after download or rely on Brokkr's error handling. 8. Missing Performance MetricsThe worker tracks duration but only logs on success/failure. Consider adding intermediate timing for debugging slow operations:
9. Test Coverage GapThe test file
10. DocumentationThe AGENTS.md update mentions the async flow but could benefit from a sequence diagram showing:
Code QualityStrengths:
Areas for Improvement:
Security Considerations
Architecture AlignmentThe implementation correctly follows the codebase patterns:
VerdictOverall Assessment: Good implementation with solid architecture. The critical issues around error flag management and file cleanup should be addressed before merging. The worker pattern is correctly implemented and aligns well with existing code. Recommendation: Request changes for critical issues #1 and #4. Issues #2, #3, and #5 should be considered for follow-up improvements. |
|
🍹 The Update (preview) for dailydotdev/api/prod (at a72024c) was successful. Resource Changes Name Type Operation
~ vpc-native-update-tags-str-cron kubernetes:batch/v1:CronJob update
~ vpc-native-post-analytics-clickhouse-cron kubernetes:batch/v1:CronJob update
~ vpc-native-deployment kubernetes:apps/v1:Deployment update
- vpc-native-api-db-migration-f5819835 kubernetes:batch/v1:Job delete
~ vpc-native-post-analytics-history-day-clickhouse-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-trending-cron kubernetes:batch/v1:CronJob update
~ vpc-native-private-deployment kubernetes:apps/v1:Deployment update
- vpc-native-api-clickhouse-migration-f5819835 kubernetes:batch/v1:Job delete
+ api-sub-api.opportunity-parse gcp:pubsub/subscription:Subscription create
~ vpc-native-generic-referral-reminder-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-zombie-images-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-current-streak-cron kubernetes:batch/v1:CronJob update
+ vpc-native-api-db-migration-6e5d1b1d kubernetes:batch/v1:Job create
~ vpc-native-update-views-cron kubernetes:batch/v1:CronJob update
~ vpc-native-validate-active-users-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-gifted-plus-cron kubernetes:batch/v1:CronJob update
+ vpc-native-api-clickhouse-migration-6e5d1b1d kubernetes:batch/v1:Job create
~ vpc-native-personalized-digest-cron kubernetes:batch/v1:CronJob update
~ vpc-native-daily-digest-cron kubernetes:batch/v1:CronJob update
~ vpc-native-bg-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-clean-stale-user-transactions-cron kubernetes:batch/v1:CronJob update
~ vpc-native-ws-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-update-source-public-threshold-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-zombie-opportunities-cron kubernetes:batch/v1:CronJob update
~ vpc-native-check-analytics-report-cron kubernetes:batch/v1:CronJob update
~ vpc-native-hourly-notification-cron kubernetes:batch/v1:CronJob update
~ vpc-native-generate-search-invites-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-source-tag-view-cron kubernetes:batch/v1:CronJob update
~ vpc-native-temporal-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-user-profile-updated-sync-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-zombie-user-companies-cron kubernetes:batch/v1:CronJob update
~ vpc-native-personalized-digest-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-clean-zombie-users-cron kubernetes:batch/v1:CronJob update
~ vpc-native-sync-subscription-with-cio-cron kubernetes:batch/v1:CronJob update
~ vpc-native-calculate-top-readers-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-highlighted-views-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-tag-recommendations-cron kubernetes:batch/v1:CronJob update
|
Parse opportunity in the worker similar how we do preview. Mostly targeted for anon flow where we parse while user does registration so they instantly see their results.