Complete ETL Pipeline (Advanced Level)#12
Open
nihal16000 wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implemented live OpenAlex API integration with a polite pool mailto configuration.
Engineered cursor-based pagination to safely handle large document sets without memory overflow.
Added robust requests retry logic with exponential backoff to handle transient 429 and 500 network errors.
Mapped deeply nested, proprietary OpenAlex JSON payloads into the strict, flat Web of Science (WoS) schema required by native Bibliometrix functions.
Engineered a custom mathematical algorithm to reconstruct OpenAlex's inverted abstract indices back into readable strings for text mining.
Calculated the primary key Short Reference (SR) column dynamically.
Implemented a strict validation layer using pandera.
Enforced DataFrame type contracts to guarantee that downstream graphing functions do not crash due to malformed data types (e.g., verifying multi-value fields are true Python lists rather than strings).
Bug Fixes to Native Code:
Annual Scientific Production Crash: Identified a silent TypeError in the native graphing functions caused by string-based publication years. Patched this in the standardizer by forcefully casting the PY column via pd.to_numeric before loading it into the reactive state, successfully unblocking the graphical analysis tabs.
UI Integration:
Overhauled the app.py frontend to include a fully reactive "API" tab.
Bound the standardized DataFrame directly to the Shiny reactive state (df.set), allowing users to search, preview, and instantly generate analytical visuals without ever leaving the interface.
This pipeline operates cleanly within an isolated services/ directory and does not overwrite or destroy any existing native functions in lib/ or functions/.
GROUP MEMBERS
Name - Nihal Nawaz Kaleem Nawaz
Matricola - D03000283
Name - Hunain Raza
Matricola - D03000256
Name - Parth Kumar Rai
Matricola - D03000255