5 데이터 카탈로그 description 개선방향#29
Merged
Conversation
…th lineage - Generates table metadata including table name, description, and column info - Adds downstream and upstream lineage with degree filtering and optional sorting - Includes column-level upstream lineage for fine-grained analysis
Closed
9 tasks
Collaborator
|
💬 와! 막연히 "이 데이터 잘 활용하면 좋겠다"라고 생각만 했었는데, 정말 발전 가능성이 무궁무진하겠네요. 잘 정리해주셔서 감사합니다! 오후에 테스트해보고 추가로 리뷰 드리겠습니다! |
ehddnr301
approved these changes
Mar 23, 2025
Contributor
Author
|
👍 말씀해주신 부분 저도 조금 신경 쓰였었는데, downstream과 upstream의 degree 기준이 다르게 적용되는 게 의아하더라구요. Downstream은 자기자신부터 degree를 세는 반면, upstream은 곧바로 연결된 테이블부터 degree를 세는 것 같아요. 💬 좋은 의견 주셔서 감사합니다!! 말씀해주신 대로 lineage 정보를 agent가 적재적소에 활용할 수 있다면 더욱 좋을 것 같아요! |
ParkGyeongTae
approved these changes
Apr 7, 2025
Contributor
ParkGyeongTae
left a comment
There was a problem hiding this comment.
👍 오오.. 좀 더 풍요로워지는 것 같아요! 계보를 사용할 수 있는 환경이라면 사용성이 좋을 것 같습니다!!
Contributor
Author
작업 진행하였습니다! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

#️⃣ Issue Number
📝 요약(Summary)
datahub_source.py에 DataHub lineage 정보를 추출하는 함수들을 추가했습니다.get_table_lineage: URN의 DOWNSTREAM/UPSTREAM lineage를 가져오는 함수get_column_lineage: URN의 UPSTREAM table의 column별 lineage를 가져오는 함수min_degree_lineage: 수많은 lineage 중 최소 degree만 가져오는 함수build_table_metadata: table name, desc, columns, lineage를 활용해 table metadata를 만드는 함수tools.py에 전체 테이블 메타데이터를 생성하는get_metadata_from_db()함수 추가했습니다.💬 To Reviewers
📂 Metadata Structure
get_metadata_from_db()의 반환 구조는 아래와 같습니다:
[ { "table_name": str, "description": str, "columns": [ { "column_name": str, "column_description": str }, ... ], "lineage": { "downstream": [{"table": str, "degree": int}, ...], "upstream": [{"table": str, "degree": int}, ...], "upstream_columns": [ { "upstream_dataset": str, "columns": [ {"upstream_column": str, "downstream_column": str, "confidence": float} ] }, ... ] } }, ... ]📖 Metadata Terminology
아래는 메타데이터 구조에서 사용된 주요 키 값과 의미입니다:
lineagedownstreamupstreamdegreeupstream_columnsupstream_datasetupstream_columndownstream_columnconfidencePR Checklist
datahub_source.py)tools.py)reference) How to Code Review