-
Notifications
You must be signed in to change notification settings - Fork 2
hwpx에서 청크1개만 생성되는 버그 수정. #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- hwpx 파싱할 때 일반 텍스트를 paragraph로 가져오고 있음. - toc 결과와 매칭할 때 paragraph를 비교 대상에 포함시킴
Summary of ChangesHello @inoray, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 이 Pull Request는 HWPX 문서 처리 과정에서 단일 청크만 생성되던 버그를 수정합니다. 이는 일반 텍스트가 파싱 과정에서 Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
hwpx 파일 파싱 시 PARAGRAPH 라벨을 포함하도록 수정하여 TOC 매칭이 정상적으로 동작하도록 한 변경 사항을 확인했습니다. 또한, 디버깅 편의성을 위해 로그 메시지에 함수명을 추가한 점도 좋은 개선입니다. 전반적으로 코드의 정확성과 유지보수성을 높이는 좋은 변경이라고 생각합니다. 한 가지 가독성 개선을 위한 제안을 리뷰 댓글로 남겼습니다.
docling/utils/document_enrichment.py
Outdated
| if (isinstance(item, TextItem) or isinstance(item, ListItem)) | ||
| and (item.label == DocItemLabel.TEXT or item.label == DocItemLabel.LIST_ITEM or item.label == DocItemLabel.PAGE_HEADER) | ||
| and (item.label in [DocItemLabel.TEXT, DocItemLabel.LIST_ITEM, DocItemLabel.PAGE_HEADER, DocItemLabel.PARAGRAPH]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
가독성 향상을 위해 isinstance 조건을 개선하는 것을 제안합니다. 여러 isinstance 호출을 튜플을 사용하여 하나로 합치면 코드가 더 간결해집니다.
| if (isinstance(item, TextItem) or isinstance(item, ListItem)) | |
| and (item.label == DocItemLabel.TEXT or item.label == DocItemLabel.LIST_ITEM or item.label == DocItemLabel.PAGE_HEADER) | |
| and (item.label in [DocItemLabel.TEXT, DocItemLabel.LIST_ITEM, DocItemLabel.PAGE_HEADER, DocItemLabel.PARAGRAPH]) | |
| if isinstance(item, (TextItem, ListItem)) | |
| and item.label in [DocItemLabel.TEXT, DocItemLabel.LIST_ITEM, DocItemLabel.PAGE_HEADER, DocItemLabel.PARAGRAPH] |
Checklist: