[MS] Extend table support for wide tables#1552
Conversation
…d add comprehensive test cases
…itdown into u/vilesyk/wide_tables
There was a problem hiding this comment.
Thanks for the PR @lesyk could you please clarify:
- How were the adaptive constants (0.70 percentile, [25,50] clamp, 10 cols/inch threshold) chosen? Were other values tested?
- Were the existing PDF tests run before and after this change to confirm no regressions?
- Why was the version number bumped?
Can you please also update your description to include commands to run to test your changes and also indicate that you have manually verified all changes, especially if any AI was used to write the code.
We have internal testing datasets which has variety of different files After new dataset was added we found that old process of parsing did not work out, thus, making these changes.
I see no regressions on our internal datasets, nor tests I have added previously.
I think I misunderstood versioning for beta channels. I will change to
I am following repos setup: |
This pull request enhances the handling and extraction of complex tables from PDF files in the
markitdownpackage. It increases the flexibility of the PDF table extraction logic to support documents with a larger number of columns, updates the package version, and adds comprehensive tests for new PDF scenarios. Additionally, it improves repository configuration for handling binary files.