Skip to content

Commit cceee49

Browse files
Merge pull request #7 from CoreFiling/feature/fix-tab-characters
Map tab characters to spaces rather than private use area
2 parents f87d55e + e6a7a92 commit cceee49

File tree

5 files changed

+31
-3
lines changed

5 files changed

+31
-3
lines changed

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
* Add optional overrides for command-line arguments passed to `pdf2htmlEX`.
88
* Patch and build `pdf2htmlEX` as part of this build process to use `libopenjp` instead of `libjpeg` for JPEG-2000 support.
99
* All patches are in this source tree, and are applied to directly to the source of the upstream tag during build.
10-
* Patch issue with non-breaking spaces in `pdf2HTMLEX`.
10+
* Patch issue with non-breaking spaces and tab characters in `pdf2HTMLEX`.
1111
* Convert complex SVGs images to bitmaps.
1212

1313
## 0.1.0

src/Pdf2Html/Dockerfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ RUN patch ./buildScripts/versionEnvs ./patches/versionEnvs.patch
1919
RUN patch ./buildScripts/buildPoppler ./patches/buildPoppler.patch
2020
RUN patch ./buildScripts/getBuildToolsApt ./patches/getBuildToolsApt.patch
2121
RUN patch ./buildScripts/getDevLibrariesApt ./patches/getDevLibrariesApt.patch
22+
RUN patch ./pdf2htmlEX/src/util/unicode.cc ./patches/unicode.cc.patch
2223
RUN patch ./pdf2htmlEX/src/util/unicode.h ./patches/unicode.h.patch
2324
RUN patch ./pdf2htmlEX/CMakeLists.txt ./patches/CMakeLists.patch
2425

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
@@ -47,6 +47,8 @@ Unicode unicode_from_font (CharCode code, GfxFont * font)
2+
if(cname)
3+
{
4+
Unicode ou = globalParams->mapNameToUnicodeText(cname);
5+
+ if(ou == '\t')
6+
+ return ' ';
7+
if(!is_illegal_unicode(ou))
8+
return ou;
9+
}
10+
@@ -62,6 +64,8 @@ Unicode check_unicode(Unicode const * u, int len, CharCode code, GfxFont * font)
11+
12+
if(len == 1)
13+
{
14+
+ if(*u == '\t')
15+
+ return ' ';
16+
if(!is_illegal_unicode(*u))
17+
return *u;
18+
}

src/Pdf2Html/pdf2htmlEX/patches/unicode.h.patch

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,12 @@
1+
@@ -27,7 +27,7 @@ namespace pdf2htmlEX {
2+
* 00(NUL)--09(\t)--0A(\n)--0D(\r)--20(SP)--7F(DEL)--9F(APC)--A0(NBSP)--AD(SHY)--061C(ALM)--1361(Ethiopic word space)
3+
* webkit: [--------------------------------) [------------------) [-]
4+
* moz: [--------------------------------) [---------] [-]
5+
- * p2h: [--------------------------------) [------------------] [-] [-] [-]
6+
+ * p2h: [--------------------------------) [------------------) [-] [-] [-]
7+
*
8+
* 200B(ZWSP)--200C(ZWNJ)--200D(ZWJ)--200E(LRM)--200F(RLM)--2028(LSEP)--2029(PSEP)--202A(LRE)--202E(RL0)--2066(LRI)--2069(PDI)
9+
* webkit: [-----------------------------------------------] [----------]
110
@@ -39,9 +39,6 @@ namespace pdf2htmlEX {
211
* moz:
312
* p2h: [------------------] [-] [-] [-----------------]
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
version https://git-lfs.github.com/spec/v1
2-
oid sha256:ff65d9e1cc4864dc0db647594c33c01333faa20e0e104379b42ae2b8e9694c0a
3-
size 1086803
2+
oid sha256:e020014ff0cab94ab78700278ed7b54852b944ccb366015b1a60ae944e0780d7
3+
size 1086801

0 commit comments

Comments
 (0)