Fix Arabic / complex-script text input (wide WNDCLASS + BMP-aware filters)#2574
Fix Arabic / complex-script text input (wide WNDCLASS + BMP-aware filters)#2574wahidkurdo wants to merge 3 commits intoTheSuperHackers:mainfrom
Conversation
The text-entry widgets fed every WM_CHAR through GameWindowManager::winIsAscii / winIsAlNum, which called iswascii / iswalnum and rejected everything above U+007F. Additionally the main window was created with the ANSI WNDCLASS / CreateWindow / DefWindowProc triplet, which transcodes WM_CHAR through the system ANSI codepage and corrupts non-Latin-1 input. This patch accepts printable BMP code points in winIsAscii / winIsAlNum and switches both Generals and GeneralsMD main windows to the wide (Unicode) class pipeline so WM_CHAR is delivered as UTF-16.
|
| Filename | Overview |
|---|---|
| Core/GameEngine/Source/GameClient/GUI/GameWindowGlobal.cpp | Widens winIsDigit/winIsAscii/winIsAlNum to accept non-ASCII BMP code points; winIsAlNum fallback at 0xA0 admits non-alphanumeric symbols including U+00A0 NON-BREAK SPACE. |
| Generals/Code/Main/WinMain.cpp | Switches window registration and creation to WNDCLASSW/RegisterClassW/CreateWindowW and replaces DefWindowProc with DefWindowProcW so WM_CHAR delivers UTF-16 code units; also includes whitespace/indentation cleanup. |
| GeneralsMD/Code/Main/WinMain.cpp | Same Unicode window-class pipeline change as Generals/Code/Main/WinMain.cpp applied to the Zero Hour main loop; formatting aligned with the Generals counterpart. |
Sequence Diagram
sequenceDiagram
participant KB as Keyboard / IME
participant OS as Windows Message Loop
participant WP as WndProc
participant DF as DefWindowProcW
participant FI as winIsAscii / winIsAlNum
participant TE as GadgetTextEntry buffer
Note over KB,TE: Before patch (ANSI window class)
KB->>OS: Arabic keypress
OS->>WP: WM_CHAR (ANSI codepage transcoded — garbled)
WP->>FI: c = garbled byte
FI-->>WP: 0 (filtered out)
WP->>DF: DefWindowProcA (ANSI)
Note over KB,TE: After patch (Unicode window class)
KB->>OS: Arabic keypress
OS->>WP: WM_CHAR (UTF-16 code point, e.g. 0x0645)
WP->>FI: c = 0x0645
FI-->>WP: 1 (c >= 0xA0, passes)
WP->>TE: Arabic character stored
WP->>DF: DefWindowProcW (Unicode)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: Core/GameEngine/Source/GameClient/GUI/GameWindowGlobal.cpp
Line: 267-268
Comment:
**`winIsAlNum` accepts non-alphanumeric BMP code points**
The `if (c >= 0xA0) return 1` guard accepts every non-control BMP code point above DEL, including non-alphanumeric ones: U+00A0 (NON-BREAK SPACE), U+00A2 (¢ CENT SIGN), U+00B7 (· MIDDLE DOT), currency/math symbols, etc. An `alphaNumericalOnly`-flagged widget would now silently admit these symbols and an invisible non-breaking space. If any such widget feeds a system that treats its output as truly alphanumeric (username uniqueness keys, server-side validation, etc.), these code points will slip through undetected.
Consider narrowing the fallback to Unicode letter/digit general categories (e.g. via `iswctype`) or at minimum excluding known non-alphanumeric ranges in the Latin-1 Supplement block (0xA0–0xBF), which are entirely punctuation and symbols.
How can I resolve this? If you propose a fix, please make it concise.Reviews (3): Last reviewed commit: "Extend winIsDigit to accept script-speci..." | Re-trigger Greptile
Addresses review feedback: the project style rule requires if/else/for/while bodies on their own line for debugger breakpoint placement.
|
Tip: Greploop — Automatically fix all review issues by running Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal. |
|
The aSCIIOnly flag is really just an input filter on the widget side its not doing anything load-bearing for security or the wire format. Strings in the engine are UnicodeString (UTF-16) all the way through, and the places that actually need ASCII (filesystem paths, the old GameSpy/WOL lobby name marshalling, map/script IDs) do their own ASCII check further down the pipeline. So widening winIsAscii doesnt punch through any of those guards, it just stops the widget from eating Arabic/CJK keystrokes. Youre right though that the flag name is misleading after this change. If youd rather keep aSCIIOnly strict, I can go two ways: Rename the widened version to winIsPrintableBMP and put the old strict check back into winIsAscii. Then only the text entries that actually need complex-script input get switched over. Smaller patch, no .wnd changes needed. |
Addresses review feedback: winIsDigit stayed on iswdigit, which in the default C locale only accepts ASCII 0-9. For consistency with the widened winIsAscii / winIsAlNum filters, also accept Arabic-Indic (U+0660-U+0669), Extended Arabic-Indic (U+06F0-U+06F9), Devanagari (U+0966-U+096F) and Bengali (U+09E6-U+09EF) digit ranges so digit-only text entry widgets work for users typing native numerals.
| #ifdef RTS_ENABLE_CRASHDUMP | ||
| #include "Common/MiniDumper.h" | ||
| #endif | ||
| #include "../OnlineServices_Init.h" |
There was a problem hiding this comment.
This patch appears to be optimized for the Generals Online repository fork.
There was a problem hiding this comment.
No changes. AI tools change only the position of the pointer.
| // case WM_MOUSEHOVER: return "WM_MOUSEHOVER"; | ||
| // case WM_MOUSELEAVE: return "WM_MOUSELEAVE"; | ||
| // case WM_MOUSEHOVER: return "WM_MOUSEHOVER"; | ||
| // case WM_MOUSELEAVE: return "WM_MOUSELEAVE"; |
There was a problem hiding this comment.
There are a lot of tab/space diffs
| return 0;*/ | ||
|
|
||
| return DefWindowProc( hWnd, message, wParam, lParam ); | ||
| // Complex-text patch: use DefWindowProcW so WM_CHAR messages are |
There was a problem hiding this comment.
Maybe it's better to reduce the volume of the comment, and use the following format
// TheSuperHackers @BugFix Mauller 10/05/2025 Always handle this command to prevent halting the game when left Alt is pressed.
Problem
Typing Arabic (or any non-Latin script) into in-game text fields produced either nothing or Latin-1 gibberish, depending on the system ANSI codepage.
Two independent bugs were responsible:
GameWindowManager::winIsAsciiandwinIsAlNumusediswascii/iswalnum, which reject everything >= 0x80. Every non-ASCII character got filtered out before reaching the text-entry buffer.The main window was registered and created with the ANSI variants (
WNDCLASS,RegisterClass,CreateWindow,DefWindowProc). Windows then transcodesWM_CHARthrough the system ANSI codepage, so on a machine whose ANSI CP is not Windows-1256, Arabic characters are mangled before the game ever sees them.Fix
winIsAscii/winIsAlNum.GeneralsandGeneralsMDmain loops soWM_CHARis delivered as UTF-16.Scope
Pure client-side fix, no asset or protocol changes. Affects both Generals and Zero Hour. No behavioural change for users who only type ASCII.
Testing
Built both targets locally. Verified that Arabic characters typed into lobby/chat widgets are now accepted and stored correctly.
Rendering of complex-script shaping (RTL / ligatures) is a separate concern and not addressed here — this PR is only about getting the code points into the buffer in the first place.