Skip to content

Fix Arabic / complex-script text input (wide WNDCLASS + BMP-aware filters)#2574

Open
wahidkurdo wants to merge 3 commits intoTheSuperHackers:mainfrom
wahidkurdo:fix/arabic-text-input
Open

Fix Arabic / complex-script text input (wide WNDCLASS + BMP-aware filters)#2574
wahidkurdo wants to merge 3 commits intoTheSuperHackers:mainfrom
wahidkurdo:fix/arabic-text-input

Conversation

@wahidkurdo
Copy link
Copy Markdown

Problem

Typing Arabic (or any non-Latin script) into in-game text fields produced either nothing or Latin-1 gibberish, depending on the system ANSI codepage.

Two independent bugs were responsible:

  1. GameWindowManager::winIsAscii and winIsAlNum used iswascii / iswalnum, which reject everything >= 0x80. Every non-ASCII character got filtered out before reaching the text-entry buffer.

  2. The main window was registered and created with the ANSI variants (WNDCLASS, RegisterClass, CreateWindow, DefWindowProc). Windows then transcodes WM_CHAR through the system ANSI codepage, so on a machine whose ANSI CP is not Windows-1256, Arabic characters are mangled before the game ever sees them.

Fix

  • Accept printable BMP code points in winIsAscii / winIsAlNum.
  • Use the wide (Unicode) window class pipeline in both Generals and GeneralsMD main loops so WM_CHAR is delivered as UTF-16.

Scope

Pure client-side fix, no asset or protocol changes. Affects both Generals and Zero Hour. No behavioural change for users who only type ASCII.

Testing

Built both targets locally. Verified that Arabic characters typed into lobby/chat widgets are now accepted and stored correctly.

Rendering of complex-script shaping (RTL / ligatures) is a separate concern and not addressed here — this PR is only about getting the code points into the buffer in the first place.

The text-entry widgets fed every WM_CHAR through GameWindowManager::winIsAscii / winIsAlNum, which called iswascii / iswalnum and rejected everything above U+007F. Additionally the main window was created with the ANSI WNDCLASS / CreateWindow / DefWindowProc triplet, which transcodes WM_CHAR through the system ANSI codepage and corrupts non-Latin-1 input. This patch accepts printable BMP code points in winIsAscii / winIsAlNum and switches both Generals and GeneralsMD main windows to the wide (Unicode) class pipeline so WM_CHAR is delivered as UTF-16.
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 10, 2026

Greptile Summary

This PR fixes Arabic/complex-script text input by switching both game targets to the W (Unicode) Win32 window-class pipeline (WNDCLASSW / RegisterClassW / CreateWindowW / DefWindowProcW) so WM_CHAR delivers true UTF-16 code units instead of ANSI-codepage bytes, and by widening the winIsDigit / winIsAscii / winIsAlNum character-filter functions to pass non-ASCII BMP code points through GadgetTextEntry.

Confidence Score: 5/5

Safe to merge; the Unicode window-class fix is correct and the only remaining concern is a minor semantic widening in winIsAlNum.

Both root causes described in the PR are correctly addressed: the ANSI→Unicode window-class pipeline change is a well-understood Win32 pattern and the character-filter widening correctly unblocks Arabic/CJK input. The single P2 finding (winIsAlNum accepting non-alphanumeric BMP symbols ≥ 0xA0) mirrors the already-flagged winIsAscii/ASCIIONLY concern; it is worth investigating but does not block merge.

Core/GameEngine/Source/GameClient/GUI/GameWindowGlobal.cpp — the winIsAlNum fallback at 0xA0 deserves a follow-up to confirm alphaNumericalOnly-gated widgets are safe.

Important Files Changed

Filename Overview
Core/GameEngine/Source/GameClient/GUI/GameWindowGlobal.cpp Widens winIsDigit/winIsAscii/winIsAlNum to accept non-ASCII BMP code points; winIsAlNum fallback at 0xA0 admits non-alphanumeric symbols including U+00A0 NON-BREAK SPACE.
Generals/Code/Main/WinMain.cpp Switches window registration and creation to WNDCLASSW/RegisterClassW/CreateWindowW and replaces DefWindowProc with DefWindowProcW so WM_CHAR delivers UTF-16 code units; also includes whitespace/indentation cleanup.
GeneralsMD/Code/Main/WinMain.cpp Same Unicode window-class pipeline change as Generals/Code/Main/WinMain.cpp applied to the Zero Hour main loop; formatting aligned with the Generals counterpart.

Sequence Diagram

sequenceDiagram
    participant KB as Keyboard / IME
    participant OS as Windows Message Loop
    participant WP as WndProc
    participant DF as DefWindowProcW
    participant FI as winIsAscii / winIsAlNum
    participant TE as GadgetTextEntry buffer

    Note over KB,TE: Before patch (ANSI window class)
    KB->>OS: Arabic keypress
    OS->>WP: WM_CHAR (ANSI codepage transcoded — garbled)
    WP->>FI: c = garbled byte
    FI-->>WP: 0 (filtered out)
    WP->>DF: DefWindowProcA (ANSI)

    Note over KB,TE: After patch (Unicode window class)
    KB->>OS: Arabic keypress
    OS->>WP: WM_CHAR (UTF-16 code point, e.g. 0x0645)
    WP->>FI: c = 0x0645
    FI-->>WP: 1 (c >= 0xA0, passes)
    WP->>TE: Arabic character stored
    WP->>DF: DefWindowProcW (Unicode)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: Core/GameEngine/Source/GameClient/GUI/GameWindowGlobal.cpp
Line: 267-268

Comment:
**`winIsAlNum` accepts non-alphanumeric BMP code points**

The `if (c >= 0xA0) return 1` guard accepts every non-control BMP code point above DEL, including non-alphanumeric ones: U+00A0 (NON-BREAK SPACE), U+00A2 (¢ CENT SIGN), U+00B7 (· MIDDLE DOT), currency/math symbols, etc. An `alphaNumericalOnly`-flagged widget would now silently admit these symbols and an invisible non-breaking space. If any such widget feeds a system that treats its output as truly alphanumeric (username uniqueness keys, server-side validation, etc.), these code points will slip through undetected.

Consider narrowing the fallback to Unicode letter/digit general categories (e.g. via `iswctype`) or at minimum excluding known non-alphanumeric ranges in the Latin-1 Supplement block (0xA0–0xBF), which are entirely punctuation and symbols.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (3): Last reviewed commit: "Extend winIsDigit to accept script-speci..." | Re-trigger Greptile

Addresses review feedback: the project style rule requires if/else/for/while bodies on their own line for debugger breakpoint placement.
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 10, 2026

Tip:

Greploop — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.

Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.

@wahidkurdo
Copy link
Copy Markdown
Author

The aSCIIOnly flag is really just an input filter on the widget side its not doing anything load-bearing for security or the wire format. Strings in the engine are UnicodeString (UTF-16) all the way through, and the places that actually need ASCII (filesystem paths, the old GameSpy/WOL lobby name marshalling, map/script IDs) do their own ASCII check further down the pipeline. So widening winIsAscii doesnt punch through any of those guards, it just stops the widget from eating Arabic/CJK keystrokes.

Youre right though that the flag name is misleading after this change. If youd rather keep aSCIIOnly strict, I can go two ways:

Rename the widened version to winIsPrintableBMP and put the old strict check back into winIsAscii. Then only the text entries that actually need complex-script input get switched over. Smaller patch, no .wnd changes needed.
Keep winIsAscii strict and add a new COMPLEXTEXT flag that opts a widget into the relaxed filter. More conservative but means touching every chat/name field in the .wnd files.
Let me know and I will amend the PR.

Addresses review feedback: winIsDigit stayed on iswdigit, which in the default C locale only accepts ASCII 0-9. For consistency with the widened winIsAscii / winIsAlNum filters, also accept Arabic-Indic (U+0660-U+0669), Extended Arabic-Indic (U+06F0-U+06F9), Devanagari (U+0966-U+096F) and Bengali (U+09E6-U+09EF) digit ranges so digit-only text entry widgets work for users typing native numerals.
#ifdef RTS_ENABLE_CRASHDUMP
#include "Common/MiniDumper.h"
#endif
#include "../OnlineServices_Init.h"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch appears to be optimized for the Generals Online repository fork.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes. AI tools change only the position of the pointer.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have this header file

// case WM_MOUSEHOVER: return "WM_MOUSEHOVER";
// case WM_MOUSELEAVE: return "WM_MOUSELEAVE";
// case WM_MOUSEHOVER: return "WM_MOUSEHOVER";
// case WM_MOUSELEAVE: return "WM_MOUSELEAVE";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of tab/space diffs

return 0;*/

return DefWindowProc( hWnd, message, wParam, lParam );
// Complex-text patch: use DefWindowProcW so WM_CHAR messages are
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to reduce the volume of the comment, and use the following format
// TheSuperHackers @BugFix Mauller 10/05/2025 Always handle this command to prevent halting the game when left Alt is pressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants