diff --git a/.agents/skills/jira/SKILL.md b/.agents/skills/jira/SKILL.md index 86c43978b4..7571a8b96e 100644 --- a/.agents/skills/jira/SKILL.md +++ b/.agents/skills/jira/SKILL.md @@ -12,6 +12,14 @@ update, reproduce, potentially fix and/or close them. Go on with this workflow to the end unless you are actually blocked or get to one of the points where the workflow tells you to wait for confirmation or ask something. +### General Notes + +Typical fields you need to know: +* "components" is typically one of "Python", "Mx", "Infra", "Compiler", "Truffle" +* "issuetype" is typically "Task", "Bug (non BugDB)", "Testing", "Build Failure" +* "project" is typically "GR" +* "labels" is typically left empty when creating new issues + ### 1. Getting context To get the issue data, start with `ol-cli`, for example: @@ -20,6 +28,8 @@ To get the issue data, start with `ol-cli`, for example: Read the description and follow any links that seem relevant. +Run this in a subagent if possible and let it give you a summary. + ### 2. Check if there is work to do Issues may be stale, already solved, or no longer apply. Search the context and @@ -27,6 +37,8 @@ logs for other potentially relevant keywords, use `ol-cli jira search` to find out if there are potentially other related issues, query the codebase and git history and look for reproducers. +Run this in a subagent if possible and let it give you a summary. + ### 3. Reproduce the issue It is PARAMOUNT to reproduce an issue first before changing code. You should @@ -63,6 +75,9 @@ DO NOT STOP POLLING AND RETRYING UNTIL EITHER YOU REPRODUCE THE ISSUE, MORE THAN 8 HOURS HAVE ELAPSED WHILE YOU TRIED, OR YOU HAVE USED AT LEAST AROUND 2 MILLION TOKENS (you may estimate from the conversation history) WHILE TRYING! +Make sure to decline the temporary reproducer PR once you are done with it +using `ol-cli bitbucket`. + ### 4a. Fixing a reproducible issue. Once you have a reproducer (even if it may mean running something in a loop for @@ -87,7 +102,13 @@ by approval of the human user), it needs to be prepared for inclusion. Transition the Jira issue to be "In Progress" using `ol-cli jira transition`. Make sure your changes are committed in reviewable, focused, incremental -commits. Create a bitbucket PR +commits. + +Run a subagent to REVIEW the code changes. Give it enough context to understand +why specific implementation decisions were made. Consider the subagent's +comments carefully, change the code where the subagent's comments make sense. + +Create a bitbucket PR 1. Push your branch. 2. Open a PR using ol-cli bitbucket with a title including the Jira issue ID, like "[GR-XXXXX] Short description of overall fix." @@ -110,7 +131,8 @@ You can do this in parallel while watching the Bitbucket PR from step 5. Add a comment using `ol-cli jira comment` to the Jira issue, summarizing your findings and any work you may have done. Do NOT use Attlassian markup, the comment just ONLY be PLAIN TEXT. For paragraphs, just use double '\n'. You can -make plaintext lists by making lines begin with '* '. +make plaintext lists by making lines begin with '* '. Do NOT use ADF, use raw +text, regardless of what the tool's help message says. Also decide yourself or confer with the human about whether this change needs to be backported, and what the "fix version" assignment for the Jira label diff --git a/CHANGELOG.md b/CHANGELOG.md index b1f2651b57..cb93d64ee4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,12 +3,10 @@ This changelog summarizes major changes between GraalVM versions of the Python language runtime. The main focus is on user-observable behavior of the engine. -## Version 25.2.0 +## Version 25.1.0 * Add support for [Truffle source options](https://www.graalvm.org/truffle/javadoc/com/oracle/truffle/api/source/Source.SourceBuilder.html#option(java.lang.String,java.lang.String)): * The `python.Optimize` option can be used to specify the optimization level, like the `-O` (level 1) and `-OO` (level 2) commandline options. * The `python.NewGlobals` option can be used to run a source with a fresh globals dictionary instead of the main module globals, which is useful for embeddings that want isolated top-level execution. - -## Version 25.1.0 * Intern string literals in source files * Allocation reporting via Truffle has been removed. Python object sizes were never reported correctly, so the data was misleading and there was a non-neglible overhead for object allocations even when reporting was inactive. * Better `readline` support via JLine. Autocompletion and history now works in `pdb` @@ -18,6 +16,7 @@ language runtime. The main focus is on user-observable behavior of the engine. * Add Github workflows that run our gates from the same job definitions as our internal CI. This will make it easier for contributors opening PRs on Github to ensure code contributions pass the same tests that we are running internally. * Added support for specifying generics on foreign classes, and inheriting from such classes. Especially when using Java classes that support generics, this allows expressing the generic types in Python type annotations as well. * Added a new `java` backend for the `pyexpat` module that uses a Java XML parser instead of the native `expat` library. It can be useful when running without native access or multiple-context scenarios. This backend is the default when embedding and can be switched back to native `expat` by setting `python.PyExpatModuleBackend` option to `native`. Standalone distribution still defaults to native expat backend. +* Add a new context option `python.UnicodeCharacterDatabaseNativeFallback` to control whether the ICU database may fall back to the native unicode character database from CPython for features and characters not supported by ICU. This requires native access to be enabled and is disabled by default for embeddings. ## Version 25.0.1 * Allow users to keep going on unsupported JDK/OS/ARCH combinations at their own risk by opting out of early failure using `-Dtruffle.UseFallbackRuntime=true`, `-Dpolyglot.engine.userResourceCache=/set/to/a/writeable/dir`, `-Dpolyglot.engine.allowUnsupportedPlatform=true`, and `-Dpolyglot.python.UnsupportedPlatformEmulates=[linux|macos|windows]` and `-Dorg.graalvm.python.resources.exclude=native.files`. diff --git a/graalpython/com.oracle.graal.python.shell/src/com/oracle/graal/python/shell/GraalPythonMain.java b/graalpython/com.oracle.graal.python.shell/src/com/oracle/graal/python/shell/GraalPythonMain.java index 2fa46d76af..9ff11e94e6 100644 --- a/graalpython/com.oracle.graal.python.shell/src/com/oracle/graal/python/shell/GraalPythonMain.java +++ b/graalpython/com.oracle.graal.python.shell/src/com/oracle/graal/python/shell/GraalPythonMain.java @@ -816,9 +816,8 @@ protected void launch(Builder contextBuilder) { contextBuilder.option("python.PosixModuleBackend", "java"); } - if (!hasContextOptionSetViaCommandLine("WarnExperimentalFeatures")) { - contextBuilder.option("python.WarnExperimentalFeatures", "false"); - } + setOptionIfNotSetViaCommandLine(contextBuilder, "WarnExperimentalFeatures", "false"); + setOptionIfNotSetViaCommandLine(contextBuilder, "UnicodeCharacterDatabaseNativeFallback", "true"); if (multiContext) { contextBuilder.engine(Engine.newBuilder().allowExperimentalOptions(true).options(enginePolyglotOptions).build()); @@ -1009,19 +1008,13 @@ private void findAndApplyVenvCfg(Builder contextBuilder, String executable) { } break; case "venvlauncher_command": - if (!hasContextOptionSetViaCommandLine("VenvlauncherCommand")) { - contextBuilder.option("python.VenvlauncherCommand", parts[1].trim()); - } + setOptionIfNotSetViaCommandLine(contextBuilder, "VenvlauncherCommand", parts[1].trim()); break; case "base-prefix": - if (!hasContextOptionSetViaCommandLine("SysBasePrefix")) { - contextBuilder.option("python.SysBasePrefix", parts[1].trim()); - } + setOptionIfNotSetViaCommandLine(contextBuilder, "SysBasePrefix", parts[1].trim()); break; case "base-executable": - if (!hasContextOptionSetViaCommandLine("BaseExecutable")) { - contextBuilder.option("python.BaseExecutable", parts[1].trim()); - } + setOptionIfNotSetViaCommandLine(contextBuilder, "BaseExecutable", parts[1].trim()); break; } } @@ -1052,6 +1045,12 @@ private String getContextOptionIfSetViaCommandLine(String key) { return null; } + private void setOptionIfNotSetViaCommandLine(Context.Builder builder, String key, String value) { + if (!hasContextOptionSetViaCommandLine(key)) { + builder.option("python." + key, value); + } + } + private boolean hasContextOptionSetViaCommandLine(String key) { if (System.getProperty("polyglot.python." + key) != null) { return System.getProperty("polyglot.python." + key) != null; diff --git a/graalpython/com.oracle.graal.python.test/src/tests/test_unicodedata.py b/graalpython/com.oracle.graal.python.test/src/tests/test_unicodedata.py index 4fecd2c658..0a0f3c9e7e 100644 --- a/graalpython/com.oracle.graal.python.test/src/tests/test_unicodedata.py +++ b/graalpython/com.oracle.graal.python.test/src/tests/test_unicodedata.py @@ -1,4 +1,4 @@ -# Copyright (c) 2018, 2025, Oracle and/or its affiliates. All rights reserved. +# Copyright (c) 2018, 2026, Oracle and/or its affiliates. All rights reserved. # DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. # # The Universal Permissive License (UPL), Version 1.0 @@ -75,6 +75,16 @@ def test_lookup(self): with self.assertRaisesRegex(KeyError, "name too long"): unicodedata.lookup("a" * 257) + def test_lookup_named_sequence(self): + if unicodedata.ucd_3_2_0.bidirectional == unicodedata.bidirectional: + raise unittest.SkipTest("Only supported with CPython's unicodedata.ucd_3_2_0") + + unicode_name = "LATIN SMALL LETTER R WITH TILDE" + self.assertEqual(unicodedata.lookup(unicode_name), "\u0072\u0303") + + with self.assertRaisesRegex(KeyError, "undefined character name 'LATIN SMALL LETTER R WITH TILDE'"): + unicodedata.ucd_3_2_0.lookup(unicode_name) + def test_east_asian_width(self): list = [1, 2, 3] @@ -101,4 +111,4 @@ def test_combining(self): empty_string = "" with self.assertRaisesRegex(TypeError, r"combining\(\) argument must be a unicode character, not str"): - unicodedata.combining(empty_string) \ No newline at end of file + unicodedata.combining(empty_string) diff --git a/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/UnicodeDataModuleBuiltins.java b/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/UnicodeDataModuleBuiltins.java index a5f1ef49d6..99b88b82fe 100644 --- a/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/UnicodeDataModuleBuiltins.java +++ b/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/UnicodeDataModuleBuiltins.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2018, 2025, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2018, 2026, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * The Universal Permissive License (UPL), Version 1.0 @@ -43,6 +43,8 @@ import static com.oracle.graal.python.nodes.BuiltinNames.J_UNICODEDATA; import static com.oracle.graal.python.nodes.BuiltinNames.T_UNICODEDATA; import static com.oracle.graal.python.nodes.BuiltinNames.T___GRAALPYTHON__; +import static com.oracle.graal.python.nodes.SpecialAttributeNames.T___MODULE__; +import static com.oracle.graal.python.nodes.SpecialAttributeNames.T___QUALNAME__; import static com.oracle.graal.python.runtime.exception.PythonErrorType.KeyError; import static com.oracle.graal.python.runtime.exception.PythonErrorType.ValueError; import static com.oracle.graal.python.util.PythonUtils.TS_ENCODING; @@ -56,21 +58,32 @@ import org.graalvm.shadowed.com.ibm.icu.text.Normalizer2; import org.graalvm.shadowed.com.ibm.icu.util.VersionInfo; +import com.oracle.graal.python.PythonLanguage; import com.oracle.graal.python.annotations.ArgumentClinic; import com.oracle.graal.python.annotations.Builtin; import com.oracle.graal.python.builtins.CoreFunctions; import com.oracle.graal.python.builtins.Python3Core; +import com.oracle.graal.python.builtins.PythonBuiltinClassType; import com.oracle.graal.python.builtins.PythonBuiltins; import com.oracle.graal.python.builtins.objects.PNone; import com.oracle.graal.python.builtins.objects.module.PythonModule; +import com.oracle.graal.python.builtins.objects.object.PythonObject; +import com.oracle.graal.python.builtins.objects.type.PythonAbstractClass; +import com.oracle.graal.python.builtins.objects.type.PythonClass; import com.oracle.graal.python.lib.PyObjectCallMethodObjArgs; +import com.oracle.graal.python.lib.PyObjectGetAttr; import com.oracle.graal.python.nodes.ErrorMessages; import com.oracle.graal.python.nodes.PRaiseNode; +import com.oracle.graal.python.nodes.call.CallNode; import com.oracle.graal.python.nodes.function.PythonBuiltinBaseNode; import com.oracle.graal.python.nodes.function.builtins.PythonBinaryClinicBuiltinNode; import com.oracle.graal.python.nodes.function.builtins.PythonUnaryClinicBuiltinNode; import com.oracle.graal.python.nodes.function.builtins.clinic.ArgumentClinicProvider; import com.oracle.graal.python.nodes.object.GetOrCreateDictNode; +import com.oracle.graal.python.nodes.object.BuiltinClassProfiles.IsBuiltinObjectProfile; +import com.oracle.graal.python.runtime.PythonOptions; +import com.oracle.graal.python.runtime.exception.PException; +import com.oracle.graal.python.nodes.statement.AbstractImportNode; import com.oracle.graal.python.runtime.object.PFactory; import com.oracle.truffle.api.CompilerDirectives.TruffleBoundary; import com.oracle.truffle.api.dsl.Bind; @@ -87,6 +100,11 @@ @CoreFunctions(defineModule = J_UNICODEDATA) public final class UnicodeDataModuleBuiltins extends PythonBuiltins { + private static final TruffleString T__CPYTHON_UNICODEDATA = toTruffleStringUncached("_cpython_unicodedata"); + private static final TruffleString T_LOOKUP = toTruffleStringUncached("lookup"); + private static final TruffleString T_UCD_3_2_0 = toTruffleStringUncached("ucd_3_2_0"); + private static final TruffleString T_UNIDATA_VERSION = toTruffleStringUncached("unidata_version"); + @Override protected List> getNodeFactories() { return UnicodeDataModuleBuiltinsFactory.getFactories(); @@ -120,12 +138,31 @@ private static String getUnicodeNameTB(int cp) { public void postInitialize(Python3Core core) { super.postInitialize(core); PythonModule self = core.lookupBuiltinModule(T_UNICODEDATA); - self.setAttribute(toTruffleStringUncached("unidata_version"), toTruffleStringUncached(getUnicodeVersion())); - PyObjectCallMethodObjArgs.executeUncached(core.lookupBuiltinModule(T___GRAALPYTHON__), toTruffleStringUncached("import_current_as_named_module_with_delegate"), - /* module_name= */ T_UNICODEDATA, - /* delegate_name= */ toTruffleStringUncached("_cpython_unicodedata"), - /* delegate_attributes= */ PFactory.createList(core.getLanguage(), new Object[]{toTruffleStringUncached("ucd_3_2_0")}), - /* owner_globals= */ GetOrCreateDictNode.executeUncached(self)); + self.setAttribute(T_UNIDATA_VERSION, toTruffleStringUncached(getUnicodeVersion())); + if (core.getLanguage().getEngineOption(PythonOptions.UnicodeCharacterDatabaseNativeFallback)) { + PyObjectCallMethodObjArgs.executeUncached(core.lookupBuiltinModule(T___GRAALPYTHON__), toTruffleStringUncached("import_current_as_named_module_with_delegate"), + /* module_name= */ T_UNICODEDATA, + /* delegate_name= */ T__CPYTHON_UNICODEDATA, + /* delegate_attributes= */ PFactory.createList(core.getLanguage(), new Object[]{T_UCD_3_2_0}), + /* owner_globals= */ GetOrCreateDictNode.executeUncached(self)); + } else { + self.setAttribute(T_UCD_3_2_0, createUCDCompatibilityObject(core, self)); + } + } + + private PythonObject createUCDCompatibilityObject(Python3Core core, PythonModule self) { + TruffleString t_ucd = toTruffleStringUncached("UCD"); + PythonClass clazz = PFactory.createPythonClassAndFixupSlots(null, core.getLanguage(), t_ucd, PythonBuiltinClassType.PythonObject, + new PythonAbstractClass[]{core.lookupType(PythonBuiltinClassType.PythonObject)}); + for (String s : new String[]{"normalize", "is_normalized", "lookup", "name", "bidirectional", "category", "combining", "east_asian_width", "decomposition", "digit", "decimal"}) { + TruffleString ts = toTruffleStringUncached(s); + clazz.setAttribute(ts, PFactory.createStaticmethodFromCallableObj(core.getLanguage(), self.getAttribute(ts))); + } + clazz.setAttribute(T___MODULE__, T_UNICODEDATA); + clazz.setAttribute(T___QUALNAME__, t_ucd); + PythonObject obj = PFactory.createPythonObject(clazz, clazz.getInstanceShape()); + obj.setAttribute(T_UNIDATA_VERSION, toTruffleStringUncached("3.2.0")); + return obj; } static final int NORMALIZER_FORM_COUNT = 4; @@ -214,23 +251,26 @@ abstract static class LookupNode extends PythonUnaryClinicBuiltinNode { @Specialization @TruffleBoundary static Object lookup(TruffleString name, + @Bind PythonLanguage lang, @Bind Node inliningTarget) { String nameString = ToJavaStringNode.getUncached().execute(name); if (nameString.length() > NAME_MAX_LENGTH) { throw PRaiseNode.raiseStatic(inliningTarget, KeyError, ErrorMessages.NAME_TOO_LONG); } - // TODO: support Unicode character named sequences (GR-68227) - // see test/test_ucn.py.UnicodeFunctionsTest.test_named_sequences_full String character = getCharacterByUnicodeName(nameString); if (character == null) { character = getCharacterByUnicodeNameAlias(nameString); } - if (character == null) { - throw PRaiseNode.raiseStatic(inliningTarget, KeyError, ErrorMessages.UNDEFINED_CHARACTER_NAME, name); + if (character != null) { + return FromJavaStringNode.getUncached().execute(character, TS_ENCODING); } - return FromJavaStringNode.getUncached().execute(character, TS_ENCODING); + Object namedSequence = lookupNamedSequenceFromFallback(lang, name); + if (namedSequence != null) { + return namedSequence; + } + throw PRaiseNode.raiseStatic(inliningTarget, KeyError, ErrorMessages.UNDEFINED_CHARACTER_NAME, name); } @Override @@ -238,10 +278,6 @@ protected ArgumentClinicProvider getArgumentClinic() { return UnicodeDataModuleBuiltinsClinicProviders.LookupNodeClinicProviderGen.INSTANCE; } - /** - * Finds a Unicode code point by its Unicode name and returns it as a single character - * String. Returns null if name is not found. - */ @TruffleBoundary private static String getCharacterByUnicodeName(String unicodeName) { int codepoint = UCharacter.getCharFromName(unicodeName); @@ -253,10 +289,6 @@ private static String getCharacterByUnicodeName(String unicodeName) { return UCharacter.toString(codepoint); } - /** - * Finds a Unicode code point by its Unicode name alias and returns it as a single character - * String. Returns null if name alias is not found. - */ @TruffleBoundary private static String getCharacterByUnicodeNameAlias(String unicodeName) { int codepoint = UCharacter.getCharFromNameAlias(unicodeName); @@ -267,6 +299,22 @@ private static String getCharacterByUnicodeNameAlias(String unicodeName) { return UCharacter.toString(codepoint); } + + @TruffleBoundary + private static Object lookupNamedSequenceFromFallback(PythonLanguage lang, TruffleString name) { + if (lang.getEngineOption(PythonOptions.UnicodeCharacterDatabaseNativeFallback)) { + try { + PythonModule cpythonUnicodeData = AbstractImportNode.importModule(T__CPYTHON_UNICODEDATA); + Object lookup = PyObjectGetAttr.executeUncached(cpythonUnicodeData, T_LOOKUP); + return CallNode.executeUncached(lookup, name); + } catch (PException e) { + if (!IsBuiltinObjectProfile.profileObjectUncached(e.getUnreifiedException(), PythonBuiltinClassType.ImportError)) { + throw e; + } + } + } + return null; + } } // unicodedata.name(chr, default) diff --git a/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/runtime/PythonOptions.java b/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/runtime/PythonOptions.java index 7cd0e7c6e0..89e292fa97 100644 --- a/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/runtime/PythonOptions.java +++ b/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/runtime/PythonOptions.java @@ -244,6 +244,9 @@ public static void checkBytecodeDSLEnv() { @EngineOption @Option(category = OptionCategory.USER, help = "Choose the backend for the pyexpat module.", usageSyntax = "java|native", stability = OptionStability.STABLE) // public static final OptionKey PyExpatModuleBackend = new OptionKey<>(T_JAVA, TS_OPTION_TYPE); + @EngineOption @Option(category = OptionCategory.USER, help = "Allow the unicodedata module to fall back from the ICU database to CPython's native UCD for unsupported features.", usageSyntax = "true|false", stability = OptionStability.STABLE) // + public static final OptionKey UnicodeCharacterDatabaseNativeFallback = new OptionKey<>(false); + @Option(category = OptionCategory.USER, help = "Install default signal handlers on startup", usageSyntax = "true|false", stability = OptionStability.STABLE) // public static final OptionKey InstallSignalHandlers = new OptionKey<>(false); diff --git a/mx.graalpython/mx_graalpython.py b/mx.graalpython/mx_graalpython.py index d51c787a05..c94f524966 100644 --- a/mx.graalpython/mx_graalpython.py +++ b/mx.graalpython/mx_graalpython.py @@ -115,6 +115,7 @@ def get_boolean_env(name, default=False): '--python.Sha3ModuleBackend=java', '--python.CompressionModulesBackend=java', '--python.PyExpatModuleBackend=java', + '--python.UnicodeCharacterDatabaseNativeFallback=false', ] @@ -2253,6 +2254,7 @@ def bytecode_dsl_build_args(prefix=''): '-Dpolyglot.python.PosixModuleBackend=native', '-Dpolyglot.python.Sha3ModuleBackend=native', '-Dpolyglot.python.CompressionModulesBackend=native', + '-Dpolyglot.python.UnicodeCharacterDatabaseNativeFallback=true', ] + bytecode_dsl_build_args(), language='python', default_vm_args=[ diff --git a/mx.graalpython/suite.py b/mx.graalpython/suite.py index 867cf73f24..8bd4ae553a 100644 --- a/mx.graalpython/suite.py +++ b/mx.graalpython/suite.py @@ -889,11 +889,12 @@ # GraalPy standalone specific flags # uncomment to disable JLine FFM provider at native image build time #'-Dorg.graalvm.shadowed.org.jline.terminal.ffm.disable=true', - '--enable-native-access=org.graalvm.shadowed.jline', + '--enable-native-access=org.graalvm.shadowed.jline', "-Dpolyglot.python.PosixModuleBackend=native", "-Dpolyglot.python.Sha3ModuleBackend=native", "-Dpolyglot.python.CompressionModulesBackend=native", "-Dpolyglot.python.PyExpatModuleBackend=native", + "-Dpolyglot.python.UnicodeCharacterDatabaseNativeFallback=true", ], "dynamicBuildArgs": "libpythonvm_build_args", },