-
-
Notifications
You must be signed in to change notification settings - Fork 65
MathML rendering refactor #1671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: mathml_tweaks_indent
Are you sure you want to change the base?
Conversation
|
@mmatera: This is far from complete, but I wanted to get something out earlier for you to look at and think about. The main thing right now to look at is This starts to set box properties in the box object, even though until we get other boxes converted, we'll still have to pass things around in First of all, the name is a poor choice. There are other places in the code where "options" refer to the user-settable parameters, like color, style, and width. Here, we are mingling in computed attributes of the box, e.g., indent-level. In the future, we will need many more, like the number of characters in text boxes (its width), number of lines (number of embedded "\n"'s), etc. So a simple thing to do is rename (I don't think WMA allows Forms and box forms to take options, but one might imagine things like a way to specify flavors of TeX/LaTex, or the maximum width in a text box, and what to do if that width is exceeded). Another "code smell" we've introduced (actually, I introduced this in splitting off the rendering code for asy, svg, etc.) is using The problem with this is that this kind of thing is an unspecified, unchecked, untyped bag of whatever. And here, we can do better. There is now box_options, but instead of a generic dict, that would be better expressed as a Python dataclass, which is typed. There is way more to mention. But I have other stuff right now, so more later... |
|
Here we have specific code, and presumably that helps to understand better what I'm talking about. On to the more high-level ideas that we should be following to match conventional expression translation code like this. Some information propagates from the bottom of the tree upwards. Indented string results work like that. And some information propagates from the parent down. The nesting level works like this. A problem or feature of the way we were doing this is that a string was returned based on parameter information coming down. To the extent that all one ever needs is the string, okay. But there are situations where we may want to query and use those other pieces of information. Probably not nesting level, but the number of lines accumulated or the maximum width of text characters used are like this. So what I ask is to think about information and transforming expression as though it propagates through the nodes of the tree rather than as pieces of information that are found in parameters or in return results, even though in fact how tree attributes get updated. Right now, I'd like to see the MathML code improved and revised to make it clean and type-annotated. This includes removing the hard-coded characters. And I think that activity we will have a better sense of how tree-transformation code should work. This is applicable to the other render of boxing functions as well. While this MathML code is significantly better than what we had before. There is still a little way to go. And I think it is very beneficial to do before moving onto other, more complex forms, like 2-Dimensional character-oriented output, etc. |
Remove expression box_properties
|
In a draft of this PR, I renamed "box_option" to "box_property". In compiler expression manipulation terminology, "property" and "attribute" are somewhat synonymous. However, in Python, "properties" and "attributes" are different in that properties are read-only, while attributes are read-write. So, I will be renaming "box_properties" to "box_attributes to reflect the read-only versus read-write aspect. Later edit. Change now made. |
Thanks for tackling this. This refactor is large, and it is good to have another point of view.
OK, it is enough to start the discussion.
The point is that in this case, I used the word
A BoundingBoxes, and attributes like the indentation level are not options, because are determined by the information already available on the container or the the structure of the box expression: the indentation level of an element in a box expression is not something that you can especify with an option like
Still, you could cache certain properties inside the objects. For example,in
Then we reach the render part:
Again, is a bag because what we want to share is very general. A more explicit way to specify what is that parameter is to pass a dictionary instead of keyword arguments. But keyword arguments in Python are dictionaries. And also has the convenience that you do not need to explicitly copy the dictionaries to ensure that a change of one element at certain level affects the content of the dictionary inside another level (Python does it by default).
|
good
Just one thing: I know maybe I sound like the least indicated person to ask this, but in order to follow the discussion, let's try to change the different aspects of the implementation in different PRs.
|
Notice that something like this was done in the implementation of
OK, but are these parameters attributes of the Box Expression object, or of the specific representation used in the render? the container corresponds with an object having the attribute text-align:center. But at the HTML level, ` does not has this attribute set: we can move this element to another part of the HTML code and then its attribute could be any other thing. This is what happens here with properties computed at render time.
This is part of the process..
The time I have to make very large changes is finishing (I am going back to work this week). #1643, #1661 and #1663 complete what I can finish before the release. With them in, it is easy to make progress in specific parts (e.g., how SVG or prettyprint are rendered) without making structural changes. Reformulate |
I don't understand which part you agree on and which part you don't.
That I haven't started to address, but it needs to be addressed.
I don't mind splitting this into different PRs. But we should not move forward with new rendering and boxing work, until the code we currently have both in the PR and in the master, has been cleaned up. Otherwise, we are proliferating bad patterns. I am sorry I didn't notice and catch this sooner. , So which aspect would you like a PR for? |
@rocky, from what is in now, I like the change from
Thanks! and I agree, at this point we need to be in the same page to go forward.
|
When something is an option, it is fine and proper to call it an option. Computed properties like indent level and bounding box information are not options; they are box attributes. Storing them into a dictionary called
Each Box type has very specific attributes that it needs, such as for certain LaTeX and MathML boxes, whether there are multiple lines, and possibly in the future, the maximum width in characters of the lines. This is not bag-like; we know in advance which box attributes are used for which kinds of boxes, and can specify a hierarchy for these. So this stuff should follow conventional OO and Python practice and should be attributes of the box object. For things like options on built-in commands, which are more varied and can be added at will, sure, the general dictionary mechanism is sometimes very convenient. But we should limit this to when it is needed. It is not needed in specifying box attributes. As has been said many times, a problem with dictionaries when used for things like box attributes is that you loose the ability to check attribute names (called a "key" in dictionary parlance), and you also loose the type information in the value, since the type has to cover all possible values covered by keys. |
In this part I agree: in the signature of render functions,
First, there are not LaTeX/* MathML boxes: there are Boxes that eventually can be rendered as an SVG picture, MathML code or LaTeX code (and eventually, as a PNG picture). Some BoxExpressions have specific options, while others not: there are box expressions like
The consistency check of that the options received are the right ones or not happens at evaluation time ( |
Thanks for the information. I just looked at these PRs. Right now, I think we can get the features covered in the PRs merged in soon. Specifics:
|
Whether this is called
These routines work off of various kinds of Boxes.
Again, if something is an option, it should stay an option. Just don't put bounding box and box attributes into the options dictionary. Instead it is an attribute of the box object.
For things that are options, that's great. For things that are box attributes, we should follow Python annotation for type checking. I feel like a broken record repeating stuff: options are options and (box) attributes are attributes. Things that are "computed on the fly" are attributes. |
|
@mmatera Looking over the discussion so far, the one thing that saddens me the most is that I don't see an acknowledgement or understanding that the compiler design pattern that should be used here is one of thinking of attributes associated with the expression nodes (here, "boxes"), and there is a pattern of information propagating up and down the (expression) tree. Code should be written informed by this principle. Information passed down the expression tree right now is done via Strictly speaking, though we don't need to pass two parameters, we can do that in the parent by assigning (often via copy) to each child box from the parent before the call involving child nodes. I did that somewhere in the draft to show how that's done. |
I think I understand and acknowledge the pattern. What I am not agree is that the information that propagates should be attached to the BoxExpression, as you do not make a C compiler to store the information used in compilation inside the source code. As I see, Box Expressions are the equivalent to the source code, and the MathML output is the object code. The place where I think the information used in compilation should be stored is in the kwargs dictionary. Could be also done in many other ways, I think. Code should be written informed by this principle.
OK, I agree with this. The detail is how to do that plays well with the dispatch table.
|
Good to hear.
A compiler is way more complicated than this. The aspect I am talking about here is part of what is called the "front-end" of a compiler and you do find it in interpreters as well. Think of Python's AST structure.
Ok. So take an interpreter like Perl, which doesn't have an AST structure, but it has an interpreter tree called Optree, which runs off of. Again, when doing transformations, structurally, it helps the organization by thinking of the information as passing through the tree instead of via parameters.
Although there are always many ways to do things, my personal experience with this kind of transformation with large code bases is that things are more comprehensible when you think about and work with stuff in this node-centric way. I talked about this in https://rocky.github.io/YAPC2018-deparse/#/9 https://rocky.github.io/YAPC2018-deparse/#/9/1 https://rocky.github.io/YAPC2018-deparse/#/10. A really poorly-presented talk I gave on this is https://youtu.be/gREriCbwW8E?si=otz2X-cRBqv3UdPP&t=1001 |
Is Python AST structure modified during its conversion to bytecode?
OK, but I would look not at every compiler/render but at the ones with a similar interface. How does HTML/SVG renders do work? How Python's
OK, then my question is: can be implement the mathml render in this way, without changing the design of the |
|
On Mon, Feb 2, 2026 at 11:51 AM Juan Mauricio Matera < ***@***.***> wrote:
*mmatera* left a comment (Mathics3/mathics-core#1671)
<#1671 (comment)>
I think I understand and acknowledge the pattern.
Good to hear.
What I am not agree is that the information that propagates should be
attached to the BoxExpression, as you do not make a C compiler to store the
information used in compilation inside the source code.
A compiler is way more complicated than this. The aspect I am talking
about here is part of what is called the "front-end" of a compiler and you
do find it in interpreters as well. Think of Python's AST structure.
Is Python AST structure modified during its conversion to bytecode?
This part is done in the creation of the AST.
As I see, Box Expressions are the equivalent to the source code, and the
MathML output is the object code.
Ok. So take an interpreter like Perl, which doesn't have an AST structure,
but it has an interpreter tree called Optree, which runs off of. Again,
when doing transformations, structurally, it helps the organization by
thinking of the information as passing through the tree instead of via
parameters.
OK, but I would look not at every compiler/render but at the ones with a
similar interface. How does HTML/SVG renders do work? How Python's xml
library does the conversion from text to a tree structure and back to text?
I am telling you that this is a common compiler pattern. See attribute
grammar. <https://en.wikipedia.org/wiki/Attribute_grammar> I am not
interested in investigating this further.
The place where I think the information used in compilation should be
stored is in the kwargs dictionary. Could be also done in many other ways,
I think.
Although there are always many ways to do things, my personal experience
with this kind of transformation with large code bases is that things are
more comprehensible when you think about and work with stuff in this
node-centric way. I talked about this in
https://rocky.github.io/YAPC2018-deparse/#/9
https://rocky.github.io/YAPC2018-deparse/#/9/1
https://rocky.github.io/YAPC2018-deparse/#/10. A really poorly-presented
talk I gave on this is
https://youtu.be/gREriCbwW8E?si=otz2X-cRBqv3UdPP&t=1001
OK, then my question is: can be implement the mathml render in this way,
without changing the design of the boxes_to_format methods by dispatch
tables? Shall we change the implementation of it? Or the changes can be
restricted to the mathics.format.render.mathml module?
One way is done more or less the way in the PR. There are fancier versions
of that. But if we start of this way we will be in better position for the
fancier ways.
… —
Reply to this email directly, view it on GitHub
<#1671 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAACFE2IRSZJSKJTTD2NZYD4J56BHAVCNFSM6AAAAACTT6MMVGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQMZWGQ2TENJUGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
OK, and I trust you it is. I just mention that how Perl's interpreter works does not show to me as the most relevant example. So, regarding my questions
If the changes just require to add some attributes to the BoxExpression subclasses and modifying |
Refactors,
mathics.format.rendercode to something that uses better style and is more akin to current expression evaluation technology.