-
Notifications
You must be signed in to change notification settings - Fork 788
Implement Garbage Collector type system #2607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
243ec44 to
68fe37e
Compare
include/wabt/binary-reader.h
Outdated
| }; | ||
| using TypeMutVector = std::vector<TypeMut>; | ||
|
|
||
| // Garbage Collector specific type information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a core part of the patch. It contains the (sub ...) part of the type. It is declared as a structure, because it is not mandatory.
include/wabt/binary-reader.h
Outdated
| virtual Result OnTypeCount(Index count) = 0; | ||
| virtual Result OnRecursiveRange(Index start_index, Index type_count) = 0; | ||
| virtual Result OnFuncType(Index index, | ||
| GCTypeExtension* gc_ext, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This structure is passed as a second argument, because it is a header, but it could go to the last argument since it is an extra (and optional) information. Which one you prefer?
| struct RecursiveRange { | ||
| Index start_index; | ||
| Index type_count; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another core structure to encode (rec ...) constructs. It represents the range.
include/wabt/type-checker.h
Outdated
| std::vector<FuncType> func_types; | ||
| std::vector<StructType> struct_types; | ||
| std::vector<ArrayType> array_types; | ||
| std::vector<RecursiveRange> recursive_ranges; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is stored in an ordered array. It cannot be stored as part of the types, because zero length (rec) range is allowed for whatever reason.
e0ce7f8 to
adcbdf7
Compare
|
I have reworked the type validation system of the patch. Now it is capable of detecting the first type index for all equal types. This first type index is called canonical index. If I have two types ( |
75bdfe5 to
890a316
Compare
|
The |
|
It sounds like you are canonicalising wrt type indices. But type indices are meaningless in any program that consists of more than one module. Type canonicalisation must happen globally, across module boundaries, based on the types' structure. I suspect that is the reason for the link/run-time tests failing. |
|
The link tests are not failing, although the interpreter do a slow type comparison for import/exports. As far as I understand the interpreter here is just a demonstration, so this is probably ok. The runtime tests fail because the operations (such as The global type canonicalisation sounds like a very good idea! A high performance engine should do that! |
cc8e21f to
097046d
Compare
|
@sbc100 there is a fuzzer issue in the code. The code is correct though. https://github.com/WebAssembly/wabt/blob/main/src/interp/binary-reader-interp.cc#L772 https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/fuzzer/FuzzerLoop.cpp What shall I do? |
We don't tend to have time to worry about fixing all the fuzz tests issues, unless they could conceivable show up in real world programs. i.e. we tend to assume trusted and save inputs, since we don't have the resources the harden wabt against other things. Having said that we obviously would be happy to accept fixes for such issues if folks come up with them. |
|
There is nothing to fix here, the code is correct (and not related to this patch). It is simply a limitation of the fuzzer, it assumes too much memory allocation is likely a bug. |
68c749c to
b7fc45c
Compare
f694b94 to
21a32fa
Compare
|
I have checked the type hashing system in the whole test system. It performs 45157 comparisons, and the hash/type_count is equal 513 times. From that, 467 times the types are really equal, and 46 times they are not. Overall the efficiency is 99.9%. I could not improve the hash by using more complex hashes. However, Edit: Now the efficiency is 100%. It does not mean it is perfect or anything. The missing cases were valid, worth improving the code. |
a4981a8 to
ee17d2b
Compare
0e0a846 to
508ab52
Compare
|
@sbc100 This is the first patch of the GC support. It triggers a fuzzer fail I described above. Reserving memory is correct for valid wasm files. However, a random value generated by a fuzzer causes a large memory allocation, which is considered as an error by the fuzzer. The fuzzer has an option to raise this allocation limit. What do you suggest? |
1510410 to
f73967c
Compare
|
Note: the issue is visible on the fuzzer backtrace: The OnDataCount gets a huge number, runs the |
1bc9c5e to
0222374
Compare
zherczeg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch is finally green. I have fixed some fuzzer bugs, which are unrelated to the patch. These are interpreter related, which main purpose is testing.
|
|
||
| Result BinaryReaderInterp::OnFunctionCount(Index count) { | ||
| module_.funcs.reserve(count); | ||
| module_.funcs.reserve(std::min(count, 1024u)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fuzzer fix: avoid allocating too much memory. If count is really > 1024, the array will still grow, just it happens later. However, some elements at the end of the buffer might not be used. I don't know what should we do here. There are similar cases below.
| CHECK_RESULT( | ||
| validator_.OnFunction(GetLocation(), Var(sig_index, GetLocation()))); | ||
| FuncType& func_type = module_.func_types[sig_index]; | ||
| Result result = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fuzzer bug: we need to add something to the list, even if the validation fails. The module_.funcs will be used later.
sbc100
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! This is awesome.
There is a lot going on here though. I didn't get time to look through all of it get.
@tlively could you take a quick look over this and see if anything jumps out? Does the general approach look reasonable?
| Result OnStructType(Index index, | ||
| Index field_count, | ||
| TypeMut* fields) override { | ||
| TypeMut* fields, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should be using C++ std::array rather then size + ptr in these API ? But I see its a pre-existing thing so no worries for this PR.
include/wabt/binary-reader.h
Outdated
| using TypeMutVector = std::vector<TypeMut>; | ||
|
|
||
| // Garbage Collector specific type information | ||
| struct GCTypeExtension { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "Extension" here? Maybe GCTypeInfo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see its like info about the types that extend a given type maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the idea. Normally, you don't use them (except passing them to the validator). And they might be changed in the future.
I am not a native speaker, so naming is never my strength. Shall I change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "GC", though? Subtyping isn't limited to GC types like structs and arrays, it's used for all types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am open to better names! Naming is not my best skill. I just wanted a name, which represents the extension where it comes, and "gc" is the name of the extension in wabt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the "Garbage Collector specific type information" comment is misleading. The "Extra type information added by the Garbage Collector proposal" would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the originating proposal is just historical accident, it is not semantically relevant. Func types are unrelated to GC, as will be other proposed types like cont. (How about just TypeExtension or Supertypes?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I picked SupertypesInfo, which is the combination of the two suggestions. I think this name can avoid symbol clashes.
include/wabt/interp/interp-inl.h
Outdated
| canonical_sub_index(kInvalidIndex), | ||
| is_final_sub_type(true), | ||
| recursive_start(0), | ||
| recursive_count(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some reason not just just put the defaults like = 0 into the class definition itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
include/wabt/interp/interp.h
Outdated
| }; | ||
|
|
||
| // To simplify the implementation, FuncType may also represent | ||
| // Struct and Array types. To do this, the mutability is stored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems odd.. is the idea to fix this later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be changed. My idea was that the interpreter is just a test program, so this choice should not affect library users, and the patch is already quite big. It can be improved later.
include/wabt/ir.h
Outdated
| Start, | ||
| Tag | ||
| Tag, | ||
| EmptyRec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is EmptyRec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the best choice here. Rec = recursive type. Normally, it contains several types, so it can be reconstructed without storing any extra data. This is not true for empty recursive types. Their position must be marked by something, and that is the purpose of that type.
I don't really understand the purpose of empty recursive types. They cannot be used for anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by: Empty recursion groups don't declare anything, they are like an empty section or an empty group of locals. I don't think they require any representation in the IR, you can just ignore them. Or, if you really want to represent them, the natural form would be as empty RecursionRanges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case is a bit more complicated. Wabt supports load/store module operations, and when an empty group is present during loading, we need to serialize it during storing to produce the same module.
There is another problem. Wabt has a list of types, and I don't want to change it to a list of recursive types, because that would require a large code refactor (and I suspect the new code would be even slower). Instead the types know when a new recursive type starts, and its size. But empty recursive blocks has no types, so this trick don't work with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Wabt preserve empty sections? If not, there is no reason to expect it to preserve empty recursion groups.
As for the list of types, your RecursionRange is a start index and a count, so isn't an empty range representable just fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is working as intended. The interpreter generates the binary in a single pass, backpatching section/function sizes, so no attempt is made to optimise their length. They will all be 5-byte LEBs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you suggest we should drop the (rec) blocks? This would simplify the code, so I would be happy. Hopefully all tools will follow this method as well.
I'd suggest they do whatever is most convenient with their respective implementation. It just doesn't matter either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbc100 please decide, the code is there, but I can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EmptyRec thing does seem out of place here. If its only there to support round-tripping of empty (rec) groups, and this is not necessary for spec compliance (am I understanding that correctly) then I suggest we drop it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EmptyRec is removed from the patch. Please let me know if you need more changes.
include/wabt/type-checker.h
Outdated
| TypeMut field; | ||
| }; | ||
|
|
||
| struct RecursiveRange { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should use RecGroup and rec_group rather then Recursive or rec ? Or maybe I'm just not used to seeing "rec" / "recurisive" and thinking of recursion group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
c0e578c to
031fa07
Compare
This patch supports parsing the new GC types - Abstract types - Recursive types - Composite types The patch also improves type comparison.
This patch supports parsing the new GC types
The patch also improves type comparison.