Improve startup cache loading performance by mbien · Pull Request #9307 · apache/netbeans

mbien · 2026-03-29T05:29:14Z

use Data*Streams instead of Object*Streams
simplify file format (no XML)
code renovation and minor smaller optimizations

result are about 4x faster cache loading times (writing probably too but its not relevant for UX)

minor:

List -> Set for field solely used for contains() in inner loop

in numbers:

readCache(): 113ms -> 28ms
calculateParents(): 128ms -> 30ms

or see the readCache() and calculateParents() marked in purple

before:

after:

third PR for some more startup optimizations pending. (first was #9303)

mbien · 2026-04-02T06:00:17Z

tested a few failure modes. e.g using a corrupted cache or one from an older NB version and it got reset on start.

Details

e.g cache containing invalid data would show:

INFO [org.netbeans.core.startup.ModuleList]: Cannot read cache
java.io.UTFDataFormatException: malformed input around byte 6
	at java.base/java.io.DataInputStream.readUTF(DataInputStream.java:641)
	at java.base/java.io.DataInputStream.readUTF(DataInputStream.java:550)
	at org.netbeans.core.startup.ModuleList.readProps(ModuleList.java:652)
[catch] at org.netbeans.core.startup.ModuleList.readCache(ModuleList.java:626)
	at org.netbeans.core.startup.ModuleList$ReadInitial.run(ModuleList.java:1583)
	at org.openide.filesystems.EventControl.runAtomicAction(EventControl.java:102)
	at org.openide.filesystems.FileSystem.runAtomicAction(FileSystem.java:494)
	at org.netbeans.core.startup.ModuleList.readInitial(ModuleList.java:147)
	at org.netbeans.core.startup.ModuleSystem.readList(ModuleSystem.java:280)
	at org.netbeans.core.startup.Main.getModuleSystem(Main.java:171)
	at org.netbeans.core.startup.Main.getModuleSystem(Main.java:141)
	at org.netbeans.core.startup.Main.start(Main.java:307)
	at org.netbeans.core.startup.TopThreadGroup.run(TopThreadGroup.java:99)
	at java.base/java.lang.Thread.run(Thread.java:1583)

but there are various checks before that so it would fail silently in most cases without even getting to the point where it is loaded.

BradWalker · 2026-04-02T11:09:00Z

Nice work!

matthiasblaesing · 2026-04-03T12:45:50Z

platform/openide.modules/src/org/openide/modules/Dependency.java

        return deps;
    }

+    /**


This is an exported class - before this change an instance of Dependency could only be created through Dependency#create now there is a option to inject a DataInput. Is there an option to keep this an implementation detail?

The created format does not look like something I'd like people to potentially rely on.

the class itself can be only constructed from the inside. The create(String body) factory method is essentially a parsing constructor and can be fed Strings like org.netbeans.bootstrap/1 to create dependencies (which is done in some cases).

Even Dependency.create(Dependency.TYPE_MODULE, "hugo"); would already return a dependency.

The first impl for performance measurements had the constructor made public and the new read/write() methods were outside. However, the code was also a bit less maintainable since read/write() are used from two different places:

org.netbeans.ModuleData and org.netbeans.core.startup.ModuleList both use read()/write()

I couldn't come up with a better solution, since the goal is to avoid using ObjectOutputStream. Using read/writeObject actually performed worse and produced larger files, Externalizable would require a public no-arg constructor so I discarded that thought early + I believe it would also have required to keep ObjectOutputStream. (both cases would also have public methods)

Having them package private would require accessor utilities in two modules using the same namespace. Also not pretty. Maybe a comment that the methods are only meant for short term storage would be sufficient?

package private methods in Dependency + pasting this into the two modules is probably worse than a comment on the methods?

package org.openide.modules; //... public class DepAccessor { public static Dependency read(DataInput is) throws IOException { return Dependency.read(is); } public static void write(Dependency dep, DataOutput os) throws IOException { dep.write(os); } }

I assume using Dependency#create would be out of the question? If so, I tend to agree, that adding comments seems like a valid approach. It mimics the "Serialization" warning that ca be found in the swing classes, that explicitly declares, that serialized versions will not be compatible with future versions. I'd add that there is no guarantee for the used data format.

Yes, agreed. Is there any usage here where we read / write across IDE versions? If not, I'd be more explicit that the format is only intended to be read by the same version of openide.modules. Could possibly even enforce that incompatibility with an extra header derived from the spec version (perhaps with magic number) somehow?

Glad to see use of DataInput and DataOutput here rather than explicit stream type.

Deps are in the streams of all-modules.dat (ModuleList) and all-manifests.dat (ModuleDataCache).

The second file is a bit larger and contains more module related info. All of this is already custom IO code, serialization of the Dependency as ObjectOutputStream was just an anomaly among it.

The all-modules cache is invalidated on any change in the module system

however: there is a second layer in the Stamps class (for all caches) which checks if

any of the clusters changed (which is cached in all-clusters.dat)

checks if any of the .lastModified files within the cluster is newer than the cache file.

if yes it would invalidate the dat file. To test #9307 (comment) I had to disable this timestamp mechanism, otherwise it would notice.

So IMO this is already covered since stream format versioning is essentially time based. NB would ignore the caches if you would copy NB 29 dat files into the dev build folder.

Sure, but this makes that custom IO code an explicit and public detail. That was just a thought to fail fast in case someone did rely on it for something else. I agree that's unlikely. Other options might be @PatchedPublic(???) or just a second create method and put the serialization in Bootstrap where it can be accessed by Startup?

I'm not against merging as is, just share some of the same reservations. Doc is probably OK.

ok. i thought your concern was something else, I didn't think about that anymore since the doc already had a comment that the format was ephemeral.

Adding a second public factory method while trying to move the serialization methods somewhere else where it is not public is also an option. This in fact was the very first version of this, which simply made the constructor public. I wasn't sure what was the lesser evil. But it looks like we want another public factory method now (constructor equivalent).

put the serialization in Bootstrap where it can be accessed by Startup?

right. will try something like that. As long the io code is not copied around its ok with me.

tried a mix between #9307 (comment) and ~~public~~ factory method.

bootstrap exports now a package more for its friends list

matthiasblaesing · 2026-04-03T13:12:04Z

platform/core.startup/src/org/netbeans/core/startup/ModuleList.java

+                    .map(e -> e.getKey() + "=" + e.getValue())
+                    .collect(Collectors.joining(","));


This will break one either a key or a value contains = or ,.

yes but can this happen? The set of keys is restricted to (see readProps):

case "name", "jar" -> entry[1]; case "enabled", "autoload", "eager", "reloadable" -> Boolean.valueOf(entry[1]); case "startlevel" -> Integer.valueOf(entry[1]); default -> throw new IOException("unknown key " + entry[0]);

the first two are String values, the rest are primitives.

This does not cache all properties, only a very limited set. (this was already done before this change)

The problem with these ad-hoc encoding is, that they have the tendency to creep up later and hunt you when you least expect it.

name currently is the codename base, at this point in time to my knowledge CNDs can't contain both, but that is not set into stone.

Not sure that a naming convention for jar files is enforced for modules, but at its core it is a filename and that can hold both , and =.

Why not reuse an approach you are already using: don't invent a new serialization, but write the number of key/value pairs to the stream and then the keys and values as pure strings. That way there is no need to consider "special" characters.

i will try to come up with something better. Part of the reason I remained at the String level is convenience and simplicity. But if we can't do that due to the separator issue, we could indeed go one level lower which would also allow us to map all booleans to a single byte and write the two strings without keys.

I dropped the idea of bitset encoding, since we would need a 3-state due to the unset state.

instead I made a slight modification of the implementation:

map entries are written as individual Strings

key/value separator is still the =, but this is fine since the String is split exactly once (split(.., 2)) and we know keys don't have that char in them

changes are in separate commit

- use Data*Streams instead of Object*Streams - simplify file format (no XML) - code renovation and other minor optimizations results in 4x faster cache loading times (readCache() and calculateParents() methods) minor: - ModuleManager.EnableContext: List -> Set for field solely used for contains() in inner loop - Module: data loading can synchronize on instance instead of class (DATA_LOCK no longer static)

matthiasblaesing

I have minimal nitpick, that you might consider. Apart from that thanks for the work.

matthiasblaesing · 2026-04-07T17:19:42Z

platform/o.n.bootstrap/src/org/netbeans/DepUtil.java

I suggest not to use org.openide.modules as that package is also part of the corresponding module. If we ever want to move towards support for JPMS split packages have to go away. Going into the opposite direction feels wrong to me. I have not yet investigated how OSGI reacts to split packages, but I suspect it will also be problematic.

ok. Side effect of that move is that it the new create() method in Dependency becomes public API.

I thought about this and I am leaning towards to keeping it as is atm due to the following reasons:

it wouldn't prevent future code movements to solve this in a different way since nothing is public atm

to make the new create() method public, I would have to add validation to it and i don't really like to reverse engineer the validation of the other create method right now. There is also the danger to make it slower again since it wasn't t here before due to point 3

it doesn't really have to be public since it is only used internally without having a purpose for client code

I have not yet investigated how OSGI reacts to split packages, but I suspect it will also be problematic.

I am by no means an OSGI expert, but I do remember that even back in the days OSGI was ok with split packages as long there was no overlap in classes. E.g if you defined the same class in two modules -> it caused issues. Since this is friend API, this would be super unlikely to occur by chance.

But I have even less experience with the OSGI impl used within NB or what spec level it actually is.

@matthiasblaesing @neilcsmith-net please let me know what you both think.

Well, we have split packages all over the place. Not that I think we should necessarily be adding more. In the short term, what would be the impact of keeping the method non-public and using a method handle?

I have some reservations that we're in the wind down phase to freeze and still some open questions on this given how fundamental it is.

and still some open questions on this given how fundamental it is.

did I overlook a question?

Just the minor things raised in this thread about split package. I didn't mean there are fundamental questions, but that this is a change in something quite fundamental.

added commit which moves it back and it uses now method handles. Added more comments and a test since there is now more to explain.

performance is similar, it seems to vary a little more though from run to run.

mbien added this to the NB30 milestone Mar 29, 2026

mbien added Code cleanup Label for cleanup done on the Netbeans IDE performance Platform [ci] enable platform tests (platform/*) ci:dev-build [ci] produce a dev-build zip artifact (7 days expiration, see link on workflow summary page) labels Mar 29, 2026

mbien mentioned this pull request Mar 29, 2026

Early startup phase optimizations #9308

Open

mbien force-pushed the cache-loading-perf branch from 0047b0a to 34d1d39 Compare March 29, 2026 17:08

mbien mentioned this pull request Apr 1, 2026

Move Splash initialization to EDT and make it no longer modal #9303

Merged

mbien force-pushed the cache-loading-perf branch from 34d1d39 to e28b5f1 Compare April 1, 2026 22:09

mbien marked this pull request as ready for review April 2, 2026 06:00

mbien force-pushed the cache-loading-perf branch from e28b5f1 to 54c0790 Compare April 2, 2026 08:06

mbien requested review from lkishalmi and matthiasblaesing April 3, 2026 02:40

matthiasblaesing reviewed Apr 3, 2026

View reviewed changes

mbien added 2 commits April 7, 2026 09:40

properties format adjustments and javadoc update

13c9102

mbien force-pushed the cache-loading-perf branch from 54c0790 to 13c9102 Compare April 7, 2026 10:16

move code around

cbdd6e5

mbien added the Need Squashing label Apr 7, 2026

matthiasblaesing approved these changes Apr 7, 2026

View reviewed changes

move code around even more. No split package but method handles.

f636af8

		.map(e -> e.getKey() + "=" + e.getValue())
		.collect(Collectors.joining(","));

Conversation

mbien commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbien commented Apr 2, 2026

Uh oh!

BradWalker commented Apr 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbien Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neilcsmith-net Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbien Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbien Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbien Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbien Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthiasblaesing left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mbien commented Mar 29, 2026 •

edited

Loading

mbien Apr 3, 2026 •

edited

Loading

neilcsmith-net Apr 7, 2026 •

edited

Loading

mbien Apr 7, 2026 •

edited

Loading

mbien Apr 7, 2026 •

edited

Loading

mbien Apr 3, 2026 •

edited

Loading

mbien Apr 7, 2026 •

edited

Loading