Skip to content

Conversation

@fosslinux
Copy link
Owner

No description provided.

@fosslinux fosslinux marked this pull request as ready for review October 15, 2025 00:19
@fosslinux fosslinux force-pushed the qemu-ci branch 5 times, most recently from 9583f58 to 7612200 Compare October 17, 2025 10:32
@fosslinux
Copy link
Owner Author

Ok im confused :\

@Googulator
Copy link
Collaborator

hmm, that last run provides a clue:
[ 21.213595] serial8250: too much work for irq4

Linux serial console bug?

@fosslinux
Copy link
Owner Author

Maybe this well help?

@alganet
Copy link
Contributor

alganet commented Oct 22, 2025

There's a 6h time limit on GitHub jobs. If it goes past that, it cancels the job. You guys probably already noticed that, but I thought it wouldn't hurt to say it.

@fosslinux
Copy link
Owner Author

Indeed, that is something that we will deal with after this first issue is fixed.

@alganet
Copy link
Contributor

alganet commented Oct 24, 2025

@fosslinux what is the current issue?

@fosslinux
Copy link
Owner Author

fosslinux commented Oct 24, 2025

This is a good example.
https://github.com/fosslinux/live-bootstrap/actions/runs/18679236148/job/53256256615
Note how just midway through the first musl build after the new Linux kernel is run, QEMU suddenly dies?
It's very unclear why this is happening. Note that the serial8250: too much work for irq4 messages only appear sometimes.

Sometimes QEMU won't die but just hang instead.

@alganet
Copy link
Contributor

alganet commented Oct 24, 2025

@fosslinux do we need -serial for something or was it just an attempt at working out the problem or getting info on it?

I think these couple of run shows promise:

https://github.com/fosslinux/live-bootstrap/actions/runs/18630275708/job/53113976191 ("another test")
https://github.com/fosslinux/live-bootstrap/actions/runs/18623287945/job/53097415539 ("jasdklfjasdklfj")

They were both cancelled by the time limit, which can be seen on their summary page:

image

GitHub recommends this approach instead of sudo. I don't think it's related to the instabilities, but it might be a better way to enable kvm than running as root.

I was attempting to do the exact same thing in my personal fork months ago. Here are some of my runs:

https://github.com/alganet/live-bootstrap/actions (look for the ones with qemu on the title).

At the time, I was consistently getting it to run up until the 6h limit.

@fosslinux
Copy link
Owner Author

@fosslinux do we need -serial for something or was it just an attempt at working out the problem or getting info on it?

Not needed, that was just a test.

I think these couple of run shows promise:

https://github.com/fosslinux/live-bootstrap/actions/runs/18630275708/job/53113976191 ("another test") https://github.com/fosslinux/live-bootstrap/actions/runs/18623287945/job/53097415539 ("jasdklfjasdklfj")

Unfortunately they don't either. That is the other case where the build hangs, look at the logs, the build process dies at a similar point. See the two hour gap between the last command output and the job being terminated? Same problem.

I was attempting to do the exact same thing in my personal fork months ago. Here are some of my runs:

https://github.com/alganet/live-bootstrap/actions (look for the ones with qemu on the title).

At the time, I was consistently getting it to run up until the 6h limit.

Thanks, thats helpful, I will inspect the differences.

@alganet
Copy link
Contributor

alganet commented Oct 26, 2025

@fosslinux

I see, you're right. For the first run, the cancelling happens after it hangs for a while without any output

image

However, for the second run "jasdklfjasdklfj", the timestamps on the GitHub raw logs indicate that qemu was producing output just before the process reached the time limit:

image

Unfortunatelly, the logs for my runs are not available anymore. One similar thing between my runs and the "jasdklfjasdklfj" run here is that both were invoking qemu outside python's subprocess.run. I don't know if that is related (seems unlikely), but I think is worth some job retries to see if it's consistent.

I'll probably do that test in my fork tomorrow, using the exact code from "jasdklfjasdklfj". If I find it to be consistent, I'll investigate some subprocess.run optional parameters or maybe an alternative to it. I will also report back if I discover that any of the runs hang.

@Googulator
Copy link
Collaborator

New clue: "cat: write error: Resource temporarily unavailable". And then the output of "cat" is seemingly truncated.

Unfortunately, the run time of qemu suggests that it still likely failed at the exact same point as previous runs, suggesting that the serial console IRQ error is unrelated.

@fosslinux
Copy link
Owner Author

"jasdklfjasdklfj" has a bug, which is that the build process runs twice, that is why we see output right up until the time limit, because that is the second run of the build process.

@alganet
Copy link
Contributor

alganet commented Oct 27, 2025

I noticed some mounts are failing. This doesn't seem to happen on the bubblewrap version.

image image

I am not familiar with that part. If it's not related and an issue for later, just ignore me :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants