2026-03-01 00:48:14 i added some debugging code to my pipeline... on the (siLj_NXx) shared-runner nor-ci-1, which is crashing with SIGILL on a vector instruction, cpuinfo does claim vector support: model name: Spacemit(R) X60 isa: rv64imafdcv_zicbom_zicboz_zicntr_zicond_zicsr_zifencei_zihintpause_zihpm_zfh_zfhmin_zca_zcd_zba_zbb_zbc_zbs_zkt_zve32f_zve32x_zve64d_zve64f_zve64x_zvfh_zvfhmin_zvkt_sscofpmf_sstc_svinval_svnapot_svpbmt 2026-03-01 00:50:35 and, even if go is compiled with GORISCV64=rva20u64 (which is also the default), it will check at runtime whether the CPU supports RVV 1.0 (vector extensions). and it seems this CPU does, but regardless crashes with SIGILL at internal/bytealg/indexbyte_riscv64.s:83 2026-03-01 00:54:21 i am thinking, maybe we patch out the runtime check and always use the non-vector impls (which is what go1.25 does) and report this to the go people to let them sort it out? i'm not familiar with riscv64 cpu extensions 2026-03-01 06:54:50 the weird thing here is that the commits which implement the vector support in go specifically mention Spacemit X60, which is ostensibly the same CPU as is failing here 2026-03-01 07:29:33 ikke: you around? i could really use a rv64 shell for this instead of trying to hope my CI jobs get scheduled on the correct box :p 2026-03-01 07:32:22 Only on milk-v pioneer hw. nor-ci-1 is different 2026-03-01 07:32:53 that doesn't help unfortunately, shared-runner pioneer1 cpu does not support the vector insns and thus does not exhibit the issue 2026-03-01 07:34:56 is nor-ci-1 ncopa's perchance? 2026-03-01 07:37:13 Yes 2026-03-01 07:38:02 so i guess i need to talk to him unless i get lucky with CI scheduling 2026-03-01 07:38:46 ncopa: would you mind giving me a temp shell on nor-ci-1 for debugging a vector instruction issue on that specific cpu? 2026-03-01 22:45:11 the s390x runner is idle... but has not picked up any of the spirv rebuilds yet. maybe it's broken? 2026-03-03 10:23:26 gentle poke, what's the state of the s390x builder? 2026-03-03 10:26:29 Will check later. 2026-03-03 10:48:23 something happened with the s390x builder 2026-03-03 10:49:18 it rebooted 3 days ago and it appears that /var was not mounted, so no lxc 2026-03-03 10:52:00 vgdisplay shows scary messages 2026-03-03 10:52:06 vg1 is gone apparently 2026-03-03 10:53:39 WARNING: VG vg1 is missing PV fKw74P-69tj-NEHO-jWLi-KTCg-ycFF-mfykG6 (last written to /dev/dasdg1). 2026-03-03 10:53:39 WARNING: VG vg1 is missing PV jHErDq-RYC7-w9u1-iaQo-lCfR-P4mP-WLCra3 (last written to /dev/dasdh1) 2026-03-03 11:17:17 i think this effectively means that we lost the data 2026-03-03 11:23:35 I think I will reboot it again and see if the disks comes back 2026-03-03 11:24:02 usa2-dev1 [~]# uptime 2026-03-03 11:24:02 11:23:54 up 3 days, 18:32, 0 users, load average: 0.00, 0.00, 0.00 2026-03-03 11:25:00 nope 2026-03-03 11:25:50 :( 2026-03-03 11:26:26 i think we lost /dev/dasdg1 and /dev/dasdh1 for some reason 2026-03-03 11:26:52 I dont know who we should ask 2026-03-03 11:27:59 Martha 2026-03-03 11:28:35 I think it was rebooted Friday 27 February and after that we lost those two disks 2026-03-03 11:34:15 Sent an email 2026-03-03 11:38:08 Thanks! 2026-03-03 16:06:31 lotheac: Sorry I can give you a login to nor-ci-1, but not right now. Please ping me tomorrow 2026-03-03 16:44:34 ncopa: ping 2026-03-03 16:45:28 I got reaction from Marist. They services and rebooted the machine. They expect the new disk configuration was not made persistent. I found that the disks are listed in /etc/zipl.conf, and added the suggested disk numbers there 2026-03-03 16:48:52 A ran `zipl`. Rebooting now and see if it comes back 2026-03-03 16:51:50 ok, it's back, including the volumes and containers. 2026-03-04 06:14:15 aelin: thanks for reporting btw 2026-03-04 06:14:53 ncopa: good morning. ping re: nor-ci-1 2026-03-04 07:35:52 lotheac: good morning! do you have access to the dmvpn network? can you route to 172.21.1.3? 2026-03-04 07:43:07 i do not 2026-03-04 07:43:47 ok. let me try set up some portforward after i eat breakfast 2026-03-04 07:43:51 cheers 2026-03-04 16:35:40 ^ is expected 2026-03-05 11:51:41 Tweaked this trigger to wait slightly longer 2026-03-06 06:01:48 jujutsu build has been stuck on build-edge-riscv64 for at least 6 hours, or even longer 2026-03-06 06:03:39 ikke: do you have time to take a look? thanks 2026-03-06 06:09:41 huajingyun: I've killed the build 2026-03-06 06:16:29 ikke: Okay, thank you. hope it will pass this time 2026-03-08 23:09:12 looks like build-edge-riscv64 is stuck on jujutsu again 2026-03-09 01:27:17 omni: yeah, the second build also got stuck. mio has already disabled checks for riscv64, so it might still require ikke to manually kill the build that has entered the 3rd build of jujutsu 2026-03-09 06:50:23 huajingyun: killed it again 2026-03-09 06:52:18 ikke: thanks a lot 2026-03-11 03:17:02 ikke: I was curious about the package mirror disc usage, put together a little analysis on it here: https://lambdacreate.com/paste/alpine-mirror-analysis.txt 2026-03-11 03:18:18 It looks like we're using ~5tb of space based on what I can see from rsync, not sure if that's the absolute total. Regardless I was kind of surprised how much space all the EOL releases take up and how much the distro has grown 2026-03-11 03:18:46 edge is like 40x the size of 3.3 which is wild. 2026-03-11 03:19:28 good stuff man! 2026-03-11 03:19:38 also the addition of disc images for AWS, Azure, GCP is non-trivial 2026-03-11 03:20:43 invoked: thank you! lots of discussion going on around this stuff lately, but it's hard to actually discuss it without data 2026-03-11 03:21:26 very helpful, thank you 2026-03-11 03:21:58 anyways, I think long term maybe the answer is we reach out to archive.org and ask if they can provide an archive.alpinelinux.org mirror for us like they do for Debian, Fedora, etc. That'd probably need TSC or Council involvement I guess, someone to represent the distro and broker the connection. 2026-03-11 03:22:13 or well, maybe not THE answer, but a possible solution. 2026-03-11 08:00:46 durrendal: nice 2026-03-11 08:01:17 i think thats a good idea reg archive.org 2026-03-11 08:02:24 would you mind checking this out and let us know what to do? 2026-03-11 09:29:11 I did look at archive.org before 2026-03-11 09:31:06 It would probably take an announcement from us that we are deleting those files before they take it u 2026-03-11 09:31:08 Up 2026-03-11 09:32:21 Archlinux does have some arrangement though 2026-03-11 12:49:45 clandmeter: yeah I don't mind at all, our best bet is to probably get it on Jason Scott's radar, he's the curator for those archives. I'll see if I can find an email for him and then reach out. 2026-03-11 12:50:21 Any issues if I cc you both on that? 2026-03-11 12:52:52 my sense is it probably will require an announcement at some point, but they do this for most of the larger distros so I don't think we'd be asking too much 2026-03-11 13:00:48 durrendal: go ahead, we appricate the support. 2026-03-11 13:10:07 No problem for me 2026-03-11 14:01:58 awesome :) I just sent, used both of your @alpinelinux.org emails for the CC, felt more official that way 2026-03-11 14:02:12 fingers crossed we get a good response! 2026-03-11 15:26:40 Durrendal: Thanks, though 3.14-3.19 are not EOL yet. The official designation is "on request" 2026-03-11 15:36:45 Drat, I forgot about that distinction. Still if they can take everything from 3.0 to 3.13 that still gives us some breathing room and a transition plan 2026-03-11 15:37:30 how much space do we have total? Is it enough to just trim space consistently? 2026-03-11 15:39:40 With each release increasing in size, it would not 2026-03-11 15:40:14 How does archlinux handle redirects in case of 3rd-party mirrors 2026-03-11 15:40:41 durrendal: we have several components with differents amount of storage 2026-03-11 15:58:27 that was my sense, and the real space saving would include the larger more recent releases. 2026-03-11 15:58:31 https://wiki.archlinux.org/title/Arch_Linux_Archive 2026-03-11 15:59:01 My understanding of how Arch does it is the historical archive has a redirect, so for a user perspective it's transparent just a little slower 2026-03-11 15:59:31 They are a rolling release, so less comparible 2026-03-11 16:00:29 yes unfortunately I haven't found any references to how it works for Debian etc but I believe they have this sort of arrangement and are closer to our use case 2026-03-11 16:10:31 For our CDN, we could probably implement the redirects, but the 3rd-party mirrors won't implement that and the files would get removed 2026-03-11 16:11:33 Another thing worth looking at is retention for cloud images 2026-03-11 16:12:28 like you pointed out, they also take up quite some space 2026-03-11 16:23:30 naive question: is changing the builders to share noarch content (as Sertonix suggested) a complex change? 2026-03-11 16:25:20 invoked: one builder would need to be the provider for these packages 2026-03-11 16:25:31 which means that builder becomes a depedency for other builders 2026-03-11 16:26:14 Harder to get consistent builds 2026-03-11 16:27:11 hm. on the other hand that seems like it could be the most impactful thing we could do 2026-03-11 16:27:19 if builder B provides the noarch packages, and builder A finishes before builder B, which finished before builder C, it could be that builder A was built against version N of a package, while builder C builds against version N+1 2026-03-11 16:27:55 ah, so it's racy 2026-03-11 16:28:17 Yes 2026-03-11 16:28:53 The current consistency is achieved by having a single builder which has a full view and sole custody over it's repositories 2026-03-11 16:29:20 Distributing that, and you get all the fun and joy from distributed systems 2026-03-11 16:29:38 :( 2026-03-11 16:29:51 thanks for the explanation 2026-03-11 16:32:10 Changing the build architecture would need to find a solution for this problem 2026-03-11 16:32:20 (or see how others have fixed it) 2026-03-11 16:33:21 Wow, I would not have foreseen that builder race condition, but it makes sense to me now that I have read that a few times. 2026-03-11 16:36:59 These problems are not impossible to solve, just requires a well-thought out architecture 2026-03-11 16:37:24 (" 2026-03-11 16:37:32 ("just" carrying a lot of weight here :P) 2026-03-11 16:39:01 Was gonna say, so you mean it is impossible to solve, haha 2026-03-11 16:41:28 The power of the current architecture, with all its limitations, is its simplicity 2026-03-11 16:41:54 Changing that would probably introduce more points of failure / instability 2026-03-11 17:28:27 ikke: My idea for noarch would be to do that post-builds: indexing a directory, checking for duplicate package ids, repack file with combined signatures, sym-/hardlink file. 2026-03-11 17:30:35 The checking of package id/hash is important since there are many noarch packages that are not the same across arches 2026-03-11 17:32:48 If I understand it correctly, that would only save space on the mirrors, right? Each builder would still have all the packages (including noarch) 2026-03-11 17:35:38 (not dismissing the idea, just trying to understand it) 2026-03-11 17:36:14 If packages have to be local, yes 2026-03-11 17:37:29 Right now, they have to be 2026-03-11 18:20:04 ikke: so it's not enough to address the storage problem on the mirrors, it must also be addressed on the builders due to their current design? 2026-03-11 18:20:23 durrendal: More or less, yes 2026-03-11 18:20:49 Though, technically we could prune more on the builders 2026-03-11 18:32:40 hmm a bit of a rock and a hard place. It's one thing to argue funneling money into expanding mirror storage, another if you also have to scale every other system that is related to packaging and release. 2026-03-11 18:33:11 Even mirror storage involves multiple hosts due to geolocation 2026-03-11 18:34:06 I assumed as much from your previous comment. And throwing money at a problem is a bandaid fix, not a long term solution. 2026-03-11 18:35:14 can we release 3.24 as is without addressing the storage problem? 3.25? trying to get a sense for how much runway we have 2026-03-11 18:36:09 also, I wasn't around, I assumed we purged 0.x through 2.0 since we only have 3.x on the mirrors? 2026-03-11 18:37:37 Yes, I recall we used to have an old.a.o, but it got dropped somewhere among several infra changes 2026-03-11 18:43:49 haha well 3.24 could be 4.0, cut it and start over. 2026-03-11 18:43:57 jk of course 2026-03-11 18:44:44 In any case, how it's now, 3.24 would not fit 2026-03-11 18:44:48 on the master mirror 2026-03-11 18:44:57 Note that they have always provided us more space when we asked for it 2026-03-11 18:45:33 But, I don't know how much more we expect their generosity to extend 2026-03-11 18:50:18 That's a sobering statement, and explains where the discussions around package acceptance policy have been coming from. If we grew less quickly, we'd have more runway. 2026-03-11 18:52:48 Is the policy on which versions get maintenance by request set in stone? If we dropped 3.14 - 3.18 for example that's 1196.2GB of space, leaving 3.19 as still at request? 2026-03-11 18:53:04 that last ? should be a . 2026-03-11 18:54:22 It's also that those releases are more likely to still have active users, and dropping them has a much larger impact 2026-03-11 18:55:28 You'd underestimate how many docker images / pipelines there are that do `FROM alpine:; RUN apk add ....` 2026-03-11 18:56:58 that's a valid argument, we don't want to break user space. But if we publicly announced a policy change, in tandem migrated what we could to archive (if they'll help), and requested additional space on the master mirror then perhaps that change in policy would be less impactful 2026-03-11 18:57:16 I'm assuming a lot, we can't affect the technical decisions of downstream consumers. 2026-03-11 18:57:20 yes, but that would probably long-term, not short-term 2026-03-11 18:58:21 agreed, and if the only thing we get is a short term fix we'll be talking about not being able to release 3.25+ or similar, which is what I worry will happen. 2026-03-11 18:59:25 there's likely not silver bullet or magic solution, but if each snapshot of edge is going to take up 0.6TB+ going forward then policy has to change somewhere. 2026-03-11 19:00:18 I guess what's unclear to me, and probably other contributors, what is the greater good? preserving the old version history as semi-warm, or mass pruning packages from the repos to refocus the distro? 2026-03-11 19:00:49 or something else, I haven't sat with the problem long enough to definitively state these are our options/ 2026-03-11 19:05:20 is the storage the one that is expensive, or traffic? storage is dirt cheap itself, e.g. you get 1tb for 3.2 euro/month, or 20tb for ~40 euros/month. traffic on the other hand is massively expensive especially with our scale 2026-03-11 19:05:35 (that is on hetzner as a example) 2026-03-11 19:05:41 For us, it's the opposite 2026-03-11 19:05:46 We got plenty of traffic 2026-03-11 19:05:50 But storage is limited 2026-03-11 19:06:51 interesting, i wouldve expected not such a tight storage limitation. really makes me wonder 2026-03-11 19:06:57 And for the record, at the moment, everything is sponsored 2026-03-11 19:07:04 yeah 2026-03-11 19:07:13 achill: what we get is mostly bare-metal 2026-03-11 19:07:44 Not VMs with network-backed storage that can infinitely expand 2026-03-11 19:08:07 hm 2026-03-11 19:08:44 but if it's bare metal could we funnel some of the donations into additional storage then? 2026-03-11 19:08:49 but also a few tbs hdd storage are not that expensive 2026-03-11 19:08:51 durrendal: possibly 2026-03-11 19:09:04 achill: we need more than a few TB 2026-03-11 19:09:05 durrendal: that would be good yea 2026-03-11 19:09:23 I'd be happy to know that my donations bought storage that allowed us to release 3.24 for what it's worth. I imagine many would feel the same 2026-03-11 19:09:41 the donations are made for exactly this purpose in mind afaik 2026-03-11 19:09:47 Yes 2026-03-11 19:09:57 but i dont know how to funnel this money into the actual storage servers 2026-03-11 19:10:26 ack, that's the tricky situation. We would at least need to discuss that with the sponsors 2026-03-11 19:10:35 yeah yeah makes sense 2026-03-11 19:10:43 thank you for being on top of this stuff! :3 2026-03-11 19:10:58 this is really great 2026-03-11 19:11:13 indeed, will second that, and for taking the time to discuss it with those of us that are interested in understanding the problem and trying to help solve it 2026-03-11 19:11:39 I also appreciate you brainstorming along 2026-03-11 19:11:46 This has been in my mind for quite some time already 2026-03-11 19:12:16 you got me thinking about it the last time we talked, I've just had bandwidth issues on my end due to $work 2026-03-11 19:12:26 yeah, we're all busy 2026-03-11 19:13:08 ain't that the truth, which makes all of this all the more meaningful to keep going, we all care enough to do it despite that 2026-03-11 19:15:48 I don't have the level of expertise of most of you, but one thing I have acquired is time. I am now a pensioner. If some don't mind doing some occasional hand holding, I am willing and interested in helping. 2026-03-11 19:17:13 ikke: i don't suppose there is a document with current infra design? 2026-03-11 19:17:13 jvvv: I'm certainly willing to do 2026-03-11 19:17:17 or is it fine to pester you for answers :> 2026-03-11 19:17:54 There'se a lot I have to document.. 2026-03-11 19:17:59 Things changing all the time does not help 2026-03-11 19:18:15 pj: anything in particular you wonder about? 2026-03-11 19:18:22 I know that too well 2026-03-11 19:19:43 I'm not sure, at least not yet. I have like some ideas: i.e noarch could be scheduled on the most powerful arch builders 2026-03-11 19:22:02 about the racing stuff, I'm feeling that there should be some kind of locking/cancelling 2026-03-11 19:22:31 That requires some orchestrating / centralized services 2026-03-11 19:22:36 I have to write this down in my notes and then do some scribbles to see how that would work 2026-03-11 19:22:41 each builder is now completely independent 2026-03-11 19:23:11 (read, each arch) 2026-03-11 19:23:38 When a new commit is pushed, each builder can start building it right away 2026-03-11 19:23:41 right but there is a centralised pipe of information for builds, iirc mqtt 2026-03-11 19:24:01 is it mqtt that schedules builds or each builder looks at git 2026-03-11 19:24:16 each git push is anounced over mqtt 2026-03-11 19:24:27 and then the builders pick that up and start building 2026-03-11 19:24:58 So triggering would be a better word 2026-03-11 19:26:41 I'll see if I'm awake enough to ponder about this later 2026-03-11 19:27:42 https://gitlab.alpinelinux.org/alpine/tsc/-/issues/82 2026-03-11 19:29:30 I know about this issue but I remember it didn't went anywhere and I'm hoping to not redesign anything 2026-03-11 19:29:33 just do small patch :> 2026-03-11 19:30:32 > The builders need to have all packages locally, resulting in storage issues. Sometimes it's possible to get extra disk storage, but that's not the case for all architectures 2026-03-11 19:31:02 I think that could be solved with rootbld? 2026-03-11 19:31:15 but I guess rootbld hasn't been tested (and I doubt it works flawlessly) 2026-03-11 19:31:16 No, rootbld solves a different problem 2026-03-11 19:31:35 I mean rootbld would pull the packages on demand no? 2026-03-11 19:31:45 The builders could do that as well 2026-03-11 19:31:56 without rootbld* 2026-03-11 19:32:11 The builders are now the canonical source of the repo 2026-03-11 19:32:29 they rsync it to the master mirror, which then gets distributed 2026-03-11 19:32:40 I guess that's true but it isn't really explained why it needs all packages 2026-03-11 19:32:50 or is it because builders just churn through everything and upload on end? 2026-03-11 19:32:57 Yes 2026-03-11 19:33:06 and rsync --delete to the master mirror 2026-03-11 19:33:15 So they need a full snapshot of the repo 2026-03-11 19:33:24 ok, i see now 2026-03-11 19:34:54 But rootbld on lxc needs some special config 2026-03-11 19:35:19 What is solves is some persistent state problems that the builders now have 2026-03-11 19:38:21 have you explored already the existing solutions then? 2026-03-11 19:38:58 Not really yet 2026-03-11 19:39:34 Someone making a list (with references) would be helpful 2026-03-11 19:40:14 i meant the solutions listed in tsc issue 2026-03-11 19:41:00 I did experiment with buildbot in the past 2026-03-11 19:41:48 The rest are very generic and would require some additional layers to glue everything together 2026-03-11 19:41:52 like we would need to run actual PoC and try something out to see if it works well 2026-03-11 19:42:45 I'm not even looking at NATS, it's too complex 2026-03-11 19:42:45 We need some orchestration that ensures the built packages remain consistent (if we do distributed builds) 2026-03-11 19:44:13 if I push an update to a package, and realize I forgot to bump pkgrels, so push that 1 minute later, those rebuilds should happen only after the first package has been built, and the rebuilds should have access to the first package 2026-03-11 19:44:26 (just an made up scenario) 2026-03-11 19:45:16 Right now, because there is just a single builder, it will first finish the building of the package, then start the rebuilds, and because the built package is local, the rebuilds immediately have access to that package 2026-03-11 19:46:16 So, I think that's the first question we need to answer, before looking at software that we may use to implement it 2026-03-11 19:48:01 I think that can influence the solution to choose but I also think we don't need to concern ourselves with distributed building (unless you want noarch to be distributed?) 2026-03-11 19:48:03 for now 2026-03-11 19:48:37 sounds like we need a state machine 2026-03-11 19:49:12 I'm type of person that always wants perfect world but if we try hunting for perfect thing, we might never solve any of the issues :>  2026-03-11 19:49:21 yeah 2026-03-11 19:49:52 If we don't want distributed builds, it would probably be best to stay with the current architecture and try to improve the painpoints 2026-03-11 19:50:15 But for architectures like riscv64, there's a lot to win there 2026-03-11 19:50:23 (But also a high price to pay) 2026-03-11 19:51:08 Is it because of smaller builders? 2026-03-11 19:51:13 or do you have just multiple same machines 2026-03-11 19:51:22 combination of both 2026-03-11 19:51:30 the current hardware is slow 2026-03-11 19:51:51 so distributing it over multiple machines could potentially improve things 2026-03-11 19:51:56 one concern is that we generally shouldn't schedule things like LLVM on weakest machines 2026-03-11 19:52:34 mmmm, distcc would go nice 2026-03-11 19:52:56 but that would only help with packages built with c(++), right? 2026-03-11 19:53:08 (it wasn't serious suggestion) 2026-03-11 19:53:08 go, rust etc would not benefit from it? 2026-03-11 19:53:14 heh 2026-03-11 19:53:41 For CI, I already have a system to add an additional tag to make sure the projects are only schduled on larger instances 2026-03-11 19:53:50 well, if there isn't any clash with CPU flash across those boards 2026-03-11 19:53:51 it could technically be used 2026-03-11 19:53:58 s/flash/flags/ 2026-03-11 19:54:36 right but that is per gitlab project? 2026-03-11 19:54:54 Yes, just for aports 2026-03-11 19:55:02 or is it done as a pre-job to figure out which package is built and uses dynamic pipeline 2026-03-11 19:55:10 that, yes 2026-03-11 19:55:31 https://gitlab.alpinelinux.org/alpine/aports/-/blob/master/.gitlab/job-config.yaml 2026-03-11 19:56:04 nice 2026-03-11 19:56:36 ok, I have to eat something and get some juices for the brain to function 2026-03-11 19:56:48 o/ 2026-03-11 19:59:26 So one thing that APKv3 was mentioned to help with is that it could do remote signing 2026-03-11 20:29:45 Number of requests per repo on a single mirror server over a period of ~24h https://tpaste.us/xD01 2026-03-11 20:30:28 And that's not even taking cdn caching into account 2026-03-11 21:17:11 The alternative is manually backporting the security fixes, but that's also not without risk 2026-03-11 21:17:16 wrong channel 2026-03-11 21:34:29 Wow half as much volume on 3.14 as latest stable is baffling, not at all what I expected. I thought it'd be 25% at most by comparison 2026-03-11 21:35:01 Like I said, you would understimate it :P 2026-03-11 21:38:21 Hahaha you weren't wrong! 2026-03-12 16:58:05 The build for py3-lmdb on the loongarch64 builder segfaulted in check(). AFAICT it hasn't tried to rebuild it. Could it have crashed the builder? 2026-03-12 17:00:11 jvvv: it tried it over and over and over and over 2026-03-12 17:00:52 excerpt: https://tpaste.us/vo0o 2026-03-12 17:12:41 Oh, wow. I think I better ask for advice on the loongarch64 channen 2026-03-12 17:12:48 *channel 2026-03-12 17:14:47 That would be helpful 2026-03-12 18:05:10 I've pushed !98926 to address that test. If I can find a fix, can add a bump. but at least this should help it to not backup the builder 2026-03-12 18:11:42 jvvv: thanks! 2026-03-12 18:14:07 Welcome 2026-03-12 18:16:04 Packetloss / flapping 2026-03-13 10:33:35 can't get this to pass w/o getting "OOMKilled": https://gitlab.alpinelinux.org/dne/aports/-/jobs/2257614 2026-03-13 10:33:58 (and on 32-bit x86 the threading test never completes…) 2026-03-14 03:54:49 Hi all! check this out, An Unknown Artist - https://www.anunknownartist.com/ 2026-03-14 09:01:18 i'm confused about build-edge-riscv64 uploads. go-1.26.1-r1 was built there on march 10th according to the logs timestamp but not available on mirrors or visible on pkgs.alpinelinux.org 2026-03-14 12:05:23 lotheac: it only uploads once the entire repository is finished, which has not happened yet 2026-03-14 12:12:18 ah i see 2026-03-14 12:12:31 so build failures can block uploads of other stuff? 2026-03-14 12:13:14 yes 2026-03-14 12:13:48 thanks for the explanation 2026-03-15 05:44:31 ikke: ERROR: Job failed (system failure): error dialing backend: dial tcp 192.168.40.103:10250: i/o timeout https://gitlab.alpinelinux.org/alpine/aports/-/jobs/2260358 2026-03-15 05:45:00 10250 is the kubelet port (used for eg. getting logs or starting exec's on pods) 2026-03-15 05:50:23 lotheac: the node itself appears to be healthy 2026-03-15 05:50:44 can you connect to 10250 on it from another node? 2026-03-15 05:51:55 192.168.40.103 (192.168.40.103:10250) open 2026-03-15 05:52:28 curl -k https://192.168.40.103:10250 2026-03-15 05:52:30 404 page not found 2026-03-15 05:52:34 So, yes 2026-03-15 05:53:16 well that's weird 2026-03-15 05:54:42 clearly the gitlab executor is able to connect to the api server (otherwise it would not be able to get statuses of pods or create them), but then it fails execing into the pod (which would be a tcp connection initiated from the api server host to the target kubelet) 2026-03-15 05:55:44 so... does opening a tcp connection to that address work from all control plane nodes? 2026-03-15 05:55:59 Not to that address 2026-03-15 05:57:12 That's a private vlan only reachable from the nodes within that network 2026-03-15 05:57:28 if that's the case then "kubelet logs pod/whatever" should also fail (for a pod running on that host) 2026-03-15 05:57:58 and it would mean that the host's internal node ip is wrong if that address is not connectable from the rest of the cluster 2026-03-15 05:59:07 Interestingly enough it does work on the other nodes, which use the same network as internalIP 2026-03-15 05:59:39 really? sounds like it should not :D 2026-03-15 06:00:47 https://tpaste.us/wa0X 2026-03-15 06:02:03 so .102 and .104 tcp/10250 *are* connectable from cplane nodes? or? 2026-03-15 06:03:08 No, it would not 2026-03-15 06:03:21 I mean, the nodes are able to accept jobs 2026-03-15 06:03:23 and they pass 2026-03-15 06:03:26 ah, i see 2026-03-15 06:03:51 that doesn't make much sense either though... afaiu the gitlab kube executor always uses exec 2026-03-15 06:04:07 and connectivity to internalIP:10250 is required for exec as well as kubectl logs 2026-03-15 06:05:06 I should've probably explicitly specified what interface to use 2026-03-15 06:05:17 But strange that it works 2026-03-15 06:05:48 yeah, i can't explain why it would work :D 2026-03-15 06:06:24 I have a wg tunnel to each of the nodes 2026-03-15 06:06:53 from? 2026-03-15 06:07:10 dmvpn <-> wg concentrator <-> nodes 2026-03-15 06:07:44 routed using those same addresses? 2026-03-15 06:07:49 ah i gotta go afk for a bit anyway 2026-03-15 06:07:56 not the internal ip obviously 2026-03-15 06:08:04 internal lan subnet 2026-03-15 06:08:32 lotheac: alright, thanks 2026-03-15 06:25:12 anyway i think the correct thing to do is rejoin those nodes to the cluster with internalIPs that are reachable from the control plane (or even all other nodes) 2026-03-15 06:27:51 maybe it's not a requirement that we understand why .102 and .104 are working without that :D 2026-03-15 06:34:25 i guess it's this option https://github.com/k0sproject/k0sctl?tab=readme-ov-file#spechostsprivateinterface-string-optional-default- 2026-03-15 06:36:10 yes 2026-03-15 06:36:17 I've already used it before 2026-03-15 06:36:58 Let me start with node3 2026-03-15 06:40:37 you could kubectl drain them before you make changes if you want to minimize disruption i guess, assuming that the gitlab executor pods have appropriate disruption budgets that prevent running jobs from being evicted 2026-03-15 06:47:08 I already cordonned the node 2026-03-15 06:47:27 alright 2026-03-15 06:48:29 usa11-ci-3.alpinelinux.org Ready,SchedulingDisabled 25d v1.33.3+k0s 172.16.252.37 2026-03-15 06:49:05 Hmm, apparently it can update it 2026-03-15 06:49:31 ah, cool, i had assumed it would be immutable for the node obj 2026-03-15 06:49:46 not that it's much of a difference to have to delete and rejoin it anyways though 2026-03-15 06:52:37 It does reinstall k0s as it calls it, but the running pods are not interrupted 2026-03-15 06:52:47 (the daemonset pods were and are still running on the node) 2026-03-15 06:53:05 yeah, those are handled by the container runtime 2026-03-15 06:53:16 right 2026-03-15 06:53:30 But it did not do a complete reset or anything like that 2026-03-15 06:53:50 right, it might have recreated the local containers after rejoining 2026-03-15 06:55:36 tarted: Thu, 12 Mar 2026 00:06:15 +0100 2026-03-15 06:55:48 Container has kept running 2026-03-15 06:56:16 i believe you, i meant to say "it might have ... [but clearly didn't]" 2026-03-15 06:56:54 thanks for handling it 2026-03-15 07:16:00 I've done 2 nodes, the last one is building a long-running job that I don't want to interrupt 2026-03-15 07:16:38 you could leave it cordoned and do it later 2026-03-15 07:20:40 yes 2026-03-15 07:20:49 That's exactly what I did :) 2026-03-15 07:21:34 _b 2026-03-15 07:29:52 I have a node where calico is not working correctly after reboot. ip -d link show dev vxlan.calico shows that it uses the wrong external interface 2026-03-15 07:32:13 i don't use calico on my clusters, but i image it would be configurable in some fashion to help it pick the right interfaces. in cilium that can be done with a crd CiliumNodeConfig 2026-03-15 07:33:35 maybe the Node object https://docs.tigera.io/calico/latest/networking/ipam/ip-autodetection#autodetecting-node-ip-address-and-subnet 2026-03-15 08:00:51 Trying ipAutodetectionMethod: "kubernetes-internal-ip" in the calico config 2026-03-15 11:47:21 is CI having DNS isssues? 2026-03-15 12:25:38 why? did you get an error about resolving something? 2026-03-15 12:27:21 I may have broken something 😔 2026-03-15 12:54:32 fatal: unable to access 'https://gitlab.alpinelinux.org/alpine/aports.git/': Could not resolve host: gitlab.alpinelinux.org (Timeout while contacting DNS servers) 2026-03-15 13:24:45 during generate-build-jobs 2026-03-15 13:42:06 gitlab also feels extra sluggish, but it could be my connection 2026-03-15 17:06:46 omni - i'm getting the same 2026-03-15 19:49:06 I think it should be fixed now 2026-03-15 19:49:10 Let me know if you still face issues 2026-03-15 19:58:23 lotheac: I had to make sure every node consistently used the internal ip address, which I did now 2026-03-15 20:16:09 I see the latest go package for riscv64 is now also available 2026-03-15 20:19:00 in lint runner: ERROR: Preparation failed: starting pod watcher: not synced: *v1.Pod 2026-03-15 20:19:25 should the pipeline be restarted? only retried just lint 2026-03-15 20:19:36 the rest of the pipeline was fine 2026-03-15 20:19:59 https://gitlab.alpinelinux.org/alpine/aports/-/jobs/2260904 2026-03-15 20:25:57 retried the job, and it works. Not sure what the problem is/was 2026-03-15 20:26:26 okay, thanks! 2026-03-16 00:43:31 ikke: thanks 2026-03-16 11:10:44 ikke: reminder for https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/96922 2026-03-16 20:22:58 Sertonix[m]: Do we think we can keep maintaining it even though upstream is dropping 32-bit support? 2026-03-16 20:25:53 The only thing dropped by upstream was optimizations on x86 for openjdk25. Before that I don't think there was anything dropped by upstream, they just never tested musl libc. 2026-03-16 20:26:24 *never test musl libc on 32-bit systems 2026-03-17 09:11:57 is build-edge-x86_64 stuck? 2026-03-17 09:26:31 No 2026-03-17 09:45:16 oh, ok, it just hasn't uploaded anything the past few hours, perhaps the duckdb build (with failure) takes a lot of time 2026-03-17 09:49:48 It's building rust atm 2026-03-17 09:50:03 And build.a.o shows that 2026-03-17 09:56:33 ikke: i need to kick bld2? 2026-03-17 09:57:20 Yes please 2026-03-17 10:19:19 ikke: it was building duckdb when I asked 2026-03-17 10:19:42 Understood, it just started building rust when I looked 2026-03-17 10:20:05 yeah, and it then failed with OOM =( 2026-03-17 10:20:29 probably because I thought I could go back to default number of jobs, as it worked in CI 2026-03-17 10:20:31 Somethings not healthy 2026-03-17 10:21:28 it's building rustc_driver in stage2 that it seems to have the same issue aarch64 had, before I switched to the default thing-local lto for it 2026-03-17 10:21:43 Is it stack exhaustion? 2026-03-17 10:23:14 how much memory per core (or thread) is available on the aarch64 and x86_64 builders? 2026-03-17 10:24:44 Roughly 3G and 5G respectively 2026-03-17 10:42:14 how is it on loongarch64, ppc64le and s390x? where we seem to not have the issue 2026-03-17 10:44:47 On ppc64le it's 2G per core 2026-03-17 10:45:17 Similar on s390x 2026-03-17 10:45:45 Loongarch is 8GB 2026-03-17 10:45:54 But I have not seen these machines run low on memory 2026-03-17 10:48:37 no.. =/ 2026-03-17 10:49:24 anyway, !99162 if it continues to fail 2026-03-17 10:50:27 thin-local is the default lto but I think we want to use thin if we can 2026-03-17 10:50:38 https://nnethercote.github.io/perf-book/build-configuration.html#link-time-optimization 2026-03-17 10:53:17 the alternative seems to be to lower the number of build jobs 2026-03-17 10:55:29 But why is it not an issue for s390x and ppc64le, which have the least amount of memory available per job 2026-03-17 11:02:51 probably releated to the uefi target, only enabled on aarch64 & x86_64, the comment for lowering number of jobs said as much 2026-03-17 11:03:24 but I went with thin-local lto for aarch64 instead of lowering the number of jobs even more 2026-03-17 11:04:02 I haven't tried to build arm* with thin lto, perhaps that's no longer an issue 2026-03-17 11:45:12 clandmeter: thanks 2026-03-17 21:48:05 Is something broken with the ipv6 config of irclogs.alpinelinux.org? 'curl -6 https://irclogs.alpinelinux.org' never returns 2026-03-17 21:52:43 *timeouts 2026-03-18 08:00:30 Sertonix[m]: I'll check 2026-03-18 11:34:10 Sertonix[m]: it points to the wrong address, I'll fix it 2026-03-18 11:55:49 Sertonix[m]: https://gitlab.alpinelinux.org/alpine/infra/linode-tf/-/merge_requests/117 2026-03-18 12:29:20 Sertonix[m]: should be fixed now (once everything has propagated) 2026-03-18 12:30:21 Yes, thanks! 2026-03-18 16:17:39 load is high 2026-03-18 19:47:34 Sertonix[m]: trying to bootstrap openjdk11 on x86. The abuild-setup-cross script fails because CBUILDROOT is not set. It would require $CHOST != $CTARGET, but when function.sh is sourced, that's not the case 2026-03-18 19:48:04 It's apparently overwritten 2026-03-18 19:49:16 Am I missing something? 2026-03-18 19:49:36 I'm running ash `abuild-setup-cross x86 ./aports` like mentioned in the MR 2026-03-18 19:51:32 ah, my bad 2026-03-18 19:52:41 I didn't run the commands exactly like written in the MR so mistakes are possible 2026-03-18 19:53:21 I missed one crucial detail :P 2026-03-18 20:06:54 Ok, build finished 2026-03-18 20:08:08 \o/ 2026-03-18 20:10:59 I guess the next step is to use https://dev.alpinelinux.org/~kevin/openjdk11-bootstrap/community/x86/ to bootstrap it on the builder? 2026-03-18 20:14:01 Would it make sense to merge !96922 and then use the built packages to bootstrap it? 2026-03-18 20:15:28 Sounds reasonable. I don't know what is possible on the builders 2026-03-18 20:15:42 Similar to how we bootstrap go and others 2026-03-18 20:17:26 abuild-apk add -X https://dev.alpinelinux.org/~kevin/openjdk11-bootstrap/community openjdk11-bootstrap 2026-03-18 20:17:46 That installs the -bootstrap package, then openjkd11 itself should be able to be built 2026-03-18 20:18:32 Sertonix[m]: it's building now :) 2026-03-18 20:22:59 Sertonix[m]: and finished 2026-03-18 20:42:57 Thanks! 2026-03-18 21:08:08 hi friends, I was wondering about the Alpine packaging infrastructure, are the packages built via cross compiling for different archs? 2026-03-18 21:35:08 no, native hosts 2026-03-18 21:43:41 ah, interesting 2026-03-18 21:43:57 For 32-bits, we use 64-bits host in 32-bit mode 2026-03-18 21:45:35 I'm new to package maintaining and was looking at submitting one that's not currently available, is the testing on different arches done post-submission in the testing repo? 2026-03-18 21:45:48 I should check it's not in testing actually 2026-03-18 21:46:26 When you create a merge request, it's already built in CI on all arches 2026-03-19 08:34:37 lotheac: I finally rebooted the bananapi runner 2026-03-19 13:28:41 thanks! i will test my theory and commit if it works 2026-03-19 13:31:14 .. tomorrow, after gitlab is no longer down 2026-03-19 13:32:04 It should be back in a sec 2026-03-19 13:32:34 you're the best ikke <3 2026-03-19 13:32:46 It's a planned upgrade :) 2026-03-19 13:33:01 doesn't make you any less than the best 2026-03-19 13:33:59 :_ 2026-03-19 13:34:01 :) 2026-03-19 13:34:03 It's booting again 2026-03-19 13:35:20 it's alive! 2026-03-19 13:35:21 thank you 2026-03-19 13:53:44 i guess it will have to be tomorrow though, because i'm not sober enough for this: 2026-03-19 13:53:45 cmd/dist 2026-03-19 13:53:45 Building Go cmd/dist using /usr/lib/go. (go1.26.1 linux/riscv64) 2026-03-19 13:53:45 go tool dist: unknown $GOARCH unsupported 2026-03-19 13:53:45 >>> ERROR: go: build failed 2026-03-19 13:55:09 (on nor-ci-1, manual build in manual container) 2026-03-19 13:56:10 well, problems for future me. thanks ncopa :) 2026-03-19 16:01:23 can i give alpine matrix rooms alpine logo 2026-03-19 16:01:51 i sometimes lose them between all other IRC rooms :> 2026-03-19 16:22:26 Funny, just as I upgrade gitlab today, they release a new version 2026-03-19 16:22:32 pj: I think so 2026-03-19 19:13:14 ikke: is the build.sh used in CI in the CI image? 2026-03-19 19:13:22 yes 2026-03-19 19:13:41 https://gitlab.alpinelinux.org/alpine/infra/docker/alpine-gitlab-ci/-/blob/master/overlay/usr/local/bin/build.sh 2026-03-19 20:22:44 anarcat shared how to fix the "work items" situation in the new gitlab. https://gitlab.com/gitlab-org/gitlab/-/work_items/590689#note_3175494866 2026-03-19 20:22:58 that change is going over like a lead balloon 2026-03-19 21:20:05 It mentions gitlab-ctl reconfigure, but we're not using the omnibus installation 2026-03-26 16:22:23 lotheac: I'll discuss with carlo if I can add you as a developer to the infra group, then you no longer need to use forks 2026-03-26 16:22:30 cheers 2026-03-26 16:22:49 We don't have any instance runners for building docker images 2026-03-26 16:22:58 (on purpose) 2026-03-26 16:23:06 that makes sense 2026-03-26 16:23:11 so I need to explicitly assign runners to the fork 2026-03-26 16:23:19 Which I did 2026-03-26 16:23:48 But by using $CI_REGISTRY_IMAGE, it would require the image to build the images to be present in the fork 2026-03-26 16:24:30 https://gitlab.alpinelinux.org/alpine/infra/docker/exec/-/jobs/2276008 2026-03-26 16:24:47 right. i guess it could have been bootstrapped in the fork separately but that doesn't feel productive :p 2026-03-26 16:25:25 No 2026-03-26 16:25:47 Would be a lot of busy work :) 2026-03-26 16:26:42 https://gitlab.alpinelinux.org/alpine/infra/docker/exec/container_registry/59 2026-03-26 16:26:52 it's alive :) 2026-03-26 16:27:48 Thanks, I'll try to test it out when I have time 2026-03-26 16:27:56 np, thanks for merging 2026-03-26 16:28:25 i didn't really do any perf testing, it's possible it's slower than docker 2026-03-26 16:28:41 but the tradeoff is less privs 2026-03-26 16:28:46 yeah 2026-03-26 16:28:58 The only case where it matters somewhat is the gitlab image 2026-03-26 16:30:48 if it does become an issue it *is* possible to run docker in kube too. might be slightly annoying to set up though 2026-03-26 16:31:05 like this https://gitlab.alpinelinux.org/alpine/infra/docker/exec/-/issues/1#note_585276 2026-03-26 16:32:08 In general I just push it and wait for it to be finished, but if there is some urgen thing to fix (security or bug) it can be anoying if it takes too loiing 2026-03-28 13:00:43 could the build-edge-ppc64le and build-edge-s390x builds be restarted please? unfortunately the test is hanging 2026-03-28 13:01:57 done 2026-03-28 13:03:48 thanks! sorry, x86 as well? 2026-03-28 13:07:21 done 2026-03-28 13:07:21 thanks 2026-03-28 13:07:46 thank you, should be able to pass on retry 2026-03-29 21:01:36 ikke: ^ 2026-03-29 21:01:47 clandmeter: yes, I noticed, Thanks! 2026-03-30 14:50:40 can someone restart the build on build-edge-x86 please? 2026-03-30 15:11:01 thanks 2026-03-30 15:11:01 done 2026-03-30 15:11:07 thanks! 2026-03-31 07:29:31 is it an infrastructure issue that !98913 is failing on x86* ? 2026-03-31 07:52:47 Technically it's a project using excessive memory :P 2026-03-31 13:22:51 so, large build hosts for that too? 2026-03-31 13:59:55 clandmeter: thanks