2023-03-01 09:57:50 just got an AL phishing email 2023-03-01 10:40:22 Targeted? 2023-03-01 11:18:41 if it's like the ones i've gotten, probably not 2023-03-01 11:22:53 arm devs lxc still not booted? 2023-03-01 11:24:10 mps-edge-aarch64 RUNNING 1 - 172.16.27.110 2023-03-01 11:24:12 mps-edge-armv7 RUNNING 1 - 172.16.27.111 2023-03-01 11:24:35 hmm, I can't ping not connect 2023-03-01 11:24:41 see ^^ 2023-03-01 11:25:38 dmvpn is disconnected 2023-03-01 11:25:47 ah 2023-03-01 11:26:37 ipv4 latency is sky high 2023-03-01 11:28:14 where is the problem? infra in general or ungleich? 2023-03-01 11:28:59 The router these server is connected to routes everything via a wg tunnel 2023-03-01 11:29:19 The performance seems to be wanting 2023-03-01 11:31:00 hm, strange that even one ping can't pass 2023-03-01 11:34:21 mps: can you try now? 2023-03-01 11:34:42 it works now 2023-03-01 11:34:50 dmvpn is up 2023-03-01 11:35:18 ikke: thanks 2023-03-01 11:46:24 Changed the triggers a bit to make sure only vigir23 alerts if there are network issues that affects all 2023-03-01 11:53:37 psykose: psykose-edge-armhf.che-bld-1.alpin.pw 2023-03-01 11:54:23 thanks 2023-03-01 11:54:34 i forgot to set up wg on my laptop 2023-03-01 11:54:35 gr 2023-03-01 11:56:53 If we need to add a new pubkey, let us know 2023-03-01 14:35:14 ikke: if you have the time, could you add 647+s8Hp2CDsoW30kXiemudXtmXp6v9cG02ozsYLu20= and walk me thru the setup again 2023-03-01 14:35:17 kinda forgot every step 2023-03-01 14:35:35 The infra repo wiki has the instructions 2023-03-01 14:37:41 sure 2023-03-01 14:37:43 gimme ip 2023-03-01 15:43:35 arm lxcs are now near but network is terribly slow 2023-03-01 16:17:35 mps: the issue appears to be that the bandwidth is limited, so anytime something uses a lot of bandwidth, latency / packetloss increase 2023-03-01 16:19:11 could someone add some priorities by type of traffic, i.e. fast for interactive (ssh) and lower for bulk (http) 2023-03-01 16:22:27 Don't see any QoS options 2023-03-01 16:23:23 if you have access to router (or host) you can set QoS 2023-03-01 17:03:00 mps: need to figure out how to do it on openwrt 2023-03-01 17:03:10 without conflicting with the config system 2023-03-01 17:03:16 (luci) 2023-03-01 17:17:03 ikke: ah, never worked with luci on openwrt, first thing I do when installing openwrt is remove luci 2023-03-01 17:19:27 thanks to alpine I don't need openwrt on most of my devices, only have one wifi AP with mips SoC so alpine doesn't work on it 2023-03-01 17:25:13 https://lartc.org/lartc.txt have example of prioritizing interactive traffic 2023-03-01 17:41:02 psykose: I assume you still need help with vpn? 2023-03-01 17:41:17 you have to make an ip for me 2023-03-01 17:41:25 yes 2023-03-01 17:41:32 Was just about to give you one 2023-03-01 17:42:26 psykose: 172.16.252.17/24 2023-03-01 17:43:20 oh wait 2023-03-01 17:43:26 clandmeter already took that one 2023-03-01 17:43:49 /24 sounds big :p 2023-03-01 17:44:01 how so? 2023-03-01 17:44:04 19 is free as I see 2023-03-01 17:44:08 mps: yes 2023-03-01 17:44:50 172.16.252.19/24 2023-03-01 17:45:08 psykose: /24 just specifies the subnet length, it doesn't give you the entire /24 2023-03-01 17:45:25 But you are right, these should be /32 2023-03-01 17:45:34 ah right 2023-03-01 17:45:34 thanks 2023-03-01 17:45:40 address should be /32 and allowed network /24 2023-03-01 17:46:10 actually allowed network could be /16 2023-03-01 17:47:07 hm 2023-03-01 17:47:16 ikke: if you didn't noticed I added one ipv6 address for me to testing how hub works with ipv6 2023-03-01 17:47:31 mps: I saw you mentioning it 2023-03-01 17:48:45 did you add 647+s8Hp2CDsoW30kXiemudXtmXp6v9cG02ozsYLu20= as an allowed key? 2023-03-01 17:49:43 Not yet, I did not know your public key 2023-03-01 17:51:03 Now I have 2023-03-01 17:51:27 And I see it's online? 2023-03-01 17:52:06 ikke: would be nice to add blank line above [Peer], should I or you 2023-03-01 17:52:38 added it, normally it's already there 2023-03-01 17:52:55 yes, I see you are in hurry 2023-03-01 17:53:20 I mean, I use add-wg-user.sh >>/etc/wireguard/wg0.conf 2023-03-01 17:53:34 ah 2023-03-01 17:53:38 That script adds the whitespace below it, so the next one is nicely separated 2023-03-01 17:54:11 ping works 2023-03-01 17:54:18 hmm, yes, I forgot about this script ;-) 2023-03-01 17:54:55 thanks 2023-03-01 17:55:00 which ip was the dns again 2023-03-01 17:55:09 172.16.6.3 2023-03-01 17:55:17 (somehow I felt the question comming) 2023-03-01 17:55:39 hm 2023-03-01 17:55:42 why can't i ping that 2023-03-01 17:55:51 Did you route 172.16.0.0/16 to wg? 2023-03-01 17:56:02 good point 2023-03-01 17:56:24 yes, with AllowedIPs 2023-03-01 17:56:29 unless there was more 2023-03-01 17:56:33 there was more yeah 2023-03-01 17:56:50 AllowedIPs is just an ACL 2023-03-01 17:56:56 It does not add routes 2023-03-01 17:57:07 mps: what was it that you used again? 2023-03-01 17:57:29 ikke: don't understand question 2023-03-01 17:57:29 (I do ip route add 172.16.0.0/16 via 172.16.252.1) 2023-03-01 17:57:43 mps: to route dmvpn traffic via wg 2023-03-01 17:58:06 works 2023-03-01 17:58:20 ah, wg-quick add routes from AllowedIPs 2023-03-01 17:58:20 mps had another trick 2023-03-01 17:59:07 'AllowedIPs = 172.16.0.0/16' is enough 2023-03-01 17:59:40 and you get '172.16.0.0/16 dev qwg0 scope link' in routing table 2023-03-01 17:59:57 ah, that's what I meant, yes 2023-03-01 18:00:16 yeah, that would work 2023-03-01 18:00:21 just make it the entire subnet 2023-03-01 18:00:28 also, with wg-quick DNS param works 2023-03-01 18:00:40 I do dns via dnsmasq 2023-03-01 18:01:09 server=/alpin.pw/172.16.6.3/ 2023-03-01 18:01:22 yes, dnsmasq is better for fine-grained setup. but for simple DNS in conf is enough 2023-03-01 18:10:56 ikke: ok, everything works, except that i cannot dns resolve anything on che-bld-1 2023-03-01 18:11:07 so none of the armhf/armv7/aarch64 containers are connectable 2023-03-01 18:11:26 i.e. dig @172.16.6.3 psykose-edge-armv7.che-bld-1.alpin.pw => fail 2023-03-01 18:13:22 psykose: dig works for me 2023-03-01 18:13:34 yeah, so weird that i can't reach it 2023-03-01 18:13:40 the other containers work 2023-03-01 18:13:45 and resolve that 2023-03-01 18:14:05 on nld-dev1 2023-03-01 18:14:18 same for the deu-* stuff 2023-03-01 18:14:50 psykose: probably your new record is not yet added to DNS server 2023-03-01 18:14:55 ah 2023-03-01 18:14:58 well 2023-03-01 18:15:06 yeah, they worked with old pre-move name 2023-03-01 18:15:09 guess that's true 2023-03-01 18:15:15 you can ping ip address I guess 2023-03-01 18:43:05 psykose: does it work if you target 172.16.27.1 directly for dns? 2023-03-01 18:43:43 yes 2023-03-01 18:44:26 and does it work now via 172.16.6.3? 2023-03-01 18:44:53 nope 2023-03-01 18:44:55 hmm 2023-03-01 18:46:29 oh, still need to add che-bld-1 ns 2023-03-01 18:48:33 can you try now? 2023-03-01 19:02:19 nope 2023-03-01 19:02:23 still only works on .27.1 2023-03-01 19:20:13 https://tpaste.us/6MVW 2023-03-01 19:20:15 caching? 2023-03-01 19:45:41 now works 2023-03-01 19:45:45 so caching 2023-03-01 19:45:52 caching in .6.3 yeah 2023-03-01 19:46:05 I mean, I tested on 6.3 and from my local system 2023-03-01 19:46:34 as did i 2023-03-01 19:46:38 also asks me for a password 2023-03-01 19:47:31 Invalid user psykose 2023-03-01 19:47:44 I did not add a user, just root 2023-03-01 19:47:45 ah 2023-03-01 19:48:01 root also asks 2023-03-01 19:48:39 you use this key? https://gitlab.alpinelinux.org/psykose.keys 2023-03-01 19:53:29 yea 2023-03-01 21:17:09 ikke: linode announce 20% price increase 2023-03-01 21:17:50 yes, I heard 2023-03-01 21:18:00 specifically dedicated cpu 2023-03-01 21:29:10 clandmeter: if it would apply to all our lines, we would go from 465 per month to 558 2023-03-01 21:29:16 linodes 2023-03-01 21:31:53 Ok then let’s look at it when back 2023-03-02 17:35:22 people weren't kiddin bout this arm networking 2023-03-02 17:35:23 hot damn 2023-03-02 17:35:43 my mobile networking is faster :D 2023-03-02 17:46:00 ah right 2023-03-02 17:46:04 no ptrace in containers 2023-03-02 17:57:21 psykose: the issue is the wg connection on an openwrt router in front of it 2023-03-02 17:57:45 We seem to get at most ~80mbit out of it, and when it's full, packets drop / latency increases 2023-03-02 17:57:53 mhm 2023-03-02 17:58:06 according to telmich, the wg connection should be able to handle more 2023-03-02 17:59:47 load is low on the box, but connecting very sluggish 2023-03-02 17:59:54 (box == router) 2023-03-02 18:00:26 mhm 2023-03-02 18:15:58 rust doesn't even build itself on armhf :D 2023-03-02 18:22:32 heh 2023-03-02 18:22:41 not the first time something like that happens 2023-03-02 18:24:02 going to be pretty impossible to debug it with no ptrace 2023-03-02 18:24:40 I think I can give you ptrace 2023-03-02 18:25:38 need to restart your container 2023-03-02 18:26:00 can I do that? 2023-03-02 18:33:32 maybe network is misconfigured on this openwrt router 2023-03-02 18:40:35 mps: No clue how to find out 2023-03-02 18:44:29 ikke: do you have root access on it 2023-03-02 18:45:12 yes 2023-03-02 18:45:25 first thing is to look queuing 'discipline' 2023-03-02 18:45:43 ethtool? 2023-03-02 18:45:59 wg0: mtu 1420 qdisc noqueue state UNKNOWN qlen 1000 2023-03-02 18:46:11 no, ip a show $interface 2023-03-02 18:46:18 ^ 2023-03-02 18:46:24 so, noqueue 2023-03-02 18:47:46 and 'tc qdisc show' 2023-03-02 18:47:57 tc is not installed 2023-03-02 18:48:03 ah 2023-03-02 18:48:04 sure 2023-03-02 18:48:52 done 2023-03-02 18:49:48 tc qdisc show dev wg0 2023-03-02 18:50:24 installing tc-tiny 2023-03-02 18:50:47 disc noqueue 0: root refcnt 2 2023-03-02 18:52:04 could you try same command on underline interface (eth or something) 2023-03-02 18:53:30 wwan0: qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1500 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64 2023-03-02 18:53:33 hm, how is tc named in opkg search 2023-03-02 18:53:52 tc-tiny or tc-full 2023-03-02 18:55:25 ahm, above (fq_codel) is default set for openwrt 2023-03-02 18:59:32 you could check if it have any classification with 'tc class show dev wg0' 2023-03-02 19:00:05 returns empty 2023-03-02 19:00:10 same for wg0 2023-03-02 19:00:13 wwan0 2023-03-02 19:00:26 and 'tc filter show dev wg0' for filters 2023-03-02 19:00:47 aha, that means any special QoS is not set 2023-03-02 19:01:55 But QoS does not improve performance, only alleviates symptoms 2023-03-02 19:02:11 right 2023-03-02 19:02:43 but with it is possible to give some traffic priority 2023-03-02 19:23:22 more gitlab updates 2023-03-02 19:24:29 Haven't received an e-mail yet 2023-03-02 19:28:27 https://about.gitlab.com/releases/2023/03/02/security-release-gitlab-15-9-2-released/ 2023-03-02 21:11:47 ikke: did you restart it for ptrace 2023-03-02 21:11:57 yes 2023-03-02 21:13:07 still says not permitted 2023-03-02 21:13:33 can I restart it again? 2023-03-02 21:13:35 sure 2023-03-02 21:14:56 Can you try now? 2023-03-02 21:17:29 still not permitted 2023-03-02 21:17:31 just gdb -p 2023-03-02 21:17:50 ah 2023-03-02 21:17:51 works as root 2023-03-02 21:17:54 sure 2023-03-02 21:18:09 weird that all the processes are the same user but the same user can't, haha 2023-03-02 21:18:10 sure, works 2023-03-02 21:18:11 thanks 2023-03-02 21:19:00 ok 2023-03-02 21:19:11 this container doesn't work, but a qemu-armhf does work for building 2023-03-02 21:19:24 so, our builders and these containers in lxc are broken, but actual qemu is not 2023-03-02 21:19:37 so the compiler is "fine" and itself works i guess 2023-03-02 21:19:44 the errors are in the interaction here 2023-03-02 21:19:56 do you know if anything changed in the lxc hosting at all between the old and new? 2023-03-02 21:20:05 the old ci was broken too, but builders worked, now both don't 2023-03-02 21:20:11 so i assume whatever tiny thing changes is the difference 2023-03-02 21:20:17 would be a nice clue 2023-03-02 21:26:01 only difference I can imagine is that the old lxc containers ran on alpine 3.16 2023-03-02 21:26:03 3.15 2023-03-02 21:26:36 and corresponding kernel 2023-03-02 21:27:04 hrm 2023-03-02 21:27:07 yeah, that is a clue 2023-03-02 21:27:46 but, before, ci was still broken with whatever kernel 2023-03-02 21:27:51 now they both are after new stuff 2023-03-02 21:28:32 sadly gdb attach is not that useful 2023-03-02 21:28:35 unlucky 2023-03-02 21:42:22 armhf ci is 3.16 2023-03-02 21:44:04 the actual kernel versions do not differ that much 2023-03-02 21:44:20 5.15.85-0-lts on che-bld-1 2023-03-02 21:49:04 i downgraded to 3.15 and the compiler works 2023-03-02 21:49:33 so it _is_ something in the alpine version? 2023-03-02 21:50:23 if 3.16 fails i have a good hunch 2023-03-02 21:50:37 that being some musl difference in 1.2.2->1.2.3 2023-03-02 21:50:43 aside from the compiler itself being broken of course 2023-03-02 21:51:02 but why would the musl version of the host matter? 2023-03-02 21:51:27 you mean the container? i can't change anything on the host itself 2023-03-02 21:51:57 You say rust built fine on usa9 lxc armhf edge 2023-03-02 21:52:06 but now on che-bld-1 lxc armf edge, it hangs 2023-03-02 21:52:16 yea 2023-03-02 21:52:24 ah, i see what you mean 2023-03-02 21:52:54 i'd love to give faster responses to all this but every time i want to try something i have to ghosttype things into ssh and wait 10 minutes 2023-03-02 21:53:03 my 2010 internet experience was faster :p 2023-03-02 21:53:05 yeah, it's anoying 2023-03-02 21:53:22 psykose: push less things :P 2023-03-02 21:53:37 :) 2023-03-02 21:55:44 nope, 3.16 works too 2023-03-02 22:00:47 haha at the siskin thing 2023-03-02 22:01:57 Was just looking at it 2023-03-02 22:02:19 same thing as before but even worse, impressive 2023-03-02 22:03:28 3.17 also works 2023-03-02 22:03:51 i guess it really is just something broken in newest rust and or llvm 2023-03-02 22:06:55 psykose: what's wrong with siskin, the approach? 2023-03-02 22:07:24 it's the same curl bin > /usr/bin approach as before 2023-03-02 22:09:57 it is the same llvm in both and the same musl stuff 2023-03-02 22:10:10 i guess it really is just rust 1.64->1.67.1 that breaks something 2023-03-02 22:10:15 weird that it's only in lxc 2023-03-02 22:18:41 ok, hangs on just `rustc main.rs` for a hello world too 2023-03-02 22:19:31 psykose: where are they curling binaries? All I can see is that it's building things from source 2023-03-02 22:21:34 ah my mistake 2023-03-02 22:21:47 it does build something with the precompiled bootstrap tarball 2023-03-02 22:22:44 That tarbal contains c, reb, r3, h files 2023-03-02 22:22:53 aport from som files they use for testing 2023-03-02 22:22:58 apart* 2023-03-02 22:23:07 and that is an old 3.7.2 source-only version indeed 2023-03-02 22:24:02 So I suppose this could act as stage0 2023-03-02 22:24:28 But this works if they want to keep it like this 2023-03-02 22:24:41 if it was only the 3.7.2 rebol-stage0 sure 2023-03-02 22:26:00 Right, this is a make-like tool 2023-03-02 22:33:27 urgh 2023-03-02 22:33:37 really have no guesses as to why it works in qemu but not in the lxc thing 2023-03-02 22:36:02 qemu is emulating armhf directly? (not arm8l?) 2023-03-02 22:36:35 `qemu-arm` and armhf alpine chroot 2023-03-02 22:37:09 idk what cpu that is or isn't 2023-03-02 22:37:13 what does uname -m return? 2023-03-02 22:37:24 armv7l 2023-03-02 22:37:34 that could be a difference? 2023-03-02 22:37:43 armv8l vs armv7l 2023-03-02 22:37:50 well 2023-03-02 22:37:52 yes and no 2023-03-02 22:37:59 one is qemu and a 32-bit host 2023-03-02 22:38:19 the containers are 32-bit userspace via linux32 on a 64-bit host so they say armv8l 2023-03-02 22:38:29 yes 2023-03-02 22:38:30 it's a pretty big difference but it doesn't say much 2023-03-02 22:39:30 if i strace it stays in a futex forever 2023-03-02 22:40:03 so a deadlock? 2023-03-02 22:40:35 possibly 2023-03-02 22:50:21 https://img.ayaya.dev/d3AD4UKZuoBh 2023-03-02 23:02:59 if i run a containerd container with arm32v6/alpine:edge linux32 on my rpi4 2023-03-02 23:03:01 it works 2023-03-02 23:03:06 it is literally only our lxc that is broken 2023-03-02 23:03:25 same armhf userspace, same armv8l uname 2023-03-02 23:03:41 the host is edge and the kernel is 6.1.8 in my case 2023-03-02 23:03:56 is there anything you can play around with in the lxc setup or host stuff or nah 2023-03-02 23:04:00 kinda losing my mind 2023-03-02 23:06:29 could setarch set it 2023-03-02 23:06:43 setarch set what 2023-03-02 23:06:52 armhf 2023-03-02 23:07:07 setarch only masks the uname return output 2023-03-02 23:07:13 it doesn't really do anything here 2023-03-02 23:07:15 and no 2023-03-02 23:07:30 yes, but doesn't it help 2023-03-02 23:07:52 nothing here even reads the uname 2023-03-02 23:07:56 and it doesn't work like that 2023-03-02 23:07:58 I forgot how I did set armhf lxc but I remember I did somehow 2023-03-02 23:08:09 all you can do is `setarch arm` and it then makes it return armv8l instead of aarch64 2023-03-02 23:08:21 it's the same thing as using `linux32`, which is just a symlink to setarch 2023-03-02 23:08:49 and again, it doesn't affect any semantics/code, just the uname output 2023-03-02 23:09:00 everything here already uses it to make it armv8l instead of aarch64 2023-03-02 23:09:28 sorry, I forgot then 2023-03-02 23:15:50 I have this in old lxc 'lxc.arch = armhf' 2023-03-02 23:16:08 and 'lxc.uts.name = armhf' 2023-03-02 23:16:29 but not worked with this for few years so not sure does it works 2023-03-02 23:18:29 but remember I used it for some armhf development, building kernel and some other things 2023-03-02 23:22:44 that's probably already there but ikke would have to check 2023-03-03 15:42:31 seems like my ncopa-edge-riscv64 does not get dhcp address? 2023-03-03 16:35:45 ncopa: dhcp is broken for rv64 for some reason 2023-03-03 16:36:05 clandmeter: managed to get it working, but at some point it stopped again 2023-03-03 16:59:40 is it dhcp client or server? 2023-03-03 16:59:46 that is broken 2023-03-03 17:23:42 It works for other containers, so I think the client 2023-03-03 18:55:06 ikke: could you post the lxc configs or whatnot for both the armhf/armv7 containers of the builders and mine 2023-03-03 18:55:09 maybe there is something sus 2023-03-03 18:55:21 i can't tell why it works literally everywhere else except in that one container 2023-03-03 18:55:24 well, those two 2023-03-03 18:56:18 If only I could work on that server 2023-03-03 18:57:14 well 2023-03-03 18:57:18 same, but i spent like 6 hours on it 2023-03-03 18:59:15 ouch 2023-03-03 19:08:36 psykose: this is build-edge-armhf: https://tpaste.us/MRb9 (alpine.common.conf: https://tpaste.us/EJOD) 2023-03-03 19:09:16 hmm 2023-03-03 19:10:12 i can't find what lxc.arch really does except some personality stuff 2023-03-03 19:10:24 usually armhf would be some v7 debianism thing, or completely meaningless like the personality 2023-03-03 19:10:27 could just make it `arm` 2023-03-03 19:10:50 in 30 minutes or so :P 2023-03-03 19:11:22 small attempt: make it `arm` and comment out every cap-drop line on my container 2023-03-03 19:11:28 then i'll see if by magic it changed it anything 2023-03-03 19:11:32 in 30 minutes or so :) 2023-03-03 19:15:01 can i restart the container? 2023-03-03 19:16:26 I just did the armhf -> armf change first 2023-03-03 19:17:35 arm* 2023-03-03 20:19:09 yeah 2023-03-03 20:20:13 done 2023-03-03 20:21:45 Reading https://www.linux.com/training-tutorials/qos-linux-tc-and-filters/ 2023-03-03 20:29:18 anything I should backup from usa9? I've got /root /etc /var/lib/lxc/*/config 2023-03-03 20:29:40 The CI VMs have nothing special 2023-03-03 20:29:48 (ARM eagerly wants it back) 2023-03-03 20:31:15 just spent 10 minutes trying to connect because for some reason ssh fails trying to connect to it over ipv6 2023-03-03 20:31:22 even though looking up AAAA is just.. nxdomain 2023-03-03 20:31:23 huh 2023-03-03 20:31:47 I have no idea how that is etup 2023-03-03 20:31:48 setup 2023-03-03 20:59:47 I'm puzzled where these slowness comes from 2023-03-03 21:01:08 less than 20mbit traffic out on both servers combined 2023-03-03 21:59:22 One advantage is that you really learn to use vim the proper way 2023-03-03 21:59:41 as you have time to think and you cannot rely on spamming arrows lots of times 2023-03-03 22:08:12 haha 2023-03-03 22:08:28 yeah, really mentally planned out the chess moves :-) 2023-03-06 12:36:14 hey is kevin here? im available to help setup the second ppc64le 2023-03-06 12:40:28 ikke: ping ^^^ 2023-03-06 12:41:18 Hey mick_ibm, I'm currently at work, but help would be appreciated 2023-03-06 12:46:18 so i can open up the ticket at OSU OSL for the power box and CC you and anyone else. according to Stan the machine is "racked and stacked" :) 2023-03-06 12:46:35 so what OS would you like running on it? specific version of alpine? 2023-03-06 12:53:44 Alpine 3.17 preferably 2023-03-06 12:58:18 ok 2023-03-06 12:59:47 once i send the email out (which opens the request ticket) you could post a public ssh key in the email thread 2023-03-06 13:07:54 request has been sent - its 5am over there now. lemme know if theres anything else i can help you out with 2023-03-06 13:13:51 mick_ibm: thanks, will do 2023-03-06 13:18:41 one question, what is the status of netboot for ppc64le? im not seeing anything here: https://boot.alpinelinux.org/ however there looks like a nice list of mirrors that host ppc64le alpine images and packages. i help manage the OSU mirror (that was copied over from unicamp as is): http://ftp2.osuosl.org/pub/ppc64el/alpine-netboot/ 2023-03-06 13:19:02 Right now im thinking I should remove this directory on OSU ftp2 unless this would be helpful for you guys since you got a dev box there 2023-03-06 13:24:44 netboot could use some love. It needs to be modernized. 2023-03-06 13:59:10 does ppc64le support uefi? 2023-03-06 14:41:41 hmmm back when coreos was being ported to power, it was not... but im checking with kernel team now 2023-03-06 14:54:06 ppc64le does not support uefi 2023-03-06 18:37:51 psykose: you're fast 2023-03-06 18:37:58 what with 2023-03-06 18:38:02 spam 2023-03-06 18:38:11 7 new messages 2023-03-06 18:38:12 haha 2023-03-07 19:33:03 zabbix has been upgraded to 6.4.0 2023-03-07 19:33:06 (living on the edge) 2023-03-08 11:28:30 I'll be returning USA9 to ARM 2023-03-08 11:28:53 (old arm / aarch64 host) 2023-03-08 15:02:46 anyone have by chance somewhere archived rust 1.66 apks 2023-03-08 15:03:03 for aarch64 2023-03-08 15:49:48 i dont 2023-03-08 20:43:10 mps: fyi, I did try to implement QoS with htb, but it does not seem to help a lot 2023-03-08 20:43:32 I'm not sure even what the bottleneck is 2023-03-08 21:03:31 ikke: I can't debug it as you know 2023-03-08 21:03:45 Yes, understand 2023-03-08 21:57:47 mps: started to monitor the interface throughput in Zabbix on the router 2023-03-08 21:57:54 see if that provides any insight 2023-03-08 22:38:07 RIP usa9 2023-03-09 06:43:22 telmich: bandwidth at the moment seems to be at most 30mbps in, not sure what's going on 2023-03-09 06:43:56 https://imgur.com/a/qLwkJcc 2023-03-09 06:44:14 ping time increases to ~1s when I try to download a large file 2023-03-09 11:38:51 ikke: do I have access to the arm machines? i wonder if it would help to change the io scheduler or preemption 2023-03-09 11:40:14 ncopa: they are connected to an openwrt router (vigir23) 2023-03-09 11:40:22 That's the bottleneck 2023-03-09 11:41:36 ok 2023-03-09 11:42:05 are they connected to dmvpn? 2023-03-09 11:42:13 what subnets? 2023-03-09 11:42:34 che-bld-1 is 2023-03-09 11:42:42 (and the ci lifts on che-bld-1 2023-03-09 11:43:40 ncopa: the openwrt router is connected to the internet via a wireguard tunnel 2023-03-09 11:44:00 ncopa: che-bld-1 is 172.16.27.1 2023-03-09 11:44:44 oh, so the internet connection goes via wireguard tunnel? 2023-03-09 11:44:53 yes 2023-03-09 11:44:58 and dmvpn is a tunnel inside a wireguard tunnel? 2023-03-09 11:45:16 yes 2023-03-09 11:45:24 I guess MTU? 2023-03-09 11:45:52 but most traffic is outside of dmvpn 2023-03-09 11:48:14 can i ssh to it somehow? seems like I cannot connect via dmvpn 2023-03-09 11:48:57 ncopa: do you have ipv6? 2023-03-09 11:49:13 Do you want to connect to the router? 2023-03-09 11:49:23 i have ipv6 2023-03-09 11:49:32 i'd like to connect to the arm machines 2023-03-09 11:49:50 2a0a:e5c1:517:cafe:da5e:d3ff:fee6:1f48 2023-03-09 11:49:53 that's che-bld-1 2023-03-09 11:49:58 (haven't setup dns yet) 2023-03-09 11:50:26 (fyi, there will be quite some lag at the moment) 2023-03-09 11:51:19 (adding your keys) 2023-03-09 11:51:20 is the ipv6 via wireguard tunnel as well? 2023-03-09 11:51:31 yes 2023-03-09 11:52:26 at least, I think so 2023-03-09 11:52:34 there is a wan and a wwan0 interface 2023-03-09 11:53:23 Hmm 2023-03-09 11:53:26 and the openwrt router is connected to where? 2023-03-09 11:53:28 default from 2a0a:e5c0:0:3::/64 via fe80::20d:b9ff:fe57:2f91 dev wan metric 384 2023-03-09 11:53:31 default dev wg0 metric 1024 2023-03-09 11:53:41 ncopa: ungleich infraa 2023-03-09 11:53:50 ok 2023-03-09 11:54:12 do we have access to the openwrt router? 2023-03-09 11:54:57 yes 2023-03-09 11:55:06 though figuring out how to add your key 2023-03-09 11:59:44 ncopa: fyi, I stopped the builders, otherwise I couldn't do anything 2023-03-09 12:00:43 ncopa: ok, you should be able to log into root@vigir23.place6.ungleich.ch 2023-03-09 12:05:04 i get prompted for password with root 2023-03-09 12:05:28 which ssh key are you using? 2023-03-09 12:06:58 ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBblcU1qMyXsRG1zDI0GfcfXk01O4p6bAlM3A6zHHxnM ncopa@ncopa-mbp14 2023-03-09 12:07:25 or 2023-03-09 12:07:31 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDExEDGhrpnu3bFJJ0n9hWElHAs815BQUXNJAWyc4Yc7jP4oGruJS+zGp/ifTzXSK5Km8F6oyllYd8+Ffv1oFEjjftcHFi2u5wk/5NqMYfEILg2Zb7U2DVg2dT6xB/v9Y/fb5um1cl09IWvGNL34ErqfNwMXyoxUUwJbG+nAW3qBtMnk/6WsdPRTO7JTr8yyil+WjyONREWsj/BnYNPkAfhiAyYJNZYTYE4LsIjZzM9Lj0v0Tc3XfluoGNvLviws+hTwUhQbOJ5KzNpYtyKErR94xem1HZb33mhr2VVJCrp2Ms9FR5Lce7wAW 2023-03-09 12:07:31 asG5WWLzKas/0HIQcvL1zmqtniE/xh ncopa@alpine-netbook 2023-03-09 12:07:49 added the first one 2023-03-09 12:07:57 it was not in the list of 4 keys that I already added :D 2023-03-09 12:08:32 :) 2023-03-09 12:08:36 ok im in 2023-03-09 12:08:41 so this is a mips machine 2023-03-09 12:09:04 yes 2023-03-09 12:12:40 ncopa: I already tried to implement some form of QoS with tc, but I could not get all commands accepted, and the bandwidth references I used assumed 80mbit, which is what we were able to reach before at least 2023-03-09 12:13:37 Right now there is no traffic, so it's snappy, but as soon as there is traffic saturating some bottleneck, latency and packetloss increase tremendously 2023-03-09 12:16:54 right. this router is not able to keep up with the network io 2023-03-09 12:17:06 or the cpu is not sufficient for the wireguard encryption 2023-03-09 12:18:33 Yeah, I was suspecting something like that 2023-03-09 12:19:35 telmich mentioned they measured 400mbit on that device a few years ago 2023-03-09 12:20:02 but not sure if that included wg or not (the interface is 1gbit, so I assume the device should be able to reach that 2023-03-09 12:20:10 https://ungleich.ch/u/products/vigir/ 2023-03-09 12:25:08 i remember we measured dmvpn network performance with via cpu a decade ago 2023-03-09 12:25:38 and back then, IIRC, the pci bus was the bottleneck 2023-03-09 12:26:06 even if NIC was gigabit we were not able to get anything better than 200-300 mbit 2023-03-09 12:26:12 or maybe it was 600mbit? 2023-03-09 12:26:28 we were never able to get gigabit 2023-03-09 12:26:49 right, but even that should be plenty for us 2023-03-09 12:27:32 do we have cpu load graphs in zabbix? 2023-03-09 12:27:58 could be the cpu is not enough for the wireguard tunnel 2023-03-09 12:28:38 I did see the load increasing before on the realtime graphs on the webinterface, but lately if I looked at the load in the OS itself, it was not high, but let me check the graphs in zabbix 2023-03-09 12:31:52 ncopa: https://zabbix.alpinelinux.org/zabbix.php?action=dashboard.view&dashboardid=13 2023-03-09 12:32:57 Not sure if there is a direct correlation 2023-03-09 12:33:36 I mean, this morning I did some download tests, and there I also clearly see the load increasing 2023-03-09 12:33:50 but just now with the packetloss, the load remains low 2023-03-09 12:34:21 Maybe related to the amount of packets versus bandwidth? 2023-03-09 12:55:23 ikke: I have MT7621 router with openwrt on it 2023-03-09 13:00:52 though I never tested it for max speed because didn't had issue with network on it 2023-03-09 13:01:33 maybe I could test it with iperf to see results 2023-03-09 13:37:53 ikke: does not look like cpu is the problem 2023-03-09 13:39:17 between 11:29 and 12:59 there was 100% packet loss. pretty weird 2023-03-09 13:41:24 no idea whats going on here 2023-03-09 13:41:59 3.17 builders were uploaading 2023-03-09 14:05:32 ncopa: the packetloss might also be a timeout being reached (meaning the latency was higher than the timeout) 2023-03-09 14:43:44 right 2023-03-09 14:45:35 fyi, I installed mosh on all 3 hosts 2023-03-09 14:46:04 might be more usefull under high congestion 2023-03-09 14:47:21 started the builders again 2023-03-09 14:52:15 ncopa: I've changed the icmp timeout for vigir23 from 2 seconds to 5 seconds 2023-03-09 14:52:17 in zabbix 2023-03-09 14:52:36 i observed ping times on 3 seconds earlier today 2023-03-09 14:53:12 Sounds really like buffers filling up 2023-03-09 15:18:00 ncopa: this will become a larger problem for the next release :( 2023-03-09 16:20:42 is there anything we can do? 2023-03-09 16:21:01 like set rate-limiting or similar? 2023-03-09 16:21:15 or get a more powerful router? 2023-03-09 16:22:00 clandmeter said he might have something 2023-03-09 16:22:15 I tried ratelimiting with QoS 2023-03-09 16:22:28 But not sure if I configured it correctly 2023-03-09 16:27:59 i wonder if it would make sense to do it on the arm machines behind 2023-03-09 16:28:53 have you done performance test between arm machines? 2023-03-09 16:29:01 and arm machine <-> router? 2023-03-09 16:30:17 <@ikke> That's the bottleneck 2023-03-09 16:30:24 how do you know its the router that is the bottleneck? 2023-03-09 16:30:59 im just double checking... 2023-03-09 16:32:23 It could be even further upstream, but I have no insight in it 2023-03-09 16:32:29 but whenever it's slow, everythign is slow 2023-03-09 16:37:03 maybe we should shoot an email in telmich's direction, with our findings. 2023-03-09 16:37:27 I will be out tomorrow morning but might be able to do some tests between the boxes tomorrow 2023-03-09 16:37:51 I was thinking to do things like test from router -> arm machine 2023-03-09 16:37:57 test from router -> internet 2023-03-09 16:38:00 etc 2023-03-09 16:38:34 i also wonder if wwan0 is wireless connection? 2023-03-09 16:43:07 according to the interface, it's lte 2023-03-09 16:43:18 Maybe that's unexpected 2023-03-09 18:23:30 ncopa: as soon as I stop the build containers, the latency drops 2023-03-09 21:58:26 ikke: there's still that consistent git pull bug 2023-03-09 21:58:44 where if you haven't updated it in a ~day or so, and just run `git pull` (or fetch i guess), it first hangs for like 45 seconds 2023-03-09 21:58:54 and it's not an ipv6 thing, same on v4 2023-03-09 22:31:56 psykose: need to figure out how to debug that 2023-03-09 22:32:03 yeah 2023-03-09 22:32:08 I did notice myself it does take longer, but not 45 seconds 2023-03-09 22:32:10 i don't have a good way, except.. wait a while then do it 2023-03-09 22:32:13 so my guess is 2023-03-09 22:32:18 there's some Initial Git Stuff 2023-03-09 22:32:32 if it's Up To Date, it just goes fast (mostish, up to a few changes, so something chunk/index related) 2023-03-09 22:32:40 past that it does Something for A While on the remote end 2023-03-09 22:32:57 We could try the new gitlab ssh thingy :P 2023-03-09 22:33:04 lol 2023-03-09 22:33:10 i mean, probably does improve it :p 2023-03-09 22:55:46 We'll get those now 2023-03-09 23:03:21 ye 2023-03-10 06:38:37 rebooted gbr-app-1 2023-03-10 13:26:27 can I connect to the new arm boxes via ipv6? or do I need to use vigir23 as jump box? 2023-03-10 13:28:16 Directly via ipv 2023-03-10 13:28:23 ipv6 2023-03-10 13:30:51 where do I find the ip addresses/hostnames? 2023-03-10 13:33:28 I didn't have time to add dns records yet. Do you have access to netbox? 2023-03-10 13:33:37 I think so 2023-03-10 13:33:44 Ips are in there 2023-03-10 13:34:05 che-bld-1, che-ci-1 2023-03-10 13:35:53 ok, i found them. thanks! 2023-03-10 13:36:38 arm servers are not accessible for me 2023-03-10 13:40:12 mps: yes, known issue 2023-03-10 13:41:16 ikke: ok 2023-03-10 13:42:02 ncopa: tried to setup wg from che-ci-1 to gbr-app-1, but no connectivity yet 2023-03-10 14:02:40 seems like my ssh key is not on che-ci-1 2023-03-10 14:03:30 and not on che-bld-1 2023-03-10 14:06:39 distfiles.a.o does not have ipv6? 2023-03-10 14:09:39 was thinking we could set DISTFILES_MIRROR to distfiles.alpinelinux.org but it does not help if its not available via ipv6 2023-03-10 14:15:03 seems like https://build.alpinelinux.org/ is not updating? 2023-03-10 14:15:14 build-edge-x86_64 is building kernel right now 2023-03-10 14:24:37 ok, not much i can do at this point 2023-03-10 14:33:10 ncopa: I did reboot the host, so maybe something is not working properly 2023-03-10 14:43:49 i was thikning we could try create a static gre tunnel or something 2023-03-10 14:44:12 but seems like vigir23 only has busybox ip, and not iproute2 2023-03-10 14:44:34 Yeah 2023-03-10 14:46:32 im not sure its possible to do wireguard over wireguard 2023-03-10 14:57:41 Me neither 2023-03-10 15:00:51 looks like it should be possible to do gre6 tunnel: https://openwrt.org/docs/guide-user/network/tunneling_interface_protocols#protocol_grev6_gre_tunnel_over_ipv6 2023-03-10 22:49:01 mps: if you're not interested in zig anymore, do you mind if i take maintenance 2023-03-10 22:54:55 psykose: sure, take it and thank you 2023-03-10 22:55:11 thanks :) 2023-03-11 05:02:41 ikke: could you show me config/common.conf and builder.common.conf too for lxc 2023-03-11 05:02:46 also i assume etc/lxc/armhf.common.conf is commented out intentionally 2023-03-11 05:04:31 want to create lxc myself with the full config copied to see if i can reproduce it that way 2023-03-11 05:04:47 that would let me play around a bit instead of pingponging one thing at a time and seeing if it's maybe fixed or not 2023-03-11 06:09:41 psykose: yes, those arch specific config were used on the previous host for NUMA 2023-03-11 06:36:38 makes sense 2023-03-11 08:59:24 when we can expect arm machines online 2023-03-11 09:06:06 ikke: could you give me those configs 2023-03-11 09:32:15 I'll do when I have the opportunity 2023-03-11 09:32:32 sure thing 2023-03-11 12:55:26 "but seems like vigir23 only..." <- Feel free to install packages as needed on the vigir, I think it should have iproute2 in the repos 2023-03-11 13:07:06 telmich: hi, is it possible to get ipv4 connectivity on the physical interface? I think we are currently missing it. 2023-03-11 13:07:34 on / via 2023-03-11 13:08:50 From what I understand the wg interface should provide that? 2023-03-11 15:55:39 /join !dklW6D8VSsGRZaDK:dendrite.k6.yokai.cafe 2023-03-11 16:13:37 "nico🇨🇭: hi, is it possible to..." <- There is no Ipv4 in Our native (physical) networks, ipv4 is only at the edge 2023-03-11 16:18:28 We only need the servers to be able to reach Ipv4 only services, something that is now not working 2023-03-12 09:47:58 iiuc arm builders doesn't work? 2023-03-12 09:49:49 nothing does 2023-03-12 09:50:00 for arm that is 2023-03-12 09:50:28 anyone have idea how to fix this network problem 2023-03-12 09:57:00 i assume we have to set up some 464xlat or whatever? 2023-03-12 09:57:13 it's ipv6 only so there has to be some service that lets you encapsulate ipv4 in it 2023-03-12 09:58:47 does anyone work on this 2023-03-12 09:59:35 i don't have access to anything, everyone else is busy, and it probably requires telmich to tell us how to do it on the specific networking setup 2023-03-12 09:59:37 no clue :D 2023-03-12 10:02:35 hm ok /o\ 2023-03-12 10:03:20 I'm away from home this weekend, so I don't have a lot of time 2023-03-12 10:03:42 And also waiting what ungleich can do 2023-03-12 10:54:56 hai, you should have nat64 already, which lets you to access any v4 only services 2023-03-12 10:54:56 are you using their dns resolver? check if it returns "fake" v6 dns entries for e.g. github.com, which should point to their nat64 range 2023-03-12 12:06:35 nu[m]: ah, I see 2023-03-12 12:38:09 nu[m]: we were using our default setup, which would bypass their dns 2023-03-12 12:44:13 if you want to keep that you could consider configuring your dns to give out these extra v6 entries which points to their, or your nat64 ranges (with bind its trivial) 2023-03-12 12:44:13 but beware, this could reroute, or block some traffic of other dualstack deployments that use this dns if they end up preferring v6 over legacy 2023-03-12 12:46:04 ah, nvm, with bind you set explicitley the range for which you want to enable dns64 for 2023-03-12 12:46:14 We use dnsmasq 2023-03-12 12:46:24 Or do you mean what ungleich uses 2023-03-12 12:52:17 i dont see dns64 support for dnsmasq:/ 2023-03-12 12:54:03 from the limited info i gather that in order to fix your v6 only nodes you would need to start using their dns, or spawn a dns64 resolver 2023-03-12 12:55:06 We can use split dns. 2023-03-12 12:55:53 And use their dns as default 2023-03-12 13:01:24 give it a try, and see if u can wget github^^ 2023-03-12 14:54:54 ikke: edge-ppc64le builder is also missing 2023-03-12 15:06:30 ah right it was stuck 2023-03-12 15:06:39 Yes 2023-03-12 22:10:48 psykose: on which arch do you have issues on arm? 2023-03-12 22:21:06 is it on 32bits arm? 2023-03-12 22:31:02 I wonder if it could be related to https://www.kernel.org/doc/Documentation/arm64/legacy_instructions.txt 2023-03-12 22:31:50 now that the old server is gone i cant double check its settings. 2023-03-12 22:34:45 psykose, ikke: i set Enabled cp15_barrier support to 2 2023-03-12 22:35:33 https://github.com/rust-lang/rust/issues/60605 2023-03-12 22:36:18 if this solves the issue, we need to enable it on boot 2023-03-13 04:08:16 clandmeter: ah, I totally forgot we enabled that 2023-03-13 05:05:24 clandmeter: btw, I made a backup of /etc of use9 2023-03-13 05:05:31 usa9 2023-03-13 05:41:37 clandmeter: ooh, thanks.. 2023-03-13 05:41:43 this looks familiar 2023-03-13 05:44:12 if this fixes it you are a god :D 2023-03-13 05:44:18 i'd have never figured this out 2023-03-13 05:44:46 (i wonder why it doesn't reproduce on an rpi4 then? are the altra cpus so much newer they hang instead on these..) 2023-03-13 05:47:39 abi.cp15_barrier = 2 2023-03-13 05:47:43 that's what we had set indeed 2023-03-13 05:48:52 from what i'm reading it should trap+emulate (and then break, into a loop..) on all armv8 hardware 2023-03-13 05:49:07 really weird the rpi4 does not 2023-03-13 05:49:30 unless the rpi4 kernel has a patch for this to set =2 by default 2023-03-13 05:49:50 instead of =1 like mainline 2023-03-13 05:49:57 And I suppose we'd need to seperately enable that on the CI vms? 2023-03-13 05:50:01 yeah 2023-03-13 05:50:09 everything that boots a kernel needs this set for now 2023-03-13 05:50:18 for llvm output to work on the arch, i guess 2023-03-13 05:50:31 i'll track these bugs, maybe in a few years it's fixed in llvm and then we can drop it 2023-03-13 06:05:02 Regarding ipv6 and the builders, we need to make sure the builders have ipv6 connetivity 2023-03-13 06:05:07 dmvpn will remain an issue 2023-03-13 08:57:01 psykose: sorry that i couldn't help you sooner, just returned from holidays. 2023-03-13 08:57:11 no worries :) no deadlines or anything 2023-03-13 08:57:27 also isn't so much helping "me" as much as figuring out why our builders/ci are broken for rust on armhf :D 2023-03-13 08:57:37 and now it should be fixed, so that's very good 2023-03-13 08:57:42 thanks a lot again :) 2023-03-13 08:57:47 its better to "waste" time on better things ;-) 2023-03-13 08:57:52 true, true 2023-03-13 08:59:06 do we have an issue recorded for this issue? maybe we could mention it so we dont forget about it in the future. 2023-03-13 09:01:32 https://gitlab.alpinelinux.org/alpine/aports/-/issues/14667 2023-03-13 09:01:41 i had it there as a todo for myself, so i wrote out the rest now 2023-03-13 10:44:18 Nice thx 2023-03-13 11:24:16 clandmeter: so nat64 (or whatever is behind it) is working, but we need to use upstream dns, and we need to to be able to use ipv6 2023-03-13 11:24:24 So we need to give the containers ipv6 addresses 2023-03-13 11:28:43 sounds doable 2023-03-13 11:28:54 yes 2023-03-13 11:28:56 working on it 2023-03-13 11:28:59 hopefully the ipv6-container-firewall-stuff isn't very painful 2023-03-13 11:31:29 ikke: yes 2023-03-13 11:31:37 been reading a bit about it 2023-03-13 11:32:17 clandmeter: about nat64? 2023-03-13 11:32:26 yes 2023-03-13 11:32:34 nat64 and dns64 2023-03-13 11:32:36 right 2023-03-13 11:32:45 So on the host it's working 2023-03-13 11:32:51 I can connect to tpaste and github 2023-03-13 11:33:14 yes I tried from the router 2023-03-13 11:35:39 Can we make it work with dnsmasq? 2023-03-13 11:35:46 clandmeter: should already work 2023-03-13 11:36:06 on che-bld-1 that is 2023-03-13 11:37:00 So all ipv4 traffic will flow via the nat64 gw iiuc 2023-03-13 11:37:28 You connect to the nat64 gw via ipv6 2023-03-13 11:37:40 yes 2023-03-13 11:37:54 via dns64 lookups 2023-03-13 11:39:07 Interesting concept 2023-03-13 11:39:09 I've assigned 2a0a:e5c1:517:1::/64 to the lxcbr0 2023-03-13 11:39:15 It’s new for me 2023-03-13 11:39:24 I knew the concept 2023-03-13 11:39:43 But I missed that we need to point the dns servers to the upstreaam dns servers 2023-03-13 11:41:08 And ipv6 addresses are kind of needed :) 2023-03-13 11:41:37 so the ipv4 addresses will need unused or for local connectivity only 2023-03-13 11:42:08 yes 2023-03-13 11:42:14 be… 2023-03-13 11:42:31 In the car to Groningen :| 2023-03-13 11:46:42 clandmeter: nice 2023-03-13 11:47:01 I started corerad with the prefix, and now all containers have a public IPv6 address :) 2023-03-13 11:48:41 no default route yet though 2023-03-13 11:55:02 one step closes, but not yet there 2023-03-13 11:55:06 closer* 2023-03-13 12:10:07 almost ! 2023-03-13 12:10:21 ikke: did you apply the arg for the armv6 workaround to CI vms too? 2023-03-13 12:12:34 Not yet, will do later 2023-03-13 12:13:09 Was first focusing on getting them connected ;-) 2023-03-13 12:22:26 :) 2023-03-13 12:38:35 ikke: also the latest-development on https://alpinelinux.org/ doesn't update anymore 2023-03-13 12:58:01 Updating again 2023-03-13 12:58:37 what was the issue 2023-03-13 13:02:08 mqtt-exec crashing when it couldn't reach msg.a.o after host reboot 2023-03-13 13:03:15 Something something supervised 2023-03-13 13:03:22 Or supervisord 2023-03-13 13:11:13 fun 2023-03-13 16:35:32 progress 2023-03-13 16:36:40 https://i.imgur.com/gfM2Bty.png 2023-03-13 16:48:37 pogress 2023-03-13 16:49:09 clandmeter: I installed corerad on che-bld-1 to provide IPv6 addresses to the containers 2023-03-13 16:49:21 and then enabling ipv6 forwarding 2023-03-13 16:50:57 yay! 2023-03-13 17:12:07 :) 2023-03-13 17:33:31 ikke: nice 2023-03-13 17:37:33 ikke: why corerad? 2023-03-13 18:09:58 clandmeter: alternative is radvd, but I've heard that one's poorly maintained 2023-03-13 18:10:57 clandmeter: oh, I also added a route on vigir23 2023-03-13 18:11:01 to forward the traffic 2023-03-13 18:24:24 clandmeter: wanted to try corerad to see how it works 2023-03-13 18:24:29 it's at least simple 2023-03-13 18:42:22 radvd is fine, just hasn't had a release in a while 2023-03-13 18:42:44 better than a million B's of ram into golang :D 2023-03-14 02:30:39 ikke: seems the networking is kinda broken again 2023-03-14 05:41:59 psykose: why do you say so? 2023-03-14 05:46:48 probably because of stuff like http://build.alpinelinux.org/buildlogs/build-edge-armv7/community/apache-ant/apache-ant-1.10.13-r0.log 2023-03-14 05:49:05 sadly no clue where it's trying to connect to 2023-03-14 05:49:26 but the builders still have network 2023-03-14 05:52:11 Things that explicitly require public ipv4 will not work 2023-03-14 09:32:07 that and them all being stfuc 2023-03-14 09:32:10 stuck* 2023-03-14 09:58:09 i.e. aarch64/armhf have not moved since i went to bed 2023-03-14 09:58:38 might just be a regular hang though 2023-03-14 09:59:05 armv7 though no idea 2023-03-14 09:59:25 you can try resolving `https://repo1.maven.org/maven2/org/apache/maven/resolver/maven-resolver-ant-tasks/1.4.0/maven-resolver-ant-tasks-1.4.0-uber.jar` 2023-03-14 09:59:27 from the machine 2023-03-14 09:59:57 repo1.maven.org has only an A address 2023-03-14 10:00:23 but does work, so i wonder if it's weird in some other way 2023-03-14 11:20:22 buildrepo seems stuck 2023-03-14 11:20:28 100%cpu 2023-03-14 11:21:00 timeout on pselect 2023-03-14 11:22:33 love that 2023-03-14 11:23:04 not sure why 2023-03-14 11:23:17 I mean, I can restart it, but not sure if it will return 2023-03-14 11:35:58 psykose: that file downloads without issue using curl 2023-03-14 11:36:12 love that 2023-03-14 11:44:20 is it meant to build with jdk8? 2023-03-14 11:45:28 it's explictly in makedepends 2023-03-14 11:46:37 I wonder if it's some ipv6 issue with jdk8 2023-03-14 12:00:23 the address in question is v4 only anyway 2023-03-14 12:00:47 also there's only jdk8 on 32-bit 2023-03-14 12:01:10 weird 2023-03-14 12:01:55 if you run the build by hand i assume it also fails 2023-03-14 12:08:13 maybe you could strace it? it's clearly that kind of error i guess 2023-03-14 12:08:26 would be noisy but would show what it's even connecting to 2023-03-14 12:08:50 and give some clues as to what is missing that curl either doesn't care about or works around 2023-03-14 13:04:54 think all these aws-c-* testsuites, step-cli, .. are also failing for the same reason 2023-03-14 13:05:14 something still doesn't quite work right on that networking even if ordinary v4/v6 connections work for fetching and stuff 2023-03-14 13:06:30 I did run it by hand and it failed 2023-03-14 13:06:39 Didn't have time to strace it. 2023-03-14 13:07:40 NAT64 + DNS64 should make it return an IPv6 address 2023-03-14 13:35:15 gonna guess these applications are like 2023-03-14 13:35:17 broken on ipv6 2023-03-14 13:35:25 so the whole ipv6 -> ipv4 translation works 2023-03-14 13:35:34 but when try do connect(ipv6) they are using some 90's code 2023-03-14 13:36:16 based on the builds so far this is mostly fine, but it's going to break a hundred random network'd testsuites on arm only 2023-03-14 15:03:26 some builders also seem to be retrying in a loop :D 2023-03-14 15:03:27 spinny 2023-03-14 15:04:16 the openjdk thing seems like https://bugs.openjdk.org/browse/JDK-8200719 but that file it's fixed is not in openjdk8 and any relevant similar things have the same patch already 2023-03-14 15:04:17 weird 2023-03-14 16:58:52 maybe JAVA_OPTS="-Djava.net.preferIPv6Addresses".. 2023-03-14 16:58:54 can't test myself tho 2023-03-14 17:50:54 could you test that one 2023-03-14 17:51:58 Was about to do 2023-03-14 17:52:05 but then I ran into ssh issues :P 2023-03-14 17:52:20 waiting for the armv7 builder to idle 2023-03-14 18:03:03 that sounds like a while 2023-03-14 18:03:08 it probably repros on aarch64 for the same thing 2023-03-14 18:07:03 it does 2023-03-14 18:07:21 but that flag does not make a difference 2023-03-14 18:07:54 unlucky 2023-03-14 18:07:58 didn't expect it to but weh 2023-03-14 18:08:03 i guess strace time of whatever fails 2023-03-14 18:08:13 yeah 2023-03-14 18:08:14 and see if by change it's gethostbyname 2023-03-14 18:16:06 connect(45, {sa_family=AF_INET6, sin6_port=htons(443), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:151.101.240.209", &sin6_addr), sin6_scope_id=0}, 28 2023-03-14 18:16:18 no gethostbyname 2023-03-14 18:18:40 https://tpaste.us/a1Zb 2023-03-14 18:23:30 hrm 2023-03-14 18:23:45 Trying to figure out what exactly is timing out 2023-03-14 18:23:56 yeah java is not fun for that 2023-03-14 18:24:35 pid 77049 is writing that message 2023-03-14 18:24:49 tcpdump perhaps 2023-03-14 18:26:48 https://tpaste.us/qEP4 2023-03-14 18:27:20 it gets an ipv4 address back for some reason 2023-03-14 18:28:28 hm 2023-03-14 18:28:34 ok, but from who 2023-03-14 18:29:12 If I do getent hosts repo1.maven.org, it just gets an AAAA record 2023-03-14 18:29:49 172.16.27.1.53 is dnsmasq on the server 2023-03-14 18:31:13 Is it because it explicitly requests an A record? 2023-03-14 18:33:14 Yeah, the dns server will return A records if you ask for them 2023-03-14 18:33:27 ikke: 172.16.27.1.53 is DNS port on 172.16.27.1 so doesn't that mean it is talking to that IP? 2023-03-14 18:34:01 That's what it uses to query dns, yes 2023-03-14 18:34:41 https://tpaste.us/NBMq 2023-03-14 18:35:01 that looks weird 2023-03-14 18:35:35 But JAVA_OPTS="-Djava.net.preferIPv6Addresses" does not help to fix that 2023-03-14 18:35:43 yeah, it shouldn't affect anything 2023-03-14 18:35:49 but as a guess, maybe it was just that funny :p 2023-03-14 18:39:07 ikke: from the strace making the DNS request I don't see where it specifies either A or AAAA 2023-03-14 18:39:58 I guess it is one of the byte-encoded values 2023-03-14 18:40:33 yes, that was my guess 2023-03-14 18:40:41 Not versed in DNS 2023-03-14 18:41:15 maybe it's time to take a packet capture and load into wireshark? 2023-03-14 18:42:16 yeah, tried with pure tcpdump, but did not give enough detail 2023-03-14 18:42:36 or did it 2023-03-14 18:42:44 172.16.27.5.43790 > 172.16.27.1.53: [bad udp cksum 0x8e61 -> 0x82cf!] 9253+ A? repo1.maven.org. (33) 2023-03-14 18:42:52 seems like an explicit request for an A record 2023-03-14 18:50:53 I admit I know little about JVM but it seems it will do both A and AAAA lookups and java.net.preferIPv6Addresses only controls preference when it gets both A and AAAA results, if it gets only A result then obviously it can't prefer a non-existant AAAA result 2023-03-14 18:51:22 It does get an IPv6 result 2023-03-14 18:51:34 https://tpaste.us/qEP4 2023-03-14 18:51:39 Here you see what happens 2023-03-14 18:52:09 2 dns queries, to responses, connects to ipv4 and fails 2023-03-14 18:52:15 s/to/2 2023-03-14 18:54:47 ok, so looking at the strace I'm guessing that 6th line (sendto) is the A lookup and 7th line (sendto) is the AAAA lookup 2023-03-14 18:55:34 and that 9th line (recvfrom) is the A result and 12th line (recvfrom) is AAAA result 2023-03-14 18:55:57 yes 2023-03-14 18:56:43 and then 14th and 15th lines are it deciding to open a v4 socket rather than a v6 socket 2023-03-14 18:58:49 um, shouldn't this be defined in JAVA_OPTIONS, not JAVA_OPTS? 2023-03-14 19:00:00 But on the last lines you do see it 2023-03-14 19:00:04 ups 2023-03-14 19:00:17 minimal: doesn't help 2023-03-14 19:00:34 JAVA_OPTIONS="-Djava.net.preferIPv6Addresses" bootstrap/bin/ant -f fetch.xml -Ddest=optional 2023-03-14 19:01:03 ok, I just saw "JAVA_OPTS is not used by the JDK, but by a bunch of other apps" 2023-03-14 19:05:21 I guess doing a "ps" on the JDK/JRE would show if that setting actually made it to the runtime 2023-03-14 19:09:31 pretty sure it needs to be _JAVA_OPTIONS=-Djava.net.preferIPv6Addresses=true 2023-03-14 19:10:52 bingo 2023-03-14 19:11:13 hmm, now it fails later 2023-03-14 19:11:37 different domain 2023-03-14 19:13:28 oh wow 2023-03-14 19:13:30 actually progressed 2023-03-14 19:14:06 netrexx.org 2023-03-14 19:14:21 broken ipv6 :/ 2023-03-14 19:14:35 yeah, that's going to be really common 2023-03-14 19:14:41 we can't rely on out-6 working :p 2023-03-14 19:15:04 I'm trying to setup a gre tunnel to another server 2023-03-14 19:15:51 ikke: unrelated, i was thinking of doing a setup-apkcache on the builders and a nightly cron that just does rm /var/cache/apk/*.apk 2023-03-14 19:16:04 for a lot of things most of the build time is just installing the 530 packages, and local disk is way faster.. 2023-03-14 19:16:21 even if the http was on localhost it's a huge difference 2023-03-14 19:16:31 (to just have them in cache already) 2023-03-14 19:16:41 psykose: the builders use local repos anywaya 2023-03-14 19:16:51 are you 100% sure though 2023-03-14 19:16:56 it seems like weirdly slow sometimes 2023-03-14 19:17:10 https://tpaste.us/xynv 2023-03-14 19:17:14 things that build instantly for me take like over a minute for install-500-things-copy-1-file-purge 2023-03-14 19:17:28 hm 2023-03-14 19:17:32 guess that's not the cause then, nvm 2023-03-14 19:17:35 why are the builders called buildozer? 2023-03-14 19:17:41 bulldozer 2023-03-14 19:17:55 yes, but is there any cool story behind it? 2023-03-14 19:18:23 I don't know the history, but sounds like a name related to builder 2023-03-14 19:21:15 buildozer because it destroys builds? ;-) 2023-03-14 19:38:00 I have a gre tunnel, now I need to setup NAT 2023-03-14 19:38:37 almost there 2023-03-14 19:42:08 I can ping 8.8.8.8 from the host, not yet from any containers 2023-03-14 19:43:29 hmm, curl is not working yet from the host 2023-03-14 19:47:50 sounds like dns again 2023-03-14 19:49:32 no, it's not a dns issue 2023-03-14 19:50:45 ICMP time exceeded in-transit 2023-03-14 19:52:19 ah 2023-03-14 19:52:32 Not sure why though 2023-03-14 19:55:32 I can ping 104.168.250.145 2023-03-14 19:55:37 but curl is not working 2023-03-14 19:55:43 not sure if it's a fw issue 2023-03-14 21:27:23 if ping works and curl doesn't and it's not a firewall/route issue it's probably an mtu issue 2023-03-15 05:39:04 Hello71: I don't even see the tcp handshake completing 2023-03-15 13:13:20 Good day everyone! I'm back in .ch and trying to see what the status is re arm servers 2023-03-15 13:13:50 currently just have a few dns issues per above 2023-03-15 13:13:53 the rest is ok i think 2023-03-15 13:16:25 telmich: wanted to send you a sunnary 2023-03-15 13:16:36 telmich: we got NAT64 working 2023-03-15 13:17:49 Buy some things are sadly still broken in a ipv6 only setup 2023-03-15 13:32:38 ikke: If you can send a summary, that would be optimal & easy to act upon, TBH 2023-03-15 13:37:01 Sure, will do. For the record, the last point is not a fault of ungleich, just the state of ipv6 support 2023-03-15 13:38:15 On ARM? 2023-03-15 13:41:05 nah, internet 2023-03-15 13:41:16 in this case we think it was an openjdk issue, no? 2023-03-15 13:41:19 or maybe it was just dns 2023-03-15 13:42:56 First it was jdk preferring ipv4 2023-03-15 13:43:24 ah right 2023-03-15 13:49:07 The last issue was a domain having AAAA record, but they were not reachable 2023-03-15 13:55:19 el clasico 2023-03-15 18:17:51 ikke: can you cycle edge-aarch64 2023-03-15 18:17:55 i am pretty sure it's deadlocked :D 2023-03-15 21:24:07 Network setup: router <--> host <--> docker-container 2023-03-15 21:24:43 When I try to ping the router from the docker container, I see an icmp echo reply on the router, but it never reaches the host 2023-03-15 21:26:28 ip6tables (which is relavent) does not show any rules (policy is accept) 2023-03-15 21:26:37 oh wait, forgetting something 2023-03-15 21:27:05 router <--> host <--> vm <--> container 2023-03-15 22:06:37 2RIP 2023-03-15 22:06:53 that feeling when echo 1 >forwarding removes your default route 2023-03-15 22:10:00 That was totally not me 2023-03-15 22:48:49 clandmeter: we need to verify what the docker teams change will mean for our infra (mainly regarding third-party images we use) 2023-03-16 06:36:28 🤨 2023-03-16 06:37:47 ok, apparently legit 2023-03-16 06:50:18 Still in Groningen and next week Germany. Then I’ll have some time :| 2023-03-16 06:50:39 Did you see the email just now? 2023-03-16 07:46:48 hi! what is status of arm builders? are they okish now, so I can tag an edge release? 2023-03-16 08:49:16 ncopa: I think so 2023-03-16 08:49:45 clandmeter: I did see multiple emails 2023-03-16 09:00:05 "that feeling when echo 1 >..." <- You need accept_ra=2, that is usually the thing you want on "VM hosts" or "container hosts" 2023-03-16 09:55:45 when we can expect access to arm developers lxc 2023-03-16 09:56:22 need to config and test linux-edge for armv7 2023-03-16 09:56:48 for aarch64 I can do this on my local machine 2023-03-16 09:58:09 telmich: yes, I'm aware, thanks 2023-03-16 09:58:30 mps: I can give direct ipv6 access 2023-03-16 09:59:31 ikke: I don't have ipv6 on my internet link 2023-03-16 12:04:25 mps: oh, I thought you did 2023-03-16 12:04:42 mps: If you want to, I can give you ipv6 via wireguard 2023-03-16 12:10:04 telmich: one issue we are facing is that there is apparently no mulitipoint gre support for ipv6 in the kernel, so dmvpn is not working 2023-03-16 12:10:47 ikke: I can use ipv6 on our wg gateway if it can route ipv6 to lxcs 2023-03-16 12:12:22 telmich: thank you. lest wait some time to see if it could be solved by infra somehow 2023-03-16 12:12:37 s/lest/let/ 2023-03-16 12:12:48 lets* 2023-03-16 12:13:14 ikke: I am not familiar with dmvpn, but for gre wouldn't you setup one tunnel per connection and then potentially run bgp over it? 2023-03-16 12:18:15 Btw, is anyone in here located in London? I'll be visiting the UK IPv6 Council in April 2023-03-16 12:37:54 telmich: multipoint gre uses a single gre interface to make multiple connections (that's about all I know about it) 2023-03-16 12:39:07 telmich: the idea is that it can dynamically create connections between spokes insteaad of having to route everything to the hub, but without having to create a seperate gre pair for each spoke 2023-03-16 12:52:34 ikke: Oh, wow, very interesting. I have never seen/used such gre interfaces. I thought gre in general can only do ptp, but seems my info is a bit outdated 2023-03-16 12:53:19 Yeah, it's very nice technology originated from cisco, but apparently ipv6 support is lacking 2023-03-16 12:53:50 That is in the mainstream Linux kernel? 2023-03-16 12:55:08 yes, we just use mainstream 2023-03-16 12:55:25 It requires nhrp 2023-03-16 12:55:32 https://sudonull.com/post/201046-Creating-point-to-multipoint-tunnels-based-on-GRE-encapsulation-in-Linux-26 2023-03-16 13:05:12 That is quite interesting. It seems to be a similar approach to tailscale/headscale, potentially a bit older 2023-03-16 13:15:32 telmich: for alpine we have a complete solution, where a certificate is generated on the ca, which contains all information requires to set up the vpn 2023-03-16 13:15:54 so on the spoke, you run the setup script with the cert, and it sets up everything 2023-03-16 13:36:27 clandmeter: did you get e-mail from docker about the alpinelinux organization? 2023-03-16 13:55:57 strange, I can ping the container from the router, but not the router from the container 2023-03-16 13:58:05 router <-> server <-> vm <-> container 2023-03-16 14:01:28 ok, that's iptables 2023-03-16 14:50:24 hmm 2023-03-16 14:50:38 net.ipv6.conf.all.accept_ra = 2 2023-03-16 14:50:41 yet 2023-03-16 14:50:48 net.ipv6.conf.eth0.accept_ra = 1 2023-03-16 14:51:07 🤨 2023-03-16 14:53:29 something set the second after the first? 2023-03-16 14:53:45 or maybe if you set the second and then the first it doesn't overwrite manually set? 2023-03-16 14:54:16 I added the former in /etc/sysctl.d/ipv6.conf, and rebooted 2023-03-16 14:54:28 I'm not aware of anything setting the latter 2023-03-16 15:44:13 Hmm, it appears conntracking is not working for some reason :/ 2023-03-16 15:46:44 ncopa: would you have a moment to take a look with me in case I'm overlooking something? 2023-03-16 15:47:43 Everything works when I do ip6tables -I FORWARD -j ACCEPT 2023-03-16 15:48:05 ikke: i have a look 2023-03-16 15:48:15 where? 2023-03-16 15:48:32 The host is 2a0a:e5c1:517:cafe:da5e:d3ff:fee6:1f28 2023-03-16 15:48:36 (che-ci-1) 2023-03-16 15:48:48 need to find a machine with ipv6 2023-03-16 15:49:08 most alpine servers should have ipv6 2023-03-16 15:49:30 and my local wifi has (my cabled ethernet has not) 2023-03-16 15:49:40 ah ok 2023-03-16 15:50:12 so, what am I looking at? 2023-03-16 15:50:50 you should be able to do `ssh aarch64` on that server 2023-03-16 15:51:44 hmm, maybe assymetric routing is the issue (which would break conntracking) 2023-03-16 15:52:25 assymetric routing does break conntrack yes 2023-03-16 15:52:43 Network setup: router <-> server <-> vm <-> container 2023-03-16 15:53:10 on the vm, I added 2a0a:e5c1:517:2000::/64 as a subnet 2023-03-16 15:53:32 On the host, I added the route: 2a0a:e5c1:517:2000::/64 via 2a0a:e5c1:517:1::20 dev br0 2023-03-16 15:53:45 2a0a:e5c1:517:1::20 is an ip on that server 2023-03-16 15:53:52 I suspect that might be the issue 2023-03-16 16:03:49 Yeah, assymetrical routing 2023-03-16 16:05:35 ncopa: sorry, apparently only needed you as rubber duck 2023-03-16 16:14:01 ikke: I'm glad it helped :) 2023-03-16 16:16:22 It's working now 2023-03-16 16:31:14 hmm, curious, the container has ipv6 access now, but the vm not 2023-03-16 16:31:45 bizarre 2023-03-16 18:43:23 is the ci back 2023-03-16 18:46:58 Yes, finishing armhf 2023-03-16 18:47:02 and then all should be good 2023-03-16 18:47:32 :) 2023-03-16 20:17:16 psykose: making progress 2023-03-16 20:20:49 good to hear 2023-03-16 20:28:59 hmm 2023-03-16 20:32:57 I broke something.. 2023-03-16 20:37:44 I think dmvpn does not like it when 2 sites try to connect to the network with the same source ip :) 2023-03-16 20:51:17 But that means we will not get dmvpn any time soon 2023-03-16 21:48:09 clandmeter: I did manage to get everything working now (except dmnvpn). 2023-03-16 22:02:15 euh 2023-03-16 22:02:52 [problem] 2023-03-16 22:04:53 pourqoi 2023-03-16 22:07:14 Should be recovering 2023-03-16 22:09:38 No idea what happened 2023-03-16 22:09:52 gbr-app-1 suddenly lost the dmvpn routes again 2023-03-17 02:50:27 ikke: seems that now apache-ant just hung forever instead of failing 2023-03-17 02:50:28 weird 2023-03-17 05:45:29 huh 2023-03-17 05:45:36 why does this keep happening 2023-03-17 05:54:26 java seems to deadlock for some reason 2023-03-17 05:59:48 nice 2023-03-17 06:00:02 love java 2023-03-17 06:07:30 strange, because yesterday on armv7 it would at least continue until the 404 2023-03-17 06:07:39 now if I try it manually on aarch64, it hangs as well 2023-03-17 06:08:04 don't think much changed overnight 2023-03-17 06:08:19 does it hang on some weird shit or just the network 2023-03-17 06:08:26 i guess the latter usually timeouts but not always 2023-03-17 06:10:15 I didn't see any network traffic 2023-03-17 06:15:15 so many threads 2023-03-17 06:30:58 psykose: I suspect an MTU issue 2023-03-17 06:31:04 could be 2023-03-17 06:46:27 Not sure why it stopped working, it worked yesterday, but I have too continue later 2023-03-17 06:47:58 so it _is_ an mtu issue, if I hard lower the MTU on the interface in the contaier, it works 2023-03-17 06:48:04 now how to fix it properly 2023-03-17 06:48:18 local.d echo 2023-03-17 06:48:21 : ) 2023-03-17 06:48:51 I mean, I know how I can set the mtu on the interfaces 2023-03-17 06:49:06 But it should not be necessary to do that on the container interface 2023-03-17 06:49:19 yeah, just jokin 2023-03-17 06:55:54 ikke: hi can we sync later today? 2023-03-17 06:56:05 Maybe via a call or so 2023-03-17 07:00:37 I have time during lunch 2023-03-17 09:12:13 maybe clamp-mss helps? 2023-03-17 09:17:12 this video explains the problem https://blog.ipspace.net/2013/01/tcp-mss-clamping-what-is-it-and-why-do.html 2023-03-17 10:41:52 I was thinking about it 2023-03-17 13:51:23 do we have access today to arm lxcs? 2023-03-17 14:11:55 Ipv6 only 2023-03-17 14:12:05 I gave you the address 2023-03-17 14:14:24 2a0a:e5c1:517:cafe:da5e:d3ff:fee6:1f28 2023-03-17 14:15:19 maybe I could try over wg.a.o as intermediary router 2023-03-17 14:16:24 but first have to refresh my knowledge about ipv6 2023-03-17 14:18:08 ACTION never learn to keep things reachable on my premise 2023-03-17 14:18:17 That address is the host 2023-03-17 14:18:42 I dm'd the addresses of the containers 2023-03-17 14:19:22 Afaik, you should even be able to route ipv6 over wireguard 2023-03-17 14:19:29 clandmeter set that is 2023-03-17 14:19:35 Up* 2023-03-17 14:21:25 what are addresses of the lxcs 2023-03-17 14:24:55 oh, ssh between x86_64 and riscv64 lxcs doesn't works again 2023-03-17 17:04:45 hmf, for some reason ipv4 forwarding is broken again 2023-03-17 21:10:43 ikke: we only have one job per arm arch now right 2023-03-17 21:11:02 oh, I probably forgot to increase concurrency 2023-03-17 21:11:04 yeah 2023-03-17 21:13:36 thanks 2023-03-17 21:13:46 should be fixed now 2023-03-17 21:14:04 it is 2023-03-17 21:14:18 hm 2023-03-17 21:14:29 did we ever profile why the containers take like 40 seconds to start sometimes 2023-03-17 21:39:17 I thought it had to do with https://i.imgur.com/6NhDTqI.png 2023-03-17 21:40:09 But they do not recommend lowering it <1.0 2023-03-17 21:40:14 https://docs.gitlab.com/ee/administration/polling.html 2023-03-17 21:40:22 and it's only to do with the interface 2023-03-17 21:48:02 ah 2023-03-17 21:48:07 ok, interface part makes sense 2023-03-17 21:48:19 i guess the reason it's 1m20s for things that take 5s locally is clone + some scripts 2023-03-17 21:48:24 maybe we should measure script overhead 2023-03-17 21:48:36 the clone used to be faster but we raised it for commit history right 2023-03-17 21:48:41 to like 100 depth 2023-03-17 21:51:07 yes 2023-03-17 21:51:19 otherwise older MRs would fail to resolve the merge base 2023-03-17 21:51:38 was that the reason? how did that work? 2023-03-17 21:51:43 my memory only reminds me of commit count 2023-03-17 21:51:58 (can't have more commits than the depth) 2023-03-17 21:52:26 It knows what packages changed by looking at the changes since the merge base between the target branch and the source branch 2023-03-17 21:53:03 that just sounds like the same thing? 2023-03-17 21:53:05 having a too shallow clone causes master to be detached from the source branch 2023-03-17 21:53:14 hmm 2023-03-17 21:53:19 ahh i see 2023-03-17 21:53:29 doesn't that mean the depth has to be as many as commits are behind 2023-03-17 21:54:34 https://gitlab.alpinelinux.org/alpine/aports/-/blob/master/.gitlab-ci.yml#L7 2023-03-17 21:55:16 aye 2023-03-18 09:18:55 ikke: I can't ssh to my x86_64 lxc (mps-edge-ax86_64 I think). It says that connection is refused. 2023-03-18 09:19:36 is it some firewall rule or the lxc needs restart or maybe only sshd restart 2023-03-18 09:26:04 The latter I guess 2023-03-18 11:17:56 hm, it is not firewall problem. it is something with key exchange 2023-03-18 12:50:33 ikke: think edge-aarch64 was stuck on something for forever 2023-03-18 13:34:21 it's not stuck 2023-03-18 13:34:25 just sitting idle 2023-03-18 13:47:38 networking again? :p 2023-03-18 13:49:07 it doesn't get any jobs 2023-03-18 13:55:23 yes 2023-03-18 13:55:29 ipv4 broken, but tries to connect to ipv4 2023-03-18 13:57:51 it somehow lost its ipv6 address 2023-03-18 13:57:53 other containers still had it 2023-03-18 16:16:02 mps_: I've restarted your container, can you access it now? 2023-03-18 16:20:43 ikke: thanks. tcptraceroute shows that the sshd is not accessible or not started '3 172.16.26.16 [closed] 40.830 ms 40.992 ms 41.543 ms' 2023-03-18 16:21:35 interesting is I can connect to riscv64 lxc on same host 2023-03-18 16:21:46 mps_: let me check 2023-03-18 16:22:17 "OpenSSL version mismatch. Built against 30000070, you have 30100000" 2023-03-18 16:22:36 upgraded the package in the container 2023-03-18 16:22:52 can you access it now? 2023-03-18 16:23:09 yes 2023-03-18 16:23:14 thank you 2023-03-18 16:23:46 hmm, how this happened. I usually do 'apk upgrade' 2023-03-18 16:23:54 in one shot 2023-03-18 16:24:04 there was a few hours during which the upgrade was broken 2023-03-18 16:24:09 guess you got unlucky 2023-03-18 16:24:12 openssh had a too strict version check 2023-03-18 16:24:37 aha, this explains 2023-03-18 16:41:04 ah, I think ipv4 is working, it's just that due to the host it's connected to, we cannot reach things like alpinelinux.org over ipv4 (due to the NATting) 2023-03-18 16:43:19 ikke: what corecount are the arm builders limited to 2023-03-18 16:43:33 no limits 2023-03-18 16:43:46 you sure? it stays at 80% (which is suspiciously 64/80) 2023-03-18 16:43:55 oh, yes 2023-03-18 16:43:57 in abuild.conf 2023-03-18 16:44:07 Not on container level 2023-03-18 16:44:12 yea 2023-03-18 16:44:32 Is that an issue? 2023-03-18 16:44:36 not quite 2023-03-18 16:44:40 just not 100% :-) 2023-03-18 16:44:46 heh 2023-03-18 20:19:42 not fantastic 2023-03-18 20:49:17 Hmmm 2023-03-18 20:49:53 sorry i was playing in the datacenter and tripped over a cable 2023-03-18 20:50:42 gonna reboot deu1/deu7 2023-03-18 20:55:18 psykose: don't you know the datacentre rules? no running, playing, eating, drinking, having fun... ;-) 2023-03-18 20:56:11 I spent so much of my life in DCs I think it affected my hearing 2023-03-18 20:56:35 yeah i'm young enough to know to buy some earplugs if i have to do that 2023-03-18 20:56:36 :p 2023-03-18 20:57:01 algitbot: ping 2023-03-18 21:34:03 the arm CI networking is like 20kb/s haha 2023-03-18 21:46:57 Who needs more anyway? 2023-03-19 03:21:13 :) 2023-03-19 03:21:24 strangely i can't connect to my aarch64 container too 2023-03-19 03:21:31 probably forgot restart the sshd 2023-03-19 03:21:38 hrm, didn't think of that 2023-03-19 03:21:50 not only do you have to upgrade to new libcrypto, but the forks also fail? 2023-03-19 03:22:00 so you have to restart the whole process or you lock yourself out 2023-03-19 03:22:38 hm, no 2023-03-19 03:22:40 just upgrade is fine 2023-03-19 03:22:53 maybe it was indeed inbetween rebuilds 2023-03-19 15:28:14 psykose: I can reach 80/100 on aarch64 ci 2023-03-19 18:28:50 ikke: any chance to get login to my armv7 lxc 2023-03-19 18:30:17 mps: I see that you have 2a01:7e01:e001:46b::3/128 assigned for WG? 2023-03-19 18:30:36 yes 2023-03-19 18:30:56 is this address ok 2023-03-19 18:36:30 I think so 2023-03-19 18:37:10 ikke: it's still as broken as before and all the networking takes 15 minutes to install packages 2023-03-19 18:37:21 could you also restart my aarch64 container 2023-03-19 18:49:17 done 2023-03-19 18:52:28 hmm 2023-03-19 18:52:29 still refused 2023-03-19 18:52:44 ah, before it was a kex failure 2023-03-19 18:52:46 now it's refused 2023-03-19 18:52:51 did the ip change too 2023-03-19 19:13:42 same issue as mps had 2023-03-19 19:14:17 upgraded openssh-server 2023-03-19 19:14:26 wondering if the slow package install is that it tries ipv4 first 2023-03-19 19:21:37 probably 2023-03-19 19:22:42 care to do a tcpdump or strace? 2023-03-19 19:26:29 where from 2023-03-19 19:26:53 not sure how i'd do that on ci 2023-03-19 19:41:39 Not sure if it's the same, but I also noticed it in your container 2023-03-19 19:59:31 ah, hm 2023-03-19 20:00:10 tcpdump in the container is like all the host traffic :p 2023-03-19 20:00:26 lets try something more targeted 2023-03-19 20:01:25 curl -6 -LO https://dl-cdn.alpinelinux.org/alpine/edge/main/x86_64/clang15-static-15.0.7-r10.apk 2023-03-19 20:01:28 average 10MB/s 2023-03-19 20:01:40 (pretty slow, but it works) 2023-03-19 20:01:45 curl -4.. hangs forever 2023-03-19 20:02:30 penge 2023-03-19 20:02:47 not sure this has any v4 2023-03-19 20:02:57 ah, google.com works 2023-03-19 20:03:11 not sure why dl-cdn doesn't 2023-03-19 20:03:29 Hmm, MTU perhaps 2023-03-19 20:05:14 What if you set the mtu of eth0 to something like 1300 2023-03-19 20:07:23 ikke: how can I find ip address of my armv7 lxc 2023-03-19 20:07:32 mps: I dm'd it your twice 2023-03-19 20:07:45 you* 2023-03-19 20:07:47 hm, I didn't saw it 2023-03-19 20:07:55 it works 2023-03-19 20:08:03 dl speed is like 5MB/s 2023-03-19 20:08:13 and what "dm'd" means 2023-03-19 20:08:19 private message 2023-03-19 20:08:34 mps: I just sent you one, don't you see that? 2023-03-19 20:08:47 hmm, didn't see private messages from you for long time 2023-03-19 20:16:57 psykose: strangely enough I can download from dl-cdn from the aarch64 container just fine with the default mtu 2023-03-19 20:17:06 curl -4 ? 2023-03-19 20:17:35 what's weird i guess is everything works fine by default (i assume defaulted to v6) 2023-03-19 20:17:35 oh right 2023-03-19 20:17:41 but in CI it goes giga slow 2023-03-19 20:18:06 though 10MB/s is still giga slow, we're just doing relative here 2023-03-19 20:19:28 yeah, I'm seeing max 70/80mbit over the wan 2023-03-19 20:28:52 psykose: I've set clamp-mss, and now at least it works over v4 without setting mtu explicitly 2023-03-19 20:29:01 still ~80mbit max download 2023-03-19 20:29:41 4 and 6 seems the same speed now 2023-03-19 20:29:45 back on the 1500 mtu default 2023-03-19 20:30:00 maybe that fixes the ci, maybe not 2023-03-19 20:30:05 let's see 2023-03-19 22:27:57 hmm, no, still slow as shit 2023-03-19 22:28:06 e.g. https://gitlab.alpinelinux.org/alpine/aports/-/jobs/995269 2023-03-19 22:28:22 it's 8x the runtime of x86_64 so far 2023-03-19 22:28:27 all in the downloads 2023-03-20 06:13:36 I've added mss-clamping on the router now (in the same way as awall does). I've reached ~140mbit (15MB/s), which I've not seen before 2023-03-20 06:25:28 ci is much faster too 2023-03-20 06:25:40 hm, but the ci /does/ do http:// repositories right 2023-03-20 06:25:49 maybe it makes sense to setup-apkcache in the ci containers then 2023-03-20 06:29:40 If the cache is persisten, we would also need something to clean up the cache 2023-03-20 06:30:17 non-persistent 2023-03-20 06:30:19 just per ci run 2023-03-20 06:30:26 cleaned at the end like any run 2023-03-20 06:31:04 so it only helps builds with multiple packages? 2023-03-20 06:31:14 i'd imagine it adds like two seconds of cleaning but anything that builds more than one package saves way more than two seconds 2023-03-20 06:31:54 and in general isn't exactly very big to meaningfully change any limits 2023-03-20 11:07:23 ikke: sorry to annoy you again but I lost ssh connection to my armv7 lxc and afaik you are only one who have access to the host. I think libcrypto and/or openssh should be fixed/upgraded 2023-03-20 11:09:25 I'm not in a hurry 2023-03-20 11:14:17 mps: upgraded openssh-server and openssh-clietn 2023-03-20 11:14:27 (and restarted sshd) 2023-03-20 11:14:48 ikke: thank you again. I'm connected now 2023-03-20 11:16:41 btw, sftp doesn't work with 'jump server' '-J' or I don't know how to use it 2023-03-20 11:20:49 oh, it work somehow 2023-03-20 11:21:29 and network speed is very fast 2023-03-20 14:38:08 looks like 6.2 kernel is porblematic on armv7 2023-03-20 14:38:22 problematic* 2023-03-20 18:44:41 hm no, the network is still slow in arm CI 2023-03-22 11:09:35 the vigir unreachable was us, we recabled the connection to make it more robust 2023-03-22 11:20:15 telmich: ok 2023-03-22 11:20:52 I think it started using wwan0 again, I need to disable it permanently 2023-03-22 11:22:36 I disabled it in luci now 2023-03-22 11:33:40 I'll take out the sim card today, ikke 2023-03-22 11:33:46 telmich: alright 2023-03-22 11:38:50 That said, it's just done, because today I am actually in place5 for some network updates 2023-03-22 11:39:14 Kinda makes sense... the router saw that the wan port was gone and tried to get the next best connection up 2023-03-22 11:39:20 Even though I am surprised openwrt is that smart 2023-03-22 11:51:14 telmich: the download speeds are still fluctuation. At peaks we reach 150mbit but it's going up and down 2023-03-22 11:51:57 / # curl -o /dev/null https://dl-cdn.alpinelinux.org/alpine/edge/community/x86_64/xonotic-data-0.8.5-r0.apk 2023-03-22 11:52:00 % Total % Received % Xferd Average Speed Time Time Time Current 2023-03-22 11:52:02 Dload Upload Total Spent Left Speed 2023-03-22 11:52:04 32 1107M 32 362M 0 0 4227k 0 0:04:28 0:01:27 0:03:01 3311k 2023-03-22 11:53:48 telmich: hmm, testing with https://link.testfile.org/1GB and it seems stable 2023-03-22 11:53:57 13MB/s - 15MB/s 2023-03-22 11:54:14 I'll investigate 2023-03-22 15:54:00 ARM builds not quite fixed yet? 2023-03-22 15:59:45 they were and now they're broken again 2023-03-22 16:13:17 gitlab 15 2023-03-22 16:13:49 sorry, 15.10 2023-03-22 16:27:26 psykose: ok, I now realize what's going on with CI 2023-03-22 16:27:37 git lab :3 2023-03-22 16:27:49 what's up 2023-03-22 16:28:08 I've added an ipv6 network for gitlab-runner 2023-03-22 16:28:18 but that ofcourse does not affect any containers that are created by the runner 2023-03-22 16:29:34 ah 2023-03-22 16:29:35 :) 2023-03-22 16:29:45 so CI has been using ipv4 all along 2023-03-22 16:29:58 which is now broken because for some reason the default route was replaced on the CI host 2023-03-22 16:32:10 networking adventures are always so much fun 2023-03-22 16:32:12 aren't they 2023-03-22 16:36:37 Yeah, I'm always so excited when the next one crops up 2023-03-22 16:39:19 :3 2023-03-22 16:39:58 legend says when ikke was a small lad he got tangled in a bunch of networking cable 2023-03-22 16:40:00 was never the same again 2023-03-22 20:27:34 I faught and I overwon 2023-03-22 20:33:37 :) 2023-03-23 01:51:30 ikke: now the dev containers have really slow networking again :D 2023-03-23 05:51:01 Yeah, I'm kinda puzzled how it worked before and even more so why it's so poorly performing 2023-03-23 05:51:20 When I test downloading things manually, they are fast (in docker on the CI vm) 2023-03-23 05:55:15 yeah the CI is fast now 2023-03-23 05:55:18 flipped :D 2023-03-23 05:55:54 I haven't changed anything since yesterday evening 2023-03-23 05:56:05 I've setup ipv6 for docker globally 2023-03-23 06:04:27 ah no, armv7 ci is also slow still 2023-03-23 06:04:29 or maybe it's random 2023-03-23 06:17:56 Or depending on what the builders are doing 2023-03-23 06:19:37 nope 2023-03-23 06:19:38 idle 2023-03-23 06:19:52 hmm 2023-03-23 06:20:49 downloading with 17MB/s now, but it's fluctuating 2023-03-23 06:20:53 20MB/s 2023-03-23 20:57:51 ikke: seems armhf ci gets stuck on rust again somehow 2023-03-23 20:57:56 did we lose the workaround 2023-03-23 20:58:03 psykose: let me check 2023-03-23 21:02:04 psykose: https://tpaste.us/rjnQ 2023-03-23 21:05:48 hmm 2023-03-23 21:05:54 weird 2023-03-23 21:06:04 it behaves in the ci containers as if it's =1 2023-03-26 09:13:13 ikke: were there issues with GIT_STRATEGY=fetch for the gitlab ci? (apparently it's faster than clone) 2023-03-26 09:49:20 good morning 2023-03-26 09:49:45 finally have an hour to spend on alpine 2023-03-26 09:52:33 tasty hour 2023-03-26 09:52:55 4 weeks away from home is kind of... 2023-03-26 09:53:15 troublesome in some ways 2023-03-26 09:53:15 homesick yet? :D 2023-03-26 09:53:44 don't think there's much that needs fixing, except armhf rust ci hangs again despite sysctl thing being et 2023-03-26 09:53:47 set* 2023-03-26 09:53:48 more like, i need to fix a gazillion things... and preferable on on monday :) 2023-03-26 09:53:52 maybe there's another gotcha 2023-03-26 09:54:12 yes i have one issue in my mailbox related to alpine 2023-03-26 09:54:41 does lxc do something that can affect it? i don't think sysctl stuff is scoped to anything so it should be global 2023-03-26 09:54:45 bonding with ice network driver. 2023-03-26 09:54:52 :) 2023-03-26 09:55:39 seems like alpine on some equinix hw is failing 2023-03-26 10:02:40 psykose: still having lxc issues? 2023-03-26 10:02:57 it's actually docker in ci so actually... :p 2023-03-26 10:03:04 builders are still fine so it's weird 2023-03-26 10:03:07 i saw a msg from docker about free not being canceled 2023-03-26 10:04:05 oh, so there is a difference between docker and lxc? 2023-03-26 10:04:50 as i described above :D 2023-03-26 10:05:00 same rust thing as before just with the arg =2 2023-03-26 10:06:35 hmm 2023-03-26 10:07:36 but it was working after we set it right? 2023-03-26 10:08:10 yeah 2023-03-26 10:08:13 just randomly broke 2023-03-26 10:08:16 but it is still set 2023-03-26 10:10:34 any kernel updates? 2023-03-26 10:11:50 don't think so, but ikke would know 2023-03-26 10:12:19 yeah the uptime is 27 days 2023-03-26 10:12:42 ive been trying to talk to ikke, but i did not find time. and now he seems not online :) 2023-03-26 10:12:59 indeed :D 2023-03-26 11:47:52 almost on 2 hours now... time to go :) 2023-03-26 11:50:54 enjoy :) 2023-03-26 12:13:12 psykose: fetch + shallow is problemeting because the old shallow points remain 2023-03-26 12:13:29 yeah, that's why it's fast and why it is good :p 2023-03-26 12:13:32 what does that break 2023-03-26 12:14:07 git merge-base 2023-03-26 12:14:28 (git diff A...B) 2023-03-26 12:14:49 which we use to determine what packages have changes and what to build 2023-03-26 12:15:15 hm 2023-03-26 12:15:19 is there no other way to architect that 2023-03-26 12:15:35 the base doesn't change, and the current checkout is kept up to date, so it should work still? 2023-03-26 12:15:50 i don't see why it being fresh-cloned or fetched into current would affect that 2023-03-26 12:15:56 The merge base can be different for each MR 2023-03-26 12:16:11 ah because they all share it right 2023-03-26 12:16:14 yes 2023-03-26 12:16:41 THat's why it's fast :P 2023-03-26 12:17:01 super easy solution: 2023-03-26 12:17:09 get pkglist from grep pkgname= 2023-03-26 12:17:09 :p 2023-03-26 12:17:14 /s 2023-03-26 12:17:30 ah but right what to grep, same issue 2023-03-26 12:17:36 yeah 2023-03-26 12:17:43 it's not getting a list of package names 2023-03-26 12:17:49 is there no other way to diff commits only 2023-03-26 12:17:49 it's about detecting whata changes 2023-03-26 12:17:52 changed 2023-03-26 12:18:10 newer versions of gitlab did add some features regarding this 2023-03-26 12:18:32 that sounds fancy 2023-03-26 12:18:33 mostly regarding to branch pipelines, but not sure if it would work for us 2023-03-26 12:19:09 And, I'm not sure if it works for new (read feature) branches 2023-03-26 12:20:20 Yeah, CI_COMMIT_BEFORE_SHA, only works when you push new commits to existing branches 2023-03-26 12:20:45 So that's why we use merge request pipelines, where you know what the target branch is 2023-03-26 12:21:15 Then we fetch the target branch to make sure we have the commits and calculate the changed commits in the source branch 2023-03-26 12:21:21 from there it's easy to find the changed packages 2023-03-26 12:24:25 mhm 2023-03-26 12:24:27 ok, makes sense 2023-03-26 12:25:05 i do wonder if there's a way to do this without this stuff 2023-03-26 12:25:55 i.e. when people pr things the gitlab webui shows only the commits, so that implies they either do the same calculations (regardless of any of these settings at all, obviously), or there is another way 2023-03-26 12:38:40 They do not expose that list to CI afaik 2023-03-26 12:38:55 but they have to do a similar operation 2023-03-26 12:39:30 the interface also does not need to use shallow clones, they do all operations on the original repositories 2023-03-26 12:40:12 https://docs.gitlab.com/ee/ci/variables/predefined_variables.html lists all variables that are present 2023-03-26 12:40:25 One option could be to use the API 2023-03-26 12:42:48 messy 2023-03-26 12:43:02 i mean current one is ok, i just perpetually wonder what is so slow 2023-03-26 12:43:10 if i run clone depth 500 it takes like 2s 2023-03-26 12:43:26 the containers on my computer start in 0.01s 2023-03-26 12:43:34 i can't imagine a git diff takes 40 seconds 2023-03-26 12:46:11 https://gitlab.alpinelinux.org/api/v4/projects/1/merge_requests//commits 2023-03-26 12:49:57 hm 2023-03-26 12:49:59 well it does work 2023-03-26 12:50:30 i assume we then tally up the id: and git-diff 2023-03-26 12:52:21 We would only really need the last commit id 2023-03-26 12:52:46 and then git diff --name-only ~ HEAD 2023-03-26 12:53:16 But, the question is if this is really the bottleneck 2023-03-26 12:53:19 like you said 2023-03-26 12:53:39 yeah 2023-03-26 12:53:49 maybe we could add some timestamps into the scripts i guess 2023-03-26 12:53:57 also, what did you want for the apkcircledep thing 2023-03-26 12:54:04 got a container with it published 2023-03-26 13:00:33 Do you just execute it and it returns an error when there are circular dependencies? 2023-03-26 13:01:13 And what if someone pushes a circular dependency to aports, would that affect all MRs? 2023-03-26 13:04:25 Does it make sense to run it scheduled and / or on branch pipelines? 2023-03-26 13:57:26 it would affect all mrs yes 2023-03-26 13:57:38 just like how if i push a syntax error it affects all mrs on aports scan :p 2023-03-26 13:58:00 i don't think that matters that much because it's not like we have 9000 mrs a day and pipeline-pass requirements 2023-03-26 13:58:26 it makes sense to run it scheduled, idk what a branch pipeline is 2023-03-26 13:58:40 the biggest use of it is not scheduled (i just do that myself) but on mr changes to catch circles before merge 2023-03-26 13:58:57 also would help some people with "why is the build order wrong" 2023-03-26 13:59:21 and yeah you just run it in aports roon 2023-03-26 13:59:23 root* 2023-03-26 14:13:04 We now use MR pipelines, which are only created as soon as someone creates an MR for a branch 2023-03-26 14:13:17 branch pipelines run as soon as someone pushes a change to a branch 2023-03-26 14:13:53 So it might make sense to have some lightweight jobs that run on master / stable-* 2023-03-26 14:18:39 ah 2023-03-26 14:18:46 sure, but this check is really only useful on merge 2023-03-26 14:18:55 and as the occasional timed one, that opens an issue if it fails 2023-03-26 14:19:06 i like timed stuff that opens issues tbh :) 2023-03-26 14:24:47 It's usefull for any change to the stable branches in addition to changes introduced in MRs 2023-03-26 14:26:37 and with a scheduled pipeline, afaik only the user who created the schedule will get a notification when the pipeline fails 2023-03-26 14:26:54 when a branch pipeline fails, the person who pushed to that branch gets the notificaiton 2023-03-26 14:34:37 neither sound that useful if i want an open issue instead 2023-03-26 14:35:05 You'd need a bot that would do that 2023-03-26 14:37:36 But getting a notification: Your push broke stuff would at least help 2023-03-26 14:37:59 better than nothin 2023-03-26 14:38:11 but i really would say it is still 50x more useful as ci output 2023-03-26 14:38:34 Sure, one does not preclude the other, but for CI, I would limit it to packages that actually changed 2023-03-26 14:38:36 the rest is just noise 2023-03-26 14:40:18 ok, let's not implement it 2023-03-26 14:40:46 why not? 2023-03-26 14:41:25 It's not that difficult to add logic that passes the build if the issues are not related to pushed changes 2023-03-26 14:41:37 it's not very easy to detect that 2023-03-26 14:42:00 nor do i care about implementing it for this hypothetical benefit to save one ci pipeline somewhere 2023-03-26 14:42:38 We can add it and fix it later if it turns out to be an issue 2023-03-26 14:42:43 I mean, add it to CI as is 2023-03-26 14:43:51 I'm just thinking about the users that have no clue why their pipeline fails / give a warning without them doing anythign wrong 2023-03-26 16:13:20 psykose: You mean this right? "Queued: 50 seconds" 2023-03-26 16:13:22 for CI jobs 2023-03-26 16:13:26 nope 2023-03-26 16:13:33 oh 2023-03-26 16:13:40 purely duration 2023-03-26 16:13:45 these 1-commits are pretty ok 2023-03-26 16:14:01 though as an example for me the 37-second one is.. 8 2023-03-26 16:14:29 well no, like 12 2023-03-26 16:14:35 it's alright :) 2023-03-26 16:14:44 i remember when it used to be like 60 seconds per pipeline base 2023-03-26 16:14:52 and then also the lint job in serial too 2023-03-26 16:15:00 those were darker days :p 2023-03-26 16:15:31 CI jobs should generally be able to start instantly as long as there are runners available with capacity 2023-03-26 16:16:01 i don't mean the pending to start time 2023-03-26 16:16:13 pending/queued 2023-03-26 16:16:33 yeah, but that's an issue I notice as well 2023-03-26 16:18:38 ah, yeah, sure 2023-03-26 16:18:44 i do notice some idles sometimes of a minute or so 2023-03-26 16:18:52 but usually when used 2023-03-26 16:19:03 so i assume there's post-complete cleanup time that takes a while? if that's how it works 2023-03-26 16:19:17 my knowledge of the whole model is quite opaque in what it's doing or how it schedules 2023-03-26 16:24:38 i read in https://github.com/ohwgiles/laminar the word 'fast'. this means we must move to it immediately 2023-03-26 16:29:14 psykose: do you have an example of a job you are describing? 2023-03-26 16:29:27 nope 2023-03-26 16:29:41 aside from 'anything with 1m queued when a runner is available for sure' 2023-03-26 16:29:42 i guess? 2023-03-26 16:29:45 it's not bad or anything 2023-03-26 18:11:12 ikke: oh btw, can i have a s390x container too 2023-03-26 18:11:24 (... that gives me one of each except ppc64le :p) 2023-03-26 18:15:01 gotta collect them all 2023-03-26 18:15:39 note that we do not have a lot of space left for s390x 2023-03-26 18:16:55 yep 2023-03-26 18:17:05 don't need much though 2023-03-26 18:18:46 psykose-edge-s390x.usa2.alpin.pw 2023-03-26 18:18:51 thanks 2023-03-26 18:20:15 looks like it's time for the 'ssh password prompt' debugging again 2023-03-26 18:34:22 ikke: fyi there are two debug systems in equinix panel 2023-03-26 18:42:53 ikke: adding it to modules fixes the issue it seems 2023-03-26 18:43:02 OK, good 2023-03-26 18:43:11 or im just beeing lucky :D 2023-03-26 18:43:41 I suppose it should work, as that's what nlplug-findfs does 2023-03-26 18:44:04 yup 2023-03-26 20:57:34 ikke: https://gitlab.alpinelinux.org/alpine/infra/alpine-mksite/-/merge_requests/61 - should fix j0wi's issue #13 2023-03-27 20:23:26 ikke: do you know why it asks me for a password for the s390x vm 2023-03-27 20:24:11 same as last time 2023-03-27 20:24:28 Failed none for invalid user psykose from 172.16.252.8 2023-03-27 20:24:43 no, it's any user 2023-03-27 20:25:06 hm 2023-03-27 20:25:12 yes, same mistake as last time 2023-03-27 20:25:15 ah 2023-03-27 20:25:15 can you try now 2023-03-27 20:25:34 I'm used you start at ~/, but when you attach you start at / 2023-03-27 20:27:21 ah 2023-03-27 20:27:23 yeah it works 2023-03-27 20:27:25 thanks :) 2023-03-29 09:47:08 i think we need to make releases today 2023-03-29 09:52:56 ncopa: 3.17 release? 2023-03-30 07:47:21 Disk space is going to be an issue the next release 2023-03-30 07:49:40 what ca we do about it? 2023-03-30 07:50:07 it is a problem on nld8 and nld9? 2023-03-30 07:54:17 usa2 as well 2023-03-30 07:54:49 i'm not sure how to solve the fundamental problem 2023-03-30 07:54:55 One option could be to drop community on the builders for eol (-1) versions? 2023-03-30 07:55:09 drop in what sense? 2023-03-30 07:55:15 where would it go 2023-03-30 07:55:33 It would still be on dl-master 2023-03-30 07:55:37 ah 2023-03-30 07:55:42 yeah 2023-03-30 07:55:48 well 2023-03-30 07:55:55 sometimes someone backports something to an eol version 2023-03-30 07:55:59 what happens then? 2023-03-30 07:56:02 Nothing 2023-03-30 07:56:10 which would be 'not great' 2023-03-30 07:56:12 hm 2023-03-30 07:56:21 what is the current age of repos on the builders 2023-03-30 07:56:29 -12? 2023-03-30 07:56:32 -20? 2023-03-30 07:57:09 Any 3.x builder at least 2023-03-30 07:57:57 we could do -5 for main and -2 for community i guess, though sometimes someone does backport something 4 back for community 2023-03-30 07:58:33 fundamentally though eventually we'll also have the issue on dl-master too since we have a bit of a flawed model wrt disk space 2023-03-30 07:59:45 Problem is that older releases are relatively small 2023-03-30 07:59:51 since there's releases every 6 months that duplicate the repo every time (as opposed to years like most distros do), and thanks to me we sure added a lot of stuff to the repos :D 2023-03-30 07:59:57 maybe we should just remove all the stuff again 2023-03-30 08:00:15 and we also have 8 architectures* 2023-03-30 08:00:20 which is another 8x 2023-03-30 08:00:45 so every hello world someone packages in go is 50MB*8 every time 2023-03-30 08:01:37 That counts especially for the large noarch stuff 2023-03-30 08:01:55 that would be some nice savings yeah, but in scope of the whole repo.. it's probably 1% 2023-03-30 08:02:02 or less 2023-03-30 08:02:16 it's really just a thousand cuts thing 2023-03-30 08:02:19 and then there's dotnet 2023-03-30 08:02:33 eternal regrets 2023-03-30 08:04:15 other idea: move all games to a separate place 2023-03-30 08:09:52 that is another 1% 2023-03-30 08:10:00 but i agree if what you mean is "remove games from aports" 2023-03-30 08:10:33 "all games" is probably somewhat not needed though, could just be things with assets 2023-03-30 08:10:48 like the whoever that has a 1GB assets game, etc 2023-03-30 08:11:05 rust is 642MB 2023-03-30 08:11:10 it is 2023-03-30 08:11:38 also with go and rust gaining popularity, binaries are increasing in size 2023-03-30 08:11:52 the rust ones are mostly fine 2023-03-30 08:11:59 the go ones are 10x bigger on average 2023-03-30 08:12:08 a binary can easily be 100MB and when complaining on it response is: "who cares?" 2023-03-30 08:12:12 libpostal-data 2023-03-30 08:12:19 yeah i complain all the time 2023-03-30 08:12:29 and all i get is a shrug who cares indeed 2023-03-30 08:13:29 libpostal-data was changed to be a script 2023-03-30 08:13:32 you have to run it at runtime now 2023-03-30 08:14:34 oh 2023-03-30 08:14:35 nvm 2023-03-30 08:14:41 separate aport 2023-03-30 08:15:19 i agree with not hosting this kinda stuff (though admittedly 764MB compressed size 1x is fine, if we had actual noarch), but you have to make policy 2023-03-30 08:15:29 not just drop 12 packages at random that add up to 0.1% in the ed 2023-03-30 08:15:31 end* 2023-03-30 08:16:07 we need to redesign the build infra, so it makes better use of disk space 2023-03-30 08:16:19 technically, we dont need all the package to be on the build servers 2023-03-30 08:16:46 pretty much 2023-03-30 08:17:01 you could also solve noarch and save some space 2023-03-30 08:25:06 problem with solving noarch is that all build servers across arches would need coordination 2023-03-30 08:25:31 eg arm build server might need for x86_64 (or what ever creates the noarchs) 2023-03-30 08:43:26 it does yeah 2023-03-30 08:45:35 Reproducible builds might help with this as well 2023-03-30 09:18:52 ikke: seems my ci runner is gone 2023-03-30 09:19:00 i guess it was on the old server 2023-03-30 09:19:13 I had registered 2 with support of qemu/kvm 2023-03-30 09:19:23 x86 and arm64 2023-03-30 09:25:50 where is distfiles.a.o hosted? 2023-03-30 09:26:03 the build logs and distfiles cache 2023-03-30 09:27:06 deu5 2023-03-30 09:27:59 so we can delete distfiles on nld9-dev1? 2023-03-30 09:29:18 Yes 2023-03-30 09:29:34 Though they are synced once per day 2023-03-30 09:29:58 so we only delete files older than a week 2023-03-30 09:30:37 uh 2023-03-30 09:30:46 the whole point of distfiles is to keep them cached for rebuilds 2023-03-30 09:31:15 which is why i suggest we keep the cache for a week 2023-03-30 09:31:20 that they are synced 'once a day' instead of immediately shared after first download is an issue, deleting them older than week too makes them quite pointless to keep at all 2023-03-30 09:31:30 if you want to keep them for a week just delete all of distfiles instead 2023-03-30 09:31:36 and cache nothing 2023-03-30 09:31:49 we also have the build logs there 2023-03-30 09:31:59 which we need to keep 2023-03-30 09:32:15 we also keep the distfiles for stable branches 2023-03-30 09:32:30 so we can rebuild stuff even if upstream disappears 2023-03-30 09:32:31 it would be nice if we just had one global distfiles 2023-03-30 09:33:29 i thikn the distfiles.a.o on deu5-dev1 is supposed to be the "one global distfiles" 2023-03-30 09:34:11 idk what is supposed to be or not supposed too be what 2023-03-30 09:34:20 currently it is not one distfiles, it's per-branch 2023-03-30 09:34:31 ah 2023-03-30 09:34:35 right 2023-03-30 09:34:47 The builders will fetch from distfiles when it's missing on the builder 2023-03-30 09:35:21 it would've been good if the builders didn't keep distfiles at all 2023-03-30 09:35:49 i think we can cache them for a few days 2023-03-30 09:35:52 in case of rebuilds 2023-03-30 09:36:05 on builder or global 2023-03-30 09:36:11 on builder 2023-03-30 09:36:29 sure, that makes sense 2023-03-30 09:36:40 a few days of 'hot cache' 2023-03-30 09:36:45 the most important thing is that we have a copy of sources for stable branches, for all packages 2023-03-30 09:36:59 The sync script should clean it up automatically 2023-03-30 09:37:22 we have 40G of distfiles and logs on nld9-dev1 2023-03-30 09:37:25 currently 2023-03-30 09:37:36 also, we can compress the build logs 2023-03-30 09:37:42 we should compress the build logs 2023-03-30 09:39:38 also, we dont need to have the build logs on same place as distfiles 2023-03-30 09:39:47 that was only done that way due to convenience 2023-03-30 09:40:47 the var/cache/distfiles was a shared directory across the builders (so distfiles could be shared) so it was convenient to collect the logs there as well 2023-03-30 10:05:56 yeah 2023-03-30 10:13:30 check the sizes of build logs: https://build.alpinelinux.org/buildlogs/build-edge-aarch64/testing/qt6-qtwebengine/ 2023-03-30 10:25:40 i have cleaned up some old release candidate iso images on nld9-dev1 2023-03-30 10:26:12 usa2 only has 400G /var 2023-03-30 10:26:33 distfiles only 8.2G 2023-03-30 10:31:15 so, other question. can we add more diskspace to builders? 2023-03-30 10:31:43 not sure 2023-03-30 10:32:45 could we use nfs or similar to make smarter use of diskspace? 2023-03-30 10:38:28 i have never heard of anyone having a good time with nfs 2023-03-30 10:39:02 i'd imagine it wouldn't work well here because of geographical limitations, since the builders are not all in the same place 2023-03-30 10:39:32 if they were all same-dc i'd say it would be ok to keep a readonly mount for distfiles or something, but across the atlantic that sounds like some weird issues happening sooner rather than later 2023-03-30 10:40:05 im just throwing out ideas 2023-03-30 10:40:14 yeah, that's good 2023-03-30 10:40:23 what if we did a proxy for abuild fetch? 2023-03-30 10:40:29 I think long term we want to avoid needing to have all the packages on a single builder 2023-03-30 10:40:31 and something 'central' that answers it 2023-03-30 10:40:46 but we need to do something short term within a few weeks for 3.18 build 2023-03-30 10:40:59 the proxy can hold connections until answered for a bit 2023-03-30 10:41:12 so when you push a commit, you get 8 connections from builders instantly 2023-03-30 10:41:17 then it fetches a tarball, and gives it back 2023-03-30 10:41:29 not super easy but it doesn't sound insane? 2023-03-30 10:41:41 I don't think distfiles is the main issue 2023-03-30 10:41:54 it's the growing builders taking up diskspace 2023-03-30 10:41:54 no, i though about apk/build proxy for years ago 2023-03-30 10:41:59 sure, i'm just talking about distfiles 2023-03-30 10:42:20 oof those are some huge logs 2023-03-30 10:42:35 each release I need to do more and more to keep enough diskpace 2023-03-30 10:42:51 the core of the issue is the multiplicative factor 2023-03-30 10:43:00 yeah we need a longterm solution 2023-03-30 10:43:10 we do releases often and each one is a copy of the huge packageset, multiplied by 8 because we have 8 architectures 2023-03-30 10:43:19 yup 2023-03-30 10:43:30 we are uniquely positioned as a distro that does not have a huge company, is very portable, and also has a stable branch, and also actually 4 of them, and we release often 2023-03-30 10:43:34 it's a unique issue :p 2023-03-30 10:43:38 so dotnet is 1G who cares? but for us it means 8Gb 2023-03-30 10:43:48 4, but yeah 2023-03-30 10:43:51 and then release too 2023-03-30 10:44:10 ah no, 5 2023-03-30 10:44:10 and now we want do cloud images in addition 2023-03-30 10:44:16 and there are even multiple dotnets 2023-03-30 10:44:17 yes 5 branches, 4 stable + edge 2023-03-30 10:44:33 We do have the t1 servers with plenty of space for mirroring 2023-03-30 10:44:47 yes, and same goes with modern software. use llvm14 for this and llvm15 for that 2023-03-30 10:44:53 So if we keep the cloud images out of the builders, that should be fine 2023-03-30 10:45:07 we need 2-3 versions of llvm * number of arches * number of release branches 2023-03-30 10:45:45 i can't think of a solution that isn't `ap revdep go | xargs rm -r` 2023-03-30 10:45:46 same with openjdk 2023-03-30 10:46:05 and now also with rust 2023-03-30 10:46:16 for example asahi kernel needs specific version of rust 2023-03-30 10:46:22 i think we will see more and more of that 2023-03-30 10:46:55 webkit has also N different versions 2023-03-30 10:47:18 webkit is different because it's api versions 2023-03-30 10:47:26 for asahi i don't think the solution is to add more rust versions 2023-03-30 10:47:32 that one is within ease of patching it 2023-03-30 10:48:34 what im saying is that people expect support for multiple versions, and pin dependencies to specific versions (see python's pip for example) 2023-03-30 10:48:45 its not uncommon that upstream say "use version X" 2023-03-30 10:48:57 sadly that is more and more the case 2023-03-30 10:50:19 sure, but for asahi you can literally patch it with a bit of work (the rust-for-linux repo gets updated and you can cherry pick some commits early, and people discuss it somewhere there) 2023-03-30 10:50:26 for openjdk most of the versions are dead and EOL 2023-03-30 10:50:31 asahi was just an example 2023-03-30 10:50:33 we only keep them to not cp a binary each release 2023-03-30 10:50:42 yes 2023-03-30 10:50:42 openjdk is mostly for bootstrapping 2023-03-30 10:50:45 yes 2023-03-30 10:50:48 i think we can drop that 2023-03-30 10:50:56 those are things we can do something about 2023-03-30 10:50:57 the only cost is having to do `cp` 2023-03-30 10:51:09 we already do that- bootstrap involves e.g. copying gcc 2023-03-30 10:51:13 copying rust, etc 2023-03-30 10:51:18 yeah 2023-03-30 10:51:25 so, we can also copy 8+11+17(?) 2023-03-30 10:51:25 i suppose we can do that for openjdk as well 2023-03-30 10:51:27 the lts releases 2023-03-30 10:51:30 and drop the rest 2023-03-30 10:51:35 also lets us drop gcc6 for fun 2023-03-30 10:51:37 I've heard something about new jdk support in gcc? 2023-03-30 10:51:41 nah 2023-03-30 10:51:47 that was someone trying to revitalise gcj 2023-03-30 10:51:50 ok 2023-03-30 10:51:52 i don't know if they got anywhere 2023-03-30 10:52:06 if they do, it means maybe new-gcc->jdk-newer-than-7 i guess 2023-03-30 10:52:12 other question, do we have a copy of ancient.alpinelinux.org somewhere? 2023-03-30 10:52:17 with the really old stuff? 2023-03-30 10:52:31 I don't know what happened to ancient.a.o 2023-03-30 10:52:49 i think i might have made a copy of it somewhere but i dont remember 2023-03-30 10:52:51 clandmeter might know 2023-03-30 10:52:54 ok 2023-03-30 10:53:09 things we can do short term is delete old builders 2023-03-30 10:53:18 move the to ancient.a.o 2023-03-30 10:54:49 we can also move to yearly releases 2023-03-30 10:54:50 :p 2023-03-30 10:55:22 it might sound stupid but it's a very obvious gain 2023-03-30 10:55:52 yeah, but i dont know if i like it 2023-03-30 10:57:13 we can also drop armhf? 2023-03-30 10:57:18 not sure if that helps much 2023-03-30 10:58:24 not a lot 2023-03-30 10:58:27 not immediately, but like 10% for future releases? 2023-03-30 10:58:32 lets do the low-hanging fruits first 2023-03-30 10:58:37 idk what share of the 8 it has 2023-03-30 10:58:40 the bottleneck is individual builders 2023-03-30 10:59:20 now that we have 2 arm servers, we have some space on arm again 2023-03-30 10:59:36 and right now qt6-qtwebkit build logs are insane. we can start with deleting those 2023-03-30 10:59:46 i'll fix the gen too 2023-03-30 10:59:58 do you mind setting up some script/alert automation somewhere for that? 2023-03-30 11:00:02 i am deleting build logs on nld9-dev1 as we speak 2023-03-30 11:00:10 like a daily 'check logs, warn on some 500MB logfiles' 2023-03-30 11:00:22 if nobody gets pinged these things go unnoticed 2023-03-30 11:02:12 it would be nice to compress them too 2023-03-30 11:03:53 parallel -j`nproc` gzip -9 {} ::: buildlogs/*/*/*/*.log 2023-03-30 11:03:56 could just run that daily 2023-03-30 11:07:27 pigz pls 2023-03-30 11:07:48 although, with parallel that makes less sense i gues 2023-03-30 11:11:33 i was thinking of changing the script that generates the logs and add the compression there, right before logfile is uploaded 2023-03-30 11:12:12 also, woudl be nice if we could add gzip http header to web server, so we can read the compressed logs online, directly in web browser 2023-03-30 11:15:17 ikke: yeah, that was intentional 2023-03-30 11:15:45 ncopa: that's automatic no? every webserver reads the mime (or more dumbly, a .gz extension) and sets that 2023-03-30 11:15:50 I compressed one logfile here https://build.alpinelinux.org/buildlogs/build-edge-armhf/main/zstd/ 2023-03-30 11:16:02 it tries to download it when i click on it 2023-03-30 11:16:26 i think we may need add .log.gz to mime or similar 2023-03-30 11:16:55 sure, just drop gzip -9 into the script that makes it 2023-03-30 11:17:14 or pigz -9 since it's not parallelised at a higher level 2023-03-30 11:20:38 yup 2023-03-30 11:31:00 i think we need to change the webserver config so it asks client to decompress .log.gz. you can test it here: https://build.alpinelinux.org/buildlogs/build-edge-armhf/main/zstd/ 2023-03-30 11:48:10 doesn't work for me when i click it 2023-03-30 11:48:28 ontent-type: application/octet-stream 2023-03-30 11:48:29 wrong 2023-03-30 12:00:22 i suppose it should be 2023-03-30 12:00:23 content-type: text/plain 2023-03-30 12:00:32 content-encoding: gzip 2023-03-30 12:04:53 yes 2023-03-30 12:05:14 if you want something supported in browsers but with more compression you can try brotli too 2023-03-30 12:05:47 gzip is fine i suppose 2023-03-30 12:48:27 this might work for compressing the logfiles: https://tpaste.us/YYYr 2023-03-30 12:52:21 i think you have to handle failure too 2023-03-30 12:53:26 we currently dont 2023-03-31 05:43:38 gitlab security upgrades released 2023-03-31 05:44:49 Will upgrade gitlab tonight 2023-03-31 21:16:29 hit the same bug twice, something's wrong with this gitlab version 2023-03-31 21:16:29 https://ptrc.gay/DfWIkCPn 2023-03-31 21:16:43 i have to force-refresh to actually see changes 2023-03-31 21:17:08 hmm 2023-03-31 21:17:44 aaand now it's stuck on "this merge request needs to be rebased", even though it's up to date with master 2023-03-31 21:18:05 sounds like gitaly 2023-03-31 21:18:47 ah, the latter might have been in the middle of merging 51d64990d5da 2023-03-31 21:18:48 It shows barman is 1 commit behind 2023-03-31 21:18:57 yeah, because psykose was merging stuff 2023-03-31 21:19:11 but i did `git fetch --all` in the meantime and it was still up to date, just... in the middle of merging, i suppose? 2023-03-31 21:20:35 See a bunch of fatal: Not a valid commit name e15451ca8f6c1a96f1776cbc086bcd9094fd1c49 messages in the log 2023-03-31 21:23:11 anyway, i can reproduce the first issue with every single merge request 2023-03-31 21:23:14 it always shows no changes 2023-03-31 21:25:19 so much for hoping this would be an unexciting upgrade 2023-03-31 21:27:47 https://gitlab.com/gitlab-org/gitlab/-/issues/368546 2023-03-31 21:29:41 though that's about gitlab.com 2023-03-31 21:46:01 Will need to continue tomorrow 2023-03-31 21:56:03 the bot is also broken at assigning