2025-06-01 07:00:46 lotheac: it appears that if you use the internal loadbalancer 2025-06-01 07:01:44 You can still reach the individual endpoints. So we could probably firewall off everything except the api server? 2025-06-01 07:30:52 yes, possible 2025-06-01 07:31:08 even with a public lb 2025-06-01 19:11:23 lotheac: But if we use the public lb, we need to filter by public IP address that's allowed to access it, right? 2025-06-01 19:11:54 If we use internal loadbalancing, and then only expose the api through the loadbalancer, we could just keep the other stuff internal, right? 2025-06-01 22:47:28 ikke: right. that's probably the better option 2025-06-02 06:47:44 the ppc64le CI runner has some weird issues: cd: line 192: can't cd to /builds/alpine/aports: No such file or directory 2025-06-02 06:47:56 https://gitlab.alpinelinux.org/alpine/aports/-/jobs/1879841 2025-06-02 10:12:25 iirc alpinelinux uses zabbix, correct? 2025-06-02 10:12:44 how is it? 2025-06-02 10:16:03 its good 2025-06-02 10:34:05 ncopa: nice, is it lightweight? 2025-06-02 10:35:43 It can run on an rpi 2025-06-02 10:35:53 like an rpi2 2025-06-02 10:36:21 seems like what I want 2025-06-02 10:36:57 I've a server running alpine with a bunch of containers (also running alpine), and it's a pretty tiny server (single core, 2 GB RAM) 2025-06-02 10:37:09 and I've been looking for some monitoring solution 2025-06-02 15:12:45 f_: are you looking for something that just works or something that could be resume fodder? 2025-06-02 15:19:43 iggy: that mostly just works while being very extensible 2025-06-02 15:20:08 I'm currently trying out prometheus but will consider zabbix if it turns out prometheus is not what I want 2025-06-02 15:28:00 f_: rgr, yeah, I was going to say, there's probably more companies actively using prometheus/victoria metrics/loki/mimir/etc, but it can be a bit unwieldy... there's also signoz which I've heard is shiny 2025-06-02 17:04:22 iggy: well basically what I'm looking for is something that can do simple metrics and that can throw stuff on an IRC channel 2025-06-02 17:04:38 and that works on alpinelinux (obviously) 2025-06-02 17:05:07 Prometheus + alertmanager + alertmanager-irc-relay seems to be working okay for that 2025-06-02 17:05:39 (I set it up in the end, and also packaged alertmanager-irc-relay this morning) 2025-06-03 05:22:52 sent an email regarding the ppc64le builder, it has storage issues 2025-06-03 08:55:11 is it on only fsck? 2025-06-03 08:55:34 I mean filesystem errors due to hard reset, or i it problems with the physical storage 2025-06-03 08:55:57 i think fsck *should* run on reboot, but it might be that it is not enabled 2025-06-03 09:32:34 rebooting gitlab? 2025-06-03 09:32:55 HTTP 502: Waiting for GitLab to boot 2025-06-03 09:34:45 Nope, not me 2025-06-03 14:01:55 ikke: for go-away, are you using a package for it or did you just build it from source and use that 2025-06-03 14:02:11 wondering if it makes sense to package it given I also want to deploy it :p 2025-06-03 14:02:27 (and having it packaged makes it less painful) 2025-06-03 14:02:29 f_: I used the upstream docker image 2025-06-03 14:02:46 ok 2025-06-03 14:03:10 But maybe packaging it would be good 2025-06-03 14:03:15 on it then :) 2025-06-03 14:11:08 really, the advantage of running alpine on the server is that now I get to submit and maintain APKBUILDs for things I need :) 2025-06-03 14:11:28 or more than I would if I were not using it a lot 2025-06-03 14:14:43 That's the best reason to package things 2025-06-03 14:16:29 Absolutely 2025-06-03 14:30:20 especially given I mostly like to deal with system containers rather than app containers 2025-06-03 14:43:23 I now have !85166 if interested 2025-06-03 15:32:17 3.21 and older package repos seem not to have updated today, is that related to the ppc64le borkage? 2025-06-03 15:33:20 I would not expect that 2025-06-03 15:35:31 The packages seem to have not been built 2025-06-03 15:35:36 I assume you refer to libarchive 3.8.1 2025-06-03 15:35:38 right 2025-06-03 15:36:32 only built for 3.22 and edge it seems 2025-06-03 15:36:51 https://build.alpinelinux.org/ "pulling git" 2025-06-04 05:16:11 qaqland: I ran git commit-graph write --changed-paths on my personal aports fork, and it does appear to have a huge inpact on things like git log 2025-06-04 05:16:19 (local repo, not gitlab) 2025-06-04 09:06:18 ikke: yes, but it takes a lot of time to index 2025-06-04 10:36:25 .. 2025-06-04 10:42:26 gitlab.a.o isn't fronted by anubis or go-away? 2025-06-04 10:44:16 omni: not yet 2025-06-04 10:45:26 15k requests where each IP address just makes a single request 2025-06-04 15:45:36 durrendal: fyi, equinix gave a timeline of what they expect 2025-06-04 15:45:55 goal is within 3 months and at most, 6 months 2025-06-04 15:47:35 that should be plenty of time I imagine. I started working on a playbook a couple of weeks ago but got pulled away due to a work project 2025-06-04 15:47:37 https://gitlab.alpinelinux.org/durrendal/deploy-mirror 2025-06-04 15:48:02 fortunately that wrapped up this past Monday, and I just caught up with my packages. So I can actually focus on this :) 2025-06-04 15:51:13 Great 2025-06-04 15:54:48 My plan was to get a playbook together with the existing docker-compose deployment, validate it mocks the current functionality, and then see where we can iterate/improve from there 2025-06-04 15:57:52 sounds good 2025-06-04 16:49:08 With regards to https://gitlab.alpinelinux.org/alpine/infra/infra/-/issues/10851, would it be possible to exclude the RSS feeds, just like you did when applying the rate limit (at least that is what i remember from the previous discussion on this channel) 2025-06-04 16:50:51 cely: In fact, I'm already gathering a list of user agents (in a confidential comment) 2025-06-04 16:51:43 Any in particular you care about? 2025-06-04 16:55:02 Yes, glab and newsraft 2025-06-04 16:56:33 The API would be excluded anyway 2025-06-04 16:58:10 Ok, so that leaves newsraft, which uses a "newsraft/(Linux)" User Agent 2025-06-04 16:58:48 Err, that didn't get sent correctly 2025-06-04 16:59:10 "newsraft/[version] (Linux)" 2025-06-04 16:59:52 I'll try to exclude the rss / atom feeds as well, so that shouldn't come up anyway, but I've added it to the list 2025-06-04 17:01:03 Thanks 2025-06-04 18:05:21 lotheac: do you have experience with nftables? 2025-06-04 20:38:55 kunkku: I'm trying to setup dmvpn on a server, but it never seems to get up. The special thing about this server is that it has a private ip address as a subinterface inet 192.168.138.45/17 scope global eth0:1. In /var/log/messages, I see it makes requests from this private IP address: 08[NET] sending packet: from 192.168.138.45[500] to 172.105.69.172[500] (340 bytes). 2025-06-04 20:39:06 03[IKE] giving up after 5 retransmits 2025-06-04 20:40:07 Is that the cause of the issue, and if so, any way around it? 2025-06-05 00:58:18 ikke: yeah I do, what's up 2025-06-05 08:01:44 are the CI builders based on the respective targeted stable branch? (and edge for most cases) 2025-06-05 08:03:38 oh, you see that in the build log 2025-06-05 08:18:31 omni: right now, it downgrades to the release it builds for 2025-06-05 09:13:41 ok 2025-06-05 09:13:49 ppc64le package builders are not yet running? 2025-06-05 10:23:10 omni: should be runnign 2025-06-05 10:45:54 ok, I only see "pulling git" on build.a.o 2025-06-05 11:47:03 🤨 2025-06-05 12:05:04 looks like it's building for edge now 2025-06-05 12:10:26 3.21 as well 2025-06-05 12:41:27 all ppc64le builders (down to 3.19) have been fixed 2025-06-05 18:28:34 lotheac: I just thought about a flaw in my approach (internal LB + exposing API with external LB): worker nodes do not know to connect to the external LB address (since you cannot set externalAddress) when setting nodeLocalLoadbalancing 2025-06-05 18:42:05 I did not get dmvpn to work in this setup yet 2025-06-05 18:42:28 I suspect the private address subinterface is causing issues 2025-06-05 19:54:51 " The konnectivity agents rely on the load balancer to eventually provide connections to all controllers. The LB address is used to brute-force open connections until the agent has the desired number of connections to different controller nodes." 2025-06-05 20:01:03 Ok, now everything seems to work 2025-06-05 20:08:23 https://gitlab.alpinelinux.org/alpine/infra/k8s/ci-cplane-1/-/merge_requests/1 2025-06-06 01:31:10 ah, didn't realize you were intending to use both kinds of lb's :) 2025-06-06 01:45:40 left a few comments on your pR 2025-06-06 01:45:42 MR 2025-06-06 12:47:24 ikke: your MR works well, but wish when missing, the process return value is not zero 2025-06-06 12:49:20 another question is, how should we properly handle it if a missing is found? 2025-06-06 12:53:22 qaqland: context? 2025-06-06 12:53:54 sorry, it is https://gitlab.alpinelinux.org/alpine/infra/repo-tools/-/merge_requests/1 2025-06-06 12:54:59 qaqland: ah, yes, I can add the exit code 2025-06-06 18:29:13 Should the ppc64le builder be working normally by now? Dince I get some wierd lto failure when I try to compile apk-tools (v3) on ppc64le. 2025-06-06 18:31:10 sertonix[m]: I do expect everything to be back normal again, but maybe there are some remnants left (I had to recreate the git index of each builder for example) 2025-06-06 18:49:23 *Since 2025-06-06 19:26:39 sertonix[m]: Where do you get issues, and what issue exactly? 2025-06-06 19:45:17 Here are the CI logs: https://gitlab.alpinelinux.org/sertonix/apk-tools/-/jobs/1886755 2025-06-06 19:45:17 Not sure how to describe that 2025-06-06 19:51:33 Superficially it appears like a build configuration error 2025-06-07 11:51:32 ikke: your options on the new DMVPN setup are: 2025-06-07 11:52:26 (1) make sure the private IP has the secondary flag set (and the public IP does not) 2025-06-07 11:53:11 (2) in /etc/network/interfaces, add "tunnel-local " below the GRE interface 2025-06-07 11:53:42 kunkku: ok, let me test that 2025-06-07 12:03:42 kunkku: tunnel-local works 2025-06-07 12:04:03 (I did not try the other one, since that's technically outside of my control 2025-06-07 12:04:59 maybe setup-dmvpn could be improved to better handle multiple IPs 2025-06-07 12:05:41 For context, this is on linode, where we enabled a private IP for a loadbalancer to connect to 2025-06-07 12:05:46 linode then sets it up like that 2025-06-07 12:07:05 nhrpd seems to prefer the address with the shortest prefix length 2025-06-07 12:07:42 So a /17 over a /24? 2025-06-07 12:07:47 yes 2025-06-07 12:08:22 okay, that would explain it 2025-06-07 12:19:11 kunkku: thanks btw :) 2025-06-07 12:19:44 np 2025-06-07 17:59:32 raspbeguy: I suppose you are not using gbr-app-4 anymore? 2025-06-07 19:18:36 Anyone care to test https://gitlab-test.alpinelinux.org/alpine, I've put go-away in front of it 2025-06-07 19:22:53 didn't get a go-away screen, but could log in and load the aports repo 2025-06-07 19:24:33 Yeah, I may have to tweak it further, but I have tried to make it as inobtrusive as possible 2025-06-07 19:24:49 If you are logged in, you should never see anything 2025-06-07 19:26:09 wasn't logged in when initially loading gitlab-test.a.o in a separate browser instance 2025-06-07 19:26:17 but works fine! 2025-06-08 06:15:23 rsync is borked 😧 2025-06-08 11:12:46 restarting gitlab 2025-06-08 11:35:19 ikke: weird pitch, but hear me out - maybe the alerts should be on a different channel than one meant for communication? 2025-06-08 11:37:17 i feel like algitbot is dominating this one 2025-06-08 12:36:01 lotheac: I think having alerts in here is useful, but we could filter out the less important ones. 2025-06-08 12:36:19 arg 2025-06-08 12:39:14 i'm just trying to say that alerts mixed with human communication is unhelpful 2025-06-08 12:39:36 imho it helps to have a different channel for those things 2025-06-08 12:41:16 the battle for removing nonactionable or not-useful alerts is never-ending :) 2025-06-08 12:41:38 it's a fine line, and that's why it's difficult 2025-06-08 12:42:14 and that's why it's important to distinguish it from human communication 2025-06-08 12:42:23 imo 2025-06-08 12:43:14 But it makes it also easier to completely ignore / discard. I think there's also value for people here to be aware of issues 2025-06-08 12:45:45 Having the alerts here for me is at least a stimulance to actually do something with them 2025-06-08 12:51:41 for someone who can't do anything about them, any time i see something happening on this channel, my client happily tells me there are new messages, but i never know if it's useless or not 2025-06-08 12:52:10 so if i can't do anything about them, it stands to reason i should ignore algitbot to participate in this channel 2025-06-08 12:52:27 which is probably also not the intention :) 2025-06-08 13:18:45 that all said, i defer to you; i haven't been here very long :) 2025-06-08 13:32:15 Trying to figure out why nginx is not consequently showing the IP from the X-Forwarded-For header. It used to work fine 2025-06-08 13:32:41 I do see some requests showing the original IP 2025-06-08 13:34:33 what's the configuration? you're logging $proxy_add_x_forwarded_for or so? 2025-06-08 13:46:32 set_real_ip_from 172.16.0.0/12; 2025-06-08 13:46:36 real_ip_header X-Forwarded-For; 2025-06-08 13:46:41 real_ip_recursive on; 2025-06-08 13:47:29 i'm unfamiliar with this 2025-06-08 13:47:42 https://nginx.org/en/docs/http/ngx_http_proxy_module.html i usually just use these 2025-06-08 13:48:13 Yes, but in this case, nginx needs to read the header, not set it to a forwarded proxy 2025-06-08 13:48:41 traefik -> go-away -> nginx 2025-06-08 13:49:56 $proxy_add_x_forwarded_for always reads the client header and appends to it 2025-06-08 13:50:24 i'm confused why the realip module is a thing at all and why it's necessary here 2025-06-08 13:51:53 Because nginx itself also should be able to read the X-Forwarded-For header and set the actual client ip header, not the ip of the server that sent to it 2025-06-08 13:51:54 oh, is it for ip-based access control? 2025-06-08 13:52:10 no 2025-06-08 13:52:16 Just access logging 2025-06-08 13:52:51 Well, it was before I used go-away also for rate limiting 2025-06-08 13:53:21 i thought it generally (including by default) logs the requesting client as a list of ip's as indicated by x-forwarded-for 2025-06-08 13:54:10 By default, it ignores the X-Forwarded-For header, because it can be easily spoofed 2025-06-08 13:54:33 i might be wrong about the default logging... but the idea is that no server can trust that header field at all anyway 2025-06-08 13:55:03 You can if it's set by a trusted source 2025-06-08 13:55:04 so you log both the address connected to you (and add that to any x-forwarded-for going upstream) as well as whatever they told you 2025-06-08 13:55:20 That's at least what you do with the real_ip module 2025-06-08 13:55:53 the realip module documentation doesn't say anything about what happens if there are multiple addresses in the header that you give to it 2025-06-08 13:56:08 which is more or less normal for x-forwarded-for 2025-06-08 13:56:16 https://serverfault.com/questions/314574/nginx-real-ip-header-and-x-forwarded-for-seems-wrong gives some details 2025-06-08 13:56:44 That's where the recursive directive comes into play 2025-06-08 13:56:54 ah i see 2025-06-08 14:01:15 if it's just for logging, why is the realip module necessary anyway? seems like we're trying to figure out a corner-case behavior in it 2025-06-08 14:12:53 anyway, i guess i see what it's supposed to be doing. if you added go-away in between, is it possible that go-away is connecting to nginx from outside of the ip range you specified in set_real_ip_from? 2025-06-08 14:17:38 failing that, maybe go-away is not doing what we expect with x-forwarded-for 2025-06-08 14:20:00 X-Forwarded-For: 140.211.x.y:0, 172.25.0.2 2025-06-08 14:20:06 that's what nginx receives 2025-06-08 14:22:14 :0 looks weird 2025-06-08 14:22:47 yeah 2025-06-08 14:25:06 From a different deployment with go-away pointing to nginx, it does work fine 2025-06-08 14:26:26 is there a traefik in that other deployment? 2025-06-08 14:26:33 yes 2025-06-08 14:26:43 so, no strange interaction there then i guess 2025-06-08 14:26:58 Not the :0 port though 2025-06-08 14:27:44 The go-away version different though, so maybe due to a change? 2025-06-08 14:27:52 maybe the :0 has always been there in this setup and something(tm) has always normalized it away, but go-away just does a string-copy 2025-06-08 14:27:55 speculating 2025-06-08 14:28:57 the real question is, why is it there anyway? there's not supposed to be any port specification in there anyway 2025-06-08 14:30:23 "The request header field value that contains an optional port is also used to replace the client port (1.11.0). The address and port should be specified according to RFC 3986. " 2025-06-08 14:31:02 can you link the reference? i'm reading https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/X-Forwarded-For and there's no such thing here 2025-06-08 14:31:25 it mentions "de-facto standard" :) 2025-06-08 14:31:29 https://nginx.org/en/docs/http/ngx_http_realip_module.html 2025-06-08 14:31:37 Yeah, it's never standardized 2025-06-08 14:31:51 Hence the X- prefix :P 2025-06-08 14:32:10 Let me test something 2025-06-08 14:32:19 not sure that we care about whether or not realip handles the port, but why is there a port to begin with? where is it added? 2025-06-08 14:33:23 It must be go-away that adds it 2025-06-08 14:33:31 Just checked, go-away does not receive it with a port 2025-06-08 14:34:11 sounds like we should patch it there then to make it work like it used to :-) 2025-06-08 14:35:27 Just checked if it mattered if I explicitly set --backend-ip-header 2025-06-08 14:35:40 I didn't do that in the first deployment, I did to it here, but no difference 2025-06-08 14:35:56 strange 2025-06-08 14:37:34 checked the code, they just use the same header as the --client-ip-header if you omit it 2025-06-08 14:39:10 in that case it would be traefik appending the :0, no? 2025-06-08 14:43:10 If I check the requests that go-away receives, I don't see it 2025-06-08 14:43:28 I guess it comes from here? https://git.gammaspectra.live/git/go-away/src/branch/master/utils/http.go#L146 2025-06-08 14:43:30 https://git.gammaspectra.live/git/go-away/src/branch/master/utils/http.go#L130 lib/challenge/data.go calls this, i think it parses addresses it sees and serializes them later, giving potential for that :0 thing 2025-06-08 14:43:42 heh :P 2025-06-08 14:43:52 :-D 2025-06-08 14:45:51 yeah, seems like it was introduced in v0.6, and the other deployment rusn v0.5.4 2025-06-08 14:46:30 https://git.gammaspectra.live/git/go-away/src/branch/master/lib/challenge/data.go#L356 i think the header values it sends upstream come from here, so might be in the impl of Addr() 2025-06-08 14:47:34 but that's netip.AddrPort, so stdlib stuff 2025-06-08 14:48:27 I see changes between v0.7 and master 2025-06-08 14:48:56 https://git.gammaspectra.live/git/go-away/src/tag/v0.7.0/lib/challenge/data.go 2025-06-08 14:49:26 https://git.gammaspectra.live/git/go-away/commit/aebbfa4eaa8eb6a0e8161ad9686fbc6154f3179c 2025-06-08 14:49:50 what version are we running? 2025-06-08 14:49:54 v0.7.0 2025-06-08 14:50:50 aebbfa4 definitely looks like it could have the effect of adding that :0 2025-06-08 14:50:59 but it's not in 0.7.0 2025-06-08 14:51:14 My though was the opposite 2025-06-08 14:51:22 "set client network address without original port on backend-ip-header option" 2025-06-08 14:53:20 true, commit-message wise that tracks, i just read the code and thought that the stringification of the netip.AddrPort might work differently 2025-06-08 14:54:16 https://pkg.go.dev/net/netip#AddrPort.Addr returns the IP only so, yeah 2025-06-08 14:54:31 your thought was right 2025-06-08 15:04:08 i'm thinking just backport aebbfa4 2025-06-08 18:39:09 It seems that Gitlab's "Delete source branch when merge request is accepted" option hasn't been working for some time, since I now have lots of merged branches piling up in my aports fork, despite me having that option enabled 2025-06-08 18:40:14 The oldest branch dating back to 5 April 2025 2025-06-08 19:28:32 Hmm interesting, haven't noticed it yet 2025-06-09 10:21:43 I got a GL sign-in notification where the IP was 172.25.0.2 2025-06-09 10:21:58 is this related to the above nginx issue? 2025-06-09 10:26:13 kunkku: yes 2025-06-09 11:07:15 GL has also sent me occasional notifications on algitbot not having push access to https://github.com/alpinelinux/dmvpn-tools.git 2025-06-09 11:08:03 I wonder if the GH repo settings should be reviewed 2025-06-09 11:11:25 Right 2025-06-09 11:23:46 :-/ 2025-06-09 12:35:52 ^ I hit into this 2025-06-09 12:36:12 https://gitlab.alpinelinux.org/funderscore/aports/-/jobs/1889646 "fatal: unable to create thread: Resource temporarily unavailable" 2025-06-09 12:36:47 or this might be a different runner.. sorry for the noise 2025-06-09 19:05:56 lotheac: deployed a new go-away image with the patch and now the client ip is correct again :-) 2025-06-09 19:06:18 https://gitlab.alpinelinux.org/alpine/infra/docker/go-away 2025-06-10 00:44:13 ikke: nice :) 2025-06-10 18:50:26 could the arm package builders be made to prefer IPv4? 2025-06-10 19:07:33 ikke, fyi go-away is in alpine testing now 2025-06-10 19:07:40 thanks to omni for merging the MR ^^ 2025-06-10 19:09:59 funderscore: nice. I'm building it myself right now because I need an unreleased patch on top of 0.7 2025-06-10 19:10:11 which patch? 2025-06-10 19:10:27 Or I suppose it is in the repo you linked earlier 2025-06-10 19:10:56 This one I suppose https://gitlab.alpinelinux.org/alpine/infra/docker/go-away/-/blob/master/patches/set-client-network-address-without-port.patch?ref_type=heads 2025-06-10 19:11:21 Yes 2025-06-10 19:11:27 alright ^^ 2025-06-10 19:13:11 hmm, it fails on our arm* builders, network issues 2025-06-10 19:14:38 ikke: hopefully the openrc init stuff goes upstream as well 2025-06-10 19:14:46 The OpenRC scripting 2025-06-10 19:15:13 and at some point the package moves to community hopefully :) 2025-06-10 19:15:25 I don't see why it can't 2025-06-10 19:15:39 It definitely can 2025-06-10 19:15:57 in the #go-away channel on Libera someone was interested in openrc start scripts (in gentoo) 2025-06-10 19:15:58 It has a huge impact on our traffic 2025-06-10 19:16:10 Ah, nice, wasn't aware there was an irc channel 2025-06-10 19:16:27 and bringing those to the repo was discussed a bit, they're definitely welcome 2025-06-10 19:16:38 yes there's #go-away on Libera.Chat 2025-06-11 11:51:45 ikke: do we dare apply version 93 of vastly config? its still a draft. I would appreciate if you could have a look at it and tell me when it is a good time to apply it. 2025-06-11 11:52:38 Ok, I'll have a look later 2025-06-11 15:21:17 Additional benefit of go-away, it seems we receive a lot less spam account creations 2025-06-11 15:42:13 good job! 2025-06-11 15:44:14 gitlab server load last 7 days: https://imgur.com/a/PwOkFXm 2025-06-11 16:35:38 Running into this regularly, always around 1GiB on that tar download, but I can't reproduce from my network https://gitlab.alpinelinux.org/selfisekai/aports/-/jobs/1892847#L4982 2025-06-11 16:38:51 loongarch64 having any problems? https://gitlab.alpinelinux.org/alpine/aports/-/jobs/1892903 "tests/init.test:save_userdata_compressed -> /usr/include/c++/14.2.0/backward/auto_ptr.h:202: std::auto_ptr< >::element_type* std::auto_ptr< >::operator->() const [with _Tp = {anonymous}::global_state; 2025-06-11 16:38:51 element_type = {anonymous}::global_state]: Assertion '_M_ptr != 0' failed." 2025-06-11 16:40:12 lnl: checking on the host 2025-06-11 16:41:55 lnl: hmm, manually downloading seems to succeed. How ofter does it happen? 2025-06-11 16:44:26 checking the other attempt for the same ... 2025-06-11 16:44:33 lnl: ipv4 vs ipv6 issue perhaps 2025-06-11 16:44:58 ipv6 is a lot faster 2025-06-11 16:45:15 ...yeah same message on the other attempt 2025-06-11 16:47:44 there's no c++ here so the issue must be with the various compression binaries? 2025-06-11 16:51:05 correction - it's not in the same place -- first one https://gitlab.alpinelinux.org/alpine/aports/-/jobs/1892894 tests/init.test:userdata_type -> /usr/include/c++/14.2.0/backward/auto_ptr.h:202: std::auto_ptr< >::element_type* std::auto_ptr< >::operator->() const [with _Tp = 2025-06-11 16:51:05 {anonymous}::global_state; element_type = {anonymous}::global_state]: Assertion '_M_ptr != 0' failed. 2025-06-11 16:55:09 lnl: So yes, I can reproduce it when downloading it via IPv4 2025-06-15 07:24:02 lotheac: I'm trying setting up the cplane cluster now as controller+worker and gre1 (dmvpn interface) as privateInterface, but it does not seem to work yet. It has problems communicating with the other nodes over dmvpn, even though on the host everything is reachable 2025-06-15 07:24:10 Have to continue troubleshooting later 2025-06-15 07:24:48 Error from server: Get "https://172.16.250.10:10250/containerLogs/kube-system/coredns-6c77c7d548-xbwgc/coredns": dial tcp 172.16.250.10:10250: i/o timeout 2025-06-15 09:32:13 hmm. that's the kubelet API it's trying to call. what is that log line from? trying to understand if it runs in the host netns or container 2025-06-15 09:32:54 if it's from a container netns, you'll need to have the host perform forwarding 2025-06-15 09:33:35 incidentally i'm at kubecon and related events from today to tuesday, so i might respond a bit slow :) 2025-06-15 09:34:01 i assume you can reach 172.16.250.10:10250 from the host netns? 2025-06-15 09:38:17 i don't know offhand how it works with k0s, but if the host is performing NAT for container-initiated traffic, i'd just check that it's allowed to forward from the container networks to the dmvpn ip's (and maybe a good idea to double check that the ip ranges don't clash as well) 2025-06-15 09:39:39 if it isn't doing NAT, as in, if the traffic is going from a container IP to kubelet on that dmvpn IP, then we'd need to make sure that the hosts know how to route the return traffic 2025-06-15 10:29:15 lotheac: that log was from k0s kubectl logs pod/coredns-6c77c7d548-xbwgc -n kube-system 2025-06-15 10:29:36 But konnectivity is also logging that when trying to connect to the api 2025-06-15 10:31:06 lotheac: from the host ns, I can do nc -v 172.16.250.10 10250 2025-06-15 10:31:43 from the ns from a pod, I can ping the local gre1 interface, but not throug the tunnel 2025-06-15 10:52:25 lotheac: ok, interesting. Checking with tcpdump, I do see the traffic reaching the other node, but the source address is the internal ns ip, so I suppose there's no route back 2025-06-15 10:52:32 So yes, what you mentioned 2025-06-15 11:55:02 if it isn’t doing nat, i think the pod (and maybe service) ip routes have to be added to the hosts manually 2025-06-15 11:56:18 not sure if there is any better way… but maybe the routes can be via localhost? :D 2025-06-15 11:56:34 not sure how this is supposed to work in k0s 2025-06-15 12:04:17 I recall that kuberouter expects to be able to do l2 traffic, but dmvpn only does l3 2025-06-15 12:07:12 i don’t see why l2 would be necessary 2025-06-15 12:08:47 this is a pod (container netns) trying to talk to (some, perhaps other) host netns and i would expect the host needs a route to respond on in all cases 2025-06-15 12:09:31 otherwise it would only work on the interface with the host default route and not the one you designated internal in k0s config, right? 2025-06-15 12:10:05 i mean, even in a case without dmvpn 2025-06-15 12:11:23 If it sends traffic from the pods internal ip address, you cannot expect it to work over an l3-only network without NAT 2025-06-15 12:11:25 although yeah it’s possible that it’s relying on iptables mangling by kuberouter instead of routing tables, but in either case - why would that require l2 2025-06-15 12:12:03 in kubeadm clusters i _believe_ (not sure) it does nat always 2025-06-15 12:12:21 What CNI does it use with kubadm? 2025-06-15 12:13:06 https://docs.k0sproject.io/v1.27.1+k0s.0/networking/ 2025-06-15 12:13:07 you get to choose 2025-06-15 12:13:22 i’m most familiar with cilium 2025-06-15 12:13:28 k0s has kuberouter and calico built-in 2025-06-15 12:14:18 right 2025-06-15 12:15:03 i wonder how this works without dmvpn 2025-06-15 12:15:15 I tried calico, and the worker nodes that are external seem to work, but konnectivity on the controller+worker nodes still have issues (didn't dig into it yet) 2025-06-15 12:16:25 I didn't add the pure worker nodes to dmvpn yet 2025-06-15 12:16:43 as in how does a node route traffic to a pod in another node if you have a non-dmvpn l2 network connecting the nodes? (even if they are both controllers) 2025-06-15 12:18:18 i mean that l2 network would not be the default route interface even in that case, why would the sender node do arp on that network if it had a dst addr it doesn’t know where to route 2025-06-15 12:19:44 and i would find it weird if they were doing something resembling proxyarp like that on the receiving node too :thinking: 2025-06-15 12:21:19 i could test this on tailscale - which is also l3-only - but it might not happen today 2025-06-15 12:21:43 https://www.kube-router.io/docs/how-it-works/ 2025-06-15 12:22:21 I don't know all the details, I do know that I had issues when I used linode vpc's, which got solved when I used vlan interfaces 2025-06-15 12:23:32 I don't have a lot of time today either 2025-06-15 12:27:25 wait, i guess i already did it on tailscale originally - the cluster worked fine, i just confused things in my head just now because that poc used a public lb 2025-06-15 12:28:42 but the pod->kubelet traffic must have been going over tailscale in that setup because i had no internal net between the nodes and port 10250 was firewalled closed 2025-06-15 12:29:14 i’ll recheck that when i get the chance though 2025-06-15 12:32:24 If I check traffic going over the vlan interface on another cluster, I see traffic going directly from pod network <-> pod network 2025-06-15 12:32:29 on different hosts 2025-06-15 12:34:53 from 10.244.5.x <-> 10.244.4.x. I do see routes for these on each node 2025-06-15 12:46:12 lotheac: So something is preventing these routes to be shared 2025-06-15 12:47:11 I switched back to kube-router. I see the local route on each node, but no routes for the other nodes 2025-06-15 13:41:05 that's weird 2025-06-15 13:43:31 the CNI can of course be doing things in different ways so it is not _necessarily_ a problem that you don't see routes to other nodes for the pod network 2025-06-15 13:44:10 eg. on cilium it looks like this with a pod network of 10.128.0.0/9: 2025-06-15 13:44:23 10.128.1.123 dev cilium_host proto kernel scope link 2025-06-15 13:44:23 10.128.3.0/24 via 10.128.1.123 dev cilium_host proto kernel src 10.128.1.123 mtu 1230 2025-06-15 13:44:23 10.128.1.0/24 via 10.128.1.123 dev cilium_host proto kernel src 10.128.1.123 2025-06-15 13:44:23 10.128.0.0/24 via 10.128.1.123 dev cilium_host proto kernel src 10.128.1.123 mtu 1230 2025-06-15 13:44:23 10.128.2.0/24 via 10.128.1.123 dev cilium_host proto kernel src 10.128.1.123 mtu 1230 2025-06-15 13:44:52 cilium may be doing ebpf shenanigans that are not visible to netfilter and least of all routing tables 2025-06-15 13:46:02 but i don't know how calico and kube-router have been implemented in this regard -- it's possible they use nft to do the dynamic part of "which node to send pod X's traffic to" without modifying routing tables 2025-06-15 13:47:25 cni config load failed: failed to load CNI config list file /etc/cni/net.d/10-kuberouter.conflist: error parsing configuration list: unexpected end of JSON input: invalid cni config: failed to load cni config\"" component=containerd stream=stderr 2025-06-15 13:49:11 jq does not have any issues reading those files 2025-06-15 13:49:49 if you just changed the cni, it's possible there are leftover files from the previous cni. (they might write to host paths) 2025-06-15 13:50:26 i'm not familiar how the APIs for CNI itself work 2025-06-15 13:51:03 I rebooted the VMs 2025-06-15 13:51:05 but i suppose since you changed _to_ kube-router, and that's the file being complained about, something is off :) 2025-06-15 13:51:25 host /etc/cni/net.d i would expect to persist over reboots though ;) 2025-06-15 13:51:33 yup 2025-06-15 13:52:21 There were some calico routes left over, so that's why I rebooted 2025-06-15 13:52:28 alright 2025-06-15 13:54:04 Let me try to manually add the missing routes, see if that helps 2025-06-15 14:12:08 ah: Failed to start node BGP server: failed to start BGP server due to: listen tcp4 172.16.250.11:179: bind: address already in use 2025-06-15 14:15:06 I can override the bgp port 2025-06-15 14:22:27 improvement, but net yet healthy 2025-06-15 14:24:58 i recreated the cluster from my `cluster-ci` repo with one change: controller+worker nodes instead of controller, and i guess i did not test it enough originally - sure enough, i can't get pod logs either, timeout to kubelet api (from the apiserver i assume) 2025-06-15 14:25:24 (this is on tailscale) 2025-06-15 14:27:42 clientset.go:234] "cannot connect once" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp: lookup ci-cplane-1.alpinelinux.org: i/o timeout\"" 2025-06-15 14:27:44 although it works for pods on some other nodes *facepalm* 2025-06-15 14:27:52 It seems it cannot resolve dns 2025-06-15 14:31:25 was the bgp port collision because dmvpn also uses bgp? 2025-06-15 14:31:38 yes 2025-06-15 14:33:20 that makes sense. the dns thing doesn't so much - yet :) 2025-06-15 14:34:26 I tried with nsenter, but not sure if that's a good test because it'll probably still use the hosts resolv.conf 2025-06-15 14:35:20 right, that could be a problem. generally kube pods talk to coredns (which is also running as a pod) and have that in resolv.conf 2025-06-15 14:35:49 although generally they also should be able to connect to external ip's, so :) 2025-06-15 14:37:34 yeah, coredns is also failing, so I should probably look a that first 2025-06-15 14:37:45 if you are able to `kubectl exec` from somewhere, that's usually a good way to debug 2025-06-15 14:38:07 it will enter all the necessary kinds of namespaces 2025-06-15 14:38:12 problem is that these containers mostly are FROM scratch containers without any other files 2025-06-15 14:38:18 right 2025-06-15 14:38:41 there is also `kubectl debug` which can create a second container in the target pod with your desired image, which is good enough sometimes 2025-06-15 14:38:52 okay, good to know 2025-06-15 14:42:08 I found where the logs are stored on the host, which makes it easier to debug 2025-06-15 14:42:29 that it does :) 2025-06-15 14:46:13 okay, just for posterity... i messed up above when creating the cluster (forgot to update the lb address in k0sctl.conf before running k0sctl the first time). recreating the nodes and the cluster, everything came up fine with tailscale https://termbin.com/v0ed 2025-06-15 14:52:25 the routing tables look like this on host https://termbin.com/ffrz -- empirically, pod ip's appear to be in 10.244.0.0/24 (or 10.244.1.0/24 and 10.244.2.0/24 for the builtin kube-system stuff, plus tailscale private ip's for kube-proxy and kube-router pods) 2025-06-15 14:52:46 but crucially the hosts have those routes to the pod network ranges 2025-06-15 14:53:03 lotheac: since changing the bgp port, the routes appear now 2025-06-15 14:53:13 right, i figured :) just wanted to double check 2025-06-15 14:58:33 Ok, I guess what's happening now is that coredns is put on the nodes that are not connected to dmvpn yet 2025-06-15 14:58:56 not necessarily - i'm also having a problem with dns resolution on my test cluster 2025-06-15 14:59:05 ok 2025-06-15 14:59:29 "kubectl get -A pod -o wide" will tell you the pod ip's and which nodes they are scheduled on 2025-06-15 14:59:52 I use k9s, which also shows that 2025-06-15 15:03:21 can you check logs of one of the kube-router pods? 2025-06-15 15:03:41