bug: ingress stuck in pending/deleting #982

Signum · 2024-08-20T09:09:38Z

Describe the bug

First time user of SwiftWave here. Trying it out on a Debian 12 VM in the local network. Installation was quick and flawless.

I have now created a new app running Wordpress with MariaDB from the App Store. I figured that my ingress would not work properly so I tried to remove it and create a new one with the proper FQDN. However the old ingress is not removed after several minutes and the new one is still pending:

The server's log shows:

2024/08/20 10:53:21 error while invoking function for queue [ingress_rule_apply]
POST /graphql | 10.6.56.60 | 200 
POST /graphql | 10.6.56.60 | 200 
POST /graphql | 10.6.56.60 | 200 
2024/08/20 10:53:25 error while invoking function for queue [ingress_rule_delete]

Are you working on this issue?

No

The text was updated successfully, but these errors were encountered:

tanmoysrt · 2024-08-20T09:35:20Z

Hi @Signum , have you enabled the proxy from server list ?
To check whether proxy has been enabled successfully, visit the server ip, it should throw bad gateway error page

Signum · 2024-08-20T10:16:01Z

Thanks for the swift response, @tanmoysrt.

It seems I had not:

Enabling ingress proxy as "Active"…

The ingresses went into "Failed" and after waiting 1-2 minutes, they are now gone.

Thanks. Continuing my journey.

HWiese1980 · 2025-01-07T15:11:54Z

I have a domain in my list in "failed" state and it's not going away. What's going on?

tanmoysrt · 2025-01-07T15:24:17Z

I have a domain in my list in "failed" state and it's not going away. What's going on?

Can you share few Infos, whether the proxy is enabled for any server ?
Also, please give some reproducable steps.

HWiese1980 · 2025-01-07T15:33:42Z

Reproducable steps are difficult because this is an experimental setup and I do not really know what lead to this state.

I try to remember...

I added a server, proxy disabled. I added a domain for an app. Deployed the app from a GitLab repo, building a docker file to that server. Domain stuck in "pending" state, probably because proxy not enabled. Because I didn't know that and also wondered why the app was deployed but did not start (docker ps -a showed no app on the server) I added another server and redeployed there. Still app deploying but not starting. I tried to delete the domain which got stuck in deleting. I enabled proxy on the second server, domain went into failed state. And that's where it's sitting now.

tanmoysrt · 2025-01-07T15:38:33Z

@HWiese1980
Can you share a screenshot of current state.
(Hide the domain name)

HWiese1980 · 2025-01-08T06:30:40Z

Sure can.

HWiese1980 · 2025-01-08T06:31:26Z

This is from this morning. The domain has been in this state for more than 12 hours or so now.

tanmoysrt · 2025-01-08T14:06:50Z

This is from this morning. The domain has been in this state for more than 12 hours or so now.

If proxy is working, just try to recreate the rule
Check docs - https://swiftwave.org/docs/dashboard/ingress-rules#recreate--fix

HWiese1980 · 2025-01-08T14:13:19Z

Nope, doesn't work. It remains in "failed" state. I see no indication of errors in the system log. I initiated "Recreate & Fix" at around 14:10. Here's the system log. The "No change in haproxy service" messages do not seem to be related to the action.

[CRONJOB] 2025/01/08 14:09:11 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:13 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:14 sync_proxy_state.go:130: No change in haproxy service
[CRONJOB] 2025/01/08 14:09:14 sync_proxy_state.go:178: No change in udpproxy service
[CRONJOB] 2025/01/08 14:09:14 sync_proxy_state.go:235: No change in exposed tcp ports of haproxy service
[CRONJOB] 2025/01/08 14:09:14 sync_proxy_state.go:255: No change in exposed udp ports of udpproxy service
[CRONJOB] 2025/01/08 14:09:15 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:17 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:19 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:21 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:23 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:25 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:27 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:29 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:31 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:33 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:35 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:37 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:39 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:41 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:43 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:45 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:47 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:49 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:51 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:53 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:56 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:09:58 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:00 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:02 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:04 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:06 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:08 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:10 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:12 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:14 sync_proxy_state.go:130: No change in haproxy service
[CRONJOB] 2025/01/08 14:10:14 sync_proxy_state.go:178: No change in udpproxy service
[CRONJOB] 2025/01/08 14:10:14 sync_proxy_state.go:235: No change in exposed tcp ports of haproxy service
[CRONJOB] 2025/01/08 14:10:14 sync_proxy_state.go:255: No change in exposed udp ports of udpproxy service
[CRONJOB] 2025/01/08 14:10:14 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:16 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:18 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:20 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:22 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:24 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:26 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:28 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:30 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:32 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:34 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:36 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:38 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:40 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:42 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:44 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:46 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:48 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:50 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:52 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:54 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:56 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:10:58 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:00 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:02 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:04 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:06 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:08 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:10 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:12 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:14 sync_proxy_state.go:130: No change in haproxy service
[CRONJOB] 2025/01/08 14:11:14 sync_proxy_state.go:178: No change in udpproxy service
[CRONJOB] 2025/01/08 14:11:14 sync_proxy_state.go:235: No change in exposed tcp ports of haproxy service
[CRONJOB] 2025/01/08 14:11:14 sync_proxy_state.go:255: No change in exposed udp ports of udpproxy service
[CRONJOB] 2025/01/08 14:11:14 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:16 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:18 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:20 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:22 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:24 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:26 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:28 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:30 server_status_monitor.go:23: Triggering Server Status Monitor Job
[CRONJOB] 2025/01/08 14:11:32 server_status_monitor.go:23: Triggering Server Status Monitor Job

tanmoysrt · 2025-01-08T17:45:33Z

Hi @HWiese1980
Try to disable proxy, then re-enable.
After some time, then try to recreate or delete it.

Also, swiftwave is latest right ?

HWiese1980 · 2025-01-09T07:24:13Z

SwiftWave is v2.2.20-1.

I've switched proxy off and on and off and on, and also tried to delete/recreate/fix etc. the ingress rule in multiple constellations. It just persists. I can't get rid of it.

A potentially "dangerous" force delete action would be convenient. Or an overwrite option when creating a new rule with the same parameters that simply (after a confirmation) overwrites all the configs of the existing rule and behaves as if it was a new rule.

HWiese1980 · 2025-01-09T08:14:16Z

I think I've tried every possible permutation of settings now, including waiting. The ingress rule just won't disappear.

tanmoysrt · 2025-01-09T08:56:56Z

@HWiese1980
I will check once. If there is any bug, because While recreation, it can handle stuffs like duplicate record or even no record in proxy.

tanmoysrt · 2025-01-09T17:39:59Z

@HWiese1980
Can you share the haproxy config ?
[Please anonimize domain names]

Go to your proxy server and you can run -

cat /var/lib/swiftwave/haproxy/haproxy.cfg

HWiese1980 · 2025-01-09T18:46:51Z

Sure. There are no domain names in the config.

global
  master-worker
  maxconn 100000
  chroot /var/lib/haproxy
  user haproxy
  group haproxy
  stats socket /var/run/haproxy.sock user haproxy group haproxy mode 660 level admin expose-fd listeners

defaults
  mode http
  option forwardfor
  maxconn 4000
  log global
  option tcp-smart-accept
  timeout http-request 10s
  timeout check 10s
  timeout connect 10s
  timeout client 1m
  timeout queue 1m
  timeout server 1m
  timeout http-keep-alive 10s
  retries 3
  errorfile 502 /etc/haproxy/errors/502.http
  errorfile 503 /etc/haproxy/errors/503.http

resolvers docker
  nameserver ns1 127.0.0.11:53
  hold valid    10s
  hold other    30s
  hold refused  30s
  hold nx       30s
  hold timeout  30s
  hold obsolete 30s
  timeout resolve 2s
  timeout retry 2s
  resolve_retries 5
  accepted_payload_size 8192

frontend fe_http
  mode http
  bind :80
  acl letsencrypt-acl path_beg /.well-known
  use_backend letsencrypt_backend if letsencrypt-acl
  default_backend error_backend

frontend fe_https
  mode http
  bind :443 ssl crt /etc/haproxy/ssl/ alpn h2,http/1.1
  http-request set-header X-Forwarded-Proto https
  acl letsencrypt-acl path_beg /.well-known
  use_backend letsencrypt_backend if letsencrypt-acl
  default_backend error_backend

backend error_backend
  mode http
  http-request deny deny_status 502

backend letsencrypt_backend
  option httpchk
  http-check send meth GET uri /healthcheck hdr Host "$SWIFTWAVE_SERVICE_ADDRESS"
  http-check expect status 200
  http-request set-header Host "$SWIFTWAVE_SERVICE_ADDRESS"
  server swiftwave_service_https "$SWIFTWAVE_SERVICE_ENDPOINT" check ssl verify required ca-file /etc/ssl/certs/ca-certificates.crt check-sni "$SWIFTWAVE_SERVICE_ADDRESS" sni str("$SWIFTWAVE_SERVICE_ADDRESS")
  server swiftwave_service_http "$SWIFTWAVE_SERVICE_ENDPOINT" check

program api
  command /dataplaneapi.sh
  no option start-on-reload

HWiese1980 · 2025-01-10T12:11:07Z

This may be related to my setup.

My Swiftwave runs behind a Caddy reverse proxy. I had to set up Let's Encrypt during creating the application. I do not need to use Let's Encrypt from Swiftwave because the Caddy reverse proxy in front of it takes care of handling certificate creation and TLS termination.

I was able to recreate the behavior by completely starting over with SwiftWave.

Remember: Swiftwave runs behind a reverse proxy (in my case Caddy). The internet facing domain name is already configured in Caddy, including wildcard for the Swiftwave ingress rules and TLS certificate issued through Let's Encrypt.

Steps:

Install SwiftWave using the docs
Fail because GPG is not automatically installed
Install GPG
Install SwiftWave
Remember that SwiftWave also needs rsync; install rsync
initialize and start SwiftWave according to the docs (use public domain name when asked for a domain during init)
check on local registry credentials (because using "local registry" does not work if I set up SwiftKey behind a reverse proxy that does not forward the corresponding port)
configure remote registry with local registry credentials and 127.0.0.1:3334
configure git credentials and git repo
configure a server using SSH (server runs docker and portainer)
check server's docker (multiple haproxy containers, all in "Created", none running; two udpproxy containers, one running
set up an application domain (TLS cert is indeed issued)
deploy application (docker build finishes successfully, push works, I see Failed to create new haproxy transaction in the logs)
set up an ingress rule: failed, undeletable
destroying the application is stuck

tanmoysrt · 2025-01-10T12:19:02Z

@HWiese1980
Caddy is on the same server ?

HWiese1980 · 2025-01-10T12:24:34Z

My workhorse Caddy is in the same network, on the same host (Proxmox) but not in the same VM. And it is unfortunately not the only Caddy in my setup.

There are actually two Caddys running. One is facing the internet (on a virtual server I hired from some service provider) and only and exclusively acting as a bastion host gateway, forwarding all and solely incoming :80 and :443 traffic to my actual workhorse Caddy on my Proxmox. The workhorse Caddy is which does the routing to the different VMs (including Swiftwave) and TLS termination.

HWiese1980 · 2025-01-10T12:25:14Z

Ah, the connection between the bastion host gateway Caddy and the workhorse Caddy is done through Wireguard.

HWiese1980 · 2025-01-10T12:38:19Z

It might have helped to set the management_node_address in /var/lib/swiftwave/config.yml (which is actually also set during swiftwave init) to 127.0.0.1. This is probably due to the fact that I do not route port 3333 on the public domain name.

HWiese1980 · 2025-01-10T12:47:48Z

No, it has not helped to set the management_node_address to 127.0.0.1. I was able to destroy the app after doing so (maybe because I had to restart swiftwave), but now I am at the start again. An ingress rule I cannot delete because it's stuck in failed, and a Failed to create new haproxy transaction in the deployment logs.

HWiese1980 · 2025-01-10T12:50:49Z

Swiftwave is trying to destroy the app again, meanwhile I'm doing docker system prune on the target server over and over again, watching udpproxy and a bunch of haproxy containers getting recreated every time after a while. udpproxy starts, the multiple haproxy containers remain in Created.

tanmoysrt · 2025-01-10T12:57:39Z

@HWiese1980
So, after enable HAProxy, just check with docker ps if it's running.
If there is something on port 80,443 on the vm, haproxy will not start.

HWiese1980 · 2025-01-10T13:17:58Z

Aaah, yeah, that may be the reason. Good catch.

If this is the reason (and it looks so), I would suggest to somehow catch that error. A simple "failed" is a little ambiguous. I would have expected to see something like that in the logs.

HWiese1980 · 2025-01-10T13:25:55Z

Okay, this has fixed at least the issue with undeletable and stuck ingress rules. So this is kind of solved (aside from maybe some more detailed logging and UX).

Thank you for your support! It's been a nice learning.

tanmoysrt · 2025-01-10T13:37:02Z

@HWiese1980
The issue is with docker swarm. It has no mechanism to report back the status on some change.
Swiftwave poll it on a fixed interval, but what happens that we will get running in most of the case.

Because, every time the container fails to start , swarm service try to start again a new one.
Polling it every second might work, but that will put lot of pressure on the server (cadvisor has same issue with large servers with constant polling google/cadvisor#2459).

Without proper integration with docker deamon event stream, it's tough to tackle.

Most of the time, this issue doesn't appear because, people are doing this on a fresh instance.

In v3.0, doing the integration with 1 level deeper and will not require swarm even.
Hoping all these issues will be solved.

Signum added the bug Something isn't working label Aug 20, 2024

Signum closed this as completed Aug 20, 2024

tanmoysrt reopened this Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: ingress stuck in pending/deleting #982

bug: ingress stuck in pending/deleting #982

Signum commented Aug 20, 2024

tanmoysrt commented Aug 20, 2024

Signum commented Aug 20, 2024

HWiese1980 commented Jan 7, 2025

tanmoysrt commented Jan 7, 2025

HWiese1980 commented Jan 7, 2025

tanmoysrt commented Jan 7, 2025

HWiese1980 commented Jan 8, 2025

HWiese1980 commented Jan 8, 2025

tanmoysrt commented Jan 8, 2025

HWiese1980 commented Jan 8, 2025

tanmoysrt commented Jan 8, 2025 •

edited

Loading

HWiese1980 commented Jan 9, 2025

HWiese1980 commented Jan 9, 2025

tanmoysrt commented Jan 9, 2025

tanmoysrt commented Jan 9, 2025

HWiese1980 commented Jan 9, 2025

HWiese1980 commented Jan 10, 2025 •

edited

Loading

tanmoysrt commented Jan 10, 2025

HWiese1980 commented Jan 10, 2025

HWiese1980 commented Jan 10, 2025

HWiese1980 commented Jan 10, 2025 •

edited

Loading

HWiese1980 commented Jan 10, 2025

HWiese1980 commented Jan 10, 2025

tanmoysrt commented Jan 10, 2025

HWiese1980 commented Jan 10, 2025

HWiese1980 commented Jan 10, 2025

tanmoysrt commented Jan 10, 2025 •

edited

Loading

bug: ingress stuck in pending/deleting #982

bug: ingress stuck in pending/deleting #982

Comments

Signum commented Aug 20, 2024

Describe the bug

Are you working on this issue?

tanmoysrt commented Aug 20, 2024

Signum commented Aug 20, 2024

HWiese1980 commented Jan 7, 2025

tanmoysrt commented Jan 7, 2025

HWiese1980 commented Jan 7, 2025

tanmoysrt commented Jan 7, 2025

HWiese1980 commented Jan 8, 2025

HWiese1980 commented Jan 8, 2025

tanmoysrt commented Jan 8, 2025

HWiese1980 commented Jan 8, 2025

tanmoysrt commented Jan 8, 2025 • edited Loading

HWiese1980 commented Jan 9, 2025

HWiese1980 commented Jan 9, 2025

tanmoysrt commented Jan 9, 2025

tanmoysrt commented Jan 9, 2025

HWiese1980 commented Jan 9, 2025

HWiese1980 commented Jan 10, 2025 • edited Loading

tanmoysrt commented Jan 10, 2025

HWiese1980 commented Jan 10, 2025

HWiese1980 commented Jan 10, 2025

HWiese1980 commented Jan 10, 2025 • edited Loading

HWiese1980 commented Jan 10, 2025

HWiese1980 commented Jan 10, 2025

tanmoysrt commented Jan 10, 2025

HWiese1980 commented Jan 10, 2025

HWiese1980 commented Jan 10, 2025

tanmoysrt commented Jan 10, 2025 • edited Loading

tanmoysrt commented Jan 8, 2025 •

edited

Loading

HWiese1980 commented Jan 10, 2025 •

edited

Loading

HWiese1980 commented Jan 10, 2025 •

edited

Loading

tanmoysrt commented Jan 10, 2025 •

edited

Loading