Sharing experiences: Failed attempt at a "super XMR" service

pitoco02 · April 6, 2023, 11:37pm

Hi guys!

I would like to share with you my failed attempt to create a “super-service” of XMR using 03 XMR containers and an HAproxy container as a load balancer.
I chose HAproxy because it works very well as a load balancer for TCP requests, as is the case with XMR requests.

I think failed attempts at new things are also interesting to share, as they can open the way for new ideas. In fact, I have seen some here that, even though they did not work, helped me with other things.

Well, basically I have about 200 CMS published on the web. Currently, unfortunately, we still use manual installation with the IIS service and Windows server (currently I am studying a lot on how to migrate these CMS to Docker), anyway…

Our clients are very restrictive regarding port releases. Therefore, the only ports I can get all of them to release are ports 80 and 443, so I would lose the XMR service on port 9505. Our solution was to host an Ubuntu server with the XMR service published, so it receives CMS connections on port 8080 and publishes on port 443. All my CMSs have these configurations pointed to the same XMR server.

The big problem is that with the high number of devices, this service ended up being too slow, until it couldn’t handle it anymore.

Our idea here was to improve this service using some XMR containers and a load balancer in front of them to manage requests, thus creating a kind of turbocharged XMR to handle the high demand for requests.

The container configurations were as follows:

version: "3"

services:

  haproxy:
    image: haproxy:2.4
    volumes:
      - ./haproxy/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg
    ports:
      - "8080:8080"
      - "443:9505"
    networks:
      - my-network

  xmr1:
    image: xibosignage/xibo-xmr:0.9

    networks:
      - my-network
    environment:
      XMR_DEBUG: "true"
      XMR_HOST: haproxy
      XMR_PORT: 8080
      XMR_CALLBACK_HOST: host.docker.internal
      XMR_CALLBACK_PORT: 443

  xmr2:
    image: xibosignage/xibo-xmr:0.9
    networks:
      - my-network
    environment:
      XMR_DEBUG: "true"
      XMR_HOST: haproxy
      XMR_PORT: 8080
      XMR_CALLBACK_HOST: host.docker.internal
      XMR_CALLBACK_PORT: 443

  xmr3:
    image: xibosignage/xibo-xmr:0.9
    networks:
      - my-network
    environment:
      XMR_DEBUG: "true"
      XMR_HOST: haproxy
      XMR_PORT: 8080
      XMR_CALLBACK_HOST: host.docker.internal
      XMR_CALLBACK_PORT: 443

networks:
  my-network:

And the configuration of the haproxy.cfg file was as follows:

global
  log stdout format raw daemon

defaults
  log global
  mode tcp
  timeout connect 10s
  timeout client 30s
  timeout server 30s

frontend tcp-in
  bind *:8080
  default_backend tcp-out

backend tcp-out
  balance roundrobin
  server xmr1 xmr1:50001
  server xmr2 xmr2:50001
  server xmr3 xmr3:50001

listen xmr-response
  bind *:9505
  mode tcp
  balance roundrobin
  server xmr1 xmr1:9505 check
  server xmr2 xmr2:9505 check
  server xmr3 xmr3:9505 check

The logic was as follows: my CMSs send commands to my XMR server on port 8080. My HAproxy container was mapped to port 8080:8080 and inside the container, it directed requests to the XMR services on port 50001 of each container, one at a time.

The XMR response was sent to port 9505 of my HAPROXY container, and my haproxy container had the port mapping as 443:9505, meaning the return of 9505 goes straight to port 443 of my host. (yes, XMR works with port 443).

The service actually worked because the requests sent by the CMS arrived perfectly at the XMR containers, perfectly balanced in (almost) real-time, and the players also managed to connect to the XMR service. The problem was that the response actions of the players were incredibly slow, for some reason that doesn’t make sense to me, considering that the requests and connections arrive in real-time (I was validating them through the haproxy logs and the XMR debugger).

In the end, I gave up and just ran a standalone XMR container, and mapped port 50001 to port 8080 of my host and 9505 to port 443 of my host, it ended up working very well, and I have my service working perfectly.

(I should have tested this first before trying something more elaborate hahahaha)

That’s the story.
Does anyone here know any reason why I had a terrible result despite the requests working in real-time?
Has anyone tried this before?

Thank you, guys!

pitoco02 · April 7, 2023, 3:42pm

Updating*: The only command that works is “Collect Now”… all the others aren’t working for some reason

Updating**: I just quit

dan · April 11, 2023, 7:41am

That’s an interesting journey!

I am not a networking expert, but I think the problem will be that the CMS will not know which XMR server to send the message to for each display. Lets say Player A connects and binds to XMR 2 - how does the CMS know to also route the message to XMR 2?

I’m sure you’ll end up with some issues sending non http data down these reserved ports. If its all still working great, then good luck!

All XMR commands are sent in the same way, except for the QOS number (1 to 10 number of which messages should have priority).

I have a project in my mind to rebuild XMR on a minimal Debian container with PHP 8.2 as I suspect the bottleneck in performance is more related to PHP/ReactPHP than it is ZMQ.

Even more interesting would be a rewrite in something like Go!