Internal error server

Victor_Araya_V · February 21, 2018, 3:48am

Hi you guys! Sorry my english first of all.

my issue is: I am having a lot of “Internal error server” on my xibo 1.8.5 with 23 displays with Android players. (Especially within the regions).

I have another instance of Xibo (1.7.9) on the same server (Hostgator VPS / centOS 6.9 / 2 processors / 4 GB RAM a lot of disk space).

the version 1.7.9 is working fine, with same Android Players and 19 displays. The layouts loads very fast, but the other instance of Xibo (1.8.5) is very slow loading any layout (regions) and very frecuently i have this “Internal server error”.

the mysqld process have a lot of cpu usage sometimes (over 170%) and i disabled all stats or error logs (so mysql have not so much queries), but still the problem.

On 1.8.5 version i could not install zeroMQ. Is it possible that this is the problem?

Thanks in advance.

dan · February 21, 2018, 11:33am

It is unlikely that a missing XMR would cause the issue.

It would be useful to track down the root cause of the internal server error, which means looking in the logs and possibly putting the CMS into test mode, which should expose more information.

We’d expect 1.8 to be faster that 1.7, so I am sure something is wrong.

Victor_Araya_V · February 21, 2018, 1:34pm

Ok dan, thank you. I put the cms in test mode and im having about 3000 records on log table every 20 minutes. What should I look for?

Thanks again.

dan · February 21, 2018, 2:34pm

Start by looking for anything with the log level ERROR - once you have those you can paste that in here.

3000 records with test mode is quite conservative - you can turn it off again as soon as you’ve seen an internal server error.

It would be even better if you could match the time of the internal server error to the relevant section of the log.

Victor_Araya_V · February 21, 2018, 6:32pm

what do you think about it?

dan · February 21, 2018, 10:22pm

The Player errors point towards you needing to check your XMR configuration - as you’ve not been able to install it correctly you need to set the XMR public address to empty, otherwise it will continue to try and communicate via XMR.

The XMDS errors relate to deadlocks and would usually point to an under powered MySQL server (i.e. it is struggling with the demand). We cache an awful lot of XMDS data, so its surprising that you have so many errors here.

You’d need to do some deadlock analysis to work out what exactly was causing the issue. As you have very high CPU load, it would be interesting to see which process that is specifically - if its MySQL, perhaps there are some run-away database tables causing an issue (you could look at the row counts of each table to see if there is anything unusual).

I’d suggest fixing problem 1, truncating the logs and seeing if the deadlock issues remain as a result.

Victor_Araya_V · February 21, 2018, 10:57pm

Hi dan… mmm… The public address is empty. (and it has been since always). And the process that reaches more than 170% cpu usagae is mysql…

2018-02-21 (3)_LI

not always is so high… generally is about 1%, but suddenly it goes up, I’m sure it corresponds to the xibo database.

Why does xmr still insert an error record?

i make sure that the xmr public is empty. (private is tcp:://localhost:5555)

And this is another log error:

Same thing happen with other displays, not only “10”… just in case…

dan · February 22, 2018, 9:16am

“XMR address not configured” is fine, we do log that on empty address (i’m not sure why actually, that should probably be an “info” event - but anyway that is not the cause of your problem).

The other message you’ve posted means that the displays are requesting files that do not exist in the CMS anymore, perhaps due to an outdated cache. I’d be inclined to try deleting the cache (by default in /library/cache)

Were you able to see if you have any large tables?

Victor_Araya_V · February 22, 2018, 11:48am

Thanks Dan. i just delete the content inside cache folder (I will keep an eye on the log). I don’t know if this screenshoot is useful for row counting:

dan · February 22, 2018, 12:11pm

17 thousand media records and 75 thousand required files is huge (75,000/19 is 4000 required files per display!!)

Do you really have 4000 files required for each display?!

Generating that list will be the cause of the CMS spike I am sure - I guess you’ve got a lot going on on your Layouts?

Victor_Araya_V · February 22, 2018, 12:38pm

Thanks again dan!

Clearly the layouts don’t have so many files. A tipical layout of the cms has about 5 images and 5 videos. A few twitter (10 or 15), weather, clock, date, and maybe a dataset.

Well, we already found the source of the problem! Now, what could be the solution?

dan · February 22, 2018, 1:43pm

This is most likely the problem - 10 or 15 searches will generate a few hundred tweets (i guess) and 2 images per tweet? What sort of search parameters / limits have you got set up on the Twitter feeds?

Are you running XTR to remove expired files?

Victor_Araya_V · February 22, 2018, 3:07pm

i think we do:

This is a typical search:

and looks for new tweets every 20 or 30 minutes… that could be the problem?

dan · February 22, 2018, 3:27pm

Regular maintenance hasn’t ever run by the looks of it (nothing in last run date). I am sure that will be part of the issue - perhaps its hitting deadlocks or something.

I’m not sure where the 17k media records are coming from - it looks like that twitter search would only return 10 tweeks - max 20 images.

Victor_Araya_V · February 22, 2018, 3:32pm

I guess you’re right:

What if I manually delete all the records from name column begin with “twitter_photo_” from table “media”?

How can I make sure I run the Regular maintenance?

dan · February 22, 2018, 5:21pm

I just can’t see why there would be that many - would you mind providing an export of your layout for us to try? You can upload to transfer.xibo.org.uk and PM me the link.

Your regular maintenance will eventually time out and try again.

Can you run the following SQL?

SELECT COUNT(*) 
  FROM `media` 
 WHERE mediaId NOT IN (
    SELECT mediaId 
       FROM `lkwidgetmedia`
 )
 AND type = 'module'

That will tell us how many of those are not in-use right now.

If that gives a low number then I really don’t know what to say - perhaps your layout export will shed some light on it.

Victor_Araya_V · February 22, 2018, 5:38pm

this is the result:

COUNT(*)
79

dan · February 22, 2018, 7:56pm

Well, they are all used then - that is a lot of data!

I’d be interested to see one of your Layouts so that we can examine your use case in more detail. Better yet would be a ZIP of your DB and library so that we can restore here to see if there are any optimisations that can be made.

Victor_Araya_V · April 13, 2018, 12:48pm

Hi Dan, I would like to consult again on this issue… I finally had to do a new and clean installation…
CMS: 1.8.6 / on VPS (hostgator). / 28 screens.

Again im having this Internal server error (error 500) mainly when I load a layout with several regions. some regions quickly show error 500. when I reload the whole page, sometimes it loads all regions well. The elements are simple: clock, text, pdf or twitter. Random sometimes go wrong.

this is my ddbb:

and:

result: COUNT(*): 48.

I think the problem could be in the ‘requiredfile’ table … It grows very fast. I guess it’s twitter. I have different layouts (about 20) but each one requests 5 different tweets every 60 minutes. The problem is that the information is always saved (i think). Should not it be cleaned automatically after certain hours?

The tasks work perfect by the way.

Maybe this is important, when i do:

select count(*) from `requiredfile` where path like '%twitter_photo%'

the result is: COUNT(*) 31854.

How can i clean this?

Thanks again!

Victor_Araya_V · April 22, 2018, 6:01pm

Hi Dan, I would like to ask again regarding this… I have empty the public adress, but players still trying to connect and generates a lot of ERROR: Unable to start XMR queue: class java.lang.Exception/XMR address not configured. or XMR unresponsive, issue reconfigure.

I have 1.8.9 version install, This should not happen, right?