Proof of Play: How should Xibo collect this information?

dan · May 11, 2017, 7:38am

We feel that the current way Xibo collects and aggregates proof of play statistics is unsustainable. Since they were added to the CMS, we have chosen the most detail and most power, which results in a lot of records.

Current

Currently each Player will collect a “stat” for each play of each widget on a Layout and a “stat” for the play of the entire Layout. It will buffer these records locally, and when it has a sufficient quantity will send them to the CMS. The CMS will store them in the stat table.

The statistics reports query the stat table, aggregate (or not) the results and present them to the user.

Stats are periodically archived to CSV in the Library, or deleted at the users preference.

Issues

The principle issues with this are:

A layout containing a lot of short duration widgets generates a lot of data (it is common to have a text item or image that only lasts 10 seconds) - imagine a 3 region layout with a 10 second widget in each region - that is 4 stat records every 10 seconds which will be recorded.
Lots of stat records mean lots of bandwidth consumption between Player/CMS
If the Players go offline for periods of time, it is not uncommon to have a backlog of stat records that cannot possibly be sent in the time the player is connected.
Lots of stats means a big (really big in some cases) stat table
A big stats table is hard to query on DB servers that have a hardware spec suitable for the rest of Xibo - recording stats inflates the hardware requirement way beyond what is actually required
Stats take a lot of disk space
Stats have to be archived to mitigate the above - effectively throwing away all the hard CPU/bandwidth to get them there

The current method does however give you absolute detail, down to each individual play, which is very powerful.

Ideas for improvement

We have two ideas for how this could be improved - there may be better ideas, comments welcome.

Don’t store them in the database, store them in elastic search (or similar). This almost completely solves the issue of CMS resources being used - elastic search could be moved to much slower spinning disks, the database health is improved and uses less resources. It is still slow to query, but we mitigate that by generating regular automatic reports with aggregated data.
Don’t store them to such detail. Have a minimum aggregation of 1 day and perform initial aggregation on the Player so that results are aggregated to the collection interval (each collection interval you get an aggregated view of what has played during that time). Further aggregate when the stats arrive at the CMS so that records are stored per day, per layout, per widget/media.

Further to this - both options would also include the ability to set whether stats recording should be collected on the Layout/Widget/Media, allowing the user to ignore things like the default layout and only collect stats on what matters.

Both options would also involve deleting the current stats on upgrade.

Comments and suggestions welcome.

cslaughter · May 11, 2017, 4:44pm

I think a combination for ideas for improvement is the way to go.

I would like the ability to see full stats for X days, which could be configured. Stats older than that are moved to elastic search.

In addition, having the ability to select the type of stats being recorded is huge. When you have hundreds of screens all sending data back on 10 second media plays with data you don’t need, that eats bandwidth, CPU and disk space. Anything that is not needed, not being sent, will save resources all around.

I personally would be fine with needing to delete the stats on upgrade if it was for the initial upgrade to improve the stats. This is because currently they are almost useless for us, so getting working functionality supersedes keeping the data we have limited access to. If the stats would need to be cleared before each upgrade there after… that could be a problem.

dan · May 12, 2017, 6:56am

That is essentially what we have already, except older stats are deleted (or archived to CSV)? By “full stats” you mean non-aggregated?

Elastic Search would be the back-end for storage in case one - they would go into elastic search and be queried from elastic search. It is more of a back-end issue rather than anything you’d notice from the user interface. The downside is that elastic search is much harder to configure than our current stack.

So I guess the question is why do you want to see individual records for X days?

To be precise - I think it would be an “opt-in” rather than an “opt-out” so that by default users do not get the records they might not want. Or perhaps a system of “On/Off/Inherit” where the default is inherit and the top level is off.

Just one upgrade - we could even leave the old stat table there I suppose, although it seems a little pointless. Our issue is that if we decide to operate on those records in any way, most upgrades will fail as there is too much data in that table to work with.

sgiunchi · May 12, 2017, 4:48pm

I second this, I don’t need to know how much times the clock has played, but I need a report on how much times an advertising has.

cslaughter · May 15, 2017, 6:27am

Yes. Just want to make sure that, if the functionality of stats will changed, that access to the same level of stats could still be possible if wanted, for a short time span, and still keep the impact to the database to an acceptable level. (Still voting for an option to clear logs and stats, before the database upgrade in a CMS upgrade )

Many clients want to double check that ads are playing during X time. So, for them to have access for X number of days to see exactly where and when an ad showed, is a plus in showing them they are getting what they paid for.

Works for me . I really do like the idea of only collecting data on what I want for stats.

Works for me as well.

I know this is only a features idea at the moment. But any rough idea at when these changes might be seen in the cms? Would it be 1.9 or 2.0?

dan · May 15, 2017, 6:57am

I think the problem is that any proposal that allows a user to collect full stats, has the potential to arrive at the current situation, which of course is what we wanted to avoid.

We will have a think about how we can do what you’ve suggested.

We’d like to solve it for 1.9 as it is a pressing issue for many - particularly CMS instances with a lot of Displays.