Remote Dataset JSON Nested data

rvd · January 25, 2019, 4:44pm

Hi,

I’ve got the following piece of JSON, from which I want to ingest $data.headline.title, $data.headline.translations.urls.url and $data.image.url into a dataset as separate columns.

I tried with “data” as Data Root, but cannot get any values into the columns - probably because the arrays are nested.
Is there any way to do this ?

In JSONPath format I see that e.g. $.data.[*].headline.translations[0].urls.[0].url should work, however it doesn’t pick anything up in Xibo

Any suggestions ?
Thanks
Rene

{
  "start" : 0,
  "rows" : 2,
  "total" : 2,
  "data" : [
    {
      "headline" : {
        "headlineId" : "13097310",
        "status" : "publish",
        "priority" : 1,
        "pinned" : 1,
        "title" : "Blabla title1",
        "startTime" : 1548191454017,
        "endTime" : 1548796254017,
        "tenant" : "comms",
        "translations" : [
          {
            "locale" : "en_US",
            "abstractText" : "BlaBla text",
            "title" : "Blabla item title",
            "urls" : [
              {
                "contentType" : "text/html",
                "url" : "google.com",
                "mobile" : true
              }
            ]
          }
        ],
        "image" : {
          "layout" : "hdSquare",
          "height" : 300,
          "width" : 300,
          "imageContentId" : "050a8880",
          "url" : "blabla#com#ff#jpg",
          "altText" : "Blabla alt text",
          "backgroundColor" : "#000000",
          "foregroundTextColor" : "#ffffff",
          "imageLibraryId" : "04b2cb90"
        },
        "limitsTo" : [
          {}
        ]
      },
      "metrics" : {
        "rank" : 10000500
      }
    },
    {
      "headline" : {
        "headlineId" : "13097311",
        "status" : "publish",
        "priority" : 1,
        "pinned" : 1,
        "title" : "Blabla title1",
        "startTime" : 1548191454017,
        "endTime" : 1548796254017,
        "tenant" : "comms",
        "translations" : [
          {
            "locale" : "en_US",
            "abstractText" : "BlaBla text",
            "title" : "Blabla item title",
            "urls" : [
              {
                "contentType" : "text/html",
                "url" : "google#com",
                "mobile" : true
              }
            ]
          }
        ],
        "image" : {
          "layout" : "hdSquare",
          "height" : 300,
          "width" : 300,
          "imageContentId" : "050a8881",
          "url" : "blabla#com#ff#jpg",
          "altText" : "Blabla alt text",
          "backgroundColor" : "#000000",
          "foregroundTextColor" : "#ffffff",
          "imageLibraryId" : "04b2cb91"
        },
        "limitsTo" : [
          {}
        ]
      },
      "metrics" : {
        "rank" : 10000500
      }
    }
  ]
}

alex · February 4, 2019, 4:08pm

So your data root is data as you suggest:

Then your columns are as follows:

The remote data paths being:

headline.title
headline.translations.0.urls.0.url
headline.image.url

That gives me:

Credit to @dan for his help with that.

rvd · February 5, 2019, 4:22pm

Thanks @alex and @dan for the very comprehensive response !

Think I found what my problem is. My source is actually Raw JSON, not Prettyfied - with RAW it doesn’t work, with Prettyfied it does work. So it looks like the remote parser is actually not working if the RAW/Tiny format is used.

The following is failing:

{"start":0,"rows":2,"total":2,"data":[{"headline":{"headlineId":"13097310","status":"publish","priority":1,"pinned":1,"title":"Blabla title1-RAW","startTime":1548191454017,"endTime":1548796254017,"tenant":"comms","translations":[{"locale":"en_US","abstractText":"BlaBla text","title":"Blabla item title","urls":[{"contentType":"text/html","url":"google.com","mobile":true}]}],"image":{"layout":"hdSquare","height":300,"width":300,"imageContentId":"050a8880","url":"blabla#com#ff#jpg","altText":"Blabla alt text","backgroundColor":"#000000","foregroundTextColor":"#ffffff","imageLibraryId":"04b2cb90"},"limitsTo":[{}]},"metrics":{"rank":10000500}},{"headline":{"headlineId":"13097311","status":"publish","priority":1,"pinned":1,"title":"Blabla title2-RAW","startTime":1548191454017,"endTime":1548796254017,"tenant":"comms","translations":[{"locale":"en_US","abstractText":"BlaBla text","title":"Blabla item title","urls":[{"contentType":"text/html","url":"google#com","mobile":true}]}],"image":{"layout":"hdSquare","height":300,"width":300,"imageContentId":"050a8881","url":"blabla#com#ff#jpg","altText":"Blabla alt text","backgroundColor":"#000000","foregroundTextColor":"#ffffff","imageLibraryId":"04b2cb91"},"limitsTo":[{}]},"metrics":{"rank":10000500}}]}

While this is working:

{
  "start": 0,
  "rows": 2,
  "total": 2,
  "data": [
    {
      "headline": {
        "headlineId": "13097310",
        "status": "publish",
        "priority": 1,
        "pinned": 1,
        "title": "Blabla title1-PRetty",
        "startTime": 1548191454017,
        "endTime": 1548796254017,
        "tenant": "comms",
        "translations": [
          {
            "locale": "en_US",
            "abstractText": "BlaBla text",
            "title": "Blabla item title",
            "urls": [
              {
                "contentType": "text/html",
                "url": "google.com",
                "mobile": true
              }
            ]
          }
        ],
        "image": {
          "layout": "hdSquare",
          "height": 300,
          "width": 300,
          "imageContentId": "050a8880",
          "url": "blabla#com#ff#jpg",
          "altText": "Blabla alt text",
          "backgroundColor": "#000000",
          "foregroundTextColor": "#ffffff",
          "imageLibraryId": "04b2cb90"
        },
        "limitsTo": [
          {}
        ]
      },
      "metrics": {
        "rank": 10000500
      }
    },
    {
      "headline": {
        "headlineId": "13097311",
        "status": "publish",
        "priority": 1,
        "pinned": 1,
        "title": "Blabla title2-PRetty",
        "startTime": 1548191454017,
        "endTime": 1548796254017,
        "tenant": "comms",
        "translations": [
          {
            "locale": "en_US",
            "abstractText": "BlaBla text",
            "title": "Blabla item title",
            "urls": [
              {
                "contentType": "text/html",
                "url": "google#com",
                "mobile": true
              }
            ]
          }
        ],
        "image": {
          "layout": "hdSquare",
          "height": 300,
          "width": 300,
          "imageContentId": "050a8881",
          "url": "blabla#com#ff#jpg",
          "altText": "Blabla alt text",
          "backgroundColor": "#000000",
          "foregroundTextColor": "#ffffff",
          "imageLibraryId": "04b2cb91"
        },
        "limitsTo": [
          {}
        ]
      },
      "metrics": {
        "rank": 10000500
      }
    }
  ]
}

alex · February 6, 2019, 8:28am

I copied your block of json which you say doesn’t work in to a file, and then swapped the dataset over to use that version instead.

I waited for it to sync, and the data came in as expected.

If it’s valid json, then it should import it regardless of formatting. I’d suggest running what you have that doesn’t work through a json validator first to check it’s valid. If it is, then make sure that it’s not in a subtly different format to the examples you’ve given us here.

Finally make sure that you allow time for the dataset to be synced again after you make a change, as that won’t happen in real time (the background sync task needs to run, and the dataset needs to be eligable to be synced). You should see the last sync date/time change after that happens.

rvd · February 7, 2019, 9:37am

Thank you Alex for your patience.
It seems that it didn’t pick up the test files I worked with due to MD5 hash checking - I put logging to debug and am starting to see a bit more info now:

1927 722698b 2019-02-07 09:11 CONSOLE GET DEBUG / Skipping due to MD5

However the source I’m trying to fetch from (Intranet so can’t post a link) in the log files is showing the following:

When I tried via CURL I see the following:
curl -I --head https://intraneturl/matches
HTTP/1.1 405 Method Not Allowed
Date: Thu, 07 Feb 2019 09:42:02 GMT
Connection: keep-alive
Allow: GET

Is there any reason why Xibo uses HEAD instead of GET ?

Regards,
Rene

alex · February 7, 2019, 12:18pm

It uses head then get.

The reason is that people will point that URL to any old thing - we’ve had people point to streaming radio services, mpeg video streams etc.

It does a HEAD request to get some idea of what the file at the end of the URL is so it can avoid scenarios where it’s likely to download forever and fill the servers disk with data it can’t use.

Is there a reason your intranet server blocks HEAD requests?

In any case, it’s just an informational log line. The code continues to do the sync after that regardless, so I don’t think it’s the cause of your issue.

github.com

xibosignage/xibo-cms/blob/develop/lib/Factory/DataSetFactory.php#L384


try {
    // Make a HEAD request to the URI and see if we are able to process this.
    if ($dataSet->method === 'GET') {
        try {
            $request = $client->head($resolvedUri, $requestParams);


            $contentLength = $request->getHeader('Content-Length');
            if ($maxMemory > 0 && count($contentLength) > 0 && $contentLength[0] > $maxMemory)
                throw new InvalidArgumentException(__('The request %d is too large to fit inside the configured memory limit. %d', $contentLength[0], $maxMemory), 'contentLength');
        } catch (RequestException $requestException) {
            $this->getLog()->info('Cannot make head request for remote dataSet ' . $dataSet->dataSetId);
        }
    }


    $request = $client->request($dataSet->method, $resolvedUri, $requestParams);


    // Check the cache control situation
    if ($enableCaching) {
        // recache if necessary
        $cacheControlKey = $this->pool->getItem('/dataset/cache/' . $dataSet->dataSetId . '/' . md5($resolvedUri . json_encode($requestParams)));
        $cacheControlKeyValue = ($cacheControlKey->isMiss()) ? '' : $cacheControlKey->get();

Skipping due to MD5 means that the feed didn’t change since the last time we polled and processed it, so it won’t be processed again (as it would result in the same data ending up in the dataset).

rvd · February 7, 2019, 1:40pm

Thanks @alex,
So it turns out that I’m not getting anything into the columns I added because it’s not polling it again due to the feed not being changed I guess.
Would be good to be able to try and pick up the feed without looking at the cache - especially when the feed is not super-dynamic… or you might need to wait 2 days before you can actually see if the fields are now populated.
I set the fetching schedule to 1 minute during testing, but the effect I guess is that the feed syncs before I get to add the columns, and will never poll again until the MD5 hash changes, so they stay empty until then.
Why the HEAD is not working ? I’ve got no idea… this is how the team that set up the feed have set up their server, probably no particular reason.

Regards,
Rene

alex · February 7, 2019, 1:53pm

So for now you could just clear the cache. That will force it to load it again.

You can simply empty the directory shared/cms/library/cache assuming a Docker-based install.

rvd · February 7, 2019, 5:32pm

Thanks @alex ! Managed to clear the cache and all is working perfectly now, all items properly read, and showing in my ticker again !