Downloaded file failed MD5 check

I am not sure, files were read/written. (Win8 on emmc!) ok i think.

Try this script. And while it is running press F5. you will see the md5 is not correct.

<?php
			//$file = 'ticker_5af5fdeca25e46dbf6c04c2006435e18';
			$file = 'in.jpg';
			downloadFile('http://www.nasa.gov/sites/default/files/thumbnails/image/hall-thruster.jpg', $file);
			$storedAs = 'out.jpg';
            if (!@copy($file, $storedAs)) {
                print('Error storing file.');
			}

            // Calculate the MD5 and the file size
            $md5        = md5_file($storedAs);
            $fileSize   = filesize($storedAs);
			
			echo $md5;
			
			
function downloadFile($url, $savePath)
    {
        // Use CURL to download a file
        // Open the file handle
        $fileHandle = fopen($savePath, 'w+');

        // Configure CURL with the file handle
        $httpOptions = array(
            CURLOPT_TIMEOUT => 50,
            CURLOPT_SSL_VERIFYPEER => true,
            CURLOPT_USERAGENT => 'test',
            CURLOPT_HEADER => false,
            CURLOPT_FOLLOWLOCATION => true,
            CURLOPT_URL => $url,
            CURLOPT_FILE => $fileHandle
        );

        $curl = curl_init();

        // Set our options
        curl_setopt_array($curl, $httpOptions);

        // Exec saves the file
        curl_exec($curl);

        // Close the curl connection
        curl_close($curl);

        // Close the file handle
        fclose($fileHandle);
    }			
?>

So I ran your file.

Your script outputs:
be8dbdc0e34a94d916f828c3cfea10f0

If I check the file with md5sum I get:
be8dbdc0e34a94d916f828c3cfea10f0

If I download the file myself I get:
be8dbdc0e34a94d916f828c3cfea10f0

So I don’t see the problem?

Are you repeatedly pressing F5 over and over so you’re triggering multiple downloads to run simultaneously and therefore to write to the same file? If so you will see md5 changing as multiple threads write at the same time.

No, i press f5 only once or twice.

As written, i think i have more then one Problem.
The Problem i don’t have on the Linux Version. Same as your result.
Maybe it is only on Windows/xampp

EDIT: I am going to try it on win7/xampp…

We don’t want a cache per instance of the item - we want a system wide cache - that is why we use the URL in the hash so that we don’t download the same temporary image if the feed is used in multiple layouts. Including the mediaId will give an assignment specific cache and still wont guarantee the single instance downloads (multiple players can reference the same layout).

flock / fopen

I thought that cURL would get a write lock on that file… but perhaps not, I can’t find anything conclusive either way.

I don’t see any harm in having flock implemented.

Ok, good to know that it should be no Problem having the same Image in multiple Feeds.

ok, don’t know why it is needed on some os/webservers

flock Problem: same on win7 with xampp.
Maybe an alternative solution http://php.net/manual/en/function.curl-multi-exec.php

What version of XAMPP have you got installed?

xampp-win32-5.6.3-0-VC11-installer[1].exe
on win7 the error appears not so often as in win8 (maybe slower harddisk/system)

So if I install Win 7 (I don’t have access to 8 at the moment) and that XAMPP, and Xibo 1.7.3 and configure a player to talk to that CMS with that ticker feed I’ll see the issue?

I really don’t know. You don’t Need to try, i have to make more test…
Maybe related to Hardware, Network Hardware/Network Card or wireless Network, cable Network whaterver, i don’t know

I’ve got a Win7 machine setup with the same XAMPP and have managed to reproduce this in that setup. I’m not sure why though at this point.

I applied your proposed locking patch and if anything the problem got worse at that point.

With regard to the Android Player not liking the lg feed, it’s because some of the images in there are around 34MB in size. That’s more memory than we’d be allocated on most devices so it’s not possible for the Player to rescale an image that large.

I’m not sure why you would want to remove the FOLLOW_LOCATION Curl option?

That will break feeds that use a redirection when accessing the image (as I suspect the Nasa feed does since the image URLs are http://server/path/file.jpg?randomstring)

Ok, but i think all 30/60 images gets downloaded. I will check if there are redirects
Because of this:

CURLOPT_FOLLOWLOCATION cannot be activated when safe_mode is enabled or an open_basedir is set

Right. I see. Well safemode isn’t enabled on the XAMPP installation?

Which have you got turned on our of interest? safe_mode or open_basedir?

I think we ought to introduce an install check for that so people stop installing on PHP with safe_mode turned on as it can break all kinds of things.

safemode is off, only open_basedir restriction…
i wrote my provider if he is able to configure something. (and install ca-certificates)
Yes, safemode is off on xampp
And yes, that check is ok. since php 5.4 safemode is removed.
Locks in symfony (i don’t know if that is the problem, but a good reference https://github.com/symfony/Filesystem/blob/master/LockHandler.php)

As I said I applied your code and actually it made it happen more often.

I don’t think it’s a locking problem because the files in the CMS library are fine and aren’t corrupt at all. If it was locking then they’d be corrupted.

Yes i know, i don’t know if my solution is correct, but on xampp i got md5 errors on my test-script. really don’t know, only had a look at the code, they write:

usleep(100); // Give some time for chmod() to complete

or

// On Windows, even if PHP doc says the contrary, LOCK_NB works, see
// PHP :: Doc Bug #54129 :: using LOCK_NB whilst aquiring a lock with flock() does work on windows

I am just about to look at this on Alex’s recreation - in summary here is what I know before looking:

  • Files are downloaded correctly to the CMS and can be opened
  • All Files are NOT downloaded correctly to the Player
  • The failing files are different each time

Yes i can confirm that. (Screenshots in first Post)
As said before, it is not so easy, i think i have multiple problems. Sometimes there was the wrong md5sum in the DB! For that i have written the new function. Maybe another problem.

I think I have located part of the problem - there is a concurrency issue whereby the CMS adds the same resource to the media table twice. That then causes the file to be downloaded by the player twice, with the second download overwriting the first one.

I’ve added a manual comparison to XMDS which will prevent those duplicates arriving at the Player, which should fix the effect, even those it doesn’t fix the cause.

Please ignore the removal of white space - not sure how the white spaces got in there, but they are gone now :smile:

Ok i am going to test that.
Line 404 should read:

$pathsAdded[] = 'media_' . $path;

instead of

 $pathsAdded = 'media_' . $path;
1 Like