Use PHP to delete file types in a directory and its sub directories
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

I'm having my graphic design students upload files to a server. I want the following PHP code to recursively delete all .js and .php files in the uploaded directories and any subdirectories. Currently, the code only deletes the .js and .php files in the top-level directory, not in the subdirectories as well.

OK, I actually got the code to work myself while I was waiting for the bounty to be filled. So, instead, I'm now looking for someone to add the function of stripping out any code between <?php ?> and <script></script> tags inside of HTML files that get uploaded.

<?php

$ds = DIRECTORY_SEPARATOR;
$storeFolder1 = '../../testserverOne/';
$storeFolder2 = '../../testserverTwo/';
$rand = substr(md5(time()),0,8);

if (!empty($_FILES)) {

    $tempFile = $_FILES['file']['tmp_name'];

    $targetPath1 = dirname( __FILE__ ) . $ds . $storeFolder1 . $ds;
    $targetPath2 = dirname( __FILE__ ) . $ds . $storeFolder2 . $ds;

    $fullPath1 = $storeFolder1.rtrim($_POST['path'], "/.");
    $fullPath2 = $storeFolder2.rtrim($_POST['path'], "/.");

    $folder1 = substr($fullPath1, 0, strrpos($fullPath1, "/"));
    $folder2 = substr($fullPath2, 0, strrpos($fullPath2, "/"));

    if (!is_dir($folder1)) {
        $old1 = umask(0);
        $old2 = umask(0);
        mkdir($folder1, 0777, true);
        mkdir($folder2, 0777, true);
        umask($old1);
        umask($old2);
    }

   $directory1 = $storeFolder1;
    $iterator1 = new RecursiveDirectoryIterator($directory1);
    $directoryIterator1 = new RecursiveIteratorIterator($iterator1);
    foreach ($directoryIterator1 as $file1) {
        $extension1 = $file1->getExtension();
        if (!$file1->isDir() && $extension1 == 'php' || $extension1 == 'js') {
            unlink($file1->getPathname());
        }
    }

    $directory2 = $storeFolder2;
    $iterator2 = new RecursiveDirectoryIterator($directory2);
    $directoryIterator2 = new RecursiveIteratorIterator($iterator2);
    foreach ($directoryIterator2 as $file2) {
        $extension2 = $file2->getExtension();
        if (!$file2->isDir() && $extension2 == 'php' || $extension2 == 'js') {
            unlink($file2->getPathname());
        }
    }

   //array_map('unlink', glob($folder1."/*.js"));
  //array_map('unlink', glob($folder1."/*.php"));

    if (move_uploaded_file($tempFile, $fullPath1)) {
        copy($fullPath1, $fullPath2);
        die('Uploaded');
    } else {
        die('Error');
    }

}

?>
I've since got the code to work before any solutions were supplied, so I've updated the bounty with new parameters.
imokyourok 3 months ago
Tags
PHP

Crowdsource coding tasks.

2 Solutions


To strip Javascript and PHP code from HTML files, you can simply use this function:

function strip_code($file) {
  if (preg_match('~[.]html?$~', $file) > 0) {
    $data = file_get_contents($file);
    $patterns = [
      '~<[?](?:php)?.*?[?]>~gims',
      '~<script[^>]*>(.*?)</script>~gims',
    ];

    foreach ($patterns as $pattern) {
      $data = preg_replace($pattern, '', $data);
    }

    file_put_contents($file, $data, \LOCK_EX);
  }

  return $file;
}

And call it with move_uploaded_file:

if (move_uploaded_file(strip_code($tempFile), $fullPath1)) {
  copy($fullPath1, $fullPath2);
  die('Uploaded');
} else {
  die('Error');
}

Here's a different approach, I took the liberty of improving your code and fixing some issues too:

<?php

$storeFolder1 = '../../testserverOne/';
$storeFolder2 = '../../testserverTwo/';

if (empty($_FILES) !== true) {
    $file = $_FILES['file']['tmp_name'];
    $path1 = $storeFolder1 . rtrim($_POST['path'], '/.');
    $path2 = $storeFolder2 . rtrim($_POST['path'], '/.');

    $mask = umask(0);

    if (is_dir(dirname($path1)) !== true) {
        mkdir(dirname($path1), 0777, true);
    }

    if (is_dir(dirname($path2)) !== true) {
        mkdir(dirname($path2), 0777, true);
    }

    umask($mask);

    $success = false;

    if (move_uploaded_file($file, $path1) === true) {
        $success = copy(strip_code($path1), strip_code($path2));
    }

    if (true) { // there are most likely better ways to do what you want
        strip_files($storeFolder1);
        strip_files($storeFolder2);
    }

    die($success ? 'Uploaded' : 'Error');
}

function strip_code($file)
{
    if (preg_match('~[.]html?$~i', $file) > 0) {
        $data = file_get_contents($file);
        $patterns = [
            '~<[?](?:php)?.*?[?]>~gims',
            '~<script[^>]*>(.*?)</script>~gims',
        ];

        file_put_contents($file, preg_replace($patterns, '', $data), \LOCK_EX);
    }

    return $file;
}

function strip_files($path)
{
    if (is_dir($path) === true) {
        $it = new RecursiveDirectoryIterator($path);
        $files = new RecursiveIteratorIterator($it, RecursiveIteratorIterator::CHILD_FIRST);

        foreach ($files as $file) {
            if ((is_file($path) === true) || (is_link($path) === true)) {
                if (preg_match('~[.](?:js|php)$~i', $file->getPathname()) > 0) {
                    unlink($file->getPathname());
                }
            }
        }
    } else if ((is_file($path) === true) || (is_link($path) === true)) {
        if (preg_match('~[.](?:js|php)$~i', $file->getPathname()) > 0) {
            unlink($path);
        }
    }
}

I don't think you should be recursively pruning files on every upload, but without understanding your use case it's hard to suggest alternative solutions.

I added the function and updated the call, but the script tags (and their contents) aren't being removed from the html files. This needs to work recursevly as well.
imokyourok 3 months ago
@imokyourok: Your code only uploads one file at a time, so why does it need to work recursively? As for the files that need to be stripped of the tags, are you sure their extension is .html or .htm?
alixaxel 3 months ago
I did not know that the files were supposed to be stripped on transfer. So it wouldn't need to be recursive in the regards. The files I uploaded were .html and thus, still not working. But you bring up a good point, the solution needs to strip the tags from both .htm and .html files.
imokyourok 3 months ago
@imokyourok: The code that I shared is already dealing with HTML/HTM extensions. And it should also work. Your code is a bit weird to follow as there's several unused/unneeded variables and logically it's also not very clear: you only upload a single file at a time and every time any file is uploaded, you're recursively traversing two directories to prune PHP and JS files - why? Assuming your directories are in a clean (no JS/PHP files) state, you could simply not upload the file in the first place if it had either a JS or PHP extension... Besides, you're uploading the new file after you prune the directories of PHP/JS files, which means a PHP/JS can live there indefinitely.
alixaxel 3 months ago
That being said, I suspect the HTML substitution is likely not working for you because of lack of permissions on your PHP temporary upload directory, I'll post a different approach shortly.
alixaxel 3 months ago
I won't be able to look at your solution till much later today. It's the end of the semester and I have lots of grading to do today. To answer your question, this script uses dropzone.js to allow my students to upload their HTML/CSS files and put them in a directory on my server for testing their design choices for performance, where to make breakpoints, etc. I don't want to put the file type upload limits on the user side. Instead of them seeing a file isn't uploaded, I'd rather take care of the sanitizing server side. Also, I believe you when you say the code is a bit of a mess. I'm a graphic design educator who cobbled this together. So cleaning the code up is appreciated as long as it serves the same functions. As I said, I will test this later today.
imokyourok 3 months ago
@imokyourok No worries, take your time. I was right in assuming that you don't need to prune the files on every file upload then. It might be good for you to try the updated version to discard any issues you might be having with your current server configuration and then I can make some changes to address the pruning. Good luck with the grading!
alixaxel 3 months ago
Just tried the code. It's not deleting the .php or .js files or stripping out any script tags.
imokyourok 3 months ago

try this,

    <?php
    $ds = DIRECTORY_SEPARATOR;
    $storeFolder1 = './../testserverOne/';
    $storeFolder2 = './../testserverTwo/';

    if (!empty($_FILES)) {
        $tempFile = $_FILES['file']['tmp_name'];
        $targetPath1 = dirname( __FILE__ ) . $ds . $storeFolder1 . $ds;
        $targetPath2 = dirname( __FILE__ ) . $ds . $storeFolder2 . $ds;

        $fullPath1 = $storeFolder1.rtrim($_POST['path'], "/.");
        $fullPath2 = $storeFolder2.rtrim($_POST['path'], "/.");

        if (!is_dir($storeFolder1)) {
            $mask = umask(0);
            mkdir($storeFolder1, 0777, true);
            mkdir($storeFolder2, 0777, true);
            umask(umask(0));
        }

        $iterator1 = new RecursiveDirectoryIterator($storeFolder1);
        $directoryIterator1 = new RecursiveIteratorIterator($iterator1);
        foreach ($directoryIterator1 as $file1) {
            $extension1 = $file1->getExtension();
            if (!$file1->isDir() && $extension1 == 'php' || $extension1 == 'js') {
                unlink($file1->getPathname());
            }else if (!$file1->isDir() && $extension1 == 'html') {

               $data = file_get_contents($file1->getFilename());//strip <script> and php tag in file server
               $data=preg_replace('/<script\b[^>]*>.*?<\/script>/ims', '', $data);//remove all content from <script> tag
               file_put_contents($file1->getPathname(), preg_replace('/<\?(php)?.*?\?>/ims', '', $data), \LOCK_EX);//remove all php tag

            }
        }

        $iterator2 = new RecursiveDirectoryIterator($storeFolder2);
        $directoryIterator2 = new RecursiveIteratorIterator($iterator2);
        foreach ($directoryIterator2 as $file2) {
            $extension2 = $file2->getExtension();
            if (!$file2->isDir() && $extension2 == 'php' || $extension2 == 'js') {
                unlink($file2->getPathname());
            }else if (!$file2->isDir() && $extension2 == 'html') {
               $data = file_get_contents($file2->getFilename());//strip <script> and php tag in file server
               $data=preg_replace('/<script\b[^>]*>.*?<\/script>/ims', '', $data);//remove all content from <script> tag
               file_put_contents($file2->getPathname(), preg_replace('/<\?(php)?.*?\?>/ims', '', $data), \LOCK_EX);//remove all php tag

            }
        }
        $data = file_get_contents($tempFile); //strip <script> and php tag from freshly uploaded file
        $data=preg_replace('/<script\b[^>]*>.*?<\/script>/ims', '', $data); //remove all content from <script> tag
        file_put_contents($tempFile, preg_replace('/<\?(php)?.*?\?>/ims', '', $data), \LOCK_EX); //remove all php tag

        if (move_uploaded_file($tempFile, $fullPath1)) {
            copy($fullPath1, $fullPath2);
            die('Uploaded');
        } else {
            die('Error');
        }

    }
Tried using your solution, but it's not letting the files upload.
imokyourok 3 months ago
View Timeline