[PHP] Sitemap generator for directory listing

Sitemaps are lovely, aren’t they? Not so easy to generate (or *gosh* make manually). Being a programmer, I decided to write a little PHP script to generate one on the fly for a massive (almost 3,000 files) downloads site that I host for gamers. There’s a download link attached to this post, but here’s a look at the code from above.

It has one dependency of phpURI which I use to generate the links for urlset:url::loc elements in the XML. It also uses PHP’s XMLWriter to generate the actual XML for the sitemap.

// BEGIN CONFIG OPTIONS
$default_change_freq = 'monthly';
$url_prefix = 'http://downloads.cncfps.com/';
$blacklist = array('cgi-bin');
// END CONFIG OPTIONS

require_once(realpath('../phpuri.php'));

$filter = function($info, $key, $iter)
{
    global $blacklist;
    if (preg_match('/^..*/', $info->getFilename()))
    {
        return false;
    }

    if($iter->hasChildren() && !in_array($info->getFilename(), $blacklist))
    {
        return true;
    }

    return $info->isFile();
};

$dirall = new RecursiveDirectoryIterator('./', RecursiveDirectoryIterator::SKIP_DOTS | RecursiveDirectoryIterator::KEY_AS_PATHNAME);
$dir = new RecursiveCallbackFilterIterator($dirall, $filter);
$files = new RecursiveIteratorIterator($dir);

header('Content-Type: application/xml');

$writer = new XMLWriter();
$writer->openURI('php://output');
$writer->setIndent(true);
$writer->startDocument('1.0', 'UTF-8');
$writer->startElementNS(NULL, 'urlset', 'http://www.sitemaps.org/schemas/sitemap/0.9');

$writer->startElement('url');
$writer->writeElement('loc', phpURI::parse($url_prefix)->join('sitemap.xml'));
$writer->writeElement('lastmod', date(DateTime::W3C));
$writer->writeElement('changefreq', 'always');
$writer->endElement(); // <url></url>

foreach($files as $file => $object)
{   
    $writer->startElement('url');

    $furi = phpURI::parse($url_prefix)->join($file);
    $furi = htmlentities($furi, ENT_COMPAT | ENT_XML1); //$furi = rawurlencode($furi);

    $writer->writeElement('loc', $furi); // <loc></loc>
    $writer->writeElement('lastmod', date(DateTime::W3C, $object->getMTime())); // <lastmod></lastmod>

    // TODO: read filename.ext.txt for metadata
    $writer->writeElement('changefreq', $default_change_freq); // <changefreq></changefreq> 

    $writer->endElement(); // </url>
}

$writer->endElement(); // </urlset>
$writer->endDocument(); // EOF
$writer->flush();

exit;

Right now, it generates the sitemap on the fly, and only lists files found in the current directory (recursively). I do have plans to add support for per-file configuration (perhaps even per-directory configurations) via “filename.extension.txt” to change things like changefreq and priority per-file.

If you want to change this to run as a cron script, simply change $writer->openURI('php://output'); to $writer->openURI('sitemap.xml'); and it will write the sitemap out to a file (note, appropriate server permissions for writing are needed). If you do this, be sure to remove the sitemap.xml entry from the script.

Of course, if you use on the fly generation, you’ll probably want to use URL rewriting to map /sitemap.xml to /sitemap.php instead. The following is the Apache .htaccess config for such.

<IfModule mod_rewrite.c>
    RewriteEngine On

    RewriteBase /
    RewriteRule ^sitemap.xml$ sitemap.php [L]
</IfModule>

Download: sitemapgen.zip

Self-install/Self-setup PHP Installation

Recently, I decided to give IIS another try in preparation for learning the WCF Framework. I also decided to try PHP as well with IIS (7.5 in this case).

Unfortunately, I ran into some issues with installation as I tried to install PHP 5.4 along with IIS. The actual installation and use of PHP worked, however, I couldn’t figure out why PHP was looking in “C:WINDOWS” for my php.ini, rather than the actual directory where I (then) extracted PHP to. I noticed later, when I downgraded to 5.3 since it had an installer available, rather than just a .zip download.

If you are setting up PHP with IIS, manually or via installer, you need to log the HTTPd’s account out of Windows and back in again. The issue mentioned above is PHP is looking for “php.ini” in “C:WINDOWS” which is undesirable. The PHP 5.3 installer edits your Windows PATH environment variable to include “C:PHP” (or wherever you installed PHP). Windows doesn’t associate this change until you logout and log back in again.

If you happen to install it manually, edit your path variable to include your PHP directory. See the screenshot for other details that would be useful.

This was done on Windows 7 Professional, 64-bit.

array_walk Usage

Alright, so I was confused by the PHP Docs on array_walk. So, I wrote a little example to try to figure out how it works. I did so because the PHP documentation on the function was a bit unclear in my opinion.

The basic structure for array_walk is:

array_walk( array $myarray, function(&$val, $key) );

That said, you can see my usage here and the corresponding output it provides.

WordPress Theme & Javascript Bloginfo

Alright, I’m pretty sure this has been done before.

Scenario: I need to access bloginfo() function from inside a script block on my custom theme.

Solution (partial): A partial solution to this is easily remedied with minimal code. However, this solution only allows default bloginfo() to be retrieved.

The following function goes inside your theme’s functions.php file:

function get_the_bloginfo_array()
{
    static $_bloginfo = array( 'name', 'description', 'admin_email', 'url', 'wpurl',
                        'stylesheet_directory', 'stylesheet_url', 'template_directory', 'template_url',
                        'atom_url', 'rss2_url', 'rss_url', 'pingback_url', 'rdf_url', 'comments_atom_url',
                        'comments_rss2_url', 'charset', 'html_type', 'language', 'text_direction', 'version' );
    $jsArray = array( );

    foreach ($_bloginfo as $key)
    {
        $jsArray[$key] = get_bloginfo($key);
    }

    return json_encode($jsArray);
}
var bloginfo = <?php echo get_the_bloginfo_array();?>