dljfield.com

« Home

8th April 2019

Generating PDFs with wkhtmltopdf in Laravel

I don’t think there is a project I’ve worked on that hasn’t at some point required the production of PDFs using data in the system, from a simple payment receipt to big reports stuffed full of graphs.

A lot of options exist for actually rendering the PDF, but perhaps one of the most common is wkhtmltopdf, which is an open source PDF renderer. It has its limitations, but in my experience comes as a nice medium between something like dompdf and Prince.

Probably The Wrong Way

Early attempts at using wkhtmltopdf from Laravel were managed by supplying URLs for it to call, and those URLs would be backed by simple controllers that rendered the HTML we wanted turned into a PDF.

Because of how wkhtmltopdf handles headers, footers, and cover pages, this ends up as four separate routes: one for the main content and then one each for the special pages.

These routes were all POST endpoints, as the data to be included had to be supplied as part of the request. This was handled by looping over the data and adding it to the command that was being built up as --post options.

foreach ($data as $key => $value) {
    $command[] = implode(' ', ['--post', escapeshellarg($key), escapeshellarg(json_encode($value))]);
}

Another thing worth considering is how accessible these routes are. Various options exist for locking them down, but the step taken in this early implementation was to encrypt the data before adding it to the command.

$data = [
    'data' => Crypt::encrypt([
        'some_data' => $generatedData,
        // signed for 10 seconds
        'timestamp' => Carbon::now()->addSeconds(10)->timestamp,
    ]),
    // static secret
    'secret' => Config::get('pdf.secret'),
];

// decrypt this in the endpoint we've called

// now we can call the command-building loop

Additionally, there are a few other aspects of this implementation that aren’t so great, though would still be entirely fixable even keeping the same basic approach. The main one is that it is not well set up to handle multiple types of document. Firstly, each document requires its own set of routes as they aren’t able to take any additional configuration. This is combined with the settings for the PDF itself being hardcoded, so it always produces a portrait file with 0cm margins, relying entirely on CSS for them.

This approach has worked, but seems unnecessarily convoluted. It requires several HTTP requests to your own application, encryption or some other security configuration, and throwing way too much data at the command line in the background.

A Slightly Better Way

The first thing I decided on was that the document needed to be set up as some kind of configuration, to make it easy to have multiple types and at least easier to add additional ones. It wasn’t necessary here to support dynamic, user-creatable templates, so the approach to this was to have regular Laravel configuration files, and for the HTML generation to be handled by regular Laravel Blade templates.

The configuration ended up looking something like this:

return [
    'document_types' => [
        'first_type' => [
            'template' => 'first_type.template',
            'header' => 'first_type.header',
            'footer' => 'first_type.footer',
            'page_height' => 10,
            'page_width' => 15,
            'margins' => [
                'top' => 0.5,
                'right' => 0.5,
                'bottom' => 0.5,
                'left' => 0.5
            ]
        ],

        'second_type' => [
            'template' => 'second_type.template',
            'page_size' => 'a4',
            'margins' => [
                'top' => 2,
                'right' => 2,
                'bottom' => 2,
                'left' => 2
            ]
        ]
    ]
];

Each top-level key in the configuration array represents a document type that can be generated. The main content of the document is the template key, which represents a Laravel view path. After that everything is optional. The only real limitation on the configuration is that if you are setting the page size, you either supply it as the page_size key (which takes sizes like A4, Letter, etc), or as a pair of keys called page_height and page_width, which are in centimetres.

Having got the configuration, there are two stages to producing the PDF: generating the HTML and then generating the PDF itself.

This is the point at which the main divergence with the previous approach comes in. Rather than pass URLs for wkhtmltopdf to call that will render the HTML, I instead render the HTML first, export it to a file and get wkhtmltopdf to render that.

The first step—generating the HTML—is very easy with Laravel:

protected function generateTemplateHtml(string $template)
{
    return view('pdf.templates.' . $template, $this->data)->render();
}

This is called by a simple method on a class that is given the document configuration being used and the data being applied to it:

public function generate(): string
{
    $templates = [
        'main' => $this->generateTemplateHtml($this->configuration['template']),
    ];

    if ($header = $this->configuration['header'] ?? null) {
        $templates['header'] = $this->generateTemplateHtml($header);
    }

    if ($footer = $this->configuration['footer'] ?? null) {
        $templates['footer'] = $this->generateTemplateHtml($footer);
    }

    return $this->renderer->render($templates, $this->configuration);
}

The last line, return $this->renderer->render(…) is a call to the PDF renderer itself.

The render method itself looks like this:

public function render(array $templates, array $documentConfiguration): string
{
    $renderId = str_random(32);

    $this->createTemplateDirectory($renderId);

    $succeeded = $this->runCommand($this->createCommandComponents($templates, $documentConfiguration, $renderId));

    $this->cleanUpTemplateDirectory();

    if (!$succeeded) {
        throw new PdfGenerationFailed('Failed to generate a PDF using wkhtmltopdf.');
    }

    return $this->generatePath($renderId);
}

It gets given the pre-rendered HTML as the $templates parameter, and the $documentConfiguration parameter is, of course, the configuration for the PDF.

The $renderId is generated by a Laravel helper, which in this case produces a 32 character string. The point of this is to have a random ID to name temporary directories after, and to name the PDF itself. If a lot of PDFs are generating at once on the same filesystem it may be better to use something like a UUID, but in this case it was unnecessary to do so.

This is followed by createTemplateDirectory, which sets up a temporary place to output the HTML that was passed in. It also sets the currentTemplatePath on the renderer, so it knows where to look for the HTML when it is generating the command that will be called.

createTemplateDirectory is then paired with cleanUpTemplateDirectory, which just removes the template directory once we’re done with it.

protected function createTemplateDirectory(string $renderId): void
{
    $this->currentTemplatePath = "/templates/{$renderId}/";

    Storage::disk('pdf')->makeDirectory($this->currentTemplatePath);
}

protected function cleanUpTemplateDirectory(): void
{
    Storage::disk('pdf')->deleteDirectory($this->currentTemplatePath);

    $this->currentTemplatePath = null;
}

The line in between does a lot:

$succeeded = $this->runCommand($this->createCommandComponents($templates, $documentConfiguration, $renderId));

To start with, runCommand looks like this:

protected function runCommand(array $commandComponents): bool
{
    $commandArray = array_merge([config('pdf.binary')], $commandComponents);

    $commandOutput = [];

    exec(escapeshellcmd(implode(' ', $commandArray)), $commandOutput);

    return $this->commandSucceeded($commandOutput);
}

The $commandComponents is an array of all the options that are going to be applied when calling wkhtmltopdf. This is merged with the location of the wkhtmltopdf binary set in the app configuration, and then implode’d into a string.

The commandSucceeded method is extremely simple. It checks the output from wkthmltopdf for 'Exit with code 1', as this is the best way that I’ve found to determine that wkhtmltopdf itself has failed. If it has failed, I log it for the sake of debugging later.

The bulk of the real work that is done in this class is performed by createCommandComponents, which is called and passed in inline to runCommand:

protected function createCommandComponents(array $templates, array $documentConfiguration, string $renderId): array
{
    $commandComponents = [];

    $commandComponents = array_merge($commandComponents, $this->processMargins($documentConfiguration));
    $commandComponents = array_merge($commandComponents, $this->processOrientation($documentConfiguration));
    $commandComponents = array_merge($commandComponents, $this->processSize($documentConfiguration));
    $commandComponents = array_merge($commandComponents, $this->processHeader(array_get($templates, 'header')));
    $commandComponents = array_merge($commandComponents, $this->processFooter(array_get($templates, 'footer')));
    $commandComponents = array_merge($commandComponents, $this->processContent($templates['main']));
    $commandComponents = array_merge($commandComponents, $this->processOutputFile($renderId));

    return $commandComponents;
}

This method builds up the command that is going to be run as an array. The first three process methods are for handling the configuration. This implementation only needed to handle configuration for margins, orientation, and size, but it would be trivial to add in other configuration the same way by slotting it into the correct place.

Each method is responsible for checking the configuration that is passed in to see whether it’s actually asking to be set. If nothing is being set it just returns an empty array.

The simplest, orientation, looks like this:

protected function processOrientation(array $documentConfiguration): array
{
    $orientation = $documentConfiguration['orientation'] ?? null;

    if (!$orientation) {
        return [];
    }

    return ["-O {$orientation}"];
}

The other two operate in the same way, but with some extra checks for their own logic. The margin configuration is itself an array, so each individual margin is checked and added by itself. Size can be either a single option or two, so there are some checks to see what we’re working with there.

The next three methods are responsible for setting where wkthmltopdf is looking for the HTML files. This is done for headers, footers, and the main content. This implementation didn’t require cover pages, but the concept would be exactly the same.

The processContent method looks like this;

protected function processContent(string $mainContent): array
{
    $mainPath = $this->currentTemplatePath . 'main.html';

    Storage::disk('pdf')->put($mainPath, $mainContent);

    $commandPath = config('filesystems.disks.pdf.root') . $mainPath;

    return ["file://{$commandPath}"];
}

It puts the HTML file in the temporary templates directory and then returns the location to be passed to wkthmltopdf, making sure to prepend it with file://, as wkhtmltopdf won’t be able to understand what it has been given otherwise.

The header and footer methods operate exactly the same way, but also have --header/footer-html prepended as they are options.

The final step is to set up where wkthmltopdf is going to output the PDF it generates. This is done with processOutputFile and that looks like this:

protected function processOutputFile(string $renderId): array
{
    return [$this->generatePath($renderId)];
}

protected function generatePath(string $renderId): string
{
    return storage_path(config('pdf.storage_path')) . '/' . $renderId . '.pdf';
}

Once it has been confirmed that the PDF is generated, generatePath is called again and returned from render, just because it is convenient to do so. It could also just store the path it has already generated on the class and have that be returned.

Once the PDF has been generated, what happens to it depends on what is needed. The main thing is to make sure that the PDF is cleaned up once it is done with.

For example, in Laravel, you can clean up a file that has been sent to the browser like so:

return response()
    ->download($pathToFile)
    ->deleteFileAfterSend();

Moving On

This setup is obviously reliant on Laravel, using several helpers from the framework to make life easier. As such, it isn’t really capable of being used as a generic library at the moment. However, I think it wouldn’t be a tremendous amount of work to fill out the available options, replace the helpers and set it up as a package.

Some might also say that this could be made a great deal more object-oriented. They would be correct to say that it could be, but to me it isn’t worth doing. I find it easier to understand what is going on by having everything in the same place, and it remains nicely self-contained.

Another thing to consider with this approach is whether using the local filesystem is viable. Some environments may make it difficult, or unadvised, to do. Most likely, in this sort of situation, the thing to do would be to store the templates and output on a central file store, but it would be necessary to make sure that the URI for it is something that wkhtmltopdf is able to understand, which means something accessible as file:// or http://.

For me, though, this solution has worked really well, and has been a great deal easier to debug and work with than the old system.