Web Development by Solarise

The Solarise.dev Blog

Convert HTML To Excel/Word/PDF – Check Out These Free PHP Scripts

PHP is a really powerful programming language. You can use it to convert plain HTML into a range of different filetypes like Excel, Word and PDF.

Blog archive Robin Metcalfe 22nd December 2017

Use the power of PHP to transform your documents

Take a look through these quick tutorials and I’ll explain how you can transform HTML code into Excel, Word or PDF documents using PHP. All code from the tutorial is available to download at the end of the article.

HTML has gone from being a simple way to markup basic documentation to a fully-fledged interactive publishing medium. But it’s still useful to be able to convert HTML to Excel and other formats, especially for systems that involve a lot of data and require exportable reports.

For example, it’s often useful to be able to download or email PDF summaries of reports or invoices to clients. Or to offer the ability for customers on an ecommerce site to download their order details as a Word document.

To get started, I’m going to assume you’re already familiar with PHP and setting up a basic web app.

Also, I’m assuming you’ve got some basic familiarity with the command line, as I’m going to be making use of the very fantastic PHP package manager, composer

Composer

Composer is, in a nutshell, a way of easily installing PHP code into your application without the headache of manually including external libraries with all their dependencies.

For a quick intro and installation guide for Composer, click here. That’ll get you up and running.

Convert HTML To PDF

To install the library I’m using to convert HTML into PDF, DomPDF, run the following composer command from your project root:

composer require dompdf/dompdf

That’ll install all the required libraries I need to run DomPDF from within my simple little PHP app.

All I’ll be doing here is reading in the contents of my external HTML file, sample.html placed within the same directory as my project files.

Then, using DOMPDF‘s own internal functionality, stream the generated file to the user’s browser, ready for downloading.

Here’s the code:

<?php
require_once('vendor/autoload.php');

// reference the Dompdf namespace
use DompdfDompdf;

$dompdf = new Dompdf();
// Enable the HTML5 parser to tolerate poorly formed HTML
$dompdf->set_option('isHtml5ParserEnabled', true);

// Load into DomPDF from the external HTML file
$content = file_get_contents('sample.html');

$dompdf->loadHtml($content);

// Render and download
$dompdf->render();
$dompdf->stream();

And the output, a downloadable PDF.

Generated PDF from DomPDF output
The result of running DomPDF on a chunk of HTML

Try it yourself:

Generate & Download PDF

You can also generate PDF documents from whole web pages – as long as you can grab a source of HTML, for example by using the PHP function file_get_contents, you can convert any accessible web page into PDF.

Convert HTML To Word

Although it’s a more archaic and less widely supported format, Microsoft Word documents remain a popular choice for saving/reading/printing documentation.

For this I’ll be using a different composer package, PHPWord. The approach is somewhat different than for generating PDFs.

First, install the package with composer.

composer require phpoffice/phpword

To get started, what’s happening in the following chunk of code is that I’m grabbing the HTML directly from the file sample.html and placing it within a DOMDocument object.

DOMDocument is a PHP class which allows for manipulation and extraction of data from HTML. Using this, it’s possible to search within HTML documents for specific pieces of data by attributes like id or class, or even by tag name – in much the same way that CSS selectors or Javascript DOM operations work.

Here, I’m getting a hold of the main page title, and the body content using the id attributes set within the HTML. You’ll see why shortly.

require_once('vendor/autoload.php');

$data = file_get_contents('sample.html');
$dom = new DOMDocument();
$dom->loadHTML($data);

// Now, extract the content I want to insert into my docx template

// 1 - The page title
$documentTitle = $dom->getElementById('title')->nodeValue;

// 2 - The article body content
$documentContent = $dom->getElementById('content')->nodeValue;

In the next step, I’m going to make use of an existing Word document to structure and template my generated document.

Now, unlike with DOMPDF, I can’t just take an HTML file and dump it straight into a Word document fully styled using PHPWord. It just doesn’t seem to work like that.

The approach I’m going to take is to use a template Word document, sample.docx and replace the title and content areas within that document with appropriate content from my HTML file (which I grabbed above using the getElementById method)

First, take a look at the file sample.docx in Word. You’ll see that it’s very sparse, with only a few bits of text, ${title}, ${author} and ${content}. These single words, surrounded by brackets and starting with a dollar symbol $ are the placeholders I’m going to use to swap out and replace with my HTML content.

PHPWord will be using this template document to construct my final document using the data I pass it.

Word template

The following lines are responsible for inserting that content into the Word template document.

// Load the template processor
$templateProcessor = new PhpOfficePhpWordTemplateProcessor('template.docx');

// Swap out my variables for the HTML content
$templateProcessor->setValue('author', "Robin Metcalfe");
$templateProcessor->setValue('title', $documentTitle);
$templateProcessor->setValue('content', $documentContent);

Using this approach, you can create a Word template styled as you require, complete with formatting, font styles, spacing etc. – Then you can drop content straight into that template from an HTML file.

This is only a simple example, but you can use more complex methods to copy segments of the template, remove segments and more. Take a look at the PHPWord class definition file to see what methods are available.

Finally, I prepare my headers to download the generated file, and stream the data to the user’s browser.

header("Content-Description: File Transfer");
header('Content-Disposition: attachment; filename="generated.docx"');
header('Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document');
header('Content-Transfer-Encoding: binary');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Expires: 0');
$templateProcessor->saveAs("php://output");
Generated PDF from DomPDF output
The result of running PHPWord on a chunk of HTML.

Note, though, the lack of any paragraph spacing or additional styling. You’d need to apply additional styling rules to the PHPWord object itself in order to more fully control the output of the script.

PHPWord by itself won’t parse any CSS included within the HTML file.

Try it yourself:

Generate & Download HTML » Word

If you’re looking for more examples on how to use PHPWord, I wouldn’t recommend the official documentation, it’s fairly sparse and still needs a lot of work.

Instead, take a look inside the /vendor directory after installing PHPWord using composer, specifically in phpoffice/phpword/samples where you’ll find a load of example files covering a range of use cases.

Convert HTML To Excel

One of the most useful conversions I’ve used before, is to convert HTML to Excel sheets using PHP, sometimes directly from HTML, but also straight from PHP using code.

In one instance, a client wanted to be able to download a spreadsheet of sales and performance metrics directly as an Excel sheet. No such functionality existed within the system, so I wrote some custom code for it using this technique to transform HTML to Excel directly.

Here’s a very quick example of how you can generate a simple spreadsheet using values provided in PHP.

Let’s get started. As before, I’ll install my dependencies using Composer:

composer require phpoffice/phpexcel

Now, for the content of my PHP file. This one is a fairly basic example, and the result is a few cells in a single sheet populated with numbers. Nothing too fancy.

require_once('vendor/autoload.php');

/**
 * Step 1: Setup
 */

$objPHPExcel = new PHPExcel();

$objPHPExcel->getProperties()->setCreator("Robin Metcalfe")
                             ->setLastModifiedBy("Robin Metcalfe")
                             ->setTitle("Excel test")
                             ->setSubject("Solarise Design")
                             ->setDescription("A test document for outputting an Excel file with some basic values.")
                             ->setKeywords("office PHPExcel php")
                             ->setCategory("Test result file");

$sheet = $objPHPExcel->setActiveSheetIndex(0);

/**
 * Step 2: Setting the values
 */

// row 1
$sheet->setCellValue("A1", 'Column A');
$sheet->setCellValue("B1", 'Column B');

// row 2
$sheet->setCellValue("A2", '1');
$sheet->setCellValue("B2", '2');

// row 3
$sheet->setCellValue("A3", '3');
$sheet->setCellValue("B3", '4');

/**
 * Step 3: Output
 */
header('Content-Type: application/vnd.ms-excel');
header('Content-Disposition: attachment;filename="html-to-excel.xls"');
header('Cache-Control: max-age=0');
// If you're serving to IE 9, then the following may be needed
header('Cache-Control: max-age=1');

// If you're serving to IE over SSL, then the following may be needed
header ('Expires: Mon, 26 Jul 1997 05:00:00 GMT'); // Date in the past
header ('Last-Modified: '.gmdate('D, d M Y H:i:s').' GMT'); // always modified
header ('Cache-Control: cache, must-revalidate'); // HTTP/1.1
header ('Pragma: public'); // HTTP/1.0

$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel5');
$objWriter->save('php://output');

Although containing quite a few lines of code, there’s only three significant things happening here:

  • Creation of a new PHPExcel object, and some configuration work to add title, creator etc.
  • Setting the values of the cells within the Excel sheet
  • Output of the generated file to the user’s browser, along with some useful headers

See the result yourself, try generating an Excel file using this script

Generate & Download Simple Excel File

A more complex example

But, to expand on the above, let’s explore how I can take data and transfer it directly from HTML to Excel format.

Here, I’m going to extract all visible tables within the HTML file, then use that data to create an Excel file containing two seperate sheets.

The effect of this will be that the script will locate all table data within a page, and convert the HTML to Excel file, with one sheet per table

Neat, huh?

Generated Excel from PHPExcel output
The result of running PHPExcel on a chunk of HTML

I’m also going to be making use of the PHP class DomDocument to extract the required data from my HTML file as I did before with HTML to Word

In the following chunk of code, I do the following:

  • First, grab the required data from my sample HTML file
  • Then I extract the data I want from the <table> element within the HTML file, looping through the rows contained in <tbody>, and grabbing the column headers from the <thead> element.
  • Next, I loop through the data generated in the previous step, and insert this into my PHPExcel object, which will build up the structure of the Excel file
  • Finally, I output the generated Excel file to the user’s browser.

PHPExcel offers a range of additional methods and options to control styling, formatting and much more. Take a look through the class documentation and the samples within the vendor/phpoffice/phpexcel/Examples directory to find out more.

Generate & Download HTML » Excel

Here’s the code in full:

require_once('vendor/autoload.php');

// Pull in the HTML contents, and convert to XML
$data = file_get_contents('sample.html');

try {

    $dom = new DOMDocument();
    $dom->loadHTML( $data );

    // Get all tables in the document
    $tables = $dom->getElementsByTagName('table');

    // The array I'll store the table data in
    $tableData = array();

    foreach($tables as $tableN => $table) {
        // This requires properly formatted HTML table structure
        $head = $table->getElementsByTagName('thead')[0];
        $body = $table->getElementsByTagName('tbody')[0];

        // Table heading - assuming there is a heading directly before the table
        $tableData[] = array(
            'heading' => 'Table '.($tableN+1),
            'tableData' => array()
        );

        if($head && $body) {

            foreach($head->getElementsByTagName('tr')[0]->getElementsByTagName('th') as $colN => $headCell) {
                $tableData[$tableN]['tableData']['headings'][] = $headCell->nodeValue;
            }

            foreach($body->getElementsByTagName('tr') as $rowN => $tableRow) {
                foreach($tableRow->getElementsByTagName('td') as $colN => $tableCell) {
                    $tableData[$tableN]['tableData']['rows'][$rowN][$colN] = $tableCell->nodeValue;
                }
            }

        }

    }

} catch(Exception $e) {
    // I failed...
    exit;
}

// Instantiate the PHPExcel object
$objPHPExcel = new PHPExcel();

$objPHPExcel->getProperties()->setCreator("Robin Metcalfe")
                             ->setLastModifiedBy("Robin Metcalfe")
                             ->setTitle("HTML Tables To Excel Test")
                             ->setSubject("Solarise Design")
                             ->setDescription("A test document for converting HTML tables into Excel.")
                             ->setKeywords("office PHPExcel php")
                             ->setCategory("Test result file");

$alphabet = range('A', 'Z');

foreach($tableData as $tableN => $data) {

    if($tableN > 0) {
        $objPHPExcel->createSheet($tableN);
    }

    $sheet = $objPHPExcel->setActiveSheetIndex($tableN);
    $objPHPExcel->getActiveSheet()->setTitle($data['heading']);

    foreach($data['tableData']['headings'] as $n => $heading) {
        $sheet->setCellValue("{$alphabet[$n]}1", $heading);
    }

    foreach($data['tableData']['rows'] as $rowN => $rowData) {
        foreach($rowData as $colN => $value) {
            $n = $rowN + 2;
            $sheet->setCellValue("{$alphabet[$colN]}{$n}", $value);
        }       

    }

}

// Resize columns to fit data, just to tidy things up
foreach(range('A','Z') as $columnID) {
    $objPHPExcel
        ->getActiveSheet()
        ->getColumnDimension($columnID)
        ->setAutoSize(true);
}

header('Content-Type: application/vnd.ms-excel');
header('Content-Disposition: attachment;filename="html-to-excel.xls"');
header('Cache-Control: max-age=0');
// If you're serving to IE 9, then the following may be needed
header('Cache-Control: max-age=1');

// If you're serving to IE over SSL, then the following may be needed
header ('Expires: Mon, 26 Jul 1997 05:00:00 GMT'); // Date in the past
header ('Last-Modified: '.gmdate('D, d M Y H:i:s').' GMT'); // always modified
header ('Cache-Control: cache, must-revalidate'); // HTTP/1.1
header ('Pragma: public'); // HTTP/1.0

$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel5');
$objWriter->save('php://output');

Download Source Code

To get a copy of all files used within this article, download them here.

Once downloaded, you’ll need to run composer install to setup all the dependencies.

Author Image

About the author

Robin is the dedicated developer behind Solarise.dev. With years of experience in web development, he's committed to delivering high-quality solutions and ensuring websites run smoothly. Always eager to learn and adapt to the ever-changing digital landscape, Robin believes in the power of collaboration and sharing knowledge. Outside of coding, he enjoys diving into tech news and exploring new tools in the industry.

If you'd like to get in touch to discuss a project, or if you'd just like to ask a question, fill in the form below.

Get in touch

Send me a message and I'll get back to you as soon as possible. Ask me about anything - web development, project proposals or just say hi!

Please enable JavaScript in your browser to complete this form.

Solarise.dev Services

Maintenance Retainers

Min 3 hours per month, no contract

Project Planning

Get in touch to discuss your project details

Consultancy

Face-to-face chats. Solve your problems!