PHP is a really powerful programming language. You can use it to convert plain HTML into a range of different filetypes like Excel, Word and PDF.
HTML has gone from being a simple way to markup basic documentation to a fully-fledged interactive publishing medium. But it’s still useful to be able to convert HTML to Excel and other formats, especially for systems that involve a lot of data and require exportable reports.
For example, it’s often useful to be able to download or email PDF summaries of reports or invoices to clients. Or to offer the ability for customers on an ecommerce site to download their order details as a Word document.
To get started, I’m going to assume you’re already familiar with PHP and setting up a basic web app.
Also, I’m assuming you’ve got some basic familiarity with the command line, as I’m going to be making use of the very fantastic PHP package manager, composer
Composer is, in a nutshell, a way of easily installing PHP code into your application without the headache of manually including external libraries with all their dependencies.
For a quick intro and installation guide for Composer, click here. That’ll get you up and running.
To install the library I’m using to convert HTML into PDF, DomPDF
, run the following composer command from your project root:
composer require dompdf/dompdf
That’ll install all the required libraries I need to run DomPDF
from within my simple little PHP app.
All I’ll be doing here is reading in the contents of my external HTML file, sample.html
placed within the same directory as my project files.
Then, using DOMPDF
‘s own internal functionality, stream the generated file to the user’s browser, ready for downloading.
Here’s the code:
<?php
require_once('vendor/autoload.php');
// reference the Dompdf namespace
use DompdfDompdf;
$dompdf = new Dompdf();
// Enable the HTML5 parser to tolerate poorly formed HTML
$dompdf->set_option('isHtml5ParserEnabled', true);
// Load into DomPDF from the external HTML file
$content = file_get_contents('sample.html');
$dompdf->loadHtml($content);
// Render and download
$dompdf->render();
$dompdf->stream();
And the output, a downloadable PDF.
Try it yourself:
You can also generate PDF documents from whole web pages – as long as you can grab a source of HTML, for example by using the PHP function file_get_contents
, you can convert any accessible web page into PDF.
Although it’s a more archaic and less widely supported format, Microsoft Word documents remain a popular choice for saving/reading/printing documentation.
For this I’ll be using a different composer package, PHPWord
. The approach is somewhat different than for generating PDFs.
First, install the package with composer.
composer require phpoffice/phpword
To get started, what’s happening in the following chunk of code is that I’m grabbing the HTML directly from the file sample.html
and placing it within a DOMDocument
object.
DOMDocument
is a PHP class which allows for manipulation and extraction of data from HTML. Using this, it’s possible to search within HTML documents for specific pieces of data by attributes like id
or class
, or even by tag name – in much the same way that CSS selectors or Javascript DOM operations work.
Here, I’m getting a hold of the main page title, and the body content using the id
attributes set within the HTML. You’ll see why shortly.
require_once('vendor/autoload.php');
$data = file_get_contents('sample.html');
$dom = new DOMDocument();
$dom->loadHTML($data);
// Now, extract the content I want to insert into my docx template
// 1 - The page title
$documentTitle = $dom->getElementById('title')->nodeValue;
// 2 - The article body content
$documentContent = $dom->getElementById('content')->nodeValue;
In the next step, I’m going to make use of an existing Word document to structure and template my generated document.
Now, unlike with DOMPDF, I can’t just take an HTML file and dump it straight into a Word document fully styled using PHPWord
. It just doesn’t seem to work like that.
The approach I’m going to take is to use a template Word document, sample.docx
and replace the title and content areas within that document with appropriate content from my HTML file (which I grabbed above using the getElementById
method)
First, take a look at the file sample.docx
in Word. You’ll see that it’s very sparse, with only a few bits of text, ${title}
, ${author}
and ${content}
. These single words, surrounded by brackets and starting with a dollar symbol $
are the placeholders I’m going to use to swap out and replace with my HTML content.
PHPWord
will be using this template document to construct my final document using the data I pass it.
The following lines are responsible for inserting that content into the Word template document.
// Load the template processor
$templateProcessor = new PhpOfficePhpWordTemplateProcessor('template.docx');
// Swap out my variables for the HTML content
$templateProcessor->setValue('author', "Robin Metcalfe");
$templateProcessor->setValue('title', $documentTitle);
$templateProcessor->setValue('content', $documentContent);
Using this approach, you can create a Word template styled as you require, complete with formatting, font styles, spacing etc. – Then you can drop content straight into that template from an HTML file.
This is only a simple example, but you can use more complex methods to copy segments of the template, remove segments and more. Take a look at the PHPWord
class definition file to see what methods are available.
Finally, I prepare my headers to download the generated file, and stream the data to the user’s browser.
header("Content-Description: File Transfer");
header('Content-Disposition: attachment; filename="generated.docx"');
header('Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document');
header('Content-Transfer-Encoding: binary');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Expires: 0');
$templateProcessor->saveAs("php://output");
Note, though, the lack of any paragraph spacing or additional styling. You’d need to apply additional styling rules to the PHPWord object itself in order to more fully control the output of the script.
PHPWord by itself won’t parse any CSS included within the HTML file.
Try it yourself:
Generate & Download HTML » Word
If you’re looking for more examples on how to use PHPWord
, I wouldn’t recommend the official documentation, it’s fairly sparse and still needs a lot of work.
Instead, take a look inside the /vendor
directory after installing PHPWord
using composer, specifically in phpoffice/phpword/samples
where you’ll find a load of example files covering a range of use cases.
One of the most useful conversions I’ve used before, is to convert HTML to Excel sheets using PHP, sometimes directly from HTML, but also straight from PHP using code.
In one instance, a client wanted to be able to download a spreadsheet of sales and performance metrics directly as an Excel sheet. No such functionality existed within the system, so I wrote some custom code for it using this technique to transform HTML to Excel directly.
Here’s a very quick example of how you can generate a simple spreadsheet using values provided in PHP.
Let’s get started. As before, I’ll install my dependencies using Composer:
composer require phpoffice/phpexcel
Now, for the content of my PHP file. This one is a fairly basic example, and the result is a few cells in a single sheet populated with numbers. Nothing too fancy.
require_once('vendor/autoload.php');
/**
* Step 1: Setup
*/
$objPHPExcel = new PHPExcel();
$objPHPExcel->getProperties()->setCreator("Robin Metcalfe")
->setLastModifiedBy("Robin Metcalfe")
->setTitle("Excel test")
->setSubject("Solarise Design")
->setDescription("A test document for outputting an Excel file with some basic values.")
->setKeywords("office PHPExcel php")
->setCategory("Test result file");
$sheet = $objPHPExcel->setActiveSheetIndex(0);
/**
* Step 2: Setting the values
*/
// row 1
$sheet->setCellValue("A1", 'Column A');
$sheet->setCellValue("B1", 'Column B');
// row 2
$sheet->setCellValue("A2", '1');
$sheet->setCellValue("B2", '2');
// row 3
$sheet->setCellValue("A3", '3');
$sheet->setCellValue("B3", '4');
/**
* Step 3: Output
*/
header('Content-Type: application/vnd.ms-excel');
header('Content-Disposition: attachment;filename="html-to-excel.xls"');
header('Cache-Control: max-age=0');
// If you're serving to IE 9, then the following may be needed
header('Cache-Control: max-age=1');
// If you're serving to IE over SSL, then the following may be needed
header ('Expires: Mon, 26 Jul 1997 05:00:00 GMT'); // Date in the past
header ('Last-Modified: '.gmdate('D, d M Y H:i:s').' GMT'); // always modified
header ('Cache-Control: cache, must-revalidate'); // HTTP/1.1
header ('Pragma: public'); // HTTP/1.0
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel5');
$objWriter->save('php://output');
Although containing quite a few lines of code, there’s only three significant things happening here:
PHPExcel
object, and some configuration work to add title, creator etc.See the result yourself, try generating an Excel file using this script
Generate & Download Simple Excel File
But, to expand on the above, let’s explore how I can take data and transfer it directly from HTML to Excel format.
Here, I’m going to extract all visible tables within the HTML file, then use that data to create an Excel file containing two seperate sheets.
The effect of this will be that the script will locate all table data within a page, and convert the HTML to Excel file, with one sheet per table
Neat, huh?
I’m also going to be making use of the PHP class DomDocument
to extract the required data from my HTML file as I did before with HTML to Word
In the following chunk of code, I do the following:
<table>
element within the HTML file, looping through the rows contained in <tbody>
, and grabbing the column headers from the <thead>
element.PHPExcel
object, which will build up the structure of the Excel filePHPExcel
offers a range of additional methods and options to control styling, formatting and much more. Take a look through the class documentation and the samples within the vendor/phpoffice/phpexcel/Examples
directory to find out more.
Generate & Download HTML » Excel
Here’s the code in full:
require_once('vendor/autoload.php');
// Pull in the HTML contents, and convert to XML
$data = file_get_contents('sample.html');
try {
$dom = new DOMDocument();
$dom->loadHTML( $data );
// Get all tables in the document
$tables = $dom->getElementsByTagName('table');
// The array I'll store the table data in
$tableData = array();
foreach($tables as $tableN => $table) {
// This requires properly formatted HTML table structure
$head = $table->getElementsByTagName('thead')[0];
$body = $table->getElementsByTagName('tbody')[0];
// Table heading - assuming there is a heading directly before the table
$tableData[] = array(
'heading' => 'Table '.($tableN+1),
'tableData' => array()
);
if($head && $body) {
foreach($head->getElementsByTagName('tr')[0]->getElementsByTagName('th') as $colN => $headCell) {
$tableData[$tableN]['tableData']['headings'][] = $headCell->nodeValue;
}
foreach($body->getElementsByTagName('tr') as $rowN => $tableRow) {
foreach($tableRow->getElementsByTagName('td') as $colN => $tableCell) {
$tableData[$tableN]['tableData']['rows'][$rowN][$colN] = $tableCell->nodeValue;
}
}
}
}
} catch(Exception $e) {
// I failed...
exit;
}
// Instantiate the PHPExcel object
$objPHPExcel = new PHPExcel();
$objPHPExcel->getProperties()->setCreator("Robin Metcalfe")
->setLastModifiedBy("Robin Metcalfe")
->setTitle("HTML Tables To Excel Test")
->setSubject("Solarise Design")
->setDescription("A test document for converting HTML tables into Excel.")
->setKeywords("office PHPExcel php")
->setCategory("Test result file");
$alphabet = range('A', 'Z');
foreach($tableData as $tableN => $data) {
if($tableN > 0) {
$objPHPExcel->createSheet($tableN);
}
$sheet = $objPHPExcel->setActiveSheetIndex($tableN);
$objPHPExcel->getActiveSheet()->setTitle($data['heading']);
foreach($data['tableData']['headings'] as $n => $heading) {
$sheet->setCellValue("{$alphabet[$n]}1", $heading);
}
foreach($data['tableData']['rows'] as $rowN => $rowData) {
foreach($rowData as $colN => $value) {
$n = $rowN + 2;
$sheet->setCellValue("{$alphabet[$colN]}{$n}", $value);
}
}
}
// Resize columns to fit data, just to tidy things up
foreach(range('A','Z') as $columnID) {
$objPHPExcel
->getActiveSheet()
->getColumnDimension($columnID)
->setAutoSize(true);
}
header('Content-Type: application/vnd.ms-excel');
header('Content-Disposition: attachment;filename="html-to-excel.xls"');
header('Cache-Control: max-age=0');
// If you're serving to IE 9, then the following may be needed
header('Cache-Control: max-age=1');
// If you're serving to IE over SSL, then the following may be needed
header ('Expires: Mon, 26 Jul 1997 05:00:00 GMT'); // Date in the past
header ('Last-Modified: '.gmdate('D, d M Y H:i:s').' GMT'); // always modified
header ('Cache-Control: cache, must-revalidate'); // HTTP/1.1
header ('Pragma: public'); // HTTP/1.0
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel5');
$objWriter->save('php://output');
To get a copy of all files used within this article, download them here.
Once downloaded, you’ll need to run composer install
to setup all the dependencies.
Robin is the dedicated developer behind Solarise.dev. With years of experience in web development, he's committed to delivering high-quality solutions and ensuring websites run smoothly. Always eager to learn and adapt to the ever-changing digital landscape, Robin believes in the power of collaboration and sharing knowledge. Outside of coding, he enjoys diving into tech news and exploring new tools in the industry.
If you'd like to get in touch to discuss a project, or if you'd just like to ask a question, fill in the form below.
Send me a message and I'll get back to you as soon as possible. Ask me about anything - web development, project proposals or just say hi!