SOLARISE
DEV

PHP How-To: Convert HTML to PDF, Word & Excel Documents

Effortlessly transform web content into professional documents

Originally published: December 22nd, 2017. Updated on: April 29th, 2025.

PHP is a really powerful language, capable of much more than just generating web pages. One handy capability is converting HTML content into various document formats like PDF, Microsoft Word (.docx), and Microsoft Excel (.xlsx/.xls). It's quite useful, actually.

This can be great for:

  • Generating downloadable PDF invoices or reports.
  • Allowing users to download order details as a Word document.
  • Exporting HTML table data into an Excel spreadsheet.

Let's look at how you can achieve this using some popular free PHP libraries.

(Quick Link: Download Source Code for these examples.)

Prerequisites:

  • Basic familiarity with PHP and setting up a simple web application.
  • Command-line access for using Composer.
  • Composer installed (getcomposer.org).

Composer

We'll use Composer, the standard PHP package manager, to install the necessary libraries. Composer handles dependencies automatically, saving you the headache of manual library management. If you haven't used it before, check out the installation guide on their website.

Convert HTML To PDF using Dompdf

Dompdf is a popular library for converting HTML and CSS into PDF documents directly within PHP.

  1. Install Dompdf: From your project's root directory, run:

    composer require dompdf/dompdf
    
  2. PHP Code: This example reads HTML from a file (sample.html) and streams the generated PDF to the browser for download.

    <?php
    // Ensure Composer's autoloader is included
    require_once 'vendor/autoload.php';
    
    // Reference the Dompdf namespace
    use Dompdf\Dompdf;
    use Dompdf\Options;
    
    // Instantiate Dompdf with options
    $options = new Options();
    $options->set('isHtml5ParserEnabled', true); // Enable HTML5 parser
    $options->set('isRemoteEnabled', true); // Allow loading remote images/CSS (use with caution!)
    $dompdf = new Dompdf($options);
    
    // Load HTML content (example: from a file)
    // Ensure 'sample.html' exists in the same directory or provide correct path
    $html_content = file_get_contents('sample.html');
    if ($html_content === false) {
        die("Error: Could not read sample.html");
    }
    
    $dompdf->loadHtml($html_content);
    
    // (Optional) Set Paper Size and Orientation
    $dompdf->setPaper('A4', 'portrait'); // or 'landscape'
    
    // Render the HTML as PDF
    $dompdf->render();
    
    // Output the generated PDF to Browser
    // Giving it a filename 'document.pdf'
    $dompdf->stream("document.pdf", ["Attachment" => 0]); // Stream to browser, 0 = preview, 1 = download
    
    ?>
    

    (Self-correction: Added Composer autoload check, basic error handling for file_get_contents, included Options class for better configuration, added remote resource option note, specified paper size, and added download/preview option to stream.)

    Make sure you have a sample.html file in the same directory containing valid HTML. Dompdf will attempt to render it, including basic CSS.

    (Image placeholder: Generated PDF from DomPDF output)

    Try it yourself: (Link placeholder: Generate & Download PDF)

    You can also fetch HTML from a URL using file_get_contents (if server configuration allows allow_url_fopen) and convert entire web pages, though complex layouts and JavaScript interactions won't be perfectly replicated.

Convert HTML To Word (.docx) using PHPWord

PHPWord is a library for working with Microsoft Word documents. Converting complex HTML directly with full styling preservation is tricky with PHPWord. A common approach is to use a Word template (.docx) with placeholders and replace those placeholders with content extracted from HTML.

  1. Install PHPWord:

    composer require phpoffice/phpword
    
  2. Create a Template (template.docx): Create a simple Word document named template.docx. Inside it, use placeholders like ${title}, ${author}, ${content} where you want to insert data. You can style the template as needed in Word.

  3. PHP Code: This code extracts specific elements from sample.html (using their IDs) and inserts them into template.docx.

    <?php
    require_once 'vendor/autoload.php';
    
    // Use PHPWord classes
    use PhpOffice\PhpWord\TemplateProcessor;
    use PhpOffice\PhpWord\IOFactory;
    
    // --- Step 1: Extract Content from HTML ---
    $html_content = file_get_contents('sample.html');
    if ($html_content === false) {
        die("Error: Could not read sample.html");
    }
    
    // Use DOMDocument to parse HTML
    $dom = new DOMDocument();
    // Suppress warnings for potentially invalid HTML
    @$dom->loadHTML($html_content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    
    // Extract content using element IDs (ensure these IDs exist in sample.html)
    $titleElement = $dom->getElementById('title');
    $contentElement = $dom->getElementById('content');
    
    $documentTitle = $titleElement ? $titleElement->nodeValue : 'Default Title';
    // Attempt to get inner HTML for content (preserves basic tags like <p>, <b>)
    $documentContentHTML = '';
    if ($contentElement) {
        foreach ($contentElement->childNodes as $child) {
            $documentContentHTML .= $dom->saveHTML($child);
        }
    } else {
        $documentContentHTML = '<p>Default content.</p>';
    }
    
    
    // --- Step 2: Process Word Template ---
    try {
        // Ensure the template file exists
        $templateFile = 'template.docx';
        if (!file_exists($templateFile)) {
             die("Error: Template file '{$templateFile}' not found.");
        }
    
        $templateProcessor = new TemplateProcessor($templateFile);
    
        // Replace placeholders in the template
        $templateProcessor->setValue('title', htmlspecialchars($documentTitle)); // Basic text replacement
        $templateProcessor->setValue('author', 'Robin Metcalfe'); // Example static value
    
        // Replace content placeholder - NOTE: This basic setValue won't render HTML tags well.
        // For richer content, PHPWord offers methods to add formatted text, paragraphs, etc.
        // A simple approach might be to strip tags for plain text:
        // $templateProcessor->setValue('content', strip_tags($documentContentHTML));
    
        // Or, more advanced: Use Html::addHtml() to insert HTML (requires careful handling)
        // This replaces the *entire* placeholder with the HTML block
        // $section = $templateProcessor->getMainDocumentPart(); // Get the main part
        // $section->setValue('content', ''); // Clear placeholder first if needed
        // \PhpOffice\PhpWord\Shared\Html::addHtml($section, $documentContentHTML, false, false);
        // For simplicity here, let's just use plain text:
         $templateProcessor->setValue('content', strip_tags(str_replace('<p>', "\n\n", $documentContentHTML))); // Basic conversion
    
    
        // --- Step 3: Output the Document ---
        $outputFilename = 'generated_document.docx';
    
        // Set headers for download
        header("Content-Description: File Transfer");
        header('Content-Disposition: attachment; filename="' . $outputFilename . '"');
        // Correct Content-Type for .docx
        header('Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document');
        header('Content-Transfer-Encoding: binary');
        header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
        header('Expires: 0');
    
        // Save the processed template to the output stream
        $templateProcessor->saveAs('php://output');
        exit;
    
    } catch (Exception $e) {
        die("Error processing template: " . $e->getMessage());
    }
    
    ?>
    

    (Self-correction: Added basic HTML parsing with DOMDocument, placeholder replacement logic, correct headers for .docx, basic error handling, and noted limitations of simple setValue for HTML content, suggesting Html::addHtml or strip_tags.)

    This approach works well when you have a predefined document structure. Directly converting arbitrary HTML with complex CSS to a perfectly matching Word document is much harder.

    (Image placeholder: Generated Word document from PHPWord output)

    Try it yourself: (Link placeholder: Generate & Download HTML » Word)

    For more complex scenarios, explore the PHPWord documentation and examples (often found in the vendor/phpoffice/phpword/samples directory after installation).

Convert HTML To Excel using PhpSpreadsheet

(Note: The original article used PHPExcel, which is deprecated. PhpSpreadsheet is its direct successor and should be used for new projects.)

PhpSpreadsheet is the leading library for reading and writing spreadsheet files (Excel .xlsx, .xls, CSV, etc.) with PHP. You can populate spreadsheets directly from PHP data or extract data from HTML tables.

  1. Install PhpSpreadsheet:

    composer require phpoffice/phpspreadsheet
    
  2. Example 1: Simple Spreadsheet from PHP Data

    <?php
    require_once 'vendor/autoload.php';
    
    use PhpOffice\PhpSpreadsheet\Spreadsheet;
    use PhpOffice\PhpSpreadsheet\Writer\Xlsx; // Use Xlsx for modern Excel format
    
    // --- Step 1: Setup Spreadsheet ---
    $spreadsheet = new Spreadsheet();
    
    // Set document properties (optional)
    $spreadsheet->getProperties()
        ->setCreator("Robin Metcalfe")
        ->setLastModifiedBy("Robin Metcalfe")
        ->setTitle("Simple Excel Test")
        ->setSubject("PHP Spreadsheet Example")
        ->setDescription("Test document generated using PhpSpreadsheet.")
        ->setKeywords("office phpspreadsheet php")
        ->setCategory("Test file");
    
    // Get the active sheet
    $sheet = $spreadsheet->getActiveSheet();
    $sheet->setTitle('Simple Data'); // Set sheet title
    
    // --- Step 2: Setting Cell Values ---
    $sheet->setCellValue('A1', 'Column A');
    $sheet->setCellValue('B1', 'Column B');
    $sheet->setCellValue('A2', 1); // Numbers
    $sheet->setCellValue('B2', 2);
    $sheet->setCellValue('A3', 3);
    $sheet->setCellValue('B3', 4);
    $sheet->setCellValue('A4', 'Total:');
    $sheet->setCellValue('B4', '=SUM(B2:B3)'); // Formula example
    
    // --- Step 3: Output ---
    $outputFilename = "simple_excel.xlsx"; // Use .xlsx extension
    
    header('Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'); // Correct MIME type for .xlsx
    header('Content-Disposition: attachment;filename="' . $outputFilename . '"');
    header('Cache-Control: max-age=0');
    
    // Write the spreadsheet to output
    $writer = new Xlsx($spreadsheet);
    $writer->save('php://output');
    exit;
    
    ?>
    

    (Self-correction: Updated to use PhpSpreadsheet instead of deprecated PHPExcel. Using Xlsx writer and correct MIME type for modern Excel files. Added sheet title and formula example.)

    Try it yourself: (Link placeholder: Generate & Download Simple Excel File)

  3. Example 2: Extracting HTML Tables to Excel Sheets

    This more complex example parses sample.html, finds all <table> elements, and creates a separate Excel sheet for each table.

    <?php
    require_once 'vendor/autoload.php';
    
    use PhpOffice\PhpSpreadsheet\Spreadsheet;
    use PhpOffice\PhpSpreadsheet\Writer\Xlsx;
    use PhpOffice\PhpSpreadsheet\Cell\Coordinate; // Helper for cell coordinates
    
    // --- Step 1: Extract Table Data from HTML ---
    $html_content = file_get_contents('sample.html');
    if ($html_content === false) {
        die("Error: Could not read sample.html");
    }
    
    $dom = new DOMDocument();
    @$dom->loadHTML($html_content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    
    $tables = $dom->getElementsByTagName('table');
    $allTableData = [];
    
    foreach ($tables as $tableIndex => $table) {
        $tableData = ['headings' => [], 'rows' => []];
        $head = $table->getElementsByTagName('thead');
        $body = $table->getElementsByTagName('tbody');
        $rows = $table->getElementsByTagName('tr'); // Fallback if no thead/tbody
    
        // Get Headings (from thead or first tr if no thead)
        if ($head->length > 0) {
            $headerRow = $head->item(0)->getElementsByTagName('tr');
            if ($headerRow->length > 0) {
                 $headerCells = $headerRow->item(0)->getElementsByTagName('th');
                 if ($headerCells->length === 0) { // Maybe uses <td> in header?
                      $headerCells = $headerRow->item(0)->getElementsByTagName('td');
                 }
                 foreach ($headerCells as $cell) {
                     $tableData['headings'][] = trim($cell->nodeValue);
                 }
            }
        } elseif ($rows->length > 0) { // Try first row as header
             $headerCells = $rows->item(0)->getElementsByTagName('td');
              if ($headerCells->length === 0) {
                   $headerCells = $rows->item(0)->getElementsByTagName('th');
              }
             foreach ($headerCells as $cell) {
                 $tableData['headings'][] = trim($cell->nodeValue);
             }
        }
    
        // Get Body Rows (from tbody or remaining tr if no tbody)
        $startRowIndex = ($head->length > 0 || count($tableData['headings']) > 0) ? 1 : 0; // Skip header row if found
        if ($body->length > 0) {
            $bodyRows = $body->item(0)->getElementsByTagName('tr');
             foreach ($bodyRows as $rowIndex => $row) {
                 $rowData = [];
                 foreach ($row->getElementsByTagName('td') as $cell) {
                     $rowData[] = trim($cell->nodeValue);
                 }
                 $tableData['rows'][] = $rowData;
             }
        } else { // Process all rows after potential header
            for ($i = $startRowIndex; $i < $rows->length; $i++) {
                 $row = $rows->item($i);
                 $rowData = [];
                 foreach ($row->getElementsByTagName('td') as $cell) {
                     $rowData[] = trim($cell->nodeValue);
                 }
                  $tableData['rows'][] = $rowData;
            }
        }
    
        if (!empty($tableData['headings']) || !empty($tableData['rows'])) {
            // Try to find a preceding heading for the sheet title
            $sheetTitle = "Table " . ($tableIndex + 1);
            $prev = $table->previousSibling;
            while($prev && $prev->nodeType !== XML_ELEMENT_NODE) { $prev = $prev->previousSibling; } // Find previous element
            if($prev && in_array(strtolower($prev->nodeName), ['h1','h2','h3','h4','h5','h6'])) {
                $sheetTitle = trim($prev->nodeValue);
            }
             $allTableData[] = ['title' => $sheetTitle, 'data' => $tableData];
        }
    } // End foreach table
    
    if (empty($allTableData)) {
        die("No valid table data found in sample.html");
    }
    
    // --- Step 2: Create Spreadsheet ---
    $spreadsheet = new Spreadsheet();
    $spreadsheet->removeSheetByIndex(0); // Remove default sheet
    
    foreach ($allTableData as $sheetIndex => $tableInfo) {
        // Create a new sheet
        $sheet = $spreadsheet->createSheet($sheetIndex);
        $sheet->setTitle(substr($tableInfo['title'], 0, 31)); // Max 31 chars for sheet title
    
        $colIndex = 1; // A=1, B=2, ...
        foreach ($tableInfo['data']['headings'] as $heading) {
            $sheet->setCellValueByColumnAndRow($colIndex++, 1, $heading);
        }
    
        $rowIndex = 2; // Start data on row 2
        foreach ($tableInfo['data']['rows'] as $rowData) {
            $colIndex = 1;
            foreach ($rowData as $cellValue) {
                $sheet->setCellValueByColumnAndRow($colIndex++, $rowIndex, $cellValue);
            }
            $rowIndex++;
        }
    
        // Auto-size columns (optional)
        foreach (range(1, count($tableInfo['data']['headings'])) as $col) {
             $sheet->getColumnDimension(Coordinate::stringFromColumnIndex($col))->setAutoSize(true);
        }
    }
    
    // Set first sheet active
    $spreadsheet->setActiveSheetIndex(0);
    
    // --- Step 3: Output ---
    $outputFilename = "html_tables_to_excel.xlsx";
    
    header('Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet');
    header('Content-Disposition: attachment;filename="' . $outputFilename . '"');
    header('Cache-Control: max-age=0');
    
    $writer = new Xlsx($spreadsheet);
    $writer->save('php://output');
    exit;
    
    ?>
    

    (Self-correction: Updated to PhpSpreadsheet. Improved HTML table parsing logic to handle tables without explicit <thead> or <tbody>. Added logic to try and find a preceding heading element for sheet titles. Used setCellValueByColumnAndRow for clarity. Added column auto-sizing.)

    This script extracts tables from sample.html and puts each into a separate sheet in the generated Excel file. Pretty useful for data export tasks.

    (Image placeholder: Generated Excel file from PhpSpreadsheet output)

    Try it yourself: (Link placeholder: Generate & Download HTML » Excel)

Download Source Code {#download-code}

Get a copy of all files used in this article (including a basic sample.html).

(Link placeholder: Download Source Code .zip)

Once downloaded, navigate into the directory via command line and run composer install to set up the dependencies before running the PHP scripts.

Robin Metcalfe

About the Author: Robin Metcalfe

Robin is a freelance web strategist and developer based in Edinburgh, with over 15 years of experience helping businesses build effective and engaging online platforms using technologies like Laravel and WordPress.

Get in Touch