SOLARISE
DEV

PHP is a really powerful language, capable of much more than just generating web pages. One incredibly handy capability is converting HTML content into various document formats like PDF, Microsoft Word (.docx), and Microsoft Excel (.xlsx/.xls). It's quite useful, actually, and can save you or your clients a significant amount of time and effort.

Imagine effortlessly transforming your web data into professionally formatted documents. This can be great for:

  • Generating downloadable PDF invoices, detailed financial reports, or sleek product catalogs directly from your application's data.
  • Allowing users to download order confirmations, event tickets, or personalized certificates as a Word document for offline access or record-keeping.
  • Exporting HTML table data, user lists, or sales figures into an Excel spreadsheet for further analysis, data manipulation, or sharing with teams.
  • Automating the creation of regular compliance documents or internal reports in a standardized format.
  • Providing data exports in user-friendly formats for non-technical stakeholders.

Let's look at how you can achieve this using some popular, free PHP libraries. We'll dive into practical examples and explore the nuances of each method.

(Quick Link: Download Source Code & Example Files for these examples.)

Prerequisites: Getting Started

Before we jump into the code, make sure you have the following:

  • Basic familiarity with PHP and setting up a simple web application (e.g., using XAMPP, MAMP, Laravel Valet, or Docker).
  • Command-line access (Terminal on macOS/Linux, Command Prompt or PowerShell on Windows) for using Composer.
  • Composer installed globally on your system. If not, head over to getcomposer.org for installation instructions. It's an indispensable tool for modern PHP development!

The Magic of Composer

We'll use Composer, the de facto standard PHP package manager, to install the necessary libraries. Composer handles dependencies automatically, resolving version conflicts and saving you the headache of manual library management. If you haven't used it before, you're in for a treat! It simplifies project setup immensely. Just define your dependencies in a composer.json file, run composer install, and Composer takes care of the rest, downloading and organizing everything into a vendor directory and creating an autoloader.


Convert HTML To PDF using Dompdf

Dompdf is a highly popular and mature library for converting HTML and CSS into PDF documents directly within PHP. It's an HTML to PDF converter written in PHP, making it a great choice for integrating PDF generation seamlessly into your applications without external dependencies (like a headless browser).

Key features of Dompdf:

  • Supports most CSS 2.1 and some CSS3 properties, including @font-face.
  • Handles complex layouts, tables, and floats reasonably well.
  • Supports external stylesheets and images (if enabled).
  • Offers options for paper size, orientation, and encryption.

1. Install Dompdf:

From your project's root directory, open your terminal and run this command:

composer require dompdf/dompdf

This will download Dompdf and its dependencies into your vendor directory and update your composer.json and composer.lock files.

2. PHP Code for PDF Generation:

This example reads HTML from a file (sample.html – make sure this file exists in the same directory or provide the correct path) and streams the generated PDF to the browser. You can choose to have it display in the browser or trigger a download.


    

(Self-correction from original: Added Composer autoload check, basic error handling for file_get_contents, included Options class for better configuration and more examples, added remote resource option note, specified paper size, added download/preview option to stream, and added example for saving to server-side file. Emphasized exit;.)

Make sure you have a sample.html file in the same directory containing valid HTML. Dompdf will attempt to render it, including basic CSS. For complex CSS or JavaScript-driven content, Dompdf might struggle, as it's primarily a CSS 2.1 renderer with some CSS3 support.

Try it yourself: Generate & Download PDF

You can also fetch HTML from a URL using file_get_contents (if your server configuration allows allow_url_fopen) and convert entire web pages. However, be mindful that complex layouts, modern CSS (like Flexbox or Grid extensively), and JavaScript interactions won't be perfectly replicated as Dompdf doesn't execute JavaScript.

Styling and CSS with Dompdf:

Dompdf does a decent job with CSS, but it has its limitations:

  • CSS 2.1 is best supported. Some CSS3 features like basic border-radius, box-shadow, and @font-face are supported, but advanced features like Flexbox, Grid, or complex animations are not.
  • Inline styles and <style> blocks within your HTML are generally well-handled.
  • External stylesheets can be linked if isRemoteEnabled is true, but ensure paths are correct and accessible.
  • Floats and tables are the primary methods for layout.
  • For custom fonts, ensure you use @font-face correctly and that Dompdf can access the font files (TTF or OTF). You might need to use the load_font.php utility that comes with Dompdf to install fonts.

For best results, design your HTML/CSS with Dompdf's capabilities in mind, often meaning simpler, more traditional layouts.

Command-Line Execution (PDF)

To run this PHP script from the command line and save the output as a PDF file:

  1. Save the code above as a PHP file (e.g., generate_pdf_cli.php).
  2. Ensure you have a sample.html file in the same directory, or update the $html_file_path variable in the script.
  3. Install Dompdf if you haven't already: composer require dompdf/dompdf
  4. Modify the script for CLI output:
    • Comment out or remove the $dompdf->stream(...) line.
    • Uncomment and adjust the lines for saving to a file:
      $output_pdf_path = __DIR__ . '/output_document.pdf'; // Define output path
            $output = $dompdf->output(); // Get PDF content
            if (file_put_contents($output_pdf_path, $output) === false) {
                error_log("CLI Error: Could not save PDF to " . $output_pdf_path);
                die("Error saving PDF file.");
            }
            echo "PDF saved to: " . $output_pdf_path . "\n";
    • Ensure the exit; line remains if you don't want further CLI output after the message.
  5. Run the script from your terminal: php generate_pdf_cli.php
  6. The PDF will be saved as output_document.pdf in the same directory.

Note: Ensure PHP is correctly configured and in your system's PATH. The isRemoteEnabled option should be used cautiously.


Convert HTML To Word (.docx) using PHPWord

PHPWord is a comprehensive library for working with Microsoft Word documents. It allows you to create new Word documents from scratch, or, more commonly for HTML conversion, to use an existing Word document (.docx) as a template and populate it with data.

Directly converting complex HTML with full styling preservation into a perfect .docx file is a significant challenge for most libraries, including PHPWord. The internal structures of HTML/CSS and Word documents are vastly different. Therefore, the most reliable approach is often to use a Word template with placeholders and replace those placeholders with content extracted (and often simplified) from your HTML.

Key features of PHPWord:

  • Create and manipulate Word documents (DOCX, ODT, RTF).
  • Template processing: Fill placeholders in existing DOCX files.
  • Add text, paragraphs, lists, tables, images, charts, headers, footers, and sections.
  • Rich text formatting capabilities.
  • Limited HTML import functionality (\PhpOffice\PhpWord\Shared\Html::addHtml()).

1. Install PHPWord:

In your project's root directory, run:

composer require phpoffice/phpword

2. Create a Word Template (template.docx):

Create a simple Word document named template.docx in your project directory. Inside this document, use placeholders where you want to insert data. Placeholders are typically in the format ${placeholder_name}.

For example, your template.docx might contain:

Document Title: ${title}

Author: ${author}

Main Content:

${content_block}

Report Date: ${report_date}

You can style this template (fonts, colors, headings) directly in Microsoft Word, and PHPWord will preserve that styling when it populates the placeholders.

3. PHP Code for Word Generation:

This code extracts specific elements from sample.html (using their IDs – ensure these IDs exist in your sample.html) and inserts their content into the template.docx.


    

(Self-correction from original: Added basic HTML parsing with DOMDocument, refined placeholder replacement logic, ensured correct headers for .docx, added more robust error handling with try-catch, and elaborated significantly on the complexities and options for inserting HTML content into PHPWord, particularly the use of setComplexValue with an Html element for better HTML block replacement.)

This template-based approach works very well when you have a predefined document structure. Directly converting arbitrary HTML with complex CSS to a perfectly matching Word document is much harder and often yields imperfect results due to the fundamental differences between web and Word document rendering.

Try it yourself: Generate & Download HTML » Word

For more complex scenarios, such as dynamically adding multiple rows to a table in your Word template or inserting richly formatted text, explore the extensive PHPWord documentation and examples (often found in the vendor/phpoffice/phpword/samples directory after installation via Composer). Methods like cloneRow for tables and various text run formatting options are very powerful.

Tips for PHPWord HTML Conversion:

  • Simplify your HTML: The cleaner and simpler your HTML, the better PHPWord's HTML processing will perform. Avoid complex CSS and JavaScript-dependent content.
  • Use basic tags: PHPWord handles tags like <p>, <b>, <i>, <ul>, <li>, <br>, and basic <table> structures reasonably well.
  • Consider WordML: For very precise control, you might need to convert your HTML to WordML (Word's underlying XML format) and insert that, which is a more advanced topic.
  • Iterative Testing: Always test with various HTML inputs to see how PHPWord renders them.
Command-Line Execution (Word)

To run this PHP script from the command line and save the output as a DOCX file:

  1. Save the code above as a PHP file (e.g., generate_word_cli.php).
  2. Create a sample.html file with elements having IDs like 'main-title' and 'content'.
  3. Create a Word template file named template.docx in the same directory. This template should contain placeholders like ${title}, ${author}, ${report_date}, and ${content_block}.
  4. Install PHPWord if you haven't already: composer require phpoffice/phpword
  5. Modify the script for CLI output:
    • Comment out or remove all header(...) lines and the $templateProcessor->saveAs('php://output'); line.
    • Add the following line to save the document to a file instead:
      $output_docx_path = __DIR__ . '/output_document.docx'; // Define output path
            $templateProcessor->saveAs($output_docx_path);
            echo "DOCX saved to: " . $output_docx_path . "\n";
    • Ensure the exit; line is removed or commented out if you want the "DOCX saved" message to appear.
  6. Run the script from your terminal: php generate_word_cli.php
  7. The DOCX file will be saved as output_document.docx.

Note: PHPWord's HTML import has limitations. For best results, use simple HTML structures or populate template variables with plain text extracted from HTML.


Convert HTML To Excel using PhpSpreadsheet

(Important Note: The original article mentioned PHPExcel, which is deprecated and no longer maintained. PhpSpreadsheet is its direct successor and should be used for all new projects. The examples below use PhpSpreadsheet.)

PhpSpreadsheet is the leading PHP library for reading and writing spreadsheet files in various formats, including Excel .xlsx (OfficeOpenXML), .xls (older BIFF format), CSV, ODS, and more. You can populate spreadsheets directly from PHP data (arrays, database results) or extract data from HTML tables.

Key features of PhpSpreadsheet:

  • Read and write multiple spreadsheet formats (XLSX, XLS, CSV, ODS, HTML, PDF via libraries like Dompdf/tcPDF/mPDF).
  • Cell styling (fonts, colors, borders, number formats).
  • Formulas, charts, images, cell merging, data validation.
  • Rich API for manipulating worksheets, rows, columns, and cells.

1. Install PhpSpreadsheet:

In your project's root directory, execute:

composer require phpoffice/phpspreadsheet

2. Example 1: Simple Spreadsheet from PHP Data

This example demonstrates creating a basic Excel file (.xlsx) directly from PHP data, including setting some cell values and a simple formula.


    

(Self-correction from original: Updated to use PhpSpreadsheet instead of deprecated PHPExcel. Using Xlsx writer and correct MIME type for modern Excel files. Added sheet title, more detailed cell data, formula examples, basic cell styling, and column auto-sizing for a more complete and useful example.)

Try it yourself: Generate Simple Excel File

Command-Line Execution (Excel from HTML Tables)

To run this PHP script from the command line and save the Excel output:

  1. Save the code above as generate_excel_cli.php (or similar).
  2. Ensure sample.html (containing HTML tables) is in the same directory.
  3. Install PhpSpreadsheet: composer require phpoffice/phpspreadsheet
  4. Modify the script for CLI output:
    • Comment out or remove all header(...) lines and the $writer->save('php://output'); line.
    • Add the following to save the Excel file:
      $output_excel_path = __DIR__ . '/html_tables_output.xlsx'; // Define output path
            $writer = new Xlsx($spreadsheet);
            $writer->save($output_excel_path);
            echo "Excel file saved to: " . $output_excel_path . "\n";
    • Ensure exit; is removed or commented if you want the "Excel file saved" message.
  5. Run from terminal: php generate_excel_cli.php
  6. The Excel file will be saved as html_tables_output.xlsx.

3. Example 2: Extracting HTML Tables to Excel Sheets

This more complex example demonstrates parsing an HTML file (sample.html), finding all <table> elements within it, and creating a separate Excel sheet for each table found. This is incredibly useful for data export tasks from web pages.


    

(Self-correction from original: Updated to PhpSpreadsheet. Significantly improved HTML table parsing logic. Added column auto-sizing and basic header styling. Improved error handling.)

This script is quite powerful: it extracts all tables from your sample.html and neatly organizes each one into a separate sheet in the generated Excel file. Pretty useful for data export tasks, especially when dealing with web pages that contain multiple data tables you need to analyze or archive.

Try it yourself: Generate HTML Tables » Excel


Other Ideal Routes & Advanced Techniques

While the libraries above are excellent for many use cases, sometimes you need different approaches, especially for complex HTML/CSS or when requiring higher fidelity.

1. Headless Browsers (e.g., Puppeteer, Playwright)

For pixel-perfect PDF or image captures of web pages, especially those with complex JavaScript and CSS, headless browsers are the gold standard. Tools like Puppeteer (Node.js) or Playwright (Node.js, Python, Java, C#) can be controlled programmatically.

  • How it works with PHP: You'd typically set up a small Node.js microservice that uses Puppeteer/Playwright. Your PHP application would then make an HTTP request to this service, passing the URL or HTML content. The service generates the PDF and returns it. Alternatively, PHP can execute command-line interface (CLI) tools that wrap these libraries.
  • Pros: Extremely high fidelity, executes JavaScript, handles modern CSS (Flexbox, Grid) perfectly.
  • Cons: More complex setup, requires Node.js, can be resource-intensive.
  • Example Use Case: Generating a PDF of a dynamic single-page application (SPA) or a page with complex D3.js charts.

2. Pandoc

Pandoc is a universal document converter – a command-line tool that can convert files from numerous markup formats into others. It's incredibly powerful for converting HTML to DOCX, ODT, LaTeX, ePub, and many more.

  • How it works with PHP: You can call Pandoc from PHP using shell_exec() or proc_open().
    <?php
    // Basic example of calling Pandoc from PHP
    $htmlFile = 'input.html';
    $docxFile = 'output.docx';
    // Ensure Pandoc is installed and in your system's PATH
    $command = "pandoc -s {$htmlFile} -o {$docxFile}";
    shell_exec($command); // Ensure proper error handling and security with shell_exec
    ?>
  • Pros: Excellent conversion quality for many formats, especially DOCX. Good support for citations, tables of contents, etc. Highly configurable.
  • Cons: Requires Pandoc to be installed on the server. Command-line interaction can have security implications if not handled carefully (sanitize inputs!).
  • Example Use Case: Converting Markdown or HTML documentation into a well-formatted Word document or ePub book.

3. Third-Party APIs & Services

Numerous commercial and free APIs exist for document conversion (e.g., Adobe PDF Services API, CloudConvert, Zamzar API).

  • How it works with PHP: You'd typically use a Guzzle or cURL to send your HTML/URL to the API endpoint and receive the converted document.
  • Pros: Often handles complex conversions well, offloads processing from your server, can be very convenient.
  • Cons: Reliance on external service, potential costs, data privacy concerns (you're sending your data to a third party).

Security Considerations

When converting HTML, especially from user-supplied sources or remote URLs, be mindful of security:

  • Remote Content (isRemoteEnabled in Dompdf): If you allow libraries to fetch remote resources (images, CSS), ensure URLs are validated to prevent Server-Side Request Forgery (SSRF) attacks. Only allow fetching from trusted domains if possible.
  • HTML Sanitization: If the HTML input comes from users, sanitize it thoroughly to prevent Cross-Site Scripting (XSS) vulnerabilities. Use libraries like HTML Purifier.
  • Input Validation: Always validate any input that influences file paths, filenames, or library options to prevent path traversal or other injection attacks.
  • Resource Limits: Processing large or complex HTML documents can be resource-intensive. Set appropriate PHP execution time limits and memory limits.
  • Library Updates: Keep your chosen libraries and their dependencies up-to-date by running composer update regularly.

Troubleshooting Common Issues

Here are some common problems you might encounter and how to address them:

  • Dompdf: CSS/Layout Issues:
    • Ensure HTML is well-formed. Simplify CSS. Check external resource paths. Use @font-face correctly.
  • PHPWord: HTML Content Not Rendering Correctly:
    • Use simple, clean HTML. Template-based approach is often more reliable.
  • PhpSpreadsheet: Performance with Large Files:
    • Use cell caching or read/write filters. CSV is faster for huge datasets.
  • General: Headers Already Sent Error:
    • Ensure no output before HTTP header calls. Use ob_start()/ob_end_clean() if needed. Call exit; after streaming.
  • Permissions Errors:
    • Ensure the web server has write permissions for temporary directories or font caches.

Conclusion: The Power Is In Your Hands

As you've seen, PHP, with the help of robust libraries like Dompdf, PHPWord, and PhpSpreadsheet, offers powerful tools for converting HTML into various document formats. Whether you're generating invoices, reports, or exporting data for analysis, these techniques can significantly streamline your workflows and enhance your web applications.

Remember to choose the right tool for the job: Dompdf for straightforward HTML-to-PDF, PHPWord for template-based Word document generation, and PhpSpreadsheet for all your Excel and spreadsheet needs. For more complex scenarios, exploring headless browsers or tools like Pandoc can provide even greater flexibility.

The key is to understand the capabilities and limitations of each library, write clean input HTML, and always consider security and performance. Happy converting!

Download Source Code & Example Files

To help you get started quickly, you can download all the example files used in this article. The ZIP archive includes the PHP scripts, a composer.json for easy dependency installation, the sample.html file, and a basic template.docx for the PHPWord example.

Download All Examples (.zip)

Alternatively, you can download individual example files:

Once downloaded (if you chose the ZIP), extract the archive, navigate into the project directory via your command line, and run composer install to set up all the dependencies. Then you can run the individual PHP scripts from the command line (e.g., php generate_pdf.php).

Robin Metcalfe

Robin is a freelance web strategist and developer based in Edinburgh, with over 15 years of experience helping businesses build effective and engaging online platforms using technologies like Laravel and WordPress.

Get in Touch