How to turn web pages into PDFs with Puppeteer and NodeJS

What is Puppeteer, and why is it awesome?

In Google’s own words,Puppeteeris, “A Node library which provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol.”

[Read:Meet the 4 scale-ups using data to save the planet]

Set up the project environment

You can use Puppeteer on the backend and frontend to generate PDFs. In this tutorial, we are using a Node backend for the task.

Initialize NPM and set up the usual Express server to get started with the tutorial.

Make sure to install the Puppeteer NPM package with the following command before you start.

Convert web pages to PDF

Now we get to the exciting part of the tutorial. With Puppeteer, we only need a few lines of code to convert web pages into PDF.

First, create a browser instance using Puppeteer’slaunchfunction.

Then, we create a new page instance and visit the given page URL using Puppeteer.

We have set thewaitUntiloption tonetworkidle0. When we usenetworkidle0option, Puppeteer waits until there are no new network connections within the last 500 ms. It is a way to determine whether the site has finished loading. It’s not exact, and Puppeteer offers other options, but it is one of the most reliable for most cases.

Finally, we create the PDF from the crawled page content and save it to our device.

The print toPDF functionis quite complicated and allows for a lot of customization, which is fantastic. Here are some of the options we used:

When the PDF creation is over, close the browser connection withbrowser.close().

Build an API to generate and respond PDFs from URLs

With the knowledge we gather so far, we can now create a new endpoint that will receive a URL as a query string, and then it will stream back to the client the generated PDF.

Here is the code:

If you start the server and visit the/pdfroute, with atargetquery param containing the URL we want to convert. The server will serve the generated PDF directly without ever storing it on disk.

URL example:http://localhost:3000/pdf?target=https://google.com

Which will generate the following PDF as it looks on the image:

That’s it! You have completed the conversion of a web page to PDF. Wasn’t that easy?

As mentioned, Puppeteer offers many customization options, so make sure you play around with the opportunities to get different results.

Next, we can change the viewport size to capture websites under different resolutions.

Capture websites with different viewports

In the previously created PDF, we didn’t specify the viewport size for the web page Puppeteer is visiting, instead used the default viewport size, 800×600px.

However, we can precisely set the page’s viewport size before crawling the page.

Conclusion

In today’s tutorial, we used Puppeteer, a Node API for headless Chrome, to generate a PDF of a given web page. Since you are now familiar with the basics of Puppeteer, you can use this knowledge in the future to create PDFs or even for other purposes like web scraping and UI testing.

Thisarticlewas originally published onLive Code StreambyJuan Cruz Martinez(twitter:@bajcmartinez), founder and publisher of Live Code Stream, entrepreneur, developer, author, speaker, and doer of things.

Live Code Streamis also available as a free weekly newsletter. Sign up for updates on everything related to programming, AI, and computer science in general.

Story byLive Code Stream