JavaScript Tutorials

Efficiently Decoding HTML Entities in JavaScript

Spread the love

HTML entities are placeholders for special characters in HTML. Characters like <, >, &, ", and ' have specific meanings within HTML and must be escaped to avoid disrupting the document’s structure. When retrieving HTML from a server or user input, you often need to decode these entities to display and manipulate them correctly. This article explores two methods for decoding HTML entities using JavaScript.

Table of Contents

Decoding HTML Entities with Vanilla JavaScript

Vanilla JavaScript provides a simple way to decode HTML entities using the browser’s built-in DOMParser. This method is efficient and requires no external libraries. However, it might not handle all entities perfectly, particularly less common or custom ones.


function decodeHTMLEntities(text) {
  const doc = new DOMParser().parseFromString(text, 'text/html');
  return doc.documentElement.textContent;
}

// Example usage:
const encodedText = '<p>This is a paragraph with &amp; in it.</p>';
const decodedText = decodeHTMLEntities(encodedText);
console.log(decodedText); // Output: 

This is a paragraph with & in it.

This function parses the input string as HTML using DOMParser. The textContent property of the root element then returns the decoded text, removing HTML tags and replacing entities. Note that this method also removes any HTML tags present in the input.

Decoding HTML Entities with the he Library

For more comprehensive decoding, especially for a wider range of entities, the he library is a robust solution. It’s lightweight and readily available via npm or a CDN.

Installation (npm):


npm install he

Installation (CDN – jsDelivr):


<script src="https://cdn.jsdelivr.net/npm/he"></script>

After installation, use the he.decode() function:


// Assuming 'he' library is included
const encodedText = '<p>This is a paragraph with &amp; and ' in it.</p>';
const decodedText = he.decode(encodedText);
console.log(decodedText); // Output: 

This is a paragraph with & and ' in it.

he.decode() handles a broader spectrum of entities, including numeric entities like ' (single quote).

Choosing a Method

Both methods effectively decode HTML entities. The vanilla JavaScript approach is suitable for simple cases and avoids external dependencies. For complex scenarios or a wider range of entities, the he library provides a more reliable solution. Always sanitize user input to prevent security vulnerabilities.

Leave a Reply

Your email address will not be published. Required fields are marked *