hget


Make HTML into static text –

This simple tool gives you the functionality to convert any HTML-based website into plain-text format. It’s great if you wish to fetch the individual links to latest stories from your favorite news platform, or want to avoid any type of JavaScript/HTML operations in general. Either way, it all happens from within the command line, and the interface is kept to a minimum.

Key Features:

Custom HTML parser that you can use to fetch a specific content output from the given website.

API

The API exports a function that takes in HTML and returns a formatted plain text string. It uses colors and formatting provided by .

var hget = require('hget');
var html = '<p>Hello <b>Nico</b>!</p>';

hget(html);
// <- 'Hello Nico!'

You can also pass in a few options.

hget(html, options)

The options are as follows.

  • root sets the context root, it defaults to 'body'. Maybe you want to use 'main' or something akin to that.
  • ignore can be a single selector or an array of selectors. Any elements that match the provided selectors will be removed from the document before rendering the terminal-printable output. Keep in mind that these selectors will be rooted in the root element.
  • html means that you'll get HTML back, instead of the default human-readable terminal output
  • markdown means you'll get Markdown back, instead of the default human-readable terminal output

CLI

Easy and flexible to use!

hget ponyfoo.com
hget file.html
cat file.html | hget

Example usage

Ooh, the CLI also follows redirects.

hget ponyfoo.com/articles/last --root article --ignore footer,.mm-count,.at-meta

Also, the output will be paged using $PAGER for convenience. You can turn this off using --no-paging.

It works well on most sites. Here's just the news links from EchoJS.

hget echojs.com --root #newslist --ignore "article>:not(h2)"

echojs-output.png
echojs-output.png

Scroll to Top