Make HTML into static text –
This simple tool gives you the functionality to convert any HTML-based website into plain-text format. It’s great if you wish to fetch the individual links to latest stories from your favorite news platform, or want to avoid any type of JavaScript/HTML operations in general. Either way, it all happens from within the command line, and the interface is kept to a minimum.
Key Features:
Custom HTML parser that you can use to fetch a specific content output from the given website.
API
The API exports a function that takes in HTML and returns a formatted plain text string. It uses colors and formatting provided by .
var hget = require('hget');
var html = '<p>Hello <b>Nico</b>!</p>';
hget(html);
// <- 'Hello Nico!'
You can also pass in a few options.
hget(html, options)
The options are as follows.
root
sets the context root, it defaults to'body'
. Maybe you want to use'main'
or something akin to that.ignore
can be a single selector or an array of selectors. Any elements that match the provided selectors will be removed from the document before rendering the terminal-printable output. Keep in mind that these selectors will be rooted in theroot
element.html
means that you'll get HTML back, instead of the default human-readable terminal outputmarkdown
means you'll get Markdown back, instead of the default human-readable terminal output
CLI
Easy and flexible to use!
hget ponyfoo.com
hget file.html
cat file.html | hget
Example usage
Ooh, the CLI also follows redirects.
hget ponyfoo.com/articles/last --root article --ignore footer,.mm-count,.at-meta
Also, the output will be paged using $PAGER
for convenience. You can turn this off using --no-paging
.
It works well on most sites. Here's just the news links from EchoJS.
hget echojs.com --root #newslist --ignore "article>:not(h2)"