Giter Club home page Giter Club logo

truncate-html's Introduction

truncate-html

Truncate html string(even contains emoji chars) and keep tags in safe. You can custom ellipsis sign, ignore unwanted elements and truncate html by words.

Notice This is a node module depends on cheerio can only run on nodejs. If you need a browser version, you may consider truncate or nodejs-html-truncate.

const truncate = require('truncate-html')
truncate('<p><img src="xxx.jpg">Hello from earth!</p>', 2, { byWords: true })
// => <p><img src="xxx.jpg">Hello from ...</p>

Installation

npm install truncate-html
or
yarn add truncate-html

Try it online

Click https://npm.runkit.com/truncate-html to try.

API

/**
 * truncate html
 * @method truncate(html, [length], [options])
 * @param  {String|CheerioStatic}         html    html string to truncate, or  existing cheerio instance(aka cheerio $)
 * @param  {Object|number}  length how many letters(words if `byWords` is true) you want reserve
 * @param  {Object|null}    options
 * @param  {Boolean}        [options.stripTags] remove all tags, default false
 * @param  {String}         [options.ellipsis] ellipsis sign, default '...'
 * @param  {Boolean}        [options.decodeEntities] decode html entities(e.g. convert `&amp;` to `&`) before
 *                                                   counting length, default false
 * @param  {String|Array}   [options.excludes] elements' selector you want ignore
 * @param  {Number}         [options.length] how many letters(words if `byWords` is true)
 *                                           you want reserve
 * @param  {Boolean}        [options.byWords] if true, length means how many words to reserve
 * @param  {Boolean|Number} [options.reserveLastWord] how to deal with when truncate in the middle of a word
 *                                1. by default, just cut at that position.
 *                                2. set it to true, with max exceed 10 letters can exceed to reserver the last word
 *                                3. set it to a positive number decide how many letters can exceed to reserve the last word
 *                                4. set it to negetive number to remove the last word if cut in the middle.
 * @param  {Boolean}        [options.trimTheOnlyWord] whether to trim the only word when `reserveLastWord` < 0
 *                                if reserveLastWord set to negetive number, and there is only one word in the html string,
 *                                 when trimTheOnlyWord set to true, the extra letters will be cutted if word's length longer
 *                                 than `length`.
 *                                see issue #23 for more details
 * @param  {Boolean}        [options.keepWhitespaces] keep whitespaces, by default continuous
 *                                spaces will be replaced with one space
 *                                set it true to reserve them, and continuous spaces will count as one
 * @return {String}
 */
truncate(html, [length], [options])
// and truncate.setup to change default options
truncate.setup(options)

Default options

{
  byWords: false,
  stripTags: false,
  ellipsis: '...',
  decodeEntities: false,
  keepWhitespaces: false,
  excludes: '',
  reserveLastWord: false,
  keepWhitespaces: false
}

You can change default options by using truncate.setup

e.g.

truncate.setup({ stripTags: true, length: 10 })
truncate('<p><img src="xxx.jpg">Hello from earth!</p>')
// => Hello from

or use existing cheerio instance

import * as cheerio from 'cheerio'
truncate.setup({ stripTags: true, length: 10 })
// truncate option `decodeEntities` will not work
//    you should config it in cheerio options by yourself
const $ = cheerio.load('<p><img src="xxx.jpg">Hello from earth!</p>', {
  /** set decodeEntities if you need it */
  decodeEntities: true
  /* any cheerio instance options*/
}, false) // third parameter is for `isDocument` option, set to false to get rid of extra wrappers, see cheerio's doc for details
truncate($)
// => Hello from

Notice

Typescript support

This lib is written with typescript and has a type definition file along with it. You may need to update your tsconfig.json by adding "esModuleInterop": true to the compilerOptions if you encounter some typing errors, see #19.

About final string length

If the html string content's length is shorter than options.length, then no ellipsis will be appended to the final html string. If longer, then the final string length will be options.length + options.ellipsis. And if you set reserveLastWord to true or none zero number, the final string will be various.

About html comments

All html comments <!-- xxx --> will be removed

About dealing with none alphabetic languages

When dealing with none alphabetic languages, such as Chinese/Japanese/Korean, they don't separate words with whitespaces, so options byWords and reserveLastWord should only works well with alphabetic languages.

And the only dependency of this project cheerio has an issue when dealing with none alphabetic languages, see Known Issues for details.

Using existing cheerio instance

If you want to use existing cheerio instance, truncate option decodeEntities will not work, you should set it in your own cheerio instance:

var html = '<p><img src="abc.png">This is a string</p> for test.'
const $ = cheerio.load(`${html}`, {
  decodeEntities: true
  /** other cheerio options */
}, false) // third parameter is for `isDocument` option, set to false to get rid of extra wrappers, see cheerio's doc for details
truncate($, 10)

Examples

var truncate = require('truncate-html')

// truncate html
var html = '<p><img src="abc.png">This is a string</p> for test.'
truncate(html, 10)
// returns: <p><img src="abc.png">This is a ...</p>

// truncate string with emojis
var string = '<p>poo πŸ’©πŸ’©πŸ’©πŸ’©πŸ’©<p>'
truncate(string, 6)
// returns: <p>poo πŸ’©πŸ’©...</p>

// with options, remove all tags
var html = '<p><img src="abc.png">This is a string</p> for test.'
truncate(html, 10, { stripTags: true })
// returns: This is a ...

// with options, truncate by words.
//  if you try to truncate none alphabet language(like CJK)
//      it will not act as you wish
var html = '<p><img src="abc.png">This is a string</p> for test.'
truncate(html, 3, { byWords: true })
// returns: <p><img src="abc.png">This is a ...</p>

// with options, keep whitespaces
var html = '<p>         <img src="abc.png">This is a string</p> for test.'
truncate(html, 10, { keepWhitespaces: true })
// returns: <p>         <img src="abc.png">This is a ...</p>

// combine length and options
var html = '<p><img src="abc.png">This is a string</p> for test.'
truncate(html, {
  length: 10,
  stripTags: true
})
// returns: This is a ...

// custom ellipsis sign
var html = '<p><img src="abc.png">This is a string</p> for test.'
truncate(html, {
  length: 10,
  ellipsis: '~'
})
// returns: <p><img src="abc.png">This is a ~</p>

// exclude some special elements(by selector), they will be removed before counting content's length
var html = '<p><img src="abc.png">This is a string</p> for test.'
truncate(html, {
  length: 10,
  ellipsis: '~',
  excludes: 'img'
})
// returns: <p>This is a ~</p>

// exclude more than one category elements
var html =
  '<p><img src="abc.png">This is a string</p><div class="something-unwanted"> unwanted string inserted ( Β΄β€’Μ₯Μ₯Μ₯Ο‰β€’Μ₯Μ₯Μ₯` οΌ‰</div> for test.'
truncate(html, {
  length: 20,
  stripTags: true,
  ellipsis: '~',
  excludes: ['img', '.something-unwanted']
})
// returns: This is a string for~

// handing encoded characters
var html = '<p>&nbsp;test for &lt;p&gt; encoded string</p>'
truncate(html, {
  length: 20,
  decodeEntities: true
})
// returns: <p> test for &lt;p&gt; encode...</p>

// when set decodeEntities false
var html = '<p>&nbsp;test for &lt;p&gt; encoded string</p>'
truncate(html, {
  length: 20,
  decodeEntities: false // this is the default value
})
// returns: <p>&nbsp;test for &lt;p...</p>

// and there may be a surprise by setting `decodeEntities` to true  when handing CJK characters
var html = '<p>&nbsp;test for &lt;p&gt; δΈ­ζ–‡ string</p>'
truncate(html, {
  length: 20,
  decodeEntities: true
})
// returns: <p> test for &lt;p&gt; &#x4E2D;&#x6587; str...</p>
// to fix this, see below for instructions

for More usages, check truncate.spec.ts

Credits

Thanks to:

truncate-html's People

Contributors

calebeno avatar dependabot[bot] avatar earthlingdavey avatar oe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

truncate-html's Issues

Upgrade Cheerio to 0.22.0

The current version of cheerio you are using is 0.19.0. In the index.js file of cheerio for 0.19.0, line 11 reads:
exports.version = require('./package').version;
This does not work for webpack builds because it doesn't have .json at the end of package. In 0.22.0 you will see that this has been fixed:
https://github.com/cheeriojs/cheerio/blob/0.22.0/index.js

This would be a big help if you could add this as it would allow it to be used with webpack builds.

Cannot find name 'CheerioStatic'.

NodeJS app. npm install --save truncate-html

Works fine when working locally and in development. But as soon as I do "npm run build" it fails.

node_modules/truncate-html/dist/truncate.d.ts:52:21 - error TS2304: Cannot find name 'CheerioStatic'.

52     (html: string | CheerioStatic, length?: number, options?: IOptions): string;
                       ~~~~~~~~~~~~~

node_modules/truncate-html/dist/truncate.d.ts:53:21 - error TS2304: Cannot find name 'CheerioStatic'.

53     (html: string | CheerioStatic, options?: IOptions): string;
                       ~~~~~~~~~~~~~

No ellipses if truncated on tag boundary

When the truncation happens just prior a tag boundary, no ellipses are added.
Good:

> truncate('Hello <b>world</b>', 5);
'Hello...'

> truncate('Hello <b>world</b>', 7)
'Hello <b>w...</b>'

Bad:

> truncate('Hello <b>world</b>', 6)
'Hello '

Missing 'load' export in the embedded cheerio dependency

I just imported the latest version of the package and used truncate once in my component. I got that error when building.

Rollup: Missing Export: node_modules/truncate-html/dist/truncate.es.js:5:9
           'load' is not exported by node_modules/cheerio/index.js

      L4:   */
      L5:  import { load } from 'cheerio';

High vulnerabie dependency found when installing the package

I received the message below when run 'npm install truncate-html@latest'

css-what <5.0.1
Severity: high
Denial of Service - https://npmjs.com/advisories/1754
fix available via npm audit fix --force
Will install [email protected], which is a breaking change
node_modules/cheerio/node_modules/css-what
css-select <=3.1.2
Depends on vulnerable versions of css-what
node_modules/cheerio/node_modules/css-select
cheerio 0.19.0 - 1.0.0-rc.3
Depends on vulnerable versions of css-select
node_modules/cheerio
truncate-html >=0.0.2
Depends on vulnerable versions of cheerio
node_modules/truncate-html

Broken TypeScript typings.

When I import trucate-html using:

import * as truncate from "truncate-html";

Then I'm not able to call truncate("<p>my-html</p>") because TypeScript compiler complains that:

Cannot invoke an expression whose type lacks a call signature.

I'm also not able to call truncate.setup().

When I import trucate-html using:

import truncate from "truncate-html";

Then TypeScript compiler doesn't complain, but there are runtime errors after compilation because truncate is undefined when imported this way.

Perversing the last word for a single word over the character limit

When the text is just a single word and we want to preserve it, it allows up to the character limit + 10 right now. Is there a way to make it just the character limit? The options I used to reproduce this is { length: limitLength, reserveLastWord: -1, ellipsis: '' }. The -1 is used in the general case that I want to cut off the last word over the limit.

Console.logs

Hey,

on lines 100 and 101 (91 and 92 in .coffee) you forgot console.log statements :)

console.log($1 + ' ~~~ ' + $2);
console.log($1.length + ' ~~~ ' + $2.length);

TypeError: $ is not a function

$ is not yet assigned when html is object

line 54-59 of truncate.js

 if (typeof html === 'object') {
    html = $(html).html();
  }
  $ = cheerio.load("<div>" + html + "</div>", {
    decodeEntities: options.decodeEntities
  });

This happens to me when my html is actually null, so perhaps there should be a check somewhere for null as well as assigning the $ var further up

Option for add space after end of tag

There is a issue when you got text like <p>some text</p><p>Hello World</p>
you will get output: some textHello World,
As you can see without space between text and hello
Ofc. easy fix is to regex add space after closing tag if there is no space, but it would be nice it this can do it for me.

Reuse existing cheerio instance, avoid reparsing

I'm already doing some processing using cheerio, and I'd like to truncate the result. Currently, truncate-html only accepts text and always invokes cheerio.load, so that's an extra HTML parse. Would be nice if I could truncate an existing cheerio $.

Option to not remove extra white spaces

It breaks pre blocks, by putting everything on a single line.

Would be nice to have this optional, at least. Solved by removing the replace call on line 84:

text = $(this).text().replace(/\s+/g, ' ');

Should I open a PR with it?

Depedency vulnerabilities

According to a few node vulnerability websites (snyk.io and nodesecurity.io), this package has a few vulnerabilities caused by the package nth-check. Is there any reason those haven't been updated yet?
Thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.