Giter Club home page Giter Club logo

metalsmith-related's Introduction

metalsmith-related

A Metalsmith plugin that shows related documents for each document in a collection.

Build Status Dependencies License

Uses Natural v0.5.1

Use

$ npm install metalsmith-related

Then in your build script:

Metalsmith  = require 'metalsmith'
markdown    = require 'metalsmith-markdown'
related     = require 'metalsmith-related'

Metalsmith(__dirname)
.use( related({
  'terms': 5
  'max': 5
  'threshold': 0
  'pattern': 'posts/**/*.md'
  'text': (doc) -> String doc.contents
}) )
.use( do markdown )
.build(do done)

You can specify which documents (like posts) will get processed, by providing a glob in the pattern option.

The option terms defines how many top terms will be used for each document to find its similar documents. Specifying max puts a cap on the total number of related articles we will return. Only documents whose importance is higher than threshold will be included.

Passing the text function you can decide how to format text on the document for analysis.

You can now access related documents under the related key as an array.

<ul id="posts">
{% for post in related %}
  <li>
    <h2><a href="/{{ post.path }}">{{ post.title }}</a></h2>
    <div class="date">{{ post.date | date('F jS, Y') }}</div>
  </li>
{% endfor %}
</ul>

Source

We depend on the globbing library and natural's term frequency–inverse document frequency.

{ Minimatch } = require 'minimatch'
{ TfIdf }     = require 'natural'
_             = require 'lodash'

module.exports = (opts) ->
  opts ?= {}

These are the options that you can override, by default we are looking for markdown documents and the top 5 terms.

  opts.pattern ?= '**/*.md'
  opts.max ?= 5
  opts.terms ?= 5
  opts.threshold ?= 0
  opts.text ?= (doc) -> String doc.contents

  mm = new Minimatch opts.pattern

  tfidf = new TfIdf()
  index = []

  (files, metalsmith, done) ->

Save all matching files into the index.

    for key, doc of files when mm.match key
      index.push key
      tfidf.addDocument opts.text doc

And for each document in the index.

    for i in [ 0...index.length ]

Get the terms sorted by their importance.

      terms = tfidf.listTerms i

Save only the top ones.

      top = ( term for { term } in terms[ 0...Math.min opts.terms, terms.length ] )

Find us similar documents with these terms and sort based on frequency.

      related = _( { freq, j } for freq, j in tfidf.tfidfs top when j isnt i and freq > opts.threshold )
      .sortBy('freq')
      .map('j')
      .value()
      .reverse()
      .map (j) -> files[index[j]]

And save max many under the related key.

      files[index[i]].related = related[ 0...Math.min opts.max, related.length ] if related.length

All done in sync.

    do done

metalsmith-related's People

Contributors

michel-kraemer avatar radekstepan avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

adenin-sites

metalsmith-related's Issues

How to make it work on MacOS?

Got this error:

159 error In file included from ../src/bson.cc:41:
159 error ../src/bson.h:182:92: error: too few arguments to function call, single argument 'isolate' was not specified
159 error void WriteLengthPrefixedString(const Local& value) { WriteInt32(value->Utf8Length()+1); WriteString(value); }
159 error void WriteDouble(const Local& object, const Local&
159 error
~~~~~~~~~~~~~~~~~~ ^
159 error /Users/fmmsilva/.node-gyp/15.4.0/include/node/v8.h:2907:3: note: 'NumberValue
' declared here
159 error V8_WARN_UNUSED_RESULT Maybe NumberValue(Local context) con
st;

Sort 'related' according to importance

Pretty good library. I was just about to write something like this myself but then I discovered your project. Thanks!

I'm wondering if related should be sorted according to the document's importance. TfIdf.tfidfs returns an array sorted by document ID but not by importance so your plugin will always only return the first 5 documents instead of the first 5 most important documents.

Unfortunately I don't know coffescript. Otherwise I would have submitted a PR.

Cheers,
Michel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.