Giter Club home page Giter Club logo

searchyll's People

Contributors

allizad avatar alranel avatar bfabio avatar drusellers avatar matthewdu avatar nz avatar robsears avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

searchyll's Issues

Error in Jekyll 3.1.6

Assuming the error is in searchyll since I don't get the error with the gem not active.

jekyll 3.1.6 | Error: undefined method[]' for nil:NilClass`

Not familiar with Ruby enough to know how to dig deeper into the error.

Developing locally with ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin15] using homebrew and rbenv

Let me know if I can provide any more details.

Index Tags

Searchyll should index tags as well as categories.

Work with `pages`?

I am trying to use searchyll to add ElasticSearch (ES) capabilities to my Git-Page website. My Git-Pages site is made up of "pages" not "posts". So I wonder if that's the root of the issue? (Also, I use collections in my _config.yml file.)

I've gotten as far as trying to add documents to my ES database via jekyll build. I don't see that my pages are added.

Below I have included details of what I've done, but an overview is:

  1. Updated my _config.yml:
    a. Added searchyll gem.
    b. Added elasticsearch

  2. Updated my `_layouts/page.html' to include

<article>...</article>
  1. Run elasticsearch locally (for now)

  2. Run jekyll build
    a. I can see the indexing document puts output.
    b. I added some additional puts to searchyll.rb just in case. All seems okay.

  3. In my elasticsearch, I do not see any new messages
    a. I expected a message as each document is indexed into the ES database but nope

  4. GET _search returns nothing
    a. Not surprising

  5. To test my ES:
    a. I manually PUT
    b. I saw a message in my elasticsearch output
    c. I manually GET

My environment:

  • Mac OS Sierra v 10.12.16
  • Gems 2.0.0
  • searchyll 0.10.1
  • jekyll 3.4.5
  • elasticsearch 5.6.0

Details:

#-------
$ cat _config.yml
[ snip ]
# stuff BA added
gems: [
  jekyll-paginate, jekyll-feed, rouge, searchyll
]
 
elasticsearch:
  url: http://localhost:9200
  index_name: CSG-Wiki
  default_type: "page"          # Optional. Default type is "post".
 
collections:
  general:
    title: General
    output: true
    permalink: /:collection/:path/:title.html
 
#-------
$ cat _layouts/page.html
---
layout: default
---
 
<div class="page">
  <h1 class="page-title">{{ page.title }}</h1>
    <!-- this will be sent to elasticsearch, along with full page metadata -->
    <article class="page-content">
      {{ content }}
    </article>
</div>
 
#-------
$ elasticsearch --verbose
[2017-09-18T08:04:30,005][INFO ][o.e.n.Node               ] [] initializing ...
[2017-09-18T08:04:30,078][INFO ][o.e.e.NodeEnvironment    ] [7_61xZT] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [126.1gb], net total_space [232.6gb], spins? [unknown], types [hfs]
[2017-09-18T08:04:30,078][INFO ][o.e.e.NodeEnvironment    ] [7_61xZT] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-09-18T08:04:30,090][INFO ][o.e.n.Node               ] node name [7_61xZT] derived from node ID [7_61xZTTSr6bdGqad_FYTQ]; set [node.name] to override
[2017-09-18T08:04:30,090][INFO ][o.e.n.Node               ] version[5.6.0], pid[42125], build[781a835/2017-09-07T03:09:58.087Z], OS[Mac OS X/10.12.6/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_131/25.131-b11]
[2017-09-18T08:04:30,090][INFO ][o.e.n.Node               ] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/usr/local/Cellar/elasticsearch/5.6.0/libexec]
[2017-09-18T08:04:30,706][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [aggs-matrix-stats]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [ingest-common]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [lang-expression]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [lang-groovy]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [lang-mustache]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [lang-painless]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [parent-join]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [percolator]
[2017-09-18T08:04:30,708][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [reindex]
[2017-09-18T08:04:30,708][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [transport-netty3]
[2017-09-18T08:04:30,708][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [transport-netty4]
[2017-09-18T08:04:30,708][INFO ][o.e.p.PluginsService     ] [7_61xZT] no plugins loaded
[2017-09-18T08:04:31,815][INFO ][o.e.d.DiscoveryModule    ] [7_61xZT] using discovery type [zen]
[2017-09-18T08:04:32,191][INFO ][o.e.n.Node               ] initialized
[2017-09-18T08:04:32,192][INFO ][o.e.n.Node               ] [7_61xZT] starting ...
[2017-09-18T08:04:32,358][INFO ][o.e.t.TransportService   ] [7_61xZT] publish_address {127.0.0.1:9300}, bound_addresses {[fe80::1]:9300}, {[::1]:9300}, {127.0.0.1:9300}
[2017-09-18T08:04:35,401][INFO ][o.e.c.s.ClusterService   ] [7_61xZT] new_master {7_61xZT}{7_61xZTTSr6bdGqad_FYTQ}{5kt0gbCuQZ2ZDrjkx6cImg}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-09-18T08:04:35,419][INFO ][o.e.h.n.Netty4HttpServerTransport] [7_61xZT] publish_address {127.0.0.1:9200}, bound_addresses {[fe80::1]:9200}, {[::1]:9200}, {127.0.0.1:9200}
[2017-09-18T08:04:35,419][INFO ][o.e.n.Node               ] [7_61xZT] started
[2017-09-18T08:04:35,529][INFO ][o.e.g.GatewayService     ] [7_61xZT] recovered [1] indices into cluster_state
[2017-09-18T08:04:35,675][INFO ][o.e.c.r.a.AllocationService] [7_61xZT] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[library][1]] ...]).
 
#-------
$ jekyll build
WARN: Unresolved specs during Gem::Specification.reset:
      rb-fsevent (>= 0.9.4, ~> 0.9)
      rb-inotify (>= 0.9.7, ~> 0.9)
WARN: Clearing out unresolved specs.
Please report a bug if this causes problems.
Configuration file: /Users/bfo7328/Documents/hca/project/wiki/_config.yml
            Source: /Users/bfo7328/Documents/hca/project/wiki
       Destination: /Users/bfo7328/Documents/hca/project/wiki/_site
Incremental build: disabled. Enable with --incremental
      Generating...
        indexing document /general/AE_job_desc.html
        indexing document /general/index.html
        indexing document /general/setup_elasticsearch.html
        indexing document /general/setup_phone_cisco_unity.html
        [ snip ]
        indexing document /unix/setup_linux_analytics_server.html
        indexing page /404.html
        indexing page /atom.xml
        indexing page /
        indexing page /feed.xml
       Old indices:
                    done in 5.697 seconds.
Auto-regeneration: disabled. Use --watch to enable.
 
#-------
$ elasticsearch --verbose
[ no new messages ]
 
#-------
$ curl -XGET localhost:9200/_search?pretty
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 0,
    "successful" : 0,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  }
}
 
#-------
$ curl -X PUT 'localhost:9200/library/books/1?pretty' -H 'Content-Type: application/json' -d'
{
  "title" : "A fly on the wall",
  "name"  : {
    "first": "Drosophila",
    "last" : "Melanogaster"
  },
  "publish_date" : "2015-06-21T23:39:40-0400",
  "price"        : 19.95
}
'
 
# output:
{
  "_index" : "library",
  "_type" : "books",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}
 
#-------
$ elasticsearch --verbose
[ new messages: ]
[2017-09-18T08:59:10,176][INFO ][o.e.c.m.MetaDataCreateIndexService] [7_61xZT] [library] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2017-09-18T08:59:10,252][INFO ][o.e.c.m.MetaDataMappingService] [7_61xZT] [library/qqN2Ig5uSQO7HBI94tp6fQ] create_mapping [books]
 
#-------
$ curl -XGET localhost:9200/_search?pretty
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "library",
        "_type" : "books",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "A fly on the wall",
          "name" : {
            "first" : "Drosophila",
            "last" : "Melanogaster"
          },
          "publish_date" : "2015-06-21T23:39:40-0400",
          "price" : 19.95
        }
      }
    ]
  }
}

Do not print elasticsearch_url

As of now, the full URL is printed to console:

Jekyll::Hooks.register(:site, :pre_render) do |site|
config = Searchyll::Configuration.new(site)
if config.valid?
puts "setting up indexer hook with url #{config.elasticsearch_url.inspect}"

However, if the URL contains authentication credentials (e.g. https://user:pass@host) and we're running in a CI/CD enrivonment, such credentials get exposed in the build logs.
I propose to avoid printing the full URL, or only print it when running with the jekyll --verbose option.

Searchyll doesn't connect to local Elasticsearch url

I am newbie with gems. Pardon if this is pretty basic. I am using this blog https://blog.omc.io/elasticsearch-for-jekyll-part-2-5ff5e52912bf to setup searchyll. I get the below error when I try to build with Searchyll Jekyll.

Thanks in advance. This plugin is awesome!!

jekyll build
            Source: /<localpath>/advisor
       Destination: /<localpath>/advisor/_site
 Incremental build: disabled. Enable with --incremental
      Generating... 
jekyll 3.6.2 | Error:  Failed to open TCP connection to :80 (Connection refused - connect(2) for nil port 80)

The error disappears when I comment out the Searchyll plugin

Env
jekyll 3.6.2
MacBook Pro running High Sierra

Contents of _config.yml

title: Your awesome title
email: [email protected]
description: >- # this means to ignore newlines until "baseurl:"
  Write an awesome description for your new site here. You can edit this
  line in _config.yml. It will appear in your document head meta (for
  Google search results) and in your feed.xml site description.
baseurl: "" # the subpath of your site, e.g. /blog
url: "" # the base hostname & protocol for your site, e.g. http://example.com
twitter_username: jekyllrb
github_username:  jekyll

# Build settings
markdown: kramdown
theme: minima
plugins:
  - jekyll-feed
  - searchyll

elasticsearch:
    url: http:/127.0.0.1:9200
    index_name: jekyll-shakespeare

contents of Gemfile

source "https://rubygems.org"

gem "jekyll", "~> 3.6.2"

gem "minima", "~> 2.0"


group :jekyll_plugins do
  gem "jekyll-feed", "~> 0.6"
end

gem 'tzinfo-data', platforms: [:mingw, :mswin, :x64_mingw, :jruby]

source 'https://rubygems.org'
gem 'searchyll'
gem 'jekyll-compose', group: [:jekyll_plugins]

Batch size keeps growing

I had bundle exec jekyll serve running for a few hours, and I noticed the following output in console:

Increased batch to 94777449436701609667188297781665503819030517220054107171082413431520640121778638383647367110547101173319817390034308454741829257133119992736870009797795596146331645789887004672
Increased batch to 200593728672035931732768693399865394589774672587325838890939151914483838602731795427929701601580512710605213727191985246954165249313228158379774992635263303952590755845660674621440
Increased batch to 18368504659304856136350155220078100239893869168967793447586326566762192167108094577698989302780888185491791238388332456749221831046657710308578700526160122250397424242244190208
Increased batch to 277271443668342844774702843187183923013839381558598037623706775909080314800410578046290760507714583250559156411054944938133818412179429338058541858072772201074249307795750912
Increased batch to 9873695207846506853946225222129548294174173481472380761094969262719211345924829493206651477802892616369342958461547297909307914910857517697703872790440830012951016964156723804556492

I think the timestamp in indexer.rb needs to be renewed :)

Use URL as page id?

page.name is currently used for populating the id field:

indexer << page.data.merge({
id: page.name,
url: page.url,
text: nokogiri_doc.xpath("//article//text()").to_s.gsub(/\s+/, " ")
})

However, there's no guarantee that name contains a defined value, and Elasticsearch will complain if it doesn't.

Why don't we use page.url instead, which is unique enough for being used as an ID (definitely more than name) and is guaranteed to be always defined?

Collaboration

First of all, thanks for creating Searchyll. It's great! We're using Searchyll to power the future search engine for our Developer Portal and I wonder whether this project is alive and up for a few improvements I'd like to see:

  1. Merge #50 so we can stop using our fork of Searchyll but instead start using the official Gem.
  2. Replace Travis with GitHub Actions (done, see SwedbankPay#5)
  3. Add CodeCov (also done in SwedbankPay#5)
  4. Update RuboCop (also done in SwedbankPay#5) and fix all problems that are auto-fixable (which amount to 185 of 199 offenses by my counts).

I can rebase and submit pull requests for each of these things if you agree that they would benefit the project and would merge them. However, if this project is dead or the features are of no interest to you I would have to establish our fork as a more permanent solution that is going to diverge greatly from this repository over time. I would like to avoid that, if possible, though.

Build unit test suite

Now that we have a stable-ish internal API, let's create some specs. Might even want to start fresh and TDD it to flush out any inadvertent weirdness or dead code from our quick iteration on v1.

Add logging during index

It would be nice to see a sweet sweet log statement during build.

Configuration file: /Users/drusellers/dev/omc/runbook/_config.yml
            Source: /Users/drusellers/dev/omc/runbook
       Destination: /Users/drusellers/dev/omc/runbook/_site
 Incremental build: disabled. Enable with --incremental
  Indexing Content: <ES url>
      Generating...
                    done in 5.73 seconds.
 Auto-regeneration: disabled. Use --watch to enable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.