cweiske / phinde Goto Github PK
View Code? Open in Web Editor NEWSelf-hosted search engine for your static blog
License: GNU Affero General Public License v3.0
Self-hosted search engine for your static blog
License: GNU Affero General Public License v3.0
elasticsearch 5 will break with that; see elastic/elasticsearch#16036
currently it's "cweiske.de search"
Required for indieweb/chat.indieweb.org#13
I need a way to disable crawling of log line urls on chat.indieweb.org - they won't give new links that I need.
Required for indieweb/chat.indieweb.org#13
Especially useful for the "show full content" setting.
Cronjobs are running. Need to investigate.
AH01071: Got error 'PHP message: PHP Fatal error:
Uncaught GearmanException: Failed to set exception option
in src/phinde/Queue.php:11
schema seems to be missing.
I have a websub subscription that is in status "subscribing" since 2 months. Calling bin/renew-subscriptions.php does not help.
especially the "noindex" value, required for indieweb/chat.indieweb.org#13.
http://search.cweiske.de/ has whitelisted URLs indexed only, not directly linked ones.
[snarfed] left you a message 2 weeks, 4 days ago: hi! just fyi, your search currently 500s on any search that ends with the / character, eg https://indiechat.search.cweiske.de/?q=xyz/ . hope it's an easy fix!
The crawlblacklist did not seem to work and began indexing things
explicitly blacklisted. I even put an echo statement in the code to
confirm the regex matched the url, but it still indexed it anyway.
reported by dan via e-mail
"10:16gRegorLove joined the channel"
Currently the running workers will capture all crawl requests, not only the ones for their own configuration/elastic search instance.
Required for indieweb/chat.indieweb.org#13.
status page silently shows -1 when gearadmin
is not installed instead of giving an error.
It would be very useful to be able to jump to the line on the day page.
Right now the time link looks like https://chat.indieweb.org/dev/2017-05-24/1495641830720000
But if it was transformed to https://chat.indieweb.org/dev/2017-05-24#t1495641830720000
, you could jump to that line on the day page and read context.
It doesn't seem to be completely reliable, so both links would probably be required.
So that the useless title in https://indiechat.search.cweiske.de/?q=nick%3Ajeena is not shown, only content.
Start page message should be customizable, too.
Required for indieweb/chat.indieweb.org#13
e.g. "url:https://chat.indieweb.org/dev/"
We already support this with author.name:cweiske
, but we need an alias nick:cweiske
Required for indieweb/chat.indieweb.org#13
http://indiechat.search.cweiske.de/index.php?q=bridgy shows "circleci-bot" which isn't really relevant for us.
https://indiechat.search.cweiske.de/?q=nick%3Aaaronpk+date%3A2017-04-09 has some of those
aaronpk at 2017-04-09 21:37
aaronpk at 2017-04-09 20:28
When a crawl HTTP request times out, the crawler should try it again after some time automatically.
probabably because of multiple nick:
words
The results on https://indiechat.search.cweiske.de/?q=nick%3Ajeena are pretty useless right now.
Required for indieweb/chat.indieweb.org#13
To quickly bootstrap chat.indieweb.org search, we cannot crawl everything because it's slow.
It's quicker to import a pre-generated list of URLs that shall be indexed.
Required for indieweb/chat.indieweb.org#13
Somehow <video>
is not escaped:
https://indiechat.search.cweiske.de/?q=nick%3AbekoDiscord
Seen on indiechat.search.cweiske.de
Currently workers crawl everything on chat.indieweb.org, but don't index anything.
# only for indexing:
$ ./bin/phinde-worker.php index
# only for crawling:
$ ./bin/phinde-worker.php crawl
Required for indieweb/chat.indieweb.org#13
support date range filters for indieweb/chat.indieweb.org#13:
before:2016-08-08
after:2016-07-07
date:2016-08-08
needed for the indieweb/chat.indieweb.org#13 instance; I don't want to follow and index all links there.
https://indiechat.search.cweiske.de/?q=nick%3Acweiske&filter%5Btags%5D=indieweb
click on the pager and the tag filter is gone.
When subscribing to https://chat.indieweb.org/dev/ phinde tells me:
No hub URL found for topic
Reason is that the URL sends a redirect to the current date, and that one has no hub URL.
@aaronpk says "as soon as it sees a hub URL it can stop following the redirects".
This is not in the spec, but I need to implement that.
Pages do not get crawled when they get indexed before they are crawled, because of:
-- Crawling https://chat.indieweb.org/dev/2016-08-24
Not modified since last fetch
Required for indieweb/chat.indieweb.org#13
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.