Giter Club home page Giter Club logo

yacy_search_server's Introduction

YaCy

Search Engine Software

YaCy Home Page YaCy Discourse Forums become a Github Sponsor become a Patreon Member Build Status Install Link

Web Search Crawl Start Index Browser

What is YaCy?

YaCy is a full search engine application containing a server hosting a search index, a web application to provide a nice user front-end for searches and index creation and a production-ready web crawler with a scheduler to keep a search index fresh.

YaCy search portals can also be placed in an intranet environment, making it a replacement for commercial enterprise search solutions. A network scanner makes it easy to discover all available HTTP, FTP and SMB servers.

Running a personal Search Engine is a great tool for privacy; indeed YaCy was created with the privacy aspect as priority motivation for the project.

You can also use YaCy with a customized search page in your own web applications.

Large-Scale Web Search with a Peer-to-Peer Network

Each YaCy peer can be part of a large search network where search indexes can be exchanged with other YaCy installation over a built-in peer-to-peer network protocol.

This is the default operation that enables new users to instantly access a large-scale search cluster, operated only by YaCy users.

You can opt-out from the YaCy cluster operation by choosing a different operation mode in the web interface. You can also opt-out from the network in individual searches, turning the use of YaCy a completely privacy-aware tool - in this operation mode search results are computed from the local index only.

Installation

We recommend to compile YaCy yourself and install it from the git sources. Pre-compiled YaCy packages exist but are not generated on a regular basis. Automaticaly built latest developer release is available at release.yacy.net. To get a ready-to-run production package, run YaCy from Docker.

Compile and run YaCy from git sources

You need Java 11 or later to run YaCy and ant to build YaCy. This would install the requirements on debian:

sudo apt-get install openjdk-11-jdk-headless ant

Then clone the repository and build the application:

git clone --depth 1 https://github.com/yacy/yacy_search_server.git
cd yacy_search_server
ant clean all

To start YaCy, run

./startYACY.sh

The administration interface is then available in your web browser at http://localhost:8090. Some of the web pages are protected and need an administration account; these pages are usually also available without a password from the localhost, but remote access needs a log-in. The default admin account name is admin and the default password is yacy. Please change it after installation using the http://<server-address>:8090/ConfigAccounts_p.html service.

Stop YaCy on the console with

./stopYACY.sh

Run YaCy using Docker

The Official YaCy Image is yacy/yacy_search_server:latest. It is hosted on Dockerhub at https://hub.docker.com/r/yacy/yacy_search_server

To install YaCy in intel-based environments, run:

docker run -d --name yacy_search_server -p 8090:8090 -p 8443:8443 -v yacy_search_server_data:/opt/yacy_search_server/DATA --restart unless-stopped --log-opt max-size=200m --log-opt max-file=2 yacy/yacy_search_server:latest

then open http://localhost:8090 in your web-browser.

For building Docker image from latest sources, see docker/Readme.md.

Help develop YaCy

  • clone https://github.com/yacy/yacy_search_server.git using build-in Eclipse features (File -> Import -> Git)
  • or download source from this site (download button "Code" -> download as Zip -> and unpack)
  • Open Help -> Eclipse Marketplace -> Search for "ivy" -> Install "Apache IvyDE"
  • right-click on the YaCy project in the package explorer -> Ivy -> resolve

This will build YaCy in Eclipse. To run YaCy:

  • Package Explorer -> YaCy: navigate to source -> net.yacy
  • right-click on yacy.java -> Run as -> Java Application

Join our development community, got to https://community.searchlab.eu

Send pull requests to https://github.com/yacy/yacy_search_server

APIs and attaching software

YaCy has many built-in interfaces, and they are all based on HTTP/XML and HTTP/JSON. You can discover these interfaces if you notice the orange "API" icon in the upper right corner of some web pages in the YaCy web interface. Click it, and you will see the XML/JSON version of the respective webpage. You can also use the shell script provided in the /bin subdirectory. The shell scripts also call the YaCy web interface. By cloning some of those scripts you can easily create more shell API access methods.

License

This project is available as open source under the terms of the GPL 2.0 or later. However, some elements are being licensed under GNU Lesser General Public License. For accurate information, please check individual files. As well as for accurate information regarding copyrights. The (GPLv2+) source code used to build YaCy is distributed with the package (in /source and /htroot).

Contact

Visit the international YaCy forum where you can start a discussion there in your own language.

Questions and requests for paid customization and integration into enterprise solutions. can be sent to the maintainer, Michael Christen per e-mail (at [email protected]) with a meaningful subject including the word 'YaCy' to prevent it getting stuck in the spam filter.

  • Michael Peter Christen

yacy_search_server's People

Contributors

alex-run avatar alexvouilloz avatar alsutton avatar apfelmaennchen avatar breznak avatar cominch avatar copro avatar dalethium avatar databasedictionary avatar f1ori avatar frankenstein91 avatar icewindx avatar intari avatar ivanhercaz avatar jeremyrand avatar joestr avatar lofyer avatar luccioman avatar marcnause avatar mibeta avatar okybaca avatar orbiter avatar otteresk avatar quix0r avatar reger24 avatar scarfmonster avatar sixcooler avatar stepanov-sergey avatar tangdou1 avatar thkoch2001 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yacy_search_server's Issues

Have Yacy to sync the RTC on pc.

Take an average of senior peers clocks and apply an average time to all peers.
See http://westernblueskymining.tripod.com/ for a calcuclator.
Take note of the error corrections.
eg ua1 = ((u1 + u2 + u3 + u4 + u5 + u6 + u7 + u8 + u9 + u10) * .1) - .00000001490116#
sa1 = ((s1 + s2 + s3 + s4 + s5 + s6 + s7 + s8 + s9 + s10 + s11 + s12 + s13 + s14 + s15 + s16 + s17 + s18 + s19 + s20 + s21 + s22 + s23 + s24 + s25 + s26 + s27 + s28 + s29 + s30 + s31 + s32 + s33 + s34 + s35 + s36 + s37 + s38 + s39 + s40) * .025) - .00000001490116#
It is tested by feeding the same value into the formular.
Its a program I wrote to look at the stock market a few years ago.
Hope its useful to you guys..

API for Ranking and Heuristics

It would be very nice if there were an API for getting the current settings from the Ranking and Heuristics settings, and for changing those settings.

Help to install Yaci for Debian

I succeeded to install Yaci with Ubuntu FR tuto, but it would be nice for Yaci to propose the command line to install it in its official website like :

sudo apt-get install openjdk-8-jre-headless deb http://debian.yacy.net ./ apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 1F968B3903D886E7 apt-get update apt-get install yacy

Or something like (I don't know exactly the right command).

Errors/Timeouts in Crawl Start Expert

I get timeouts and errors, if I enter an url in CrawlStartExpert.html. The problem is caused by Commit: 06d0e2a

In the console I get the following Warning:
W 2016/02/19 13:21:01 org.eclipse.jetty.servlet.ServletHandler javax.servlet.ServletException: /root/git/yacy_search_server/htroot/api/getpageinfo_p.xml at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:833) at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:318) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:542) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745)

I also get multiple:
Caused by: java.io.IOException: Client can't execute: test.de duration=3001 for url http://test.de/

Reverting to Commit: caf9e98 fixes the problem.

Yacy Pagination bug

Going beyond a particular page in the search range returns empty results. This can be reproduced in the yacy interface itself or through the API.
Example: If there are 16 pages in the search results, going beyond some page say 14, would return blank results when accessing through API or in the interface, the pages won't be loaded.

Please fix this.

Enhancement Request: weather and movie widgets

As a casual user, it would be nice to provide information that isn't purely tech sites. One thing that would be nice is a way to see a nice visual if I type in "weather ", "movies ", "sunset ", "convert 3 pounds to kilograms."

And get some quick answers. While it might be fluff compared to the main purpose of this project, it would give a casual usefulness to the program.

Bandwidth Usage

I would like to propose a code change to limit Bandwidth usage. Before I dive in head first, I would like some feedback about my plan.

My use case is as follows: I have a 1TB a month limit and I would like to limit Crawling to 500GB a month and Search to 250GB a month with the remaining bandwidth going to other services.

Basic Plan: A two tiered bandwidth limit (Protocol overhead excluded from count)

  • Crawling
  • Search

For Crawling, it looks like the bulk of the work to track bandwidth will be done in LoaderDispatcher.java's loadInternal(). openInputStreamInternal() is also going to "eat" bandwidth, but I do not have a plan to address it so far. Recommendations welcome.

What is the best way to notify the "network" that I cannot Crawl / Search?
Crawl Restriction: sb.peers.mySeed().setFlagAcceptRemoteCrawl(false) & sb.peers.mySeed().setFlagAcceptRemoteIndex(false)

Any input/review is appreciated.

Thank you.

Ant dist error!

BUILD FAILED
/home/linker/gits/yacy/build.xml:470: Entry: yacy/libbuild/J7Zip-modified/target/classes/SevenZip/Compression/LZMA/Decoder$LiteralDecoder$Decoder2.class longer than 100characters.

Corrupted database?

After a power outage, my yacy instance can no longer start. The CPU usages would max out at 100% and unable to connect to web admin. Is there a way to repair the database?

Happy to provide log files but not sure which one to upload.

LibreJS

LibreJS is blocking the javascript because it doesn't detect the licenses. I'll try to fix it myself.

Scraper cannot load URL

Hello,

now as my YaCy is in general able to parse and index pdf document, I have sometime some issues with some documents. For example, I want to parse a book, then I get the following error:

Crawling von "http://localhost:8090/repository/Linux.pdf" schlug fehl. Grund: scraper cannot load URL: java.io.IOException: REJECTED EMPTY RESPONSE BODY 'HTTP/1.1 200 OK' for URL 'http://localhost:8090/repository/Linux.pdf'$/

I searched already the web, but I didn't find a solution. Currently I am using jdk8-openjdk and stuff like this, is this maybe a problem for YaCy and the parser?

Thanks in advance and kind regards.

Maven custom plugins

I tryed to build the project straight with maven and i got some error because i didn't have a yacy specific plugin, digging in the pom i saw that you are using your custom plugin to get the git revision, but luckely the last version of the buildnumber official plugin can do that, here the reference:

<groupId>org.codehaus.mojo</groupId>
<artifactId>buildnumber-maven-plugin</artifactId>

http://www.mojohaus.org/buildnumber-maven-plugin/create-mojo.html

i can help you to configure it if is needed ;)

lib missing

[INFO] ------------------------------------------------------------------------
[INFO] Building YaCy 1.83
[INFO] ------------------------------------------------------------------------
[WARNING] The POM for net.yacy.extlib:J7Zip-modified:jar:1.02 is missing, no dependency information available
[WARNING] The POM for net.yacy.extlib:webcat:jar:0.1 is missing, no dependency information available
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.811 s
[INFO] Finished at: 2015-11-05T15:45:08+08:00
[INFO] Final Memory: 14M/180M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project yacycore: Could not resolve dependencies for project net.yacy:yacycore:jar:1.83: The following artifacts could not be resolved: net.yacy.extlib:J7Zip-modified:jar:1.02, net.yacy.extlib:webcat:jar:0.1: Failure to find net.yacy.extlib:J7Zip-modified:jar:1.02 in https://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced -> [Help 1]

After I run "mvn install" in the libbuild dir.

Would there be a performance increase with these java options?

See [url]http://www.minecraftforum.net/forums/support/server-support/server-administration/1937726-java-7-8-command-line-options-for-minecraft[/url]
My minecraft CPU usage was around 40% just at idle now 3-4%.
Not to sure what all of them do so someone would have to test it because my Yacy server in windows crawled much higher PPM than I have ever seen before.

Also setting up the program to run in a small RAM disk would take some doing to store the data some where else.
Settings.

Java 7:
-Xmn2G -Xss4M -Xms4G -Xmx4G -XX:+UseLargePages -XX:PermSize=256M -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -XX:+UseStringCache -XX:+OptimizeStringConcat -XX:+UseCompressedStrings -XX:+UseBiasedLocking -Xincgc -XX:MaxGCPauseMillis=10 -XX:SoftRefLRUPolicyMSPerMB=10000 -XX:+CMSParallelRemarkEnabled -XX:ParallelGCThreads=10 -Djava.net.preferIPv4Stack=true

Java 8:
-Xmn2G -Xss4M -Xms4G -Xmx4G -XX:+UseLargePages -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -XX:+OptimizeStringConcat -XX:+UseBiasedLocking -Xincgc -XX:MaxGCPauseMillis=10 -XX:SoftRefLRUPolicyMSPerMB=10000 -XX:+CMSParallelRemarkEnabled -XX:ParallelGCThreads=10 -Djava.net.preferIPv4Stack=true

What do you think?

Using yacy as a large scale crawler

Hi everyone!

I'm currently testing yacy as a backend that will return every pages on certain large sites, which we will use on our end for further processing.

Right now my tests consists of crawling some very large news sites: Buzzfeed, the BBC, Wired and some others. I have looked at the documentation and I have come up with this query:

http://localhost:8090/yacysearch.json?query=site%3Abuzzfeed.com&nav=all&startRecord=40000&maximumRecords=1000&verify=false

Unfortuately for large values of startRecord I start to get no results. Is there a way to change this query or change the configuration of yacy to fix this? Latency wise I don't mind if the query takes up to half an hour and I don't mind too if there is pagination or not. If we can manage to do this we will be adding upwards of 100,000,000 documents to the index in the next few weeks.

Thanks for the help!

Http error 500 on News and Wiki

Hello,

In a fresh install of Yacy and Debian Jessie I got HTTP ERROR 500 for these pages of Administration > System Status > Messages & Community Data :
Overview Incoming Processed Outgoing Published (/News.html)
Local Peer Wiki (/Wiki.html)
For information, Yacy use case is "Search portal for your own web pages" for the moment.
May be it's normal because I have not configured anything but I'm not sure it is the waited behaviours.

The error message is :

HTTP ERROR 500
Problem accessing /News.html. Reason:
Server Error
Caused by:
javax.servlet.ServletException: /usr/share/yacy/htroot/News.html
at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:850)
at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:316)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:542)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
YaCy 1.90 - powered by Jetty -

Suggestion: P2P index harvesting and archiving

Hi,
regarding global index harvesting, is it possible to add extra tuning options and functionality for p2p and settings for all yacy peers (volunteerally) to contribute indexes for archiving p2p nodes who want and are able to store huge amount of data.

This would mean that theres needed additional peer role along junior,senior,principal. Archivist role would be nice addition to yacy.

Suggestion is that, those crawling peers could activate option for checking archivist tag from peer list and contribute all p2p index transmissions to those peers in round robin along normal p2p index distribution. This is purely for preserving and protecting p2p index from "erosion" when some nodes stops running yacy before theyve sended their index fully into global index.

Between Archivist nodes, they would distribute archived index to new and old peers in low priority while priority primaly is in receiving as much as possible global index and share/sync it between archivists nodes.

Edit: Also to be able to Archive as much as possible, probably is need functionatily that the p2p chunks would not be indexed directly, lets say that archivists receive index chunks for 24hours and then go into indexing mode which deactivates index receiving and archivist node starts to check for doubles from received chunks and then indexes all transfers and after that starts to receive new chunks. This due indexing chunks takes quite much cpu power and those who contribute to archivist nodes might DDoS node down quite easily.

Br,
Paraabeli

Support an upstream SOCKS proxy

Supporting an upstream SOCKS proxy would be very beneficial to users who want to use YaCy with Tor. One particular requirement to keep in mind is that Tor's stream isolation feature (which is very important for privacy in a situation like YaCy's) requires SOCKS authentication.

I've spent a few hours looking around, and the best candidate Java SOCKS client library I can find that supports authentication is https://github.com/fengyouchao/sockslib .

I'm not 100% sure that sockslib is the best choice (I'd probably want to ask some other Tor community members if there's something else they'd recommend), but before I expend any additional effort on this, would there be interest among the YaCy devs to support an upstream SOCKS proxy via sockslib? If so, would anyone like to do the work of integrating sockslib into YaCy, or should I attempt it myself and submit a PR?

Crawler ignore cyrillic paths in robots.txt

I just index some site with MediaWiki Engine and it's have robots.txt lilke:

User-agent: *
Disallow: /%D0%A8%D0%B0%D0%B1%D0%BB%D0%BE%D0%BD:
Disallow: /%D0%A1%D0%B2%D0%BE%D0%B9%D1%81%D1%82%D0%B2%D0%BE:
Disallow: /%D0%A1%D0%BB%D1%83%D0%B6%D0%B5%D0%B1%D0%BD%D0%B0%D1%8F:
Disallow: /%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:

But in Crawler answers robots exist: crawl allowed. It's at least strange...

"W: http://debian.yacy.net/./Release.gpg: Signature by key 8BD752501CB62448A30EA3EA1F968B3903D886E7 uses weak digest algorithm (SHA1)"

After following http://www.yacy-websuche.de/wiki/index.php/De:DebianInstall to add the apt repository on Ubuntu 16.10. I get the warning W: http://debian.yacy.net/./Release.gpg: Signature by key 8BD752501CB62448A30EA3EA1F968B3903D886E7 uses weak digest algorithm (SHA1). It'd be nice to remove it by providing a stronger hash algorithm. There's no impact on the installation or functioning of yacy.

Search results not clickable / do not open

Hello together,

I am running YaCy as a intranet search engine for documents. When YaCy shows the results of the search the pdf-files are not clickable.
That means they do not open in a extra tab or new window.

The result give the title, the short text about the document and the whole link to the document which looks like this: file://Z:\blablabla\blabla/bla/test.pdf#page=1
As I said, the link is not opening in Firefox or Internet Explorer, but if I copy the link manually to the Windows Explorer or also Firefox than the pdf-file can be opend.

How can I solve the problem?
Best regards

Unable to access yacy 403

HTTP ERROR: 403

Problem accessing /Status.html. Reason:

    proxy use not allowed (see Advanced Settings -> HTTP Networking -> Transparent Proxy; switched off).

Powered by Jetty://

yacy server listen yo 127.0.0.1 not 0.0.0.0

I installed yacy in my debian machine. Now it is listening to 0.0.0.0:8090.
I would like to make it listen to 127.0.0.1:8090 (or any other ip, but not 0.0.0.0)

I changed
port = 8090 to port = 127.0.0.1:8090 but no luck

How can I configure it?

Test

Test

System Status
System
YaCy version 1.82/9000
Uptime: 45 days 09:37
Processors: 8
Load: 2.74
Threads: 137/17, peak:982, total:8262748
Protection
password-protected [Configure]
Address
Host: []:6070 | SSL: enabled (port 8443)
Public Address: http://sokrates.homeunix.net:6070
YaCy Address: http://endeavour.yacy
Proxy
Transparent off URL off
Remote: not used
Auto-popup on start-up
Enabled [Disable]
Tray-Icon
Experimental
Memory Usage
RAM used: 62.88 GB
RAM max: 67.98 GB
DISK used: (approx.) 378.92 GB
DISK free: 1,302.15 GB

Support limiting of the hdd folder size

If I see it right, the current settings only allow to disable certain things, if there is not enough free space anymore.

It would be nice, if there would be an option: "Limit the DATA folder to x megabytes."

Of course I can use quotas etc., but native support would be great.

Finished crawls still listed as running

The Status Page claims the crawler is idling. No crawling activity showing on the logs.

Yet the Crawler monitor list the crawls as running.

YaCy version: 1.90/9000
Java version: 1.7.0_95
OS: FreeBSD

Deprecation warnings during build from Network and Protocol

Log at Git revision : 5f113be

[10:42:11] sudheeshsinganamalla:yacy_search_server git:(master) $ ant
Buildfile: /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/build.xml

buildGitRevTask:
   [delete] Deleting: /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/libbuild/GitRevTask.jar
      [jar] Building jar: /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/libbuild/GitRevTask.jar

determineGitRevision:

readBuildProperties:

init:
     [echo] YaCy Branch:
     [echo] YaCy Version number: 1.83
     [echo] YaCy Release number: 9785
   [delete] Deleting: /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/classes/net/yacy/peers/operation/yacyBuildProperties.java
     [copy] Copying 1 file to /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/classes/net/yacy/peers/operation
     [copy] Copying 1 file to /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/classes

compile-core:
    [javac] Compiling 1 source file to /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/classes
    [javac] Compiling 77 source files to /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/classes
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/source/net/yacy/cora/document/id/MultiProtocolURL.java:1170: warning: [cast] redundant cast to MultiProtocolURL
    [javac]         final MultiProtocolURL other = (MultiProtocolURL) obj;
    [javac]                                        ^
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/source/net/yacy/peers/Network.java:351: warning: [deprecation] getIP() in Seed has been deprecated
    [javac]                 String ip = seed.getIP();
    [javac]                                 ^
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/source/net/yacy/peers/Network.java:561: warning: [deprecation] getIP() in Seed has been deprecated
    [javac]             if ( sb.peers.mySeed().getPublicAddress(sb.peers.mySeed().getIP()) == null ) {
    [javac]                                                                      ^
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/source/net/yacy/peers/Network.java:675: warning: [deprecation] getIP() in Seed has been deprecated
    [javac]             sb.peers.lastSeedUpload_myIP = sb.peers.mySeed().getIP();
    [javac]                                                             ^
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/source/net/yacy/peers/Protocol.java:373: warning: [deprecation] getIP() in Seed has been deprecated
    [javac]             String ip = target.getIP();
    [javac]                               ^
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/source/net/yacy/peers/Protocol.java:1042: warning: [deprecation] getIP() in Seed has been deprecated
    [javac]         String ip = target.getIP();
    [javac]                           ^
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/source/net/yacy/peers/Protocol.java:1400: warning: [deprecation] getIP() in Seed has been deprecated
    [javac]         final String address = target.getPublicAddress(target.getIP());
    [javac]                                                              ^
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/source/net/yacy/peers/Protocol.java:1487: warning: [deprecation] peerDeparture(Seed,String) in PeerActions has been deprecated
    [javac]             seeds.peerActions.peerDeparture(targetSeed, errorCause); // disconnect unavailable peer
    [javac]                              ^
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/source/net/yacy/peers/Protocol.java:1494: warning: [deprecation] peerDeparture(Seed,String) in PeerActions has been deprecated
    [javac]             seeds.peerActions.peerDeparture(targetSeed, errorCause); // disconnect unavailable peer
    [javac]                              ^
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/source/net/yacy/peers/Protocol.java:1530: warning: [deprecation] peerDeparture(Seed,String) in PeerActions has been deprecated
    [javac]             seeds.peerActions.peerDeparture(targetSeed, errorCause); // disconnect unavailable peer
    [javac]                              ^
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/source/net/yacy/peers/Protocol.java:1553: warning: [deprecation] getIP() in Seed has been deprecated
    [javac]         String ip = targetSeed.getIP();
    [javac]                               ^
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/source/net/yacy/peers/Protocol.java:1625: warning: [deprecation] getIP() in Seed has been deprecated
    [javac]         String ip = targetSeed.getIP();
    [javac]                               ^
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/source/net/yacy/peers/Protocol.java:1696: warning: [deprecation] getIP() in Seed has been deprecated
    [javac]         String address = targetSeed.getPublicAddress(targetSeed.getIP());
    [javac]                                                                ^
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/source/net/yacy/peers/Protocol.java:1738: warning: [deprecation] getIP() in Seed has been deprecated
    [javac]             final Post post = new Post(target.getPublicAddress(target.getIP()), target.hash, "/yacy/idx.json", parts, 30000);
    [javac]                                                                      ^
    [javac] 14 warnings
      [jar] Building jar: /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/lib/yacycore.jar

compile:
    [javac] Compiling 14 source files
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/htroot/ConfigBasic.java:154: warning: [deprecation] myPublicLocalIP() in Domains has been deprecated
    [javac]                 host = Domains.myPublicLocalIP().getHostAddress();
    [javac]                               ^
    [javac] /Users/sudheeshsinganamalla/Documents/yacy/yacy_search_server/htroot/Status.java:230: warning: [deprecation] getIP() in Seed has been deprecated
    [javac]                 prop.put("peerAddress_address", sb.peers.mySeed().getPublicAddress(sb.peers.mySeed().getIP()));
    [javac]                                                                                                     ^
    [javac] 2 warnings

all:

BUILD SUCCESSFUL
Total time: 7 seconds

JSON api returns empty result for items (Python 3.5)

Can't work out how to get the description for search results, as items array is empty when it comes into Python. Not sure if this is a bug with Python or the way Yacy returns JSON requests.

query':query, 'contentdom':'text', 'maximumRecords': '1'

HTTP/S Proxy support with Yacy Server

http_proxy = 
https_proxy =

might be the environment variables already present in the system, there should be a way to tell Yacy to use internet behind a proxy environment or respect the http_proxy settings.

RemoteInstance uses deprecated DefaultHttpClient

https://github.com/yacy/yacy_search_server/blob/master/source/net/yacy/cora/federate/solr/instance/RemoteInstance.java is using https://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/impl/client/DefaultHttpClient.html and some related classes, which are deprecated in favor of https://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/impl/client/HttpClientBuilder.html .

This is already noted in a TODO comment, but I figure it's worth opening a GitHub issue for this so that anyone looking for potential tasks to do in the issue tracker will see this.

Less GC

I have try using G1GC ,and got a good performance.
set javacmd=%javacmd% -Djava.net.preferIPv4Stack=true -XX:+UseG1GC -Djava.awt.headless=true -Dsolr.directoryFactory=solr.MMapDirectoryFactory -Dfile.encoding=UTF-8 -Djsse.enableSNIExtension=false

An idea that will need more testing and thought. For a Yacy Front End App.

Light weight I Frame program pointing to you own Yacy Web Server.
The EXE file is 1.5 MB for the web server.
You can scale the Iframe to suite the device.

Results so far
Windows Tested Ok. XP and 7
Linux Ubuntu 14.04 there some Error 258? when compiling. Not reported as yet.
Mac osx Unknown
Android App is Possible but its beyond me at the moment.

What do you think?

Source

Simplify the inclusion of the jar files in build by including all jars in buildpath

As of now we are specifying the jars manually in the build path definition. This can be done to include all the elements in the lib/ directory which would be more elegant.

 <!-- define the classpath that should be used for compiling -->
    <!-- when changing paths here, please also update the paths in /addon/YaCy.app/Contents/Info.plist -->
    <path id="project.class.path">
      <pathelement location="${build}" />

      <fileset dir="${lib}">
         <include name="**/*.jar"/>
      </fileset>
    </path>

@Orbiter Your views on this ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.