Giter Club home page Giter Club logo

Comments (20)

pandzel-zz avatar pandzel-zz commented on September 26, 2024

I have no problems harvesting my environment. Both WAF and UNC works fine.
WAF might be tricky sometimes; could you provide more information what is the server using to implement WAF: IIS, Tomcat?
Also, did you declare any pattern when defining input brokers?

from geoportal-server-harvester.

MapZombie avatar MapZombie commented on September 26, 2024

The WAF is on IIS 7

I didn't declare and patterns when defining input brokers. Is there anything in Tomcat 8 that would need to be flushed? I have older versions of harvester installed along side the latest version.

from geoportal-server-harvester.

pandzel-zz avatar pandzel-zz commented on September 26, 2024

So, in this situation it is hard for me to tell exactly what happen. The only remaining option is to try to debug the software to find out what is going on.

However, first I would try to do the following:

  1. Pull the most recent code,
  2. Edit logging.properties from the geoportal-application\geoportal-harvester-war\src\main\resources by adding the following at the end of the file:

com.esri.geoportal.harvester.waf.level = FINE
com.esri.geoportal.harvester.unc.level = FINE

  1. Build and redeploy on Tomcat.

Then try to harvest UNC folder again and check log file to see entries like:

UNC FILES in ...
UNC SUBFOLDERS in ...

It can give you idea what exactly adaptor saw inside the folder. Similarly, you can try to harvest WAF with a little bit different message expected in the log file.

from geoportal-server-harvester.

MapZombie avatar MapZombie commented on September 26, 2024

So using that debugging inside the log it appears to only show subfolders a couple of child levels down (2 levels from root). And if I create a new task to harvest so it is only harvesting 2 max 2 subfolders down it appears to work fine.

For harvesting UNC path \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL
This is what the log shows:
14-Nov-2016 08:36:53.608 FINE [HARVESTING] com.esri.geoportal.harvester.unc.UncFolder.readContent UNC FILES in \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test: [] 14-Nov-2016 08:36:53.608 FINE [HARVESTING] com.esri.geoportal.harvester.unc.UncFolder.readContent UNC SUBFOLDERS in \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test: [\\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test\SubfolderA, \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test\SubfolderB, \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test\SubfolderC, \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test\SubfolderD, \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test\SubfolderE, \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test\SubfolderF, \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test\SubfolderG]
Which returns 0 success, 0 fails.

However if I setup the Broker directly to that failed subfolder for example:
\\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test

This harvests everything 2-3 levels below.

from geoportal-server-harvester.

pandzel-zz avatar pandzel-zz commented on September 26, 2024

I need to have some sense of what's in your repository:

  1. What is an approximate number of metadata files you would expect to be harvested from the 'INTERNATIONAL' folder (counting all sub folders)?
  2. I presume that 'Test' folder has no files in it, just subfolders: SubfolderA...SubfolderG, is that right?
  3. How deep is that folder structure, i.e. should any file to be expected in SubfolderA or rather deeper than that?

from geoportal-server-harvester.

MapZombie avatar MapZombie commented on September 26, 2024
  1. The approximate number is 28,000 xmls for 'INTERNATIONAL'
  2. Correct. There are no files only subfolders inside 'INTERNATIONAL' as well as 'Test'
  3. It is deeper than that. Files will be expected inside SubfolderA\SomeXMLs
    Also SubfolderA\Data\SomeMoreXMLS\

Pointing directly to "Test" folder I was able to harvest 8000 files.

from geoportal-server-harvester.

pandzel-zz avatar pandzel-zz commented on September 26, 2024

In other words:

When pointing to \HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL it yields no records, but with \HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test it gives you a bunch?

from geoportal-server-harvester.

MapZombie avatar MapZombie commented on September 26, 2024

Exactly!

from geoportal-server-harvester.

pandzel-zz avatar pandzel-zz commented on September 26, 2024

Is there any exception logged below the message about 0 harvested files?
Is any of these folders (INTERNATIONAL, Test) a link to the actual folder?

from geoportal-server-harvester.

pandzel-zz avatar pandzel-zz commented on September 26, 2024

The only reason I found (and tested) to see that behavior is when 'Test' would be a symbolic link to the folder. In Windows one can create such link with 'mklink' command. UNC broker used old fashion File interface to operate on the content of the folder and that didn't follow symbolic links. So, I switched entire broker to use NIO interfaces and that seems to solve issue I was able to reproduce.

So, why don't you fetch latest code and give it another try?

from geoportal-server-harvester.

MapZombie avatar MapZombie commented on September 26, 2024

Is there any exception logged below the message about 0 harvested files?
Is any of these folders (INTERNATIONAL, Test) a link to the actual folder?

I'm not sure what you mean by link to the actual folder? Do you mean are these the actual folder names? Test is not the actual folder name. I have folders with country names inside the international folder, the rest of the path is mostly accurate except for the server name,

Just tried the latest code and am getting this:

22-Nov-2016 10:39:59.167 SEVERE [HARVESTING] com.esri.geoportal.harvester.support.ReportLogger.logError Error processing task: PROCESS:: status: working, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\servername\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://hostname:8088/geoportal, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false | Error reading data.
22-Nov-2016 10:39:59.168 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportLogger.completed Completed processing task: PROCESS:: status: completed, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\servername\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/geoportal, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false
22-Nov-2016 10:39:59.169 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportStatistics.completed Harvesting of PROCESS:: status: completed, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\servername\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://hostname:8088/geoportal, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false completed at Tue Nov 22 10:39:59 MST 2016. No. succeded: 0, no. failed: 1

from geoportal-server-harvester.

pandzel-zz avatar pandzel-zz commented on September 26, 2024

That log doesn't give much of the clue. Perhaps, prior to that entries there is one more with the full stack trace (at least what I am having) which would be much more interesting to see. It should be just before that SEVERE log.
At this moment I am thinking: permissions?

from geoportal-server-harvester.

MapZombie avatar MapZombie commented on September 26, 2024

This is still an issue. Also have not had any success in harvesting UNC paths.

Currently only WAF harvests to direct folders that contain the XMLs

from geoportal-server-harvester.

pandzel-zz avatar pandzel-zz commented on September 26, 2024

I would examine error logs again; if you have logging.properties intact (out of the box like we've defined it with com.esri.geoportal.harvester.support.ErrorLogger.level = FINE), there must be more details logged just above the "SEVERE" log from your post from Nov 22, 2016.

from geoportal-server-harvester.

MapZombie avatar MapZombie commented on September 26, 2024

This is the full log from today trying to harvest a root folder:

29-Mar-2017 09:45:00.820 INFO [http-apr-8088-exec-6] com.esri.geoportal.harvester.engine.defaults.DefaultProcessor.createProcess SUBMITTING: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false
29-Mar-2017 09:45:00.883 INFO [HARVESTING] com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$1 Started harvest: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false
29-Mar-2017 09:45:01.127 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportLogger.started Started processing task: PROCESS:: status: working, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false
29-Mar-2017 09:45:01.127 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportStatistics.started Harvesting of PROCESS:: status: working, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false started at Wed Mar 29 09:45:01 MDT 2017
29-Mar-2017 09:45:01.158 SEVERE [HARVESTING] com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$1 Error harvesting of PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false
 com.esri.geoportal.harvester.api.ex.DataInputException: Error reading data.
	at com.esri.geoportal.harvester.unc.UncBroker$UncIterator.hasNext(UncBroker.java:126)
	at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$1(DefaultProcessor.java:131)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.AccessDeniedException: \\calna1\gisdata$\GIS\GIS_Services\International
	at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83)
	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
	at sun.nio.fs.WindowsDirectoryStream.<init>(WindowsDirectoryStream.java:86)
	at sun.nio.fs.WindowsFileSystemProvider.newDirectoryStream(WindowsFileSystemProvider.java:518)
	at java.nio.file.Files.newDirectoryStream(Files.java:457)
	at java.nio.file.Files.list(Files.java:3451)
	at com.esri.geoportal.harvester.unc.UncFolder.readContent(UncFolder.java:69)
	at com.esri.geoportal.harvester.unc.UncBroker$UncIterator.hasNext(UncBroker.java:118)
	... 2 more

29-Mar-2017 09:45:01.158 SEVERE [HARVESTING] com.esri.geoportal.harvester.support.ReportLogger.logError Error processing task: PROCESS:: status: working, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false | Error reading data.
29-Mar-2017 09:45:01.158 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportLogger.completed Completed processing task: PROCESS:: status: completed, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false
29-Mar-2017 09:45:01.158 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportStatistics.completed Harvesting of PROCESS:: status: completed, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false completed at Wed Mar 29 09:45:01 MDT 2017. No. succeded: 0, no. failed: 1

Also of note I do have full read/write access to the folder and subs.

from geoportal-server-harvester.

pandzel-zz avatar pandzel-zz commented on September 26, 2024

Well, AccessDeniedException explains the root cause: Harvester is running within the Apache Tomcat process, which is running under particular user. That user has no access to the mentioned folder.

from geoportal-server-harvester.

MapZombie avatar MapZombie commented on September 26, 2024

I do wish it was that simple, but I have full read/write access to the folder and all sub folders. I think it has to do with using "$" in the path.

I tested this with the same contents and permissions but on a local folder and it works.

from geoportal-server-harvester.

pandzel-zz avatar pandzel-zz commented on September 26, 2024

I am quite confident there is no issue with $ sign (tested on my own). Documentation for AccessDeniedException is quite laconic, yet leaves no room for any doubt.

from geoportal-server-harvester.

MapZombie avatar MapZombie commented on September 26, 2024

Thanks, I'll look into our structure further.

from geoportal-server-harvester.

MapZombie avatar MapZombie commented on September 26, 2024

Tomcat8 must be "Run As Administrator" to fix this.

from geoportal-server-harvester.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.