Comments (20)
I have no problems harvesting my environment. Both WAF and UNC works fine.
WAF might be tricky sometimes; could you provide more information what is the server using to implement WAF: IIS, Tomcat?
Also, did you declare any pattern when defining input brokers?
from geoportal-server-harvester.
The WAF is on IIS 7
I didn't declare and patterns when defining input brokers. Is there anything in Tomcat 8 that would need to be flushed? I have older versions of harvester installed along side the latest version.
from geoportal-server-harvester.
So, in this situation it is hard for me to tell exactly what happen. The only remaining option is to try to debug the software to find out what is going on.
However, first I would try to do the following:
- Pull the most recent code,
- Edit logging.properties from the geoportal-application\geoportal-harvester-war\src\main\resources by adding the following at the end of the file:
com.esri.geoportal.harvester.waf.level = FINE
com.esri.geoportal.harvester.unc.level = FINE
- Build and redeploy on Tomcat.
Then try to harvest UNC folder again and check log file to see entries like:
UNC FILES in ...
UNC SUBFOLDERS in ...
It can give you idea what exactly adaptor saw inside the folder. Similarly, you can try to harvest WAF with a little bit different message expected in the log file.
from geoportal-server-harvester.
So using that debugging inside the log it appears to only show subfolders a couple of child levels down (2 levels from root). And if I create a new task to harvest so it is only harvesting 2 max 2 subfolders down it appears to work fine.
For harvesting UNC path \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL
This is what the log shows:
14-Nov-2016 08:36:53.608 FINE [HARVESTING] com.esri.geoportal.harvester.unc.UncFolder.readContent UNC FILES in \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test: [] 14-Nov-2016 08:36:53.608 FINE [HARVESTING] com.esri.geoportal.harvester.unc.UncFolder.readContent UNC SUBFOLDERS in \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test: [\\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test\SubfolderA, \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test\SubfolderB, \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test\SubfolderC, \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test\SubfolderD, \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test\SubfolderE, \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test\SubfolderF, \\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test\SubfolderG]
Which returns 0 success, 0 fails.
However if I setup the Broker directly to that failed subfolder for example:
\\HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test
This harvests everything 2-3 levels below.
from geoportal-server-harvester.
I need to have some sense of what's in your repository:
- What is an approximate number of metadata files you would expect to be harvested from the 'INTERNATIONAL' folder (counting all sub folders)?
- I presume that 'Test' folder has no files in it, just subfolders: SubfolderA...SubfolderG, is that right?
- How deep is that folder structure, i.e. should any file to be expected in SubfolderA or rather deeper than that?
from geoportal-server-harvester.
- The approximate number is 28,000 xmls for 'INTERNATIONAL'
- Correct. There are no files only subfolders inside 'INTERNATIONAL' as well as 'Test'
- It is deeper than that. Files will be expected inside SubfolderA\SomeXMLs
Also SubfolderA\Data\SomeMoreXMLS\
Pointing directly to "Test" folder I was able to harvest 8000 files.
from geoportal-server-harvester.
In other words:
When pointing to \HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL it yields no records, but with \HOSTNAME\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL\Test it gives you a bunch?
from geoportal-server-harvester.
Exactly!
from geoportal-server-harvester.
Is there any exception logged below the message about 0 harvested files?
Is any of these folders (INTERNATIONAL, Test) a link to the actual folder?
from geoportal-server-harvester.
The only reason I found (and tested) to see that behavior is when 'Test' would be a symbolic link to the folder. In Windows one can create such link with 'mklink' command. UNC broker used old fashion File interface to operate on the content of the folder and that didn't follow symbolic links. So, I switched entire broker to use NIO interfaces and that seems to solve issue I was able to reproduce.
So, why don't you fetch latest code and give it another try?
from geoportal-server-harvester.
Is there any exception logged below the message about 0 harvested files?
Is any of these folders (INTERNATIONAL, Test) a link to the actual folder?
I'm not sure what you mean by link to the actual folder? Do you mean are these the actual folder names? Test is not the actual folder name. I have folders with country names inside the international folder, the rest of the path is mostly accurate except for the server name,
Just tried the latest code and am getting this:
22-Nov-2016 10:39:59.167 SEVERE [HARVESTING] com.esri.geoportal.harvester.support.ReportLogger.logError Error processing task: PROCESS:: status: working, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\servername\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://hostname:8088/geoportal, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false | Error reading data.
22-Nov-2016 10:39:59.168 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportLogger.completed Completed processing task: PROCESS:: status: completed, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\servername\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/geoportal, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false
22-Nov-2016 10:39:59.169 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportStatistics.completed Harvesting of PROCESS:: status: completed, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\servername\GISDATA$\GIS\GIS_SERVICES\INTERNATIONAL, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://hostname:8088/geoportal, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false completed at Tue Nov 22 10:39:59 MST 2016. No. succeded: 0, no. failed: 1
from geoportal-server-harvester.
That log doesn't give much of the clue. Perhaps, prior to that entries there is one more with the full stack trace (at least what I am having) which would be much more interesting to see. It should be just before that SEVERE log.
At this moment I am thinking: permissions?
from geoportal-server-harvester.
This is still an issue. Also have not had any success in harvesting UNC paths.
Currently only WAF harvests to direct folders that contain the XMLs
from geoportal-server-harvester.
I would examine error logs again; if you have logging.properties intact (out of the box like we've defined it with com.esri.geoportal.harvester.support.ErrorLogger.level = FINE), there must be more details logged just above the "SEVERE" log from your post from Nov 22, 2016.
from geoportal-server-harvester.
This is the full log from today trying to harvest a root folder:
29-Mar-2017 09:45:00.820 INFO [http-apr-8088-exec-6] com.esri.geoportal.harvester.engine.defaults.DefaultProcessor.createProcess SUBMITTING: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false
29-Mar-2017 09:45:00.883 INFO [HARVESTING] com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$1 Started harvest: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false
29-Mar-2017 09:45:01.127 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportLogger.started Started processing task: PROCESS:: status: working, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false
29-Mar-2017 09:45:01.127 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportStatistics.started Harvesting of PROCESS:: status: working, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false started at Wed Mar 29 09:45:01 MDT 2017
29-Mar-2017 09:45:01.158 SEVERE [HARVESTING] com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$1 Error harvesting of PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false
com.esri.geoportal.harvester.api.ex.DataInputException: Error reading data.
at com.esri.geoportal.harvester.unc.UncBroker$UncIterator.hasNext(UncBroker.java:126)
at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$1(DefaultProcessor.java:131)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.AccessDeniedException: \\calna1\gisdata$\GIS\GIS_Services\International
at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83)
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
at sun.nio.fs.WindowsDirectoryStream.<init>(WindowsDirectoryStream.java:86)
at sun.nio.fs.WindowsFileSystemProvider.newDirectoryStream(WindowsFileSystemProvider.java:518)
at java.nio.file.Files.newDirectoryStream(Files.java:457)
at java.nio.file.Files.list(Files.java:3451)
at com.esri.geoportal.harvester.unc.UncFolder.readContent(UncFolder.java:69)
at com.esri.geoportal.harvester.unc.UncBroker$UncIterator.hasNext(UncBroker.java:118)
... 2 more
29-Mar-2017 09:45:01.158 SEVERE [HARVESTING] com.esri.geoportal.harvester.support.ReportLogger.logError Error processing task: PROCESS:: status: working, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false | Error reading data.
29-Mar-2017 09:45:01.158 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportLogger.completed Completed processing task: PROCESS:: status: completed, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false
29-Mar-2017 09:45:01.158 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportStatistics.completed Harvesting of PROCESS:: status: completed, title: PROCESSOR: DEFAULT[], SOURCE: UNC[unc-root-folder=\\calna1\gisdata$\GIS\GIS_Services\International, unc-pattern=], DESTINATIONS: [GPT[gpt-host-url=http://cal8783:8088/gp2, cred-username=gptadmin, cred-password=gptadmin, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: false completed at Wed Mar 29 09:45:01 MDT 2017. No. succeded: 0, no. failed: 1
Also of note I do have full read/write access to the folder and subs.
from geoportal-server-harvester.
Well, AccessDeniedException explains the root cause: Harvester is running within the Apache Tomcat process, which is running under particular user. That user has no access to the mentioned folder.
from geoportal-server-harvester.
I do wish it was that simple, but I have full read/write access to the folder and all sub folders. I think it has to do with using "$" in the path.
I tested this with the same contents and permissions but on a local folder and it works.
from geoportal-server-harvester.
I am quite confident there is no issue with $ sign (tested on my own). Documentation for AccessDeniedException is quite laconic, yet leaves no room for any doubt.
from geoportal-server-harvester.
Thanks, I'll look into our structure further.
from geoportal-server-harvester.
Tomcat8 must be "Run As Administrator" to fix this.
from geoportal-server-harvester.
Related Issues (20)
- Harvester not removing content from geoportal that has been removed from source WAF HOT 5
- Item type of tiled image layers in ArcGIS Image not properly maintained when harvesting into ArcGIS Portal/Online HOT 1
- Harvester Issue to ArcGIS Portal - The size of each typeKeyword cannot be more than 256 characters
- Translation for AGOL/Portal HOT 1
- Harvester CKAN Broker Iterator Error for Data.gov
- Upgrading to 2.7 issue HOT 2
- Parse markdown to HTML in metadata XML
- Associate harvested metadata to existing sub-layers HOT 1
- Enable ArcGIS Online/Portal authentication in the harvester HOT 2
- Support for records in ISO 19115-3? HOT 2
- Enable layers option on ArcGIS Portal input broker. HOT 2
- Use title as output file name
- include reference to source metadata when publishing fails
- Harvest full XML from ArcGIS Server services and layers when available HOT 1
- Use ArcGIS Server layer metadata if available
- translate metadata when harvesting into geoportal
- translate locale information when harvesting to ArcGIS Online/Portal HOT 1
- support harvesting from OGC API: Records
- give CSW input broker option to switch http client
- include explicit sign out from web app HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from geoportal-server-harvester.