Giter Club home page Giter Club logo

Comments (10)

mhogeweg avatar mhogeweg commented on September 26, 2024

are you going to submit a pull request to both harvester and catalog?

from geoportal-server-harvester.

pandzel-zz avatar pandzel-zz commented on September 26, 2024

I just wanted to point out that this implementation is using 'XOAI' libraries released under https://raw.githubusercontent.com/DSpace/DSpace/master/LICENSE license. I would contact Esri legal department for advise whether we are allowed to mix such a library with our code.

from geoportal-server-harvester.

valentinedwv avatar valentinedwv commented on September 26, 2024

I can work to split into two change sets.
I need to bring them into line with the 2.5 release.

If there is a better OAI library, or codebase to utilize, it would be preferred. It only does oai_dc record, for now.

from geoportal-server-harvester.

pandzel-zz avatar pandzel-zz commented on September 26, 2024

Correction: it turns out that the particular OAI library used here (developed by lyncode) is released under Apache 2.0 licence (at least, this is what maven repository tells me).

from geoportal-server-harvester.

mhogeweg avatar mhogeweg commented on September 26, 2024

Then we can include this in the next release!

from geoportal-server-harvester.

pandzel-zz avatar pandzel-zz commented on September 26, 2024

If you issue a "Pull request", we will incorporate you changes into the code base.

from geoportal-server-harvester.

valentinedwv avatar valentinedwv commented on September 26, 2024

Put code on branch: https://github.com/CINERGI/geoportal-server-harvester/tree/oai_d1_harvest_2_5
having some issues with migrating to the 2.5 release.

from geoportal-server-harvester.

valentinedwv avatar valentinedwv commented on September 26, 2024

DataOne
Seems to be some XML parser conflicts.

30-Mar-2017 12:15:01.350 SEVERE [HARVESTING] com.esri.geoportal.harvester.support.ErrorLogger.logError Error processing task: PROCESS:: status: working, title: PROCESSOR: DEFAULT[], SOURCE: DataOne[d1-cn-url=https://cn.dataone.org/cn, d1-format-identifier=http://www.isotc211.org/2005/gmd-noaa], DESTINATIONS: [FOLDER[folder-root-folder=D:\dev_odm\geoportal_metadata, folder-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: true | Error reading data.
 com.esri.geoportal.harvester.api.ex.DataInputException: Error reading data.
	at edu.sdsc.dataone.source.DataOneBroker$DataOneIterator.hasNext(DataOneBroker.java:159)
	at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$43(DefaultProcessor.java:132)
	at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess$$Lambda$608/18700217.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: org.xml.sax.SAXNotRecognizedException: Feature 'http://javax.xml.XMLConstants/feature/secure-processing' not recognized
	at com.sun.xml.internal.bind.v2.util.XmlFactory.createParserFactory(XmlFactory.java:128)
	at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.getXMLReader(UnmarshallerImpl.java:139)
	at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:157)
	at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:204)
	at org.dataone.service.util.TypeMarshaller.unmarshalTypeFromStream(TypeMarshaller.java:297)
	at org.dataone.client.rest.MultipartD1Node.deserializeServiceType(MultipartD1Node.java:870)
	at org.dataone.client.v1.impl.MultipartCNode.listObjects(MultipartCNode.java:451)
	at org.dataone.client.v1.impl.MultipartCNode.listObjects(MultipartCNode.java:473)
	at edu.sdsc.dataone.source.DataOneBroker$DataOneIterator.hasNext(DataOneBroker.java:136)
	... 3 more
Caused by: org.xml.sax.SAXNotRecognizedException: Feature 'http://javax.xml.XMLConstants/feature/secure-processing' not recognized
	at com.ctc.wstx.sax.WstxSAXParserFactory.setFeature(WstxSAXParserFactory.java:144)
	at com.sun.xml.internal.bind.v2.util.XmlFactory.createParserFactory(XmlFactory.java:121)
	... 11 more

from geoportal-server-harvester.

valentinedwv avatar valentinedwv commented on September 26, 2024

OAI. Finding that the 4.x library is not supported.
Probably need to migrate to a lighter library, or just roll our own
https://github.com/xbib/oai

OAI error looks like it's breaking deep in the OAI fetch routines. The buffer needs to be increased:

30-Mar-2017 12:23:40.442 INFO [HARVESTING] com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$43 Started harvest: PROCESSOR: DEFAULT[], SOURCE: OAI[oai-host-url=https://oai.datacite.org/oai, oai-metaFormat=oai_dc], DESTINATIONS: [FOLDER[folder-root-folder=D:\dev_odm\geoportal_metadata, folder-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: true
com.lyncode.xoai.serviceprovider.exceptions.InvalidOAIResponse: com.lyncode.xoai.serviceprovider.exceptions.HttpException: Error querying service. Returned HTTP Status Code: 502
	at com.lyncode.xoai.serviceprovider.handler.ListRecordHandler.nextIteration(ListRecordHandler.java:100)
	at com.lyncode.xoai.serviceprovider.lazy.ItemIterator.hasNext(ItemIterator.java:40)
	at com.lyncode.xoai.serviceprovider.lazy.ItemIterator.<init>(ItemIterator.java:30)
	at com.lyncode.xoai.serviceprovider.ServiceProvider.listRecords(ServiceProvider.java:65)
	at edu.sdsc.oai.source.OAIBroker.initialize(OAIBroker.java:93)
	at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.initialize(DefaultProcessor.java:98)
	at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$43(DefaultProcessor.java:128)
	at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess$$Lambda$608/18700217.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:745)
Caused by: com.lyncode.xoai.serviceprovider.exceptions.HttpException: Error querying service. Returned HTTP Status Code: 502
	at com.lyncode.xoai.serviceprovider.client.HttpOAIClient.execute(HttpOAIClient.java:46)
	at com.lyncode.xoai.serviceprovider.handler.ListRecordHandler.nextIteration(ListRecordHandler.java:69)
	... 8 more
30-Mar-2017 12:23:41.408 SEVERE [HARVESTING] com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$43 Error harvesting of PROCESSOR: DEFAULT[], SOURCE: OAI[oai-host-url=https://oai.datacite.org/oai, oai-metaFormat=oai_dc], DESTINATIONS: [FOLDER[folder-root-folder=D:\dev_odm\geoportal_metadata, folder-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: true
 com.esri.geoportal.harvester.api.ex.DataProcessorException: cannot connect to xOAI
	at edu.sdsc.oai.source.OAIBroker.initialize(OAIBroker.java:97)
	at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.initialize(DefaultProcessor.java:98)
	at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$43(DefaultProcessor.java:128)
	at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess$$Lambda$608/18700217.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:745)

from geoportal-server-harvester.

pandzel-zz avatar pandzel-zz commented on September 26, 2024

OAI-PMH capabilities has been added with #70 pull request.

from geoportal-server-harvester.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.