Giter Club home page Giter Club logo

Comments (6)

mhogeweg avatar mhogeweg commented on June 24, 2024

does this also happen when harvesting into a local folder?

from geoportal-server-harvester.

valentinedwv avatar valentinedwv commented on June 24, 2024

Was running to both a folder and server
With 6million records, was going to rewrite the folder to break it into ~1k blocks (or make an s3 store endpoint)

from geoportal-server-harvester.

valentinedwv avatar valentinedwv commented on June 24, 2024

Assumed it's a connection to the csw server.

19-May-2017 12:36:53.488 INFO [HARVESTING] com.esri.geoportal.harvester.support.ProgressLogger.printStatusLog Harvesting of PROCESS:: status: working, title: PROCESSOR: DEFAULT[], SOURCE: CSW[csw-host-url=https://www.sciencebase.gov/catalog/csw, cred-username=, cred-password=*****, csw-profile-id=urn:ogc:CSW:2.0.2:HTTP:APISO:SCIENCBASE], DESTINATIONS: [GPT[gpt-host-url=http://localhost:8080/geoportal, cred-username=gptadmin, cred-password=*****, gpt-cleanup=false], FOLDER[folder-root-folder=/opt/tomcat/webapps/metadata/, folder-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: true progress: 141500
19-May-2017 12:38:28.398 SEVERE [HARVESTING] com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$43 Error harvesting of PROCESSOR: DEFAULT[], SOURCE: CSW[csw-host-url=https://www.sciencebase.gov/catalog/csw, cred-username=, cred-password=*****, csw-profile-id=urn:ogc:CSW:2.0.2:HTTP:APISO:SCIENCBASE], DESTINATIONS: [GPT[gpt-host-url=http://localhost:8080/geoportal, cred-username=gptadmin, cred-password=*****, gpt-cleanup=false], FOLDER[folder-root-folder=/opt/tomcat/webapps/metadata/, folder-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: true
 com.esri.geoportal.harvester.api.ex.DataInputException: Error reading data.
        at com.esri.geoportal.harvester.csw.CswBroker$CswIterator.next(CswBroker.java:179)
        at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$43(DefaultProcessor.java:136)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.http.client.HttpResponseException: Not Found
        at com.esri.geoportal.commons.csw.client.impl.Client.readMetadata(Client.java:155)
        at com.esri.geoportal.harvester.csw.CswBroker$CswIterator.next(CswBroker.java:174)
        ... 2 more

19-May-2017 12:38:28.398 SEVERE [HARVESTING] com.esri.geoportal.harvester.support.ErrorLogger.logError Error processing task: PROCESS:: status: working, title: PROCESSOR: DEFAULT[], SOURCE: CSW[csw-host-url=https://www.sciencebase.gov/catalog/csw, cred-username=, cred-password=*****, csw-profile-id=urn:ogc:CSW:2.0.2:HTTP:APISO:SCIENCBASE], DESTINATIONS: [GPT[gpt-host-url=http://localhost:8080/geoportal, cred-username=gptadmin, cred-password=*****, gpt-cleanup=false], FOLDER[folder-root-folder=/opt/tomcat/webapps/metadata/, folder-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: true | Error reading data.
 com.esri.geoportal.harvester.api.ex.DataInputException: Error reading data.
        at com.esri.geoportal.harvester.csw.CswBroker$CswIterator.next(CswBroker.java:179)
        at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$43(DefaultProcessor.java:136)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.http.client.HttpResponseException: Not Found
        at com.esri.geoportal.commons.csw.client.impl.Client.readMetadata(Client.java:155)
        at com.esri.geoportal.harvester.csw.CswBroker$CswIterator.next(CswBroker.java:174)
        ... 2 more

19-May-2017 12:38:28.399 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportLogger.completed Completed processing task: PROCESS:: status: completed, title: PROCESSOR: DEFAULT[], SOURCE: CSW[csw-host-url=https://www.sciencebase.gov/catalog/csw, cred-username=, cred-password=*****, csw-profile-id=urn:ogc:CSW:2.0.2:HTTP:APISO:SCIENCBASE], DESTINATIONS: [GPT[gpt-host-url=http://localhost:8080/geoportal, cred-username=gptadmin, cred-password=*****, gpt-cleanup=false], FOLDER[folder-root-folder=/opt/tomcat/webapps/metadata/, folder-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: true
19-May-2017 12:38:28.399 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportStatistics.completed Harvesting of PROCESS:: status: completed, title: PROCESSOR: DEFAULT[], SOURCE: CSW[csw-host-url=https://www.sciencebase.gov/catalog/csw, cred-username=, cred-password=*****, csw-profile-id=urn:ogc:CSW:2.0.2:HTTP:APISO:SCIENCBASE], DESTINATIONS: [GPT[gpt-host-url=http://localhost:8080/geoportal, cred-username=gptadmin, cred-password=*****, gpt-cleanup=false], FOLDER[folder-root-folder=/opt/tomcat/webapps/metadata/, folder-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: true completed at Fri May 19 12:38:28 UTC 2017. No. succeded: 283135, no. failed: 2

from geoportal-server-harvester.

valentinedwv avatar valentinedwv commented on June 24, 2024

One is server issue. dies at record 166666

https://www.sciencebase.gov/catalog/csw

<csw:GetRecords
xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
maxRecords="1"
startPosition="166666"

outputFormat="application/xml"
outputSchema="http://www.isotc211.org/2005/gmd"
resultType="results" service="CSW" version="2.0.2">
    <csw:Query typeNames="csw:Record">
        <csw:ElementSetName>full</csw:ElementSetName>
        <csw:Constraint version="1.1.0">
            <ogc:Filter xmlns:ogc="http://www.opengis.net/ogc" xmlns="http://www.opengis.net/ogc"
            xmlns:gml="http://www.opengis.net/gml">
                <ogc:PropertyIsLike escape="" singleChar="_" wildCard="%">
                    <ogc:PropertyName>AnyText</ogc:PropertyName>
                    <ogc:Literal>well</ogc:Literal>
                </ogc:PropertyIsLike>
            </ogc:Filter>
        </csw:Constraint>
    </csw:Query>
</csw:GetRecords>

from geoportal-server-harvester.

pandzel-zz avatar pandzel-zz commented on June 24, 2024

Pull request #72 provides ability to define 'AnyText' literal for any CSW input broker.

from geoportal-server-harvester.

zguo avatar zguo commented on June 24, 2024

search text filter implemented in harvester.

from geoportal-server-harvester.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.