Giter Club home page Giter Club logo

Comments (14)

drewr avatar drewr commented on June 16, 2024

This means the date string you're supplying in the source document ES is not auto-recognized. Please see http://www.elasticsearch.org/guide/reference/mapping/date-format.html and supply the correct format string in the mapping for your date field.

FYI, issues are not meant to be opened for general ES usage questions. Please post to the mailing list if you're still having trouble.

from elasticsearch-mapper-attachments.

coreagile avatar coreagile commented on June 16, 2024

So people who work on the elasticsearch-mapper-attachment plugin expect me to take an email pulled directly out of Outlook and CHANGE the dates inside of it? I assure you I didn't send any malformed dates into ElasticSearch. This was an email that Outlook created that I stored as an attachment.

from elasticsearch-mapper-attachments.

drewr avatar drewr commented on June 16, 2024

Gosh, I totally overlooked that this was for the mapper plugin and not core ES. These github notifications come from all over the place!

This is either an issue with Tika or our integration with it. I'll reopen so we can take a look.

from elasticsearch-mapper-attachments.

coreagile avatar coreagile commented on June 16, 2024

Thanks!

from elasticsearch-mapper-attachments.

 avatar commented on June 16, 2024

Same problem to store webpages.

For example, I have this problem when trying to index http://www.unm.edu/

Any workaround to have something working?

from elasticsearch-mapper-attachments.

spinscale avatar spinscale commented on June 16, 2024

@scstarkey @tpatris

can you provide us some sample data your are indexing, which fails. I have an assumption, that the Tika is extracting the date from your document, but stores it wrong.

You might be able to change the date formattings inside of the attachment plugin like this (just a wild guess, but worth a try):

{
    "person" : {
        "properties" : {
            "file" : { 
                "type" : "attachment",
                "fields" : {
                    "date" : {"store" : "yyyy/MM/dd||date_optional_time||date_time"},
                }
            }
        }
    }
}

Note: The format above needs to be changed, according to http://www.elasticsearch.org/guide/reference/mapping/date-format/

I hope this helps, but anyway, just post your samples here, in order to be make sure it is not a different bug we are chasing.

from elasticsearch-mapper-attachments.

 avatar commented on June 16, 2024

I can not paste all the content of the HTML that I want to index here but you can get it by using ctrl + u in your browser on the page http://www.unm.edu/.

My error is:

{"error"=>"MapperParsingException[Failed to parse [content.date]]; nested: MapperParsingException[failed to parse date field [Tue, 14 May 2013 08:000:11 -0440], tried both date format [dateOptionalTime], and timestamp number]; nested: IllegalArgumentException[Invalid format: \"Tue, 14 May 2013 08:000:11 -0440\"]; ", "status"=>400}

My mapping is:


mappings: {
  weblink: {
    properties: {
      tags: {
        store: yes
        analyzer: keyword
        boost: 2
        type: string
      }
      id: {
        type: integer
      }
      content: {
        path: full
        type: attachment
        fields: {
          content: {
            store: yes
            term_vector: with_positions_offsets
            type: string
          }
          author: {
            store: yes
            type: string
          }
          title: {
            store: yes
            type: string
          }
          keywords: {
            store: yes
            type: string
          }
          name: {
            store: yes
            type: string
          }
          date: {
            format: dateOptionalTime
            type: date
          }
          content_type: {
            store: yes
            type: string
          }
        }
      }
      library_id: {
        type: long
      }
      created_at: {
        store: yes
        format: dateOptionalTime
        type: date
      }
      user_id: {
        type: integer
      }
      type: {
        type: string
      }
      url: {
        index: not_analyzed
        omit_norms: true
        index_options: docs
        type: string
      }
    }
  }
}

from elasticsearch-mapper-attachments.

spinscale avatar spinscale commented on June 16, 2024

Hey,

looking at the HTML source, specifically at this line

<meta content="Thu, 16 May 2013 01:000:12 -0440" name="date" />

shows a custom date format, which needs to be configured explicitly, as mentioned in my last post. See http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html for possible options.

However I am a bit unsure about that format, the 000 makes me pretty unsure about it. Reloading it gives me a different date, so there is some caching involved and you should be able to get it working.

from elasticsearch-mapper-attachments.

 avatar commented on June 16, 2024

Hello Alexander,

First thanks for your answer and sorry for the time since my last message.

So... your answer actually doesn't answer to our problem. I will try to explain exactly what it is:

We are building a bookmarking tool, when a user bookmark an URL, we index the full page. It means we don't know what kind of date formats will be used in the webpage content.

So my question is: How to get rid of this error without have to do something specific about date formats?

Thanks

from elasticsearch-mapper-attachments.

katta avatar katta commented on June 16, 2024

Yes is there a simple way to ignore or override the date if one doesnt care about the formats. I mean cant we just store it as string ??

Tried the following mapping

{
    "files-type": {
        "properties": {
            "content": {
                "type": "attachment",
                "fields": {
                    "content": {
                        "store": "yes",
                        "include_in_all": true,
                        "term_vector": "with_positions_offsets"
                    },                    
                    "date" : { "type": "string" }
                }
            }
        }
    }
}

But attachment type seem to override the one I give explicitly and changes it back to

  {
    "properties": {
            "content": {
                "fields": {
                    "author": {
                        "type": "string"
                    },
                    "content": {
                        "include_in_all": true,
                        "store": "yes",
                        "term_vector": "with_positions_offsets",
                        "type": "string"
                    },
                    "content_type": {
                        "type": "string"
                    },
                    "date": {
                        "format": "dateOptionalTime",
                        "type": "date"
                    },
                    "keywords": {
                        "type": "string"
                    },
                    "name": {
                        "type": "string"
                    },
                    "title": {
                        "type": "string"
                    }
                },
                "path": "full",
                "type": "attachment"
            }
        }
  }

from elasticsearch-mapper-attachments.

dadoonet avatar dadoonet commented on June 16, 2024

Heya,

Jumping in this thread. In next 1.9.0 version, mapper attachment plugin will now ignore metadata fields in case of error, unless you ask it to fail explicitly. See #38.

About mapping, I will look at it. I just fixed something similar in #37 about using multifield.

from elasticsearch-mapper-attachments.

dadoonet avatar dadoonet commented on June 16, 2024

Did someone tested mapper 1.9? Closing this issue but feel free to reopen if the error still occurs.

from elasticsearch-mapper-attachments.

506764932 avatar 506764932 commented on June 16, 2024

same question
my mapping is
"starttime": {
"type": "date",
"format":"yyyy/MM"
}
and my data is
"starttime":"2015/01"
and exceptionis
MapperParsingException[failed to parse date field [1997/09], tried both date format[dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: "1997/09" is malformed at "/09"];
what should i do?

from elasticsearch-mapper-attachments.

dadoonet avatar dadoonet commented on June 16, 2024

Sorry but how is this related to mapper attachment plugin? I mean that starttime is not generated by the mapper plugin, right?

That said, I'm pretty sure your mapping has not been applied as Date parser is stil using the default format.

I'd open a thread on the mailing list and provide a full script which shows exactly what you are doing. So we can help you there.

If you think it's absolutely related to the mapper plugin, you can open a new issue and provide all the same details I just mentioned.

from elasticsearch-mapper-attachments.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.