Comments (3)
Thanks @thomasm82.
Indeed you are right. With #37 fix, mapping is not overwritten anymore but still it's not useable.
May I ask you how did you fix it in mapper attachment plugin?
May be it could worth a hack before having this fixed in elasticsearch core.
Let me know.
from elasticsearch-mapper-attachments.
Hi there,
thanks for your response.
Well I described a bit of it at the end of my post in the elasticsearch GH: elastic/elasticsearch#5402
Part 1 - My own ParseContext
In detail I implemented my own ParseContext, which I passed the original ParseContext as a delegate to be used:
/*
* Created: Mar 12, 2014 11:40:41 AM
*/
package org.elasticsearch.index.mapper.attachment;
import java.util.List;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Field;
import org.elasticsearch.common.bytes.BytesReference;
import org.elasticsearch.common.lucene.all.AllEntries;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentParser;
import org.elasticsearch.index.analysis.AnalysisService;
import org.elasticsearch.index.mapper.ContentPath;
import org.elasticsearch.index.mapper.DocumentMapper;
import org.elasticsearch.index.mapper.DocumentMapper.ParseListener;
import org.elasticsearch.index.mapper.DocumentMapperParser;
import org.elasticsearch.index.mapper.FieldMapper;
import org.elasticsearch.index.mapper.ParseContext;
import org.elasticsearch.index.mapper.SourceToParse;
import org.elasticsearch.index.mapper.object.RootObjectMapper;
public class AttachmentParseContext extends ParseContext {
private ParseContext delegate;
public AttachmentParseContext(ParseContext context) {
/*
* needs to be there as no default constructor is available in
* superclass; we are delegating most calls to the original context
* anyways
*/
super(context.index(), context.indexSettings(), context
.docMapperParser(), context.docMapper(), context.path());
this.delegate = context;
}
/*
* ................... START: OVERRIDING CODE ...................
*/
/**
* Unlike {@link ParseContext#externalValueSet()} this method also returns
* <code>true</code>, if the external value is set without the boolean flag
* indicating it. This is needed in order to support multi fields within an
* attachment's field.
*
* @return <code>true</code> in case either {@link #delegate
* #externalValueSet()} returns <code>true</code> or
* {@link #delegate#externalValue()} is not <code>null</code>.
*/
public boolean externalValueSet() {
return delegate.externalValueSet() || delegate.externalValue() != null;
}
/*
* ................... END: OVERRIDING CODE ...................
*/
/*
* ................... START: DELEGATE METHODS ...................
*/
public void addDoc(Document doc) {
delegate.addDoc(doc);
}
public AllEntries allEntries() {
return delegate.allEntries();
}
public AnalysisService analysisService() {
return delegate.analysisService();
}
public Analyzer analyzer() {
return delegate.analyzer();
}
public void analyzer(Analyzer analyzer) {
delegate.analyzer(analyzer);
}
public void clearWithinCopyTo() {
delegate.clearWithinCopyTo();
}
public void clearWithinNewMapper() {
delegate.clearWithinNewMapper();
}
public Document doc() {
return delegate.doc();
}
public float docBoost() {
return delegate.docBoost();
}
public void docBoost(float docBoost) {
delegate.docBoost(docBoost);
}
public DocumentMapper docMapper() {
return delegate.docMapper();
}
public DocumentMapperParser docMapperParser() {
return delegate.docMapperParser();
}
public List<Document> docs() {
return delegate.docs();
}
public boolean equals(Object obj) {
return delegate.equals(obj);
}
public Object externalValue() {
return delegate.externalValue();
}
public void externalValue(Object externalValue) {
delegate.externalValue(externalValue);
}
public boolean flyweight() {
return delegate.flyweight();
}
public int hashCode() {
return delegate.hashCode();
}
public String id() {
return delegate.id();
}
public void id(String id) {
delegate.id(id);
}
public String ignoredValue(String indexName) {
return delegate.ignoredValue(indexName);
}
public void ignoredValue(String indexName, String value) {
delegate.ignoredValue(indexName, value);
}
public boolean includeInAll(Boolean includeInAll, FieldMapper mapper) {
return delegate.includeInAll(includeInAll, mapper);
}
public String index() {
return delegate.index();
}
public Settings indexSettings() {
return delegate.indexSettings();
}
public boolean isWithinCopyTo() {
return delegate.isWithinCopyTo();
}
public boolean isWithinNewMapper() {
return delegate.isWithinNewMapper();
}
public ParseListener listener() {
return delegate.listener();
}
public boolean mappingsModified() {
return delegate.mappingsModified();
}
public XContentParser parser() {
return delegate.parser();
}
public ContentPath path() {
return delegate.path();
}
public void reset(XContentParser parser, Document document,
SourceToParse source, ParseListener listener) {
delegate.reset(parser, document, source, listener);
}
public RootObjectMapper root() {
return delegate.root();
}
public Document rootDoc() {
return delegate.rootDoc();
}
public void setMappingsModified() {
delegate.setMappingsModified();
}
public void setWithinCopyTo() {
delegate.setWithinCopyTo();
}
public void setWithinNewMapper() {
delegate.setWithinNewMapper();
}
public BytesReference source() {
return delegate.source();
}
public void source(BytesReference source) {
delegate.source(source);
}
public SourceToParse sourceToParse() {
return delegate.sourceToParse();
}
public StringBuilder stringBuilder() {
return delegate.stringBuilder();
}
public Document switchDoc(Document doc) {
return delegate.switchDoc(doc);
}
public String toString() {
return delegate.toString();
}
public String type() {
return delegate.type();
}
public Field uid() {
return delegate.uid();
}
public void uid(Field uid) {
delegate.uid(uid);
}
public Field version() {
return delegate.version();
}
public void version(Field version) {
delegate.version(version);
}
/*
* ................... END: DELEGATE METHODS ...................
*/
}
Part 2 - Integration with the plugin
In order to get my context being used, I simply changed one line (as far as I can remember) within the org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(ParseContext context)
method where I am wrapping the original context by calling context = new AttachmentParseContext(context);
:
@Override
public void parse(ParseContext context) throws IOException {
byte[] content = null;
String contentType = null;
int indexedChars = defaultIndexedChars;
String name = null;
XContentParser parser = context.parser();
XContentParser.Token token = parser.currentToken();
if (token == XContentParser.Token.VALUE_STRING) {
content = parser.binaryValue();
} else {
String currentFieldName = null;
while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) {
if (token == XContentParser.Token.FIELD_NAME) {
currentFieldName = parser.currentName();
} else if (token == XContentParser.Token.VALUE_STRING) {
if ("content".equals(currentFieldName)) {
content = parser.binaryValue();
} else if ("_content_type".equals(currentFieldName)) {
contentType = parser.text();
} else if ("_name".equals(currentFieldName)) {
name = parser.text();
}
} else if (token == XContentParser.Token.VALUE_NUMBER) {
if ("_indexed_chars".equals(currentFieldName) || "_indexedChars".equals(currentFieldName)) {
indexedChars = parser.intValue();
}
}
}
}
// Throw clean exception when no content is provided Fix #23
if (content == null) {
throw new MapperParsingException("No content is provided.");
}
Metadata metadata = new Metadata();
if (contentType != null) {
metadata.add(Metadata.CONTENT_TYPE, contentType);
}
if (name != null) {
metadata.add(Metadata.RESOURCE_NAME_KEY, name);
}
String parsedContent;
try {
// Set the maximum length of strings returned by the parseToString method, -1 sets no limit
parsedContent = tika().parseToString(new BytesStreamInput(content, false), metadata, indexedChars);
LanguageIdentifier languageIdentifier = new LanguageIdentifier(parsedContent);
String language = "en";
if (languageIdentifier.isReasonablyCertain()) {
language = languageIdentifier.getLanguage();
}
context.externalValue(language);
languageMapper.parse(context);
} catch (Throwable e) {
// #18: we could ignore errors when Tika does not parse data
if (!ignoreErrors) throw new MapperParsingException("Failed to extract [" + indexedChars + "] characters of text for [" + name + "]", e);
return;
}
context = new AttachmentParseContext(context);
context.externalValue(parsedContent);
contentMapper.parse(context);
try {
context.externalValue(name);
nameMapper.parse(context);
} catch(MapperParsingException e){
if (!ignoreErrors) throw e;
if (logger.isDebugEnabled()) logger.debug("Ignoring MapperParsingException catch while parsing name: {}", e.getMessage());
}
try {
context.externalValue(metadata.get(Metadata.DATE));
dateMapper.parse(context);
} catch(MapperParsingException e){
if (!ignoreErrors) throw e;
if (logger.isDebugEnabled()) logger.debug("Ignoring MapperParsingException catch while parsing date: {}: {}", e.getMessage(), context.externalValue());
}
try {
context.externalValue(metadata.get(Metadata.TITLE));
titleMapper.parse(context);
} catch(MapperParsingException e){
if (!ignoreErrors) throw e;
if (logger.isDebugEnabled()) logger.debug("Ignoring MapperParsingException catch while parsing title: {}: {}", e.getMessage(), context.externalValue());
}
try {
context.externalValue(metadata.get(Metadata.AUTHOR));
authorMapper.parse(context);
} catch(MapperParsingException e){
if (!ignoreErrors) throw e;
if (logger.isDebugEnabled()) logger.debug("Ignoring MapperParsingException catch while parsing author: {}: {}", e.getMessage(), context.externalValue());
}
try {
context.externalValue(metadata.get(Metadata.KEYWORDS));
keywordsMapper.parse(context);
} catch(MapperParsingException e){
if (!ignoreErrors) throw e;
if (logger.isDebugEnabled()) logger.debug("Ignoring MapperParsingException catch while parsing keywords: {}: {}", e.getMessage(), context.externalValue());
}
try {
context.externalValue(metadata.get(Metadata.CONTENT_TYPE));
contentTypeMapper.parse(context);
} catch(MapperParsingException e){
if (!ignoreErrors) throw e;
if (logger.isDebugEnabled()) logger.debug("Ignoring MapperParsingException catch while parsing content_type: {}: {}", e.getMessage(), context.externalValue());
}
try {
if (metadata.get(Metadata.CONTENT_LENGTH) != null) {
// We try to get CONTENT_LENGTH from Tika first
context.externalValue(metadata.get(Metadata.CONTENT_LENGTH));
} else {
// Otherwise, we use our byte[] length
context.externalValue(content.length);
}
contentLengthMapper.parse(context);
} catch(MapperParsingException e){
if (!ignoreErrors) throw e;
if (logger.isDebugEnabled()) logger.debug("Ignoring MapperParsingException catch while parsing content_length: {}: {}", e.getMessage(), context.externalValue());
}
}
This way you get back an external value whenever it is set, even if it was consumed - i.e. read - already.
Hope this helps,
Tom
from elasticsearch-mapper-attachments.
Thanks for sharing this! I really appreciate.
I created a PR based on that in core code.
We'll see if we need or not to patch the plugin in the meantime...
Keeping this open for the moment.
from elasticsearch-mapper-attachments.
Related Issues (20)
- Update to elasticsearch 2.1.2
- This plugin is incompatible with elasticsearch 2.2.0 HOT 5
- Loosing file.content on _update HOT 5
- Searching iWork files HOT 1
- Failed to index docx file with mapper-attachments plugin that comes with Elasticsearch 2.3.1 HOT 2
- Support for Elasticsearch 2.2.1 HOT 1
- Extracting "subject" and "page count" out of a pdf document HOT 2
- Filtered Query Match on Document Contents Not Working
- Unable to build HOT 1
- copy_to not working as expected HOT 1
- build it with gradle error HOT 1
- Plugin doesn't work with embedded elasticsearch in spring boot app's uber JAR HOT 1
- the txt file can't be search HOT 1
- mapper-attachments (documentation) link is broken HOT 1
- NodeJS example HOT 2
- ERROR: Plugin [mapper-attachments] is incompatible with Elasticsearch [2.4.4]. Was designed for version [2.1.2] HOT 3
- pdfbox error HOT 1
- Make the plugin available for ES6 HOT 1
- Please update Bouncy Castle HOT 1
- Support Elasticsearch 7.x HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticsearch-mapper-attachments.