Comments (11)
Hey @rucciva, I've been thinking about this, an alternative would be to use the process_map
processor, allowing you to extract fields, apply a date
processor and then place the result in a path. That way we can have a flow where the processor uses function interpolation for multiple args and a value.
It would look something like this:
type: process_map
process_map:
premap:
duration: path.to.duration
value: path.to.value
processors:
- type: date
date:
operator: add
arg: ${!json_field:duration}
value: ${!json_field:value}
postmap:
path.of.result: .
from connect.
Hey @rucciva, I've considered this before, to an extent some of this stuff can be done with the grok
processor but it obviously doesn't allow you to do conversions. What would be the specific processing steps you're after?
I'd consider both function interpolations or a dedicated processor, I'm not entirely sure which I would personally prefer yet.
from connect.
In my case, i would like to
- Convert unix timestamp (either in second, mili second, or nano second) to elasticsearch date string.
- From the same date data, construct time based index, either daily, monthly, or yearly. Depends on the approximate number of event.
from connect.
Thinking about it, I could possibly add this as a text
processor operation. It might look something like this:
type: text
text:
operator: date_to_unix
arg: "Mon Jan 2 15:04:05 -0700 MST 2006" # Format to convert from
The arg
field would follow this rule: https://golang.org/pkg/time/#Time.Format. Then I could add date_to_unix_nano
, unix_to_date
, unix_nano_to_date
, etc.
type: text
text:
operator: unix_to_date
arg: "Mon Jan 2 15:04:05 -0700 MST 2006" # Format to convert to
from connect.
Hmm.. why not create a separate processor instead of using text processor?
Imho, adding this operation to text processor is not consistent with the fact that decode, encode, and hash gets it own processor. I think text processor are best for general string processing.
I also think that having dedicated processor for date processing is good in case future specifix date operations are needed.
from connect.
Yeah you might be right, I'll sit on this one for a little while if that's okay.
from connect.
Yeah It's completely okay. No need to rush
from connect.
Hi @Jeffail , any news on this?
IMHO, a separate processors is better than combining it with the text processor. I think the config might be different and i have two options in my mind
first:
type: date
date:
parts: []
operator: some operator
args:
- interpolatable argument 1
- interpolatable argument 2
- ...
value: interpolatable value that will replace the message part as the base value that is manipulated by the operator
i think that a message containing only date value is rare, thus the date value must be a part of or a field in the message. this also imply that replacing the whole message with only date value is a rare occassion. in this case process_field
could come in handy to select only the date field to be processed. But in the process, we will also lose reference to other part of the message that might be needed as arguments.
in the case above, i'am thinking we could create a new field that contains both the value that will be processed and other values needed as arguments. Then we use process_field
on the newly created field and select the value and arguments field using interpolation. for example given a message like this {"request_start_time" : 1540352793, "duration":5, ... }
, when we need to compute the request_finish_time
, we first need to create a temporary field contains both the request_start_time and duration like this {"request_start_time" : 1540352793, "duration":5, "request_finish_time": {"request_start_time" : 1540352793, "duration":5}, ... }
, then process_field the 'request_finish_time' and apply the date processing:
- type: process_field
process_field:
parts: []
path: "request_finish_time"
processors:
- type: date
date:
operator: add
args:
- ${!json_field:duration}
value: ${!json_field:request_start_time}
the second options will use target_path and source_path in the configuration like so.
type: date
date:
parts: []
source_path: some json path like that of process_field's path
target_path: some json path like that of process_field's path
operator: some operator
args:
- interpolatable argument 1
- interpolatable argument 2
- ...
or we could just remove source_path and treat everything as arguments to the operator
what do you think?
from connect.
i like that process_map
, it makes it looks simpler, and reusable to other sub processor
from connect.
Hey @rucciva, just to keep you up to date. I'm thinking of solving this instead via the new AWK processor by adding date related functions.
This example gives a brief look at how it works: https://github.com/Jeffail/benthos/blob/master/docs/processors/README.md#json
from connect.
Hi @Jeffail, this is very nice too. Thanks.
I think the issue can be closed
from connect.
Related Issues (20)
- MwM
- Exclude enterprise licensed plugins from the all package HOT 2
- main.go seems to be Redpanda Enterprise licensed HOT 2
- Document workflow/result_map
- CLI references the wrong binary name
- Collaborate on a Benthos processor for a Conduit pipeline?
- Dependency Licensing issue caused by couchbase/gocbcoreps HOT 3
- aws_kinesis input: shards are not processed if they are closed HOT 4
- public free bundle missing the xml package import HOT 2
- Pass along bloblang/yaml error context
- kafka_franz: No connection errors if `consumer_group: ""`
- redis_streams: support for `XAUTOCLAIM`
- Docs typo in Configuration: Templating
- sql_insert - high CPU usage mainly due to GC cycles and allocations. HOT 6
- Global options no longer work via rpk connect HOT 5
- Elasticsearch output backoff should honor HTTP code `429`
- Log rotation is extra aggressive on removing older log files
- Kafka_franz info HOT 1
- Emit `kafka_lag` metadata for the `kafka_franz` input similarly to the `kafka` input
- Add connector support levels to the connector source and templates
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from connect.