beam-telemetry / telemetry Goto Github PK

View Code? Open in Web Editor NEW

845.0 38.0 71.0 198 KB

Dynamic dispatching library for metrics and instrumentations.

Home Page: https://hexdocs.pm/telemetry

License: Apache License 2.0

Erlang 98.88% Elixir 1.12%

elixir metrics instrumentation events

telemetry's People

Contributors

Stargazers

Watchers

telemetry's Issues

Disallow emitting and attaching to empty-list event

Right now an event name might be an arbitrary list of atoms. I think that event with an empty-list name should not be allowed since it doesn't convey any information about the source of the event.

Thoughts?

Reduce telemetry footprint

Currently the codebase has a behaviour and some indirection and it could all be inlined into a single module (although I would keep the current function boundaries, since it decouples the events from the API). I know @arkgil wanted to keep the impl behaviour for a bit but we need to eventually clean it up.

Telemetry should also use Erlang's Logger data in events

This would allow for example using Telemetry events as events in traces (not only as start-end events).

My original use-case was a little bit different though. I wanted to add support for Server-Timing header in Plug with timings for different parts of the query. This would be grat to gather all events that happened within single request and put them in the response header, however right now there is no way to detect which are the events for "current request" as there is no way to set metadata that would be populated through all Telemetry events. It would be great to have either telemetry:set_process_metadata/1 or use Erlang's logger metadata that would be added to the metadata provided in event dispatch. With metadata I could just do set_process_metadata(#{?MODULE => ref}) and use that to detect events that happened within single process.

telemetry.beam failed: :badfile

Hey y'all!

Apologies if this is the wrong place to be posting this, but I am at my whits end. If there is a better place to be asking this question, please point in that direction if you have the time. I am new to elixir/Erlang and I'm not sure about a lot of things at this point, but here's where I'm at:

Been playing around with Elixir / Phoenix / LiveView, etc. Recently I got a LiveView testing course and the sample project is running Erlang 23, and I'm only (very loosely) familiar with 24. Anyhoo, I downloaded asdf and started to get things sorted. It was a BEAR getting 23 correctly installed with the right version of openssl and blah blah blah. Anyhoo, I finally got 23 to build correctly, but when I try running any tests at all, I see this a BUNCH of times:

22:52:06.135 [error] Loading of {project_dir}/_build/test/lib/telemetry/ebin/telemetry.beam failed: :badfile

22:52:06.135 [error] beam/beam_load.c(1886): Error loading module telemetry:
  This BEAM file was compiled for a later version of the run-time system than 23.
  To fix this, please recompile this module with an 23 compiler.
  (Use of opcode 172; this emulator supports only up to 170.)

And then eventually the running of the test fails with this error message:

could not compile dependency :phoenix_live_dashboard, "mix compile" failed. You can recompile this dependency with "mix deps.compile phoenix_live_dashboard", update it with "mix deps.update phoenix_live_dashboard" or clean it with "mix deps.clean phoenix_live_dashboard"
** (MatchError) no match of right hand side value: {:error, {:phoenix, {:bad_return, {{Phoenix, :start, [:normal, []]}, {:EXIT, {:undef, [{:telemetry, :attach, [{Phoenix.Logger, [:phoenix, :channel_handled_in]}, [:phoenix, :channel_handled_in], #Function<8.82939286/4 in Phoenix.Logger."-fun.phoenix_channel_handled_in/4-">, :ok], []}, {Phoenix.Logger, :"-install/0-fun-7-", 2, [file: 'lib/phoenix/logger.ex', line: 111]}, {:maps, :fold_1, 3, [file: 'maps.erl', line: 233]}, {Phoenix.Logger, :install, 0, [file: 'lib/phoenix/logger.ex', line: 110]}, {Phoenix, :start, 2, [file: 'lib/phoenix.ex', line: 46]}, {:application_master, :start_it_old, 4, [file: 'application_master.erl', line: 277]}]}}}}}}
    (phoenix 1.5.6) lib/mix/tasks/compile.phoenix.ex:11: Mix.Tasks.Compile.Phoenix.run/1
    (mix 1.11.2) lib/mix/task.ex:394: Mix.Task.run_task/3
    (mix 1.11.2) lib/mix/tasks/compile.all.ex:90: Mix.Tasks.Compile.All.run_compiler/2
    (mix 1.11.2) lib/mix/tasks/compile.all.ex:70: Mix.Tasks.Compile.All.compile/4
    (mix 1.11.2) lib/mix/tasks/compile.all.ex:57: Mix.Tasks.Compile.All.with_logger_app/2
    (mix 1.11.2) lib/mix/tasks/compile.all.ex:35: Mix.Tasks.Compile.All.run/1
    (mix 1.11.2) lib/mix/task.ex:394: Mix.Task.run_task/3
    (mix 1.11.2) lib/mix/tasks/compile.ex:119: Mix.Tasks.Compile.run/1

So...

Am I in the right place?
How exactly does one "recompile this module with an 23 compiler" ?
Do you have any other suggestions?

Thank you so much for your time!

Need for notification when handler is detached?

Right now the handler is detached whenever it fails. Additionally, handlers can be manually removed. However, there is now way to get notified when this happens, which means that one can not attempt to attach handler once again in hope that it won't fail this time (similarly we restart processes in hope that this time everything will work fine).

I believe that such mechanism should be added to the library. I imagine processes attaching handlers on spawn, listening on "detached events" and crashing when they're received, to ultimately be restarted and starting the cycle over again.

/cc @josevalim @chrismccord @studzien

Guidance around monotonic times

A number of integrations are using monotonic_time for calculating durations, and that's great since it's the only way to avoid bad times when "clock skew" happens.

http://erlang.org/doc/apps/erts/time_correction.html

However, the monontonic_time does not mark a specific point in time! There are numerous integrations now that are using that time directly as part of a :start event, which means that they are providing an un-usable start timestamp. Knowing the accurate start time of a span is critical for tracing.

I suggest we decide on a set of recommendations for marking & tracking time.

My initial suggestion is this:

Start

start_time via System.system_time()

Stop / Failure

duration via System.monotonic_time()

start_time = System.system_time()
start_time_mono = System.monotonic_time()

:telemetry.exectute(
  [:event, :start],
  %{
    start_time: start_time
  },
  metadata
)

# do work ...

end_time_mono = System.monotonic_time()

:telemetry.exectute(
  [:event, :stop],
  %{
    duration: end_time_mono - start_time_mono
  },
  metadata
)

Why not rely on each individual handler to mark time?

There may be multiple handlers, and they are called in sequence. Each one will "stamp" a different time
There may be any amount of work done between the point at which the library might consider the thing-being-instrumented "done" and when the telemetry execution is triggered
Time is hard, and doing this once in the library is better than N times for every handler that integrates with it

What's the difference with gen_event?

I was reading the code and, as far as I was checking it, it looks a copy of gen_event, it's been able to register handlers for every element propagated and if it's failing it's detached. Because maybe I was missing something, I'm curious, what's the difference with gen_event?

Distributed Tracing - ie: mutable handlers

Howdy folks, I'm excited about this project!

I'm the author of the New Relic Elixir Agent (https://github.com/newrelic/elixir_agent/)...

One feature that is really powerful but a little aside from the Telemetry use cases I'm seeing is Distributed Tracing. This kind of features differs in that in addition to callbacks that fire when events happen, we need make some kind of mutation.

For HTTP, this looks like adding an additional header to a request.. In the New Relic agent, the header addition pretty much has to be done manually by the developer.

headers = headers ++ NewRelic.create_distributed_trace_payload(:http)

On the receiving end, the server needs to get access to read this header and take some action. We do this with a simple plug that users must add to the top of their pipeline. You could imagine reading the header and wanting to set some kind of value in conn.private, for example.

If we provided a clear convention or mechanism for this kind of thing, libraries might be willing to build this in along with the Telemetry events. Basically it boils down to enabling the Telemetry system to have a category of event that can transform some select pieces of data.

Here's a very rough sketch of what the interface could be like, just to explain the idea:

# Inside an HTTP client
#  ... this library wants to allow transforming headers & query params
%{headers: headers, params: params} =
  Telemetry.transform([:http, :client, :pre_request], %{params: params, headers: headers, url: url, body: body}})

# Inside an HTTP server / Plug
%{conn: conn} = 
  Telemetry.transform([:http, :server, :receive], %{conn: conn})

Hopefully this is interesting to other folks too!

Add an API to stream events as they happen for discovery

It would be nice if we had an API that collected/logged all events published in a certain time period. This would make it easy to discover events. For example, in a Phoenix application I could say: call IO.inspect/1 for all events in the next 5 seconds.

I am not sure if this functionality belongs here. In order to not slow down telemetry, we could implement this using :erlang.trace. The goal here is debugging, so it is important to not slow down the main telemetry engine. We could also add this to telemetry_sampler.

Support a debug mode and capture unhandled events in debug mode

It would be useful to debug poorly configured handlers or having the list of all events emitted in an application.

I initially thought about always persisting unhandled events to an ETS table, but I realize that the execute function could be called many times per second so we need to keep the overhead there to a minimum.

So I was thinking of doing it conditionally if the telemetry application is run in debug mode.

For now we could keep things simple and only persist unhandled event names.

Later we could expand the debug mode to do things that are no-nos when running in production.

The downside of this approach is that right now telemetry is so simple that it doesn't need configuration (and that is a good thing) and a debug mode would make it slightly more complex. It also means a possible foot gun for users that could unwittingly set debug mode on production.

Make `config` argument required on attach

Making execution order of handlers guaranteed

I have a use case, where I want to include span_id and trace_id, which come from a "distributed tracing" library of choice into logs. If it's OpenTelemetry or Spandex doesn't really matter, in both cases, I am facing the same possible issue.

I want to have span_id and trace_id attached to the log lines generated when web request starts already, but in order to achieve it, the handlers which start the OpenTelemetry/Spandex span/trace need to be executed before the handlers attached by Phoenix.Logger to the same event.

This seems to be work, as I can disable logger initialization on :phoenix application start, and then attach it manually after I attached OpenTelemetry/Spandex events, for example:

# config/config.exs
config :phoenix, :logger, false

# lib/application.ex
    SpandexPhoenix.Telemetry.install()
    Phoenix.Logger.install()

    # or

    OpenTelemetry.register_application_tracer(:my_app)
    OpentelemetryPhoenix.setup()
    Phoenix.Logger.install()

This works as I'd expect it to work but, the documentation of this project clearly says:

Note that you should not rely on the order in which handlers are invoked.

telemetry/src/telemetry.erl

Line 72 in d3ba8b2

%% Note that you should not rely on the order in which handlers are invoked.

And this is precisely what I am relying on (and others too, without a doubt).

It seems that the handlers are executed in the order they were inserted to underlying ETS table, as they are fetched with :ets.lookup, which does return them in order of insertion.

https://github.com/beam-telemetry/telemetry/blob/main/src/telemetry_handler_table.erl#L48

Should we make this behavior official? I think we don't have to change anything in the project code itself, maybe we should just add a couple of tests to make sure the order is preserved, and remove the warnings from the documentation that the order is not guaranteed?

Configure CI

Since the project is now public, I think it would be nice to have some CI set up to run tests on each PR. I see that Phoenix itself is using TravisCI, should we use it for this project as well?

/cc @josevalim @chrismccord

span needs to pass context to the stop event

Opening this really quick, can't expand on it right this second. But the span function needs to be able to take some result from the start event to pass to the stop event so that the right span can be ended.

That or span needs to be able to work another way that isn't based on events.

Trying to figure out how to integrate with OpenTelemetry 1.0 without a lot of overhead for the user and so that it doesn't have the issue of not knowing what span you are end'ing when end is called in the stop event.

Add a README

Proposal:

Telemetry is a dynamic dispatching library for metrics and instrumentations. It is lightweight, small and can be used in any Elixir project.

In a nutshell, you register a custom module and function to be invoked for certain events, which are executed whenever there is such event. Event names are a list of atoms. Each event is composed of a numeric value and can have metadata attached to it. Let's see an example:

...

Tracing events

It would be great to have built-in support for tracing events along metrics. It would provide nice API to digest by OpenCensus, Spandex, and possibly other tracers out there. While this can be currently "improvised" via creating events in form of [:event, :start] and [:event, :end] there are few problems with that approach:

it is hard to match :start with it's :end
it requires "best common practices" instead of forcing nomenclature on everyone

How to handle telemetry in tests

I have some code that attaches Telemetry handlers on startup of some supervisor. I start this supervisor in tests with start_supervised!, but the problem is that the telemetry handlers are never detached when the supervisor dies.

The code below shows the problem, test "two" will fail with the error {:error, :already_exists}. I don't really like to just ignore the outcome of Telemetry.attach, I could of course handle both :ok and {:error, :already_exists} as valid outcomes.

Is that the way to handle it, or is there a way to stop the Telemetry cleanly after a test.

defmodule TestTelemetry do
  use ExUnit.Case, async: false

  test "one" do
    :ok = Telemetry.attach_many(
      "log-response-handler",
      [[:external_message, :consumed], [:external_message, :received]],
      LogResponseHandler,
      :handle_event,
      nil
    )
  end

  test "two" do
    :ok = Telemetry.attach_many(
      "log-response-handler",
      [[:external_message, :consumed], [:external_message, :received]],
      LogResponseHandler,
      :handle_event,
      nil
    )
  end
end

`:telemetry.span/3` documentation seems inconsistent

When providing StartMetadata and StopMetadata, these values will be sent independently to start and stop events. […] In general, StopMetadata should only provide values that are additive to StartMetadata so that handlers, such as those used for metrics, can rely entirely on the stop event.

It's not really clear from this if StartMetadata values needs to be supplied as StopMetadata as well if they should be available for both events. The code seems to suggest they should, the docs seem to suggest to not do that.

Expected user experience for installing a telemetry backend

I'm working on an instrumentation package for sending Ecto data to New Relic (https://github.com/binaryseed/new_relic_ecto). I'm the author of New Relic's Elixir agent (https://github.com/newrelic/elixir_agent/)

I'm wondering what the expected workflow is for users who would want to install the package.

For example, in this case, I'm wondering about the following:

Should I get clever and figure out the existing Repos at startup and auto-instrument them, or should I require the user to manually attach the handler.

If I knew the otp_app I could get the list of Repos & attach them. But generally do we want users to have to glue things together or should we try to make it as automatic as possible?

Should I provide a helper function for doing the attach, or do we want users interacting with :telemetry.attach directly?

Right now I'm thinking of at least providing a helper that translates from the Repo module to the Telemetry event for the user, and wraps up the details of how my handler is implemented. Here's how that is currently shaping up: binaryseed/new_relic_ecto#4

def attach(repo: repo) do
  Telemetry.attach(
    "new_relic_ecto",
    telemetry_prefix(repo) ++ [:query],
    NewRelic.Ecto.Telemetry,
    :handle_event,
    %{}
  )
end

Just hoping to get a sense of the desired user experience... Thanks!

Is telemetry distributed ?

I am wondering if I have two nodes connected via libcluster and I run telemetry.attach on node A and telemetry.execute on B. If node A is able to execute the handle_event automatically.

Or how can I achieve this with telemetry?

Thank you.

Move to Erlang + Rebar3

This will allow this package to be used by everyone in the community for event dispatching, which will improve the coverage of events throughout the ecosystem. Right now we are planning to move only the core though.

@tsloughter, can you please review the current API and let us know if you would like to do any changes in the process?

Errors in readme

or rebar.config:

{deps, [{telemetry, "~> 0.4.1"}]}.

should be chaged to:

{deps, [{telemetry, "0.4.1"}]}.

for Erlang, at the end of readme.

Moreover, there is no 0.4.1 release so 0.4.0 should be used.

And also

telemetry:execute(
  [web, request, done],
  #{latency => Latency},
  #{request_path => Path, status_code => Status}
)

uses request_path while

handle_event([web, request, done], #{latency := Latency}, #{path := Path,
                                                            status_code := Status}, _Config) ->
  ?LOG_INFO("[~s] ~p sent in ~p", [Path, Status, Latency]).

handler, matches to the key path. Which causes an error trying to copy-paste an example form Readme.

Elixir interface

Now that this library is implemented in erlang, would it be possible to have a Elixir-native module to interface with Telemetry instead of :telemetry?

I'll PR if it's straightforward...

A flaky and odd compilation error

My version information:

...
"telemetry": {:hex, :telemetry, "0.4.1", "ae2718484892448a24470e6aa341bc847c3277bfb8d4e9289f7474d752c09c7f", [:rebar3], [], "hexpm", "4738382e36a0a9a2b6e25d67c960e40e1a2c95560b9f936d8e29de8cd858480f"},
...

 $ asdf current
elixir         1.9.4    (set by /Users/isanders/projects/.tool-versions)
erlang         22.2.1   (set by /Users/isanders/projects/.tool-versions)

My error looks like this:

===> Compiling telemetry
===> Uncaught error in rebar_core. Run with DEBUG=1 to see stacktrace or consult rebar3.crashdump
===> When submitting a bug report, please include the output of `rebar3 report "your command"`
** (Mix) Could not compile dependency :telemetry, "/Users/isanders/.asdf/installs/elixir/1.9.4/.mix/rebar3 bare compile --paths="/Users/isanders/projects/my_app/_build/test/lib/*/ebin"" command failed. You can recompile this dependency with "mix deps.compile telemetry", update it with "mix deps.update telemetry" or clean it with "mix deps.clean telemetry"
** (exit) 1
    (mix) lib/mix/tasks/cmd.ex:45: Mix.Tasks.Cmd.run/1
    (mix) lib/mix/task.ex:331: Mix.Task.run_task/3
    (mix) lib/mix/task.ex:365: Mix.Task.run_alias/3
    (mix) lib/mix/task.ex:292: Mix.Task.run/2
    (mix) lib/mix/cli.ex:79: Mix.CLI.run_task/2
    (elixir) lib/code.ex:813: Code.require_file/2

The rebar3.crashdump is the weirdest part:

 $ cat deps/telemetry/rebar3.crashdump
Error: {badmatch,{error,enoent}}
[{rebar_erlc_compiler,modify_erlcinfo,5,
                      [{file,"/home/tristan/Devel/rebar3/_build/prod/lib/rebar/src/rebar_erlc_compiler.erl"},
                       {line,477}]},
 {rebar_erlc_compiler,'-update_erlcinfo_fun/2-fun-0-',4,
                      [{file,"/home/tristan/Devel/rebar3/_build/prod/lib/rebar/src/rebar_erlc_compiler.erl"},
                       {line,463}]},
 {lists,foldl,3,[{file,"lists.erl"},{line,1263}]},
 {rebar_erlc_compiler,update_erlcinfo,3,
                      [{file,"/home/tristan/Devel/rebar3/_build/prod/lib/rebar/src/rebar_erlc_compiler.erl"},
                       {line,448}]},
 {rebar_erlc_compiler,'-update_erlcinfo_fun/2-fun-0-',4,
                      [{file,"/home/tristan/Devel/rebar3/_build/prod/lib/rebar/src/rebar_erlc_compiler.erl"},
                       {line,463}]},
 {lists,foldl,3,[{file,"lists.erl"},{line,1263}]},
 {rebar_erlc_compiler,update_erlcinfo,3,
                      [{file,"/home/tristan/Devel/rebar3/_build/prod/lib/rebar/src/rebar_erlc_compiler.erl"},
                       {line,448}]},
 {rebar_erlc_compiler,'-update_erlcinfo_fun/2-fun-0-',4,
                      [{file,"/home/tristan/Devel/rebar3/_build/prod/lib/rebar/src/rebar_erlc_compiler.erl"},
                       {line,463}]}]

I don't share a computer with @tsloughter, or even know him, but he is the best lead I have on this.

Question: Is there a more detailed usage example?

Hi! Saw Chris' describe Telemetry in his Elixir Conf Talk and want to explore using it.

Is there a more detailed usage example available somewhere
or "demo" app with source we can read on GitHub to help get started using it?

Is it likely to be covered in Programming Elixir 1.6 or Programming Phoenix 1.4 ?
(I've tried searching the latest versions of both eBooks and not found a reference to Telemetry...)

Thanks!

P.S. LMK if this is more appropriate for the ElixirForum, happy to close this issue and move it to there ...

Safe ETS interactions

We've seen an issue in Absinthe where a mix task is run that calls code which is instrumented with telemetry, but the handler lookup fails. The ETS table hadn't been created since telemetry wasn't started.

Given that the goal is to have telemetry embedded everywhere, should we rescue the ArgumentError that arises in cases like this?

telemetry/src/telemetry_handler_table.erl

Lines 45 to 47 in 9f00993

 -spec list_for_event(telemetry:event_name()) -> [#handler{}]. 

 list_for_event(EventName) -> 

 ets:lookup(?MODULE, EventName).

absinthe-graphql/absinthe#807

Compiling 1 file (.ex)
** (ArgumentError) argument error
    (stdlib) :ets.lookup(:telemetry_handler_table, [:absinthe, :execute, :operation, :start])
    /Users/Jaap/Documents/development/family-five/server/deps/telemetry/src/telemetry_handler_table.erl:47: :telemetry_handler_table.list_for_event/1
    /Users/Jaap/Documents/development/family-five/server/deps/telemetry/src/telemetry.erl:114: :telemetry.execute/3
    lib/absinthe/phase/telemetry.ex:18: Absinthe.Phase.Telemetry.run/2
    lib/absinthe/pipeline.ex:366: Absinthe.Pipeline.run_phase/3
    lib/absinthe.ex:110: Absinthe.run/3
    lib/mix/tasks/compile.schema_json.ex:12: Mix.Tasks.Compile.SchemaJson.run/1

Update docs on hexdocs

Hi! The hosted docs are still on 0.3 and that is a bit confusing since the change on API.

Could you guys update it there? https://hexdocs.pm/telemetry/

Thanks!

Can't start app in Mix release {:bad_return, {{:telemetry_app, :start...

Using Telemetry from git (sha: 9a9da7b8) the telemetry app won't work in a mix release. It works fine when starting the application with mix.

This is the error we get when trying to start the release:

[14:02:39.213] [error] module=proc_lib function=crash_report/4 
Process #PID<0.3570.0> terminating
** (exit) {:bad_return, {{:telemetry_app, :start, [:normal, []]}, {:EXIT, {:undef, [{:telemetry_app, :start, [:normal, []], []}, {:application_master, :start_it_old, 4, [file: 'application_master.erl', line: 293]}]}}}}
    (kernel 8.1.2) application_master.erl:142: :application_master.init/4
    (stdlib 3.16.1) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Initial Call: :application_master.init/4
Ancestors: [#PID<0.3569.0>]
Message Queue Length: 1
Messages: [{:EXIT, #PID<0.3571.0>, :normal}]
Links: [#PID<0.3569.0>, #PID<0.3496.0>]
Dictionary: []
Trapping Exits: true
Status: :running
Heap Size: 376
Stack Size: 28
Reductions: 225

Release 0.4.2

We have added span/3 to telemetry, so it probably makes sense to build a new release. WDYT?

@arkgil, would you please do the honors when you are around? Thank you!

Static API for discovering events

Originally posted by @bryannaegele in #12 (comment):

We've discussed a number of ways to discover telemetry events for a variety of use cases, most recently as it pertains to registering of event handlers for tracing purposes. Knowledge of event names is a predicate to registering handlers, so we are left with two potential avenues: compile-time introspection or explicit declaration.

Note that this is not intended to be the sole mechanism for event discovery. Issue #12 talks about a way to discover events as they happen. This discussion is about discovering events before they happen so they can be listened to in an unintrusive way.

Compile-time Introspection

This is arguably the best overall solution since it requires no additional steps by the user. In my initial investigation, introspection would be difficult to achieve, especially across languages.

Explicit Declaration

By requiring the user or library author to explicitly declare possible events, we have a straightforward mechanism to make use of that declaration at various steps and with a broad range of applications.

Behaviour

One option would be to provide a behaviour with a telemetry_events/0 function which returns a list of discoverable events.

Module Attribute

Introduce an attribute of telemetry_events with a list of discoverable events.

Each of the above is easily discoverable at runtime and compile-time. My initial idea was to use module attributes but, after further consideration, I'm leaning towards @josevalim's idea of going with a behaviour. It's just as easy to use for a user, feels more idiomatic, and provides flexibility going forward. It's probably more useful from a tooling perspective, as well.

Proposal

Introduce a telemetry behaviour with a telemetry_events/0 callback (default impl returns []).
Expand the API to add telemetry:register_event/1, telemetry:deregister_event/1, and telemetry:register_all_events/1 functions to support further runtime declaration flexibility and discoverability (ETS storage is probably fine).
- Fire events for invocation of each of the above which a tracing or other interested library could listen for.
- Validate event name declarations during registration.
- Validate against duplicate event names. (Raise an exception? This situation is likely to cause handler detachments. We should at least warn.)
- Store registering module names with the event name? This could be useful in debugging or warning against duplicate event names being registered from different namespaces.
Add telemetry:list_all_events/0 would return all registered events.
Introduce an event definition data structure for discovery and documentation. Includes event name, event description, available measurements with optional descriptions, metadata with optional descriptions.

/cc @arkgil @GregMefford @tsloughter @hauleth @binaryseed

A doubt about sync nature of execute

Hello, before all thanks for creating a really cool & needed library.
I have a doubt, as I understand from the code after calling :telemetry.execute we call linked handler function immediately in a sync way. Am I right?
If I am, I have doubts if in some cases (with a high-frequency usage, ecto, for example) it will be an optimal solution. Can you please clarify on this topic?

span/3 StartMetadata and StopMetadata semantics

I was working on adding some new telemetry events to Commanded to be used in the bridge library I'm developing for Otel. A point of confusion came for me when I needed to pass the metadata to :telemetry.span/3. In the PR, you'll see that I passed data metadata to the StartMetadata, but then passed an empty map for the StopMetadata.

After ruminating on this with @bryannaegele, it became obvious to me that I should be passing the same value to both start and stop. However, the documentation did not point me in this direction, and the API was surprising initially. I would have expected the StopMetadata to be merged into the StartMetadata.

I'm opening this issue to start a discussion on whether this needs a simple documentation update, or if the API needs to be reconsidered slightly.

Using telemetry as a dependency in an Elixir script

I've got a previously working Elixir script with Mix.install:

Mix.install([
  {:telemetry, "~> 1.1.0"},
])

I've recently upgraded to macOS Ventura (on M1) and tried to run the script (elixir test.exs), and I get this error:

** (ErlangError) Erlang error: :enoent
(elixir 1.14.2) lib/system.ex:940: System.shell("/Users/yrashk/.mix/elixir/1-14/rebar3 bare compile --paths /Users/yrashk/Library/Caches/mix/installs/elixir-1.14.2-erts-13.1.1/c0c70753ffb5bc2d1e26727be812c93b/_build/dev/lib/*/ebin", [stderr_to_stdout: true, into: %Mix.Shell{callback: #Function<0.74950258/1 in Mix.Shell.IO.cmd/2>}, env: [{"ERL_LIBS", "/Users/yrashk/Library/Caches/mix/installs/elixir-1.14.2-erts-13.1.1/c0c70753ffb5bc2d1e26727be812c93b/_build/dev/lib"}, {"REBAR_BARE_COMPILER_OUTPUT_DIR", "/Users/yrashk/Library/Caches/mix/installs/elixir-1.14.2-erts-13.1.1/c0c70753ffb5bc2d1e26727be812c93b/_build/dev/lib/telemetry"}, {"REBAR_CONFIG", "/Users/yrashk/Library/Caches/mix/installs/elixir-1.14.2-erts-13.1.1/c0c70753ffb5bc2d1e26727be812c93b/_build/dev/lib/telemetry/mix.rebar.config"}, {"REBAR_PROFILE", "prod"}, {"TERM", "dumb"}]])

$ elixir --version
Erlang/OTP 25 [erts-13.1.1] [source] [64-bit] [smp:10:10] [ds:10:10:10] [async-threads:1] [jit] [dtrace]

Not quire sure what went wrong here... I've tried deleting all the caches, didn't help.

Remove disclaimers from README

Instrumentation hooks for trace context propagation

I'm interested in seeing telemetry extended with an API to enable an additional type of instrumentation - the mutation of an outbound request / query with Distributed Trace context information.

The general API could be implemented as an instrumentation hook that a library could build into their code that enables telemetry handlers to mutate specific data structures.

Taking HTTP requests as an example, we could imagine an HTTP client library adding telemetry hooks:

defmodule HttpClient do
  def request(url, body, headers) do
    extra_headers =
      :telemetry.decorate(
        [:httpoison, :request, :headers],
        %{url: url, body: body, headers: headers}
      )

    validate(extra_headers)
    do_request(url, body, headers ++ extra_headers)
  end
end

Then a handler could attach to [:httpoison, :request, :headers], and do some work in the handler...

defmodule MyTelemetryHandler do
  def handle([:httpoison, :request, :headers], request) do
    context = get_trace_context(request)
    generate_distributed_tracing_headers(context)
  end
end

This would provide a general mechanism for outbound instrumentation, without hard-wiring to any vendor-specific instrumentation details.

Very open to discussions about strategy and naming!

Issues to sort out:

Clarify naming for the various parts of the puzzle
Account for collecting annotations from multiple handlers
Have a mechanism or pattern for validating the data returned from handlers

For context, this issue has been discussed before in #24 and beam-telemetry/telemetry_metrics#46

Automatically detach when attaching process ends

I have a use case where a GenServer needs to monitor a connection. The connection automatically reconnects when it disconnects, and it publishes telemetry events for it.

The problem is my GenServer can crash and get restarted, but :telemetry doesn't clean up the event handler.

Task.async(fn -> :ok = :telemetry.attach(:foo, :foo, &handler/4, :foo) end)
|> Task.await()

:telemetry.list_handlers(:foo) != []

Is there any performant way to have :telemetry Process.monitor/1 anything that calls attach/4 and automatically detaches when those processes end?

There are ways to do this on the user side of :telemetry, but it seems very hacky: The config argument of attach could be {handler_id, pid_of_caller}, then the handler can call Process.alive?(pid_of_caller) and call detach if true. It also seems less performant than just having a process monitor.

Thanks for the help.

Subscribing to prefix vs subscribing to event

This is a follow-up to the discussion on the Ecto mailing list and @josevalim's comment here.

@GregMefford proposed that maybe instead of subscribing to prefixes of events like we do now, we should subscribe directly to event names which would allow us to perform a simple :ets.lookup to find matching handlers.

I really like the simplicity of the proposed approach. I think that we were considering it when discussing the initial implementation, and I'm not sure why we settled on the prefixes 🤔

I think that the difference boils down to following: in place of

Telemetry.attach(..., [:phoenix], ..., ...)

there would be

Telemetry.attach(..., Phoenix.events(), ..., ...)

To me the second example looks much more readable and better expresses the intent (i.e. subscribing to all the events published by Phoenix). However, it has a limitation: Phoenix (or any other project which uses this library) would need to know all the event names at the time of attaching the handler. Let's imagine that Phoenix emits the event with the execution time of the controller action. With the prefixes, the event name could be dynamically constructed like:

Telemetry.execute([:phoenix, :latency, YourApp.Controller, :show], 12.01])

With static event names we would need to use the event metadata:

Telemetry.execute([:phoenix, :controller, :latency], 12.01,
                  %{controller: YouApp.Controller, action: :show})

What do you think?

/cc @josevalim @GregMefford @studzien @chrismccord @cloud8421

Shall we release 1.0?

Nested payload for measurement?

I am thinking, can Telemetry be used to send event having a nested payload

:telemetry.execute(
  [:web, :request, :done],
  %{latency: latency},
  %{request_path: path, status_code: status}
)

But what if I need to emit an event with measurement as a payload which may be deeply nested map? Not sure if the type of measurement is documented. Or is it a limitation of available reporters only?

Add telemetry:span/3

This is a spin off from the discussion in #56. @hauleth mentioned about providing telemetry:span/3 and I replied that I don't think we can come up with a generally useful API and we should have a separate discussion about it. So instead of me being grumpy, I thought I would open up said discussion and send a proposal:

telemetry:span(EventPrefix, StartMetadata, (() -> {Result, StopMetadata}) :: Result

This will emit three events:

EventPrefix ++ [start] with measurement #{system_time := erlang:system_time()} and StartMetadata
EventPrefix ++ [exception] with measurement #{duration := erlang:monotonic_time()} and #{kind := throw | error | exit, reason := term(), stacktrace := list()} as metadata
EventPrefix ++ [stop] with measurement #{duration := erlang:monotonic_time()} and StopMetadata

The only limitation with this approach is only that we can't configure measurements but I would say configuring metadata is good enough and despite my initial comment, this should cover a good amount of scenarios.

Shall we ship with this approach?

	-spec list_for_event(telemetry:event_name()) -> [#handler{}].
	list_for_event(EventName) ->
	ets:lookup(?MODULE, EventName).