Giter Club home page Giter Club logo

phobos-issues's Introduction

Phobos® Issue Tracker and Bug Reports

Phobos® logo

This repository is the public issue and bugtracker for Phobos® - Enterprise DevOps Suite for Akka.NET.

If you have any of the following issues:

  1. Bug reports;
  2. Feature requests;
  3. Documentation / tutorials request; or
  4. Compatibility or platform requests

Please file them here by either reviewing current issues or submitting new ones.

Other Resources

Here are some other Phobos resources that could help you:

  1. All supported and planned Phobos monitoring and tracing integrations;
  2. Phobos integration Docker files and setup instructions; and
  3. Phobos code samples.

phobos-issues's People

Contributors

aaronontheweb avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

aaronontheweb

phobos-issues's Issues

When using akka.serialize-messages = on together with Phobos it throws errors in the application when trying to fetch the `actor hierarchy`

When I have Phobos 1.1.3 and Petabridge.CMD 0.8.5 I get the following errors when I want to fetch the actor hierarchy via pbm. Also no hierarchy is shown except /user.

Output of the pbm console:

petabridge.cmd (0.8.5.0)
Copyright 2017 - 2021, Petabridge�.

successfully connected to [::ffff:127.0.0.1]:9110
Commands downloaded from server. type help to see what's available
[127.0.0.1:9110] pbm> actor hierarchy
/user

Errors thrown in the application running the actorsystem:

[15:57:42.356 +02:00 Error] [Petabridge.Cmd.Host.Default.Actor.ActorTracer] [RequestId: ] Swallowing exception during message send
System.Runtime.Serialization.SerializationException: Failed to serialize and deserialize payload object [Phobos.Tracing.SpanEnvelope]. Envelope: [<Phobos.Tracing.SpanEnvelope> from [akka://blueprint-service/user/petabridge.cmd/127.0.0.1%3A46166/actor/handler/$a/$J#202894395]], Actor type: [Petabridge.Cmd.Host.Default.Actor.ActorTracer]
---> System.InvalidCastException: Unable to cast object of type 'Surrogate' to type 'Akka.Actor.ActorPath'.
at lambda_method(Closure , Stream , DeserializerSession )
at lambda_method(Closure , Stream , DeserializerSession )
at Hyperion.Serializer.Deserialize[T](Stream stream)
at Akka.Serialization.HyperionSerializer.FromBinary(Byte[] bytes, Type type)
at Akka.Serialization.Serialization.Deserialize(Byte[] bytes, Int32 serializerId, String manifest)
at Phobos.Tracing.Serialization.WrappedPayloadSupport.PayloadFrom(Payload payload)
at Phobos.Tracing.Serialization.TraceEnvelopeSerializer.WithTraceFromProto(Byte[] bytes)
at Akka.Serialization.Serialization.Deserialize(Byte[] bytes, Int32 serializerId, String manifest)
at Akka.Actor.ActorCell.SerializeAndDeserializePayload(Object obj)
at Akka.Actor.ActorCell.SerializeAndDeserialize(Envelope envelope)
--- End of inner exception stack trace ---
at Akka.Actor.ActorCell.SerializeAndDeserialize(Envelope envelope)
at Akka.Actor.ActorCell.SendMessage(Envelope message)
[15:57:42.357 +02:00 Error] [Petabridge.Cmd.Host.Default.Actor.ActorTracer] [RequestId: ] Swallowing exception during message send

Tracing: Self.Forward causes entirely new trace to be made

Version: 1.0.3

Expected Behavior:

When an actor in the middle of processing a valid trace calls Self.Forward, it should be a continuation of the current trace in progress.

Actual Behavior:

When Self.Forward is called, an entirely new trace is created.

Get the current actor settings from Phobos to disable tracing throws exception

This line of code
var defaultSettings = PhobosSettings.For(actorSystem).ActorSettings.WithTracing(false)

throws the following exception when PhobosProviderSelection.Local is used during boostrapping

Unable to cast object of type 'Akka.Actor.LocalActorRefProvider' to type 'Phobos.Actor.IPhobosActorRefProvider'

UPDATE
This is also happening when PhobosProviderSelection.Cluster is used

Unable to filter out noise from trace open telemetry

I am currently investigating how we can best use OTel and Phobos in our software solution. During the implementation I can't really figure out how to filter out the messages below without adding an endless list of separate message filters.

The trace-actor-lifecycle option is disabled, but there are still plenty of internal akka (unnecessary) messages.

image

Using .NET 6, Akka.NET Cluster, Open Telemetry and Phobos.

How to propagate external context to Phobos?

Our app receives messages from RabbitMQ and forwards them to Akka actors. Here's the code enriched with OpenTelemetry context propagation:

let extractTraceContext (properties: IBasicProperties) (key: string) =
    try
        let result, value = properties.Headers.TryGetValue key
        if result then
            logDebugf mailbox $"UNDO: extracted header {value} for {key}"
            seq { Encoding.UTF8.GetString(value :?> byte []) }
        else
            logDebugf mailbox $"UNDO: no header for {key}"
            Seq.empty
    with
    | _ -> Seq.empty

base.HandleBasicDeliver(consumerTag, deliveryTag, redelivered, exchange, routingKey, properties, body)
let args = BasicDeliverEventArgs(consumerTag, deliveryTag, redelivered, exchange, routingKey, properties, body)

// Extract the PropagationContext of the upstream parent from the message headers
let parentContext =
    traceContenxtPropagator.Extract(PropagationContext(), properties, (fun props key -> extractTraceContext props key))
// Inject extracted info into current context
Baggage.Current <- parentContext.Baggage
// start an activity
use activity = activitySource.StartActivity("mq_receive", ActivityKind.Consumer, parentContext.ActivityContext)
logDebugf mailbox $"UNDO: started activity {activity} for {activitySource.Name}"

target <! Delivered
    { DeliveryTag = args.DeliveryTag
        Redelivered = args.Redelivered
        BasicProperties = args.BasicProperties
        Body = body.ToArray() }

The "target" in the code above is the actor that will process the message just received. The incoming message comes with "traceparent" header set by the message publisher. I can see the distributed trace that includes my activity "mq_receive".

But the trace stops there, nothing is propagated with the "Delivered" message that is sent to "target" actor. I am obviously missing something I need to do to reach Phobos tracing.

Don't finish trace upon stashing

Should try to preserve traces when we're stashing, when possible. Might need a custom IStash implementation in order to do this though.

When using akka.serialize-messages = on together with Phobos it sometimes throws this error during shutdown

[21:00:03.435 +02:00 Error] [Phobos.Actor.Instrumentation.Actors.EventStreamMonitorActor] [RequestId: ] Swallowing exception during message send
System.Runtime.Serialization.SerializationException: Failed to serialize and deserialize payload object [Phobos.Tracing.SpanEnvelope]. Envelope: [<Phobos.Tracing.SpanEnvelope> from [akka://blueprint-service/user/blueprint-actor_2bc365a3-12e4-4727-aaef-083b4f6f82d4#156300447]], Actor type: [Phobos.Actor.Instr
umentation.Actors.EventStreamMonitorActor]
---> Microsoft.CSharp.RuntimeBinder.RuntimeBinderException: The best overloaded method match for 'Hyperion.SerializerFactories.ArraySerializerFactory.WriteValues<System.ValueType>(System.ValueType[], System.IO.Stream, System.Type, Hyperion.ValueSerializers.ValueSerializer, Hyperion.SerializerSession)' has s
ome invalid arguments

Error when reporting to App Metrics statsd to DataDog

Hi,
not sure if the bug is related to Phobos, AppMetrics or Dogstatsd but we are getting this error:

Jan 30 20:19:45 *** agent[3330389]: 2021-01-30 20:19:45 EST | CORE | ERROR | (pkg/dogstatsd/server.go:438 in errLog) | Dogstatsd: error parsing metric message '"Akka.NET.akka.messages.recv.meter.value:606|m"': invalid metric type: "m"

Thanks!

When support OpenTelemetry?

Excuse, me.

Of course, I read to it.

What About Phobos and OpenTelemetry?
In the near future Phobos will support the emerging OpenTelemetry standard, but as we noted earlier this summer - OpenTelemetry is still somewhat half-baked for .NET as of late 2020. We expect that by the time .NET 6 is released that OpenTelemetry will be widely available and ready for production use, at which point we will ensure that Phobos supports it.

But, currently, OpenTelemetry C# is Ver 1.0.
So, I'm curious, when support OpenTelemetry?

I need to roadmap for my boss.

ps) Can Phobos support Splunk?

Broken trace when recovering persistent actor in sharded cluster

We've noticed an issue with Phobos tracing in our sharded cluster when a persistent child actor is started as a command is sent to it. The tracing appears to be working as expected until the child actor starts:

akka.msg.recv_MyActorShardEnvelope
 - akka.msg.recv_MyActorShardEnvelope
    - akka.actor.start

8<-------- break in tracing --------

akka.msg.recv_RequestRecoveryPermit

Trace Details

akka.msg.recv_ShardEnvelope

akka.actor.path         = /system/sharding/myactor
akka.actor.recv.msgType = MyContracts.MyShardEnvelope
akka.actor.recv.sender  = akka.tcp://myactorsystem@myapi:8081/temp/vu
akka.actor.type         = Akka.Cluster.Sharding.ShardRegion

akka.msg.recv_ShardEnvelope

akka.actor.path         = /system/sharding/myactor/myactor-id
akka.actor.recv.msgType = MyContracts.MyShardEnvelope
akka.actor.recv.sender  = akka.tcp://myactorsystem@myapi:8081/temp/vu
akka.actor.type         = Akka.Cluster.Sharding.Shard

MyActor receives the message MyShardEnvelope which has the real command inside and sends that command to MyChildActor:

akka.actor.start

akka.actor.path         = akka://myactorsystem/system/sharding/myactor/myactor-id/mychildactor-id#123456789
akka.actor.type         = MyApplication.MyChildActor

The trace appears to break here.

akka.msg.recv_RequestRecoveryPermit

akka.actor.path         = /system/recoveryPermitter
akka.actor.recv.msgType = Akka.Persistence.RequestRecoveryPermit
akka.actor.recv.sender  = akka://myactorsystem/system/sharding/myactor/myactor-id/mychildactor-id/$a
akka.actor.type         = Akka.Persistence.RecoveryPermitter

Add support for labels for metrics

Based on the Prometheus specs, this seems like a useful idea that might help simplify some of the work Phobos.Monitoring has to do internally today - it could also make things easier for end-users too.

Tracing: change filtering defaults

Need to test changing filtering such that it totally excludes filtered messages from active traces, rather than merely "no start" new ones - this is a more aggressive filtering change but it should align with customer expectations more clearly.

OpenTelemetry: remove `PhobosSetup`

OpenTelemetry eliminates the need to pass around references to a shared ITracer or IMetricsRoot object in order for all traces and metrics to be properly correlated and aggregated. Therefore we can very likely eliminate the PhobosConfigBuilder and some of the other objects from Phobos 1.x.

Limiting granularity of Phobos distributed traces

I tried to use Phobos-instrumented system with Honeycomb APM and quickly got into a problem with rate limit. Honeycomb sets maximum rate of 2000 events per second for a dataset (perhaps higher rate for Enterprise account, I didn't check) and our application exceeded that. While this is understandable because each incoming request results in a complex processing workflow across the Akka cluster nodes, the traffic in fact is relatively low.

I wonder whether you think Phobos tracing can be extended with activity/actor filtering so an Akka application can be configured to to sample just subsets of actors/activities. I haven't thoroghly thought about filtering principles, just trying to figure out how to deal with challenges like described above, when Phobos traces hit certain rate limit set by the APM provider.

1.4.*: `propagate-settings-to-children` isn't propagated when using Akka.DI or Akka.DependencyInjection

Reproduction:

phobos{
	tracing{
		trace-all-system-actors = off
		trace-all-user-actors = off
	}

	monitoring{
		monitor-all-system-actors = off
		monitor-all-user-actors = off
	}
}

Create a parent actor with:

akka.actor.deployment{
   /parent{
       phobos{
         propagate-settings-to-children = on
         tracing.enabled = on
        monitoring.enabled = on
       }
   }
}

Then create some child actors using Akka.DI or Akka.DependencyInjection.

Expected

Child actors would have tracing and monitoring.

Actual

Child actors do not have tracing and monitoring.

The type initializer for 'Phobos.Actor.Configuration.PhobosProviderSelection' threw an exception

After having upgraded akka modules to 1.4.28 the following line threw an exception

bootstrap = bootstrap.WithActorRefProvider(
  akkaSettings.EnableCluster ? PhobosProviderSelection.Cluster : PhobosProviderSelection.Local);

And this is the full exception

[14:10:30.727 +01:00 Fatal] [Microsoft.AspNetCore.Hosting.WebHost] Application startup exception
System.TypeInitializationException: The type initializer for 'Phobos.Actor.Configuration.PhobosProviderSelection' threw an exception.
---> System.TypeLoadException: Method 'CreateFutureRef' in type 'Phobos.Actor.PhobosActorRefProvider' from assembly 'Phobos.Actor, Version=1.2.3.0, Culture=neutral, PublicKeyToken=null' does not have an implementation.
at Phobos.Actor.Configuration.PhobosProviderSelection..cctor()
--- End of inner exception stack trace ---
at Agrifirm.Core.BuildingBlocks.AkkaSupport.AgrifirmAkkaService.StartActorSystem() in AgrifirmAkkaService.cs:line 93
at Agrifirm.Core.BuildingBlocks.AkkaSupport.AgrifirmAkkaService..ctor(IServiceProvider sp, IHostApplicationLifetime lifetime) in AgrifirmAkkaService.cs:line 36
at Agrifirm.Service.BusinessPartner.AkkaBootstrapService..ctor(IServiceProvider serviceProvider, IHostApplicationLifetime lifetime) in AkkaBootstrapService.cs:line 21
at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitConstructor(ConstructorCallSite constructorCallSite, RuntimeResolverContext context)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitCache(ServiceCallSite callSite, RuntimeResolverContext context, ServiceProviderEngineScope serviceProviderEngine, RuntimeResolverLock lockType)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitRootCache(ServiceCallSite singletonCallSite, RuntimeResolverContext context)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor2.VisitCallSite(ServiceCallSite callSite, TArgument argument) at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.Resolve(ServiceCallSite callSite, ServiceProviderEngineScope scope) at Microsoft.Extensions.DependencyInjection.ServiceLookup.DynamicServiceProviderEngine.<>c__DisplayClass1_0.<RealizeService>b__0(ServiceProviderEngineScope scope) at Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope.GetService(Type serviceType) at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService(IServiceProvider provider, Type serviceType) at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService[T](IServiceProvider provider) at Agrifirm.Core.BuildingBlocks.DependencyInjection.<>c__01.b__0_0(IServiceProvider sp) in DependencyInjection.cs:line 44
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitFactory(FactoryCallSite factoryCallSite, RuntimeResolverContext context)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitCache(ServiceCallSite callSite, RuntimeResolverContext context, ServiceProviderEngineScope serviceProviderEngine, RuntimeResolverLock lockType)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitRootCache(ServiceCallSite singletonCallSite, RuntimeResolverContext context)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor2.VisitCallSite(ServiceCallSite callSite, TArgument argument) at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitConstructor(ConstructorCallSite constructorCallSite, RuntimeResolverContext context) at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitCache(ServiceCallSite callSite, RuntimeResolverContext context, ServiceProviderEngineScope serviceProviderEngine, RuntimeResolverLock lockType) at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitRootCache(ServiceCallSite singletonCallSite, RuntimeResolverContext context) at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor2.VisitCallSite(ServiceCallSite callSite, TArgument argument)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.Resolve(ServiceCallSite callSite, ServiceProviderEngineScope scope)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.DynamicServiceProviderEngine.<>c__DisplayClass1_0.b__0(ServiceProviderEngineScope scope)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope.GetService(Type serviceType)
at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService(IServiceProvider provider, Type serviceType)
at Agrifirm.Core.BuildingBlocks.DependencyInjection.<>c__DisplayClass1_1.b__4(IServiceProvider sp) in DependencyInjection.cs:line 93
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitFactory(FactoryCallSite factoryCallSite, RuntimeResolverContext context)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitCache(ServiceCallSite callSite, RuntimeResolverContext context, ServiceProviderEngineScope serviceProviderEngine, RuntimeResolverLock lockType)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitRootCache(ServiceCallSite singletonCallSite, RuntimeResolverContext context)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor2.VisitCallSite(ServiceCallSite callSite, TArgument argument) at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitConstructor(ConstructorCallSite constructorCallSite, RuntimeResolverContext context) at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitCache(ServiceCallSite callSite, RuntimeResolverContext context, ServiceProviderEngineScope serviceProviderEngine, RuntimeResolverLock lockType) at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitRootCache(ServiceCallSite singletonCallSite, RuntimeResolverContext context) at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor2.VisitCallSite(ServiceCallSite callSite, TArgument argument)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.Resolve(ServiceCallSite callSite, ServiceProviderEngineScope scope)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.DynamicServiceProviderEngine.<>c__DisplayClass1_0.b__0(ServiceProviderEngineScope scope)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope.GetService(Type serviceType)
at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService(IServiceProvider provider, Type serviceType)
at Agrifirm.Core.BuildingBlocks.DependencyInjection.<>c__DisplayClass1_1.b__4(IServiceProvider sp) in DependencyInjection.cs:line 93
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitFactory(FactoryCallSite factoryCallSite, RuntimeResolverContext context)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitCache(ServiceCallSite callSite, RuntimeResolverContext context, ServiceProviderEngineScope serviceProviderEngine, RuntimeResolverLock lockType)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitRootCache(ServiceCallSite singletonCallSite, RuntimeResolverContext context)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(ServiceCallSite callSite, TArgument argument)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.Resolve(ServiceCallSite callSite, ServiceProviderEngineScope scope)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.DynamicServiceProviderEngine.<>c__DisplayClass1_0.b__0(ServiceProviderEngineScope scope)
at Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngine.GetService(Type serviceType)
at Microsoft.Extensions.DependencyInjection.ServiceProvider.GetService(Type serviceType)
at Microsoft.Extensions.Internal.ActivatorUtilities.ConstructorMatcher.CreateInstance(IServiceProvider provider)
at Microsoft.Extensions.Internal.ActivatorUtilities.CreateInstance(IServiceProvider provider, Type instanceType, Object[] parameters)
at Microsoft.AspNetCore.Builder.UseMiddlewareExtensions.<>c__DisplayClass5_0.b__0(RequestDelegate next)
at Microsoft.AspNetCore.Builder.ApplicationBuilder.Build()
at Microsoft.AspNetCore.Hosting.WebHost.BuildApplication()

Metrics: properly track messages transmitted through DistributedPubSub

Issue report from a customer:

if Phobos will instrument actors used for distributed pub-sub across a cluster with associated messages? I can see Subscribe and SubscribeAck messages but I don’t see any other message types in akka_net_akka_messages_recv_total metric (using Prometheus) that my application sends using distributed pub sub mechanism, so I wasn’t too sure if that’s correct behaviour or if I need to implement something explicitly to record such messages.

Need to validate that DistributedPubSub messages don't get blackholed by the metrics system.

OpenTelemetry Plans: Phobos 2.0

With the launch of Phobos 2.0.0-beta1 we've put our first major stake in the ground around OpenTelemetry and Akka.NET. This issue is to track the scope of what we're working on leading up to Phobos 2.0's eventual RTM release:

  • Full tracing support, including propagation over Akka.Remote - all of the same features from Phobos 1.x;
  • Full metrics support, although we'll be revisiting some of the metrics types from 1.x in order to bring them in line with OpenTelemetry;
  • Baggage support, including propagation over Akka.Remote;
  • Updated Phobos Dashboards to reflect the changes in OTel metrics: https://github.com/petabridge/phobos-dashboards; and
  • Separate API documentation for both Phobos 1.x and 2.x.

See the Phobos 2.0.0 milestone here for updates.


Phobos 1.x Support Plans

Per our original Phobos 2.0 announcement:

So what’s in-store for Phobos 1.x in the context of 2.x coming around the corner?

Our plan is to concurrently support Phobos 1.x and 2.x all the way through the Akka.NET v1.4 and 1.5 lifecycles, which should last through the end of 2023.

We will continue to add updates to Phobos 1.x for fixing core actor tracing and metrics, but we will likely not address any APM-vendor-specific issues. I.e. “my graphs in Datadog look different than they do in Jaeger.” We’re going to direct users to Phobos 2.0 as OpenTelemetry enjoys official, ongoing support in ways that App.Metrics and older driver implementations will not. New customers should use Phobos 2.0 by default going forward once it becomes available.

OpenTelemetry: fix `Activity.Current` Leaks

In our internal issue tracker we've found some cases where active traces are leaking, thus breaking correlation inside our system. Happens most frequently around akka.actor.spawn and some of the new startup tracking that was added in v1.4 but ported into 2.0.

This is issue to track the status of that issue and others in the wild.

Need to add Phobos 2.0 documentation to website

Need to:

  • split API reference documentation, in clearly demarcated areas, between Phobos 1.x and 2.x
  • split conceptual documentation for configuration into 1.x and 2.x usages
  • split QuickStart tutorial into 1.x and 2.x usages

OpenTelemetry: need to add `Baggage` support

The changes between OpenTracing and OpenTelemetry involve a major shift around how Baggage is handled - it's now a standard piece of the OTel spec, not something optional that is left up to the devices of the ITracer implementation like how it was in OpenTracing. Therefore, we will need to support it.

Phobos trace and span are not properly propagated to Serilog contexts

From a customer:

Datadog can connect the logs for a trace if the logging context is enriched with dd.trace_id and dd.span_id, so I wrote an extension function for that

 public class OpenTracingContextLogEventEnricher : ILogEventEnricher
    {
        public void Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory)
        {
            var tracer = GlobalTracer.Instance;
            if (tracer?.ActiveSpan == null)
                return;

            logEvent.AddPropertyIfAbsent(
                propertyFactory.CreateProperty("dd.trace_id", tracer.ActiveSpan.Context.TraceId));
            logEvent.AddPropertyIfAbsent(propertyFactory.CreateProperty("dd.span_id", tracer.ActiveSpan.Context.SpanId));
        }
    }

Unfortunately it did not work as I expected. Akka.net does not add those context properties to the Serilog logs, hence the logs from akka level don’t connect for any trace, though it works nicely in MVC API level.

An example of how this got gets used in concert with the Akka.Logger.Serilog package inside a running Akka.NET actor (pseudo-code):

public sealed class MyTestActor : UntypedActor{
    public override void Receive(object msg){
    
        using (var span = Context.GetInstrumentation().Tracer.BuildSpan("SomeSubSpan").StartActive())
        {
            // do some handling

            // call the `EnrichLogging` extension method from earlier
            Context.EnrichLogging().Info($"We did a thing");
        }

    }
}

Expected Results

We'd expect to see the Serilog structured logs contain the dd.trace_id and dd.span_id properties populated with the values of the current ITracer.ActiveSpan.Context.TraceId and ITracer.ActiveSpan.Context.SpanId properties respectively.

Actual Results

Those properties are never added to the Serilog context.

Possible Issues

Some thoughts on what could cause this issue:

  • the GlobalTracer.Instance in this case is not set, thus all of the properties return null;
  • for whatever reason, there is no ActiveSpan in this context;
  • the formatting of the LogEvent by the Serilog logger tries to access the ActiveSpan outside of its scope, thus there is non available.

Work-arounds

When setting up the ITracer you're going to pass into Phobos, explicitly set that tracer as the GlobalTracer.Instance:

var tracer = new MockTracer(new ActorScopeManager());
GlobalTracer.Register(tracer);

var phobosConfigBuilder = new PhobosConfigBuilder()
    .WithTracing(t => t.SetTracer(tracer));

var bootstrap = BootstrapSetup.Create()
    .WithActorRefProvider(PhobosProviderSelection.Cluster) // configures Phobos for Akka.Cluster
    .WithConfig(HoconConfig);

// need this to launch our ActorSystem
ActorSystemSetup phobosSetup = PhobosSetup.Create(phobosConfigBuilder).And(bootstrap);
ActorSystem actorSystem = ActorSystem.Create("MySys", phobosSetup);

If the issue is that the Serilog logger is trying to access contextual properties that aren't available when the Serilog logger runs asynchronously, we can work around that issue by closing over the properties from inside the Akka.NET actor, rather than having the Serilog ILogEventEnricher handle it. That way the appropriate span data can be captured during actor execution.

Phobos 1.4.x - OpenTracing: Tracing breaks when using Ask().PipeTo()

We've discovered an issue in Phobos (1.4.0 and 1.4.1) that can be seen in Jaeger when using the Ask().PipeTo() method:

myOtherActor.Ask(myMessage, TimeSpan.FromSeconds(7))
    .PipeTo(Self, sender,
        success => new MessageHandled(success),
        exception => new MessageFailed(exception));

We see a span for the actor that's asking, and the next span comes with a warning: "invalid parent span IDs=effe6a359f66ddc6; skipping clock skew adjustment".

image

However, if we modify the code to use async/await instead, it works fine:

var success = await myOtherActor.Ask(myMessage, TimeSpan.FromSeconds(7));
Context.Self.Forward(new MessageHandled(success));

Epic: Akka.Streams Support

Similar to how we completed adding end to end tracing for Akka.Persistence in v1.4, we need to improve how tracing is handled inside Akka.Streams for users going forward.

This includes:

  • Breaking up long-running Akka.Streams traces into smaller operations;
  • Trying to trace the graph stages themselves, rather than the actors running them - this is a stretch as it requires having access to the materializer and the graph interpreter, which aren't as easy to reach;
  • Adding extension methods to make it easier to pass trace context around flows, sources, and sinks; and
  • Standardizing practices such as petabridge/Phobos.Samples#17 to make it easier to propagate trace context to Alpakka / Akka.Streams.Kafka

This issue will take a backseat to #35 in terms of prioritization and whatever we do we'll add support for it to 1.x and 2.x.

OpenTelemetry support

It has been over a year since this post on OpenTelemetry support in Phobos. Are there any plans on moving to OpenTelemetry soon ?

We are working on a distributed system and using OpenTelemetry wherever we can. Currently I'm investigating whether I will be able to smoothly integrate our Akka.Net microservice, traced by Phobos using OT. I need to deal with different context propagation formats, trace collectors and different instrumentation libraries.

I guess that's all possible but if Phobos could just support OpenTelemetry, then things would greatly simplify for us.

Need to cut down on the number of messages captured by /system actors

In Phobos 1.0 and higher we automatically record data for the following /system actors even though tracing for /system actors is normally disabled by default:

  • Cluster.Sharding actors
  • DistributedPubSub actors

The reason why we do this is that these actors in particular interact with a ton of /user actors and we need to capture important interactions that occur there. However a lot of noise like this ends up being picked up:

  • Scheduled messages, i.e. GossipTick
  • System messages that have no bearing on /user actors

I'm not sure what the right approach is for cutting down the noise here, but I suspect it's probably going to involve making DistributedPubSub more "pass through" or including a list of specific /system messages that we don't want included inside traces.

Akka.Persistence.PersistentActor tracing support for Phobos

Bug report from a customer sent via email:

Hello, Whilst looking at how we trace requests through our actor systems using Phobos with Datadog, we're stuck at the point at which commands are persisted in our persistent actors. It seems as though one trace ends at the point commands are persisted and another one begins in the callback handler on success.

For example:

Command((Actor1Command c) => 
{ 
    Persist(c, h => 
       { 
          Logger.Info("LOGGING ADAPTER: Actor1Command {Id} received", c.Id); 
          _msgs.Add(c.Id); 
          _ddActor2.Tell(new Actor2Command { Id = Guid.NewGuid() }); }); }); 

The first trace ends with the following:

{
  "akka": {
    "actor": {
      "path": "/system/akka.persistence.journal.sql-server",
      "recv": {
        "msgType": "Akka.Persistence.WriteMessages",
        "sender": "akka://ddactorsystem/system/sharding/ddactor1/1/1"
      },
      "type": "Akka.Persistence.SqlServer.Journal.SqlServerJournal"
    }
  }
}

The second trace begins with these:

{
  "akka": {
    "actor": {
      "path": "/user/$a",
      "recv": {
        "msgType": "Akka.Persistence.Journal.AsyncWriteJournal+Desequenced",
        "sender": "akka://ddactorsystem/deadLetters"
      },
      "type": "Akka.Persistence.Journal.AsyncWriteJournal+Resequencer"
    }
  }
}

{
  "akka": {
    "actor": {
      "path": "/system/sharding/ddactor1/1/1",
      "recv": {
        "msgType": "Akka.Persistence.WriteMessageSuccess",
        "sender": "akka.tcp://[email protected]:4055/deadLetters"
      },
      "type": "Dd.Actors.DdActor1"
    }
  }
}

Of course, this could all be down to how I have configured the actor system.

Tracing: leaking undisposed `IScope`s

Looks as though we have some issues where a series of deeply rooted IScopes never get properly disposed, which retains memory for a prolonged period of time.

Cases we need to investigate:

  1. User creates new active IScope inside, original IScope created by Akka.NET never gets disposed
  2. send messages to other actors inside await block
  3. await call while an active IScope is inside a using block

Migrating from OpenTracing.NET to OpenTelemetry.NET

We have been using Phobos with OpenTelemetry where trace exporter is typically configured like this (in F#):

        let configureTracing (serviceProvider: IServiceProvider) =

            let settings = connectionString "JaegerPhobos" |> DbUtils.parseConnectionString
            let host = settings.["Host"]
            let port = settings.["Port"] |> Int32.Parse
            let loggerFactory = serviceProvider.GetRequiredService<ILoggerFactory>()
            let logReporter = LoggingReporter(loggerFactory)

            let remoteReporter =
                RemoteReporter
                    .Builder()
                    .WithLoggerFactory(loggerFactory)
                    .WithMaxQueueSize(100)
                    .WithFlushInterval(TimeSpan.FromSeconds(1.))
                    .WithSender(UdpSender(host, port, 0))
                    .Build()

            let assemblyName =
                Reflection.Assembly.GetEntryAssembly().GetName()
                    .Name
            let tracer =
                Tracer
                    .Builder(assemblyName)
                    .WithReporter(CompositeReporter(remoteReporter, logReporter))
                    .WithSampler(ConstSampler(true))
                    .WithScopeManager(ActorScopeManager())
                    .Build()

            tracer :> ITracer

The ITracer object is then passed to Phobos bootstrapper.

This is not how OpenTelemetry exporters are configured in .NET 6, here's a new sampe code (in C#):

services.AddOpenTelemetryTracing(builder => {
    builder
        .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("MyServiceName"))
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddSqlClientInstrumentation(o => {
            o.SetDbStatementForText = true;
            o.RecordException = true;
        })
        .AddJaegerExporter(o => {
            o.AgentHost = openTracingConfiguration?.Host ?? "localhost";
            o.AgentPort = openTracingConfiguration?.Port ?? 6831;
        });
});

I've found this post that suggests using TracerShim from OpenTelemetry.Shims.OpenTracing, and this workaround works. However it would be great if Phobos supported OpenTelemetry DI out of the box, without using TracerShim.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.