armory-plugins / armory-observability-plugin Goto Github PK
View Code? Open in Web Editor NEWSpinnaker plugin for enabling, configuring, and customizing observability features.
License: Apache License 2.0
Spinnaker plugin for enabling, configuring, and customizing observability features.
License: Apache License 2.0
Seeing a lot of other jvm metrics in there but not that one and we use that for the dashboards:
https://s.armory.io/P8uyr2JO
The Jedis connection pool metrics show up for Kayenta but not for other services like Fiat.
Fiat:
spinnaker@spin-fiat-bb6bcb649-mfldg:/$ wget -q -O - localhost:7003/aop-prometheus | grep -i pool | grep TYPE
spinnaker@spin-fiat-bb6bcb649-mfldg:/$
Kayenta
bash-5.0$ wget -q -O - localhost:8090/aop-prometheus | grep -i pool | grep TYPE
# TYPE threadpool_maximumPoolSize gauge
# TYPE threadpool_blockingQueueSize gauge
# TYPE threadpool_corePoolSize gauge
# TYPE redis_connectionPool_numWaiters gauge
# TYPE redis_connectionPool_minIdle gauge
# TYPE redis_connectionPool_numActive gauge
# TYPE threadpool_poolSize gauge
# TYPE threadpool_activeCount gauge
# TYPE redis_connectionPool_maxIdle gauge
# TYPE redis_connectionPool_numIdle gauge
Getting NaNs for:
pollingMonitor_newItems
pollingMonitor_itemsOverThreshold
When the Observability plugin is configured for all the services (regardless if the NewRelic config for the plugin is enabled or not) AND the Kayenta configuration has an active NewRelic Canary Metric store then the apiKey configuration of the Observability plugin is loaded as the apiKey of the Kayenta Canary store.
I am trying to use https://github.com/armory-plugins/armory-observability-plugin#plugin-configuration with prometheus, we are running spinnaker version 1.21.3 with Operator based installation.
i have added required configuration for enabling plugin in SpinnakerService.yaml file
spinnaker:
extensibility:
plugins:
Armory.ObservabilityPlugin:
config.metrics:
additionalTags:
customerEnvName: test
customerName: testcust
prometheus:
enabled: true
meterRegistryConfig.armoryRecommendedFiltersEnabled: true
enabled: true
repositories:
armory-observability-plugin-releases:
url: https://raw.githubusercontent.com/armory-plugins/armory-observability-plugin-releases/master/repositories.json
It should add plugin Ideally but we are getting exception in operator :
com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "config.metrics" (class com.netflix.spinnaker.halyard.config.model.v1.plugins.Plugin), not marked as ignorable (5 known properties: "version", "enabled", "id", "extensions", "uiResourceLocation"]) at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: com.netflix.spinnaker.halyard.config.model.v1.node.DeploymentConfiguration["spinnaker"]->com.netflix.spinnaker.halyard.config.model.v1.node.Spinnaker["extensibility"]->com.netflix.spinnaker.halyard.config.model.v1.node.Extensibility["plugins"]->java.util.LinkedHashMap["Armory.ObservabilityPlugin"]->com.netflix.spinnaker.halyard.config.model.v1.plugins.Plugin["config.metrics"])
I tried adding version for plugin 1.0.0 as well but same exception
Is plugin not supportable with 1.21.3 ? Or Any suggestion please if i am missing or doing anything wrong ?
Seeing NaN reported for several metric values...
Example:
`
Element | Value |
---|---|
jvm_memory_used{app_kubernetes_io_name="clouddriver",armSpinSvcVer="2.22.11",container="clouddriver",customerEnvName="testing",customerName="jasonmcintosh",endpoint="7002",id="CodeHeap 'non-nmethods'",instance="10.1.15.14:7002",job="clouddriver",lib="aop",libVer="v1.1.3",memtype="NON_HEAP",namespace="spinnaker",ossSpinSvcVer="6.11.1-20201012160101",pod="spin-clouddriver-8878c977b-nnxjm",spinSvc="clouddriver",spinnakerRelease="1.22.2",version="6.11.1-20201012160101"} | NaN |
`
Spinnaker code doesn't just use TimeUnit.SECONDS
throughout the code, various incantations such as TimeUnit.MILLISECONDS
and TimeUnit.NANOSECONDS
are also used.
registry.timer(connectionAcquiredId).record(elapsedAcquiredNanos, TimeUnit.NANOSECONDS)
In the plugin, we only use seconds
and thus our metric values can be off.
I don't know if we can reliably determine the unit of time and include in our metric name per Prometheus Best Practices.
Potential options:
_seconds
) and require users of the plugin to discover the unit through Spinnaker source code? I am happy to work through the spinnaker-mixin dashboards and alerts.I've configured DataDog using the supplied documentation and left the default stepInSeconds to 30s.
However, I've started getting some 'Payload too large' errors in Orca:
2022-12-02 20:41:22.224 ERROR 1 --- [trics-publisher] i.m.datadog.DatadogMeterRegistry : [] failed to send metrics to datadog: {"status":"error","code":413,"errors":["Payload too large"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"[email protected]"}
2022-12-02 20:42:22.208 ERROR 1 --- [trics-publisher] i.m.datadog.DatadogMeterRegistry : [] failed to send metrics to datadog: {"status":"error","code":413,"errors":["Payload too large"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"[email protected]"}
2022-12-02 20:43:22.214 ERROR 1 --- [trics-publisher] i.m.datadog.DatadogMeterRegistry : [] failed to send metrics to datadog: {"status":"error","code":413,"errors":["Payload too large"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"[email protected]"}
A few things I've noticed:
stepInSeconds
is not referenced by the RegistryConfig, and Step
is not configuredExposing Prometheus metrics on the same port
but a different path
to a Spinnaker services API endpoint is simple.
However it exposes a services internal API's potentially unauthenticated to Prometheus system and the operators of Prometheus.
If Prometheus is compromised then an attacker could leverage Prometheus' service discovery abilities to find Spinnaker endpoints and access say Fiat
service to modify permissions.
Prometheus acts like Everyone Else
in diagrams here: https://spinnaker.io/setup/security/authorization/#requirements
Personally prefer a different port by default, convention over configuration.
Simpler for example Prometheus configuration files, issues, etc.
Avoid collision with:
Process:
9710
(I'm happy to PR this if acceptable)Endpoint
with a Custom Port - eg revert?: 53fed00 (/prometheus
path is good too but anything ok)PodMonitor
to suit this port (I'm happy to PR this if acceptable)It would be really nice if we at a minimum ensured that all errors have an error GUID that showed up in API responses as well as displayed in Deck, so that logs could be correlated to errors quickly.
I want to do a spike to see what is possible with this plugin and Backstopper and deck.
Not sure if this would be effectively solved by enabling distributed tracing or not?
CC: @nicmunroe
Likely need to allow access without auth by using HttpSecurity to add this as a path. Per docs something like
@Configuration(proxyBeanMethods = false)
public class ActuatorSecurity extends WebSecurityConfigurerAdapter {
@Override
protected void configure(HttpSecurity http) throws Exception {
http.requestMatcher(EndpointRequest.toAnyEndpoint()).authorizeRequests((requests) ->
requests.anyRequest().permitAll());
}
}
Will likely be needed to allow access to gate. I'd assume this should be configurable/configured.
I tried armory-observability-plugin, but it didn’t work well.
$ hal deploy apply
- Get current deployment
Failure
Validation in Global:
! ERROR Could not translate your halconfig: Unrecognized field
"config.metrics" (class
com.netflix.spinnaker.halyard.config.model.v1.plugins.Plugin), not marked as
ignorable (5 known properties: "version", "enabled", "id", "extensions",
"uiResourceLocation"])
at [Source: UNKNOWN; line: -1, column: -1] (through reference chain:
com.netflix.spinnaker.halyard.config.model.v1.node.Halconfig["deploymentConfigurations"]->java.util.ArrayList[0]->com.netflix.spinnaker.halyard.config.model.v1.node.DeploymentConfiguration["spinnaker"]->com.netflix.spinnaker.halyard.config.model.v1.node.Spinnaker["extensibility"]->com.netflix.spinnaker.halyard.config.model.v1.node.Extensibility["plugins"]->java.util.LinkedHashMap["armory.observability-plugin"]->com.netflix.spinnaker.halyard.config.model.v1.plugins.Plugin["config.metrics"])
I think the config example in README is wrong because config.metrics must come under extensions as follow.
However, I couldn't find the value for extensions. (Ref: https://spinnaker.io/guides/user/plugins/)
spinnaker:
extensibility:
plugins:
Armory.ObservabilityPlugin:
enabled: true
extensions:
Armory.observabilityPlugin:
config.metrics:
additionalTags:
customerName: armory
customerEnvName: production
prometheus:
enabled: true
meterRegistryConfig.armoryRecommendedFiltersEnabled: true
java.lang.IllegalArgumentException: Prometheus requires that all meters with the same name have the same set of tag keys.
There is already an existing meter named
‘kubernetes_api_seconds’
containing tag keys
[account, action, applicationName, armoryAppVersion, customerEnvName, customerName, hostname, kinds, lib, libVersion, namespace, ossAppVersion, spinnakerRelease, success, version].
The meter you are attempting to register has keys
[account, action, applicationName, armoryAppVersion, customerEnvName, customerName, hostname, kinds, lib, libVersion, namespace, ossAppVersion, reason, spinnakerRelease, success, version].
Without any Accept
header, the aop-prometheus
endpoint returns a successful response
orca> wget -S http://localhost:8083/aop-prometheus
Connecting to localhost:8083 ([::1]:8083)
HTTP/1.1 200
Content-Type: text/plain; version=0.0.4;charset=utf-8
Content-Length: 31287
...
When using the Accept
a 406 error response is returned
orca> wget -S --header 'Accept: text/plain; version=0.0.4' http://localhost:8083/aop-prometheus
Connecting to localhost:8083 ([::1]:8083)
HTTP/1.1 406
wget: server returned error: HTTP/1.1 406
The prometheus spring-boot WebEndpoint doesn't have this issue. Looking at the code for the aop endpoint, the Accept header may not have been implemented due to issues with plugins/WebEndpoint
New to spinnaker metrics. using v1.26.7
I set the below configuration in halyard and related profiles
management:
endpoints:
web:
exposure.include: health,info,aop-prometheus
spinnaker:
extensibility:
plugins:
Armory.ObservabilityPlugin:
enabled: true
version: v1.4.1
config:
metrics:
additionalTags:
customerEnvName: preprod
prometheus:
enabled: true
repositories:
armory-observability-plugin-releases:
url: https://raw.githubusercontent.com/armory-plugins/armory-observability-plugin-releases/master/repositories.json
And got some error when I hit it
$ curl -k http://localhost:8083/aop-prometheus
{"timestamp":1676514581933,"status":403,"error":"Forbidden","message":"Access Denied"}⏎
And found a line from its log
2023-02-16 02:26:53.306 WARN 1 --- [ main] o.r.ArmoryObservabilityCompositeRegistry : None of the supported Armory Observability Plugin registries where enabled defaulting a Simple Meter Registry which Spectator will use.
Do I have to do something more about this? : (
More logs here
2023-02-16 02:26:47.925 INFO 1 --- [ main] c.n.s.config.PluginsAutoConfiguration : Enabling spinnaker-official and spinnaker-community plugin repositories
2023-02-16 02:26:47.956 INFO 1 --- [ main] org.pf4j.AbstractPluginManager : No plugins
2023-02-16 02:26:51.303 INFO 1 --- [ main] org.pf4j.util.FileUtils : Expanded plugin zip 'Armory.ObservabilityPlugin-armory-observability-plugin-v1.4.1.zip' in 'Armory.ObservabilityPlugin-armory-observabi
lity-plugin-v1.4.1'
2023-02-16 02:26:51.306 INFO 1 --- [ main] org.pf4j.util.FileUtils : Expanded plugin zip 'echo.zip' in 'echo'
2023-02-16 02:26:51.311 INFO 1 --- [ main] org.pf4j.AbstractPluginManager : Plugin '[email protected]' resolved
2023-02-16 02:26:52.173 INFO 1 --- [ main] org.pf4j.util.FileUtils : Expanded plugin zip 'Aws.LambdaDeploymentPlugin-aws-lambda-deployment-plugin-spinnaker-1.0.5.zip' in 'Aws.LambdaDeploymentPlugin-aws-la
mbda-deployment-plugin-spinnaker-1.0.5'
2023-02-16 02:26:52.174 WARN 1 --- [ main] c.n.s.k.p.bundle.PluginBundleExtractor : Downloaded plugin bundle 'Aws.LambdaDeploymentPlugin-aws-lambda-deployment-plugin-spinnaker-1.0.5.zip' does not have plugin for service
: echo
2023-02-16 02:26:52.175 INFO 1 --- [ main] org.pf4j.AbstractPluginManager : Start plugin '[email protected]'
2023-02-16 02:26:52.181 INFO 1 --- [ main] o.s.c.a.ConfigurationClassPostProcessor : Cannot enhance @Configuration bean definition 'application' since its singleton instance has been created too early. The typical cause
is a non-static @Bean method with a BeanDefinitionRegistryPostProcessor return type: Consider declaring such methods as 'static'.
2023-02-16 02:26:52.182 INFO 1 --- [ main] o.s.c.a.ConfigurationClassPostProcessor : Cannot enhance @Configuration bean definition 'com.netflix.spinnaker.kork.PlatformComponents' since its singleton instance has been cre
ated too early. The typical cause is a non-static @Bean method with a BeanDefinitionRegistryPostProcessor return type: Consider declaring such methods as 'static'.
2023-02-16 02:26:52.518 INFO 1 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'gcsSecretEngine' of type [com.netflix.spinnaker.kork.secrets.engines.GcsSecretEngine] is not eligible for getting processed by al
l BeanPostProcessors (for example: not eligible for auto-proxying)
2023-02-16 02:26:52.519 INFO 1 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'secretsManagerSecretEngine' of type [com.netflix.spinnaker.kork.secrets.engines.SecretsManagerSecretEngine] is not eligible for g
etting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2023-02-16 02:26:52.522 INFO 1 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 's3SecretEngine' of type [com.netflix.spinnaker.kork.secrets.engines.S3SecretEngine] is not eligible for getting processed by all
BeanPostProcessors (for example: not eligible for auto-proxying)
2023-02-16 02:26:52.522 INFO 1 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'noopSecretEngine' of type [com.netflix.spinnaker.kork.secrets.engines.NoopSecretEngine] is not eligible for getting processed by
all BeanPostProcessors (for example: not eligible for auto-proxying)
2023-02-16 02:26:52.523 INFO 1 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'secretEngineRegistry' of type [com.netflix.spinnaker.kork.secrets.SecretEngineRegistry] is not eligible for getting processed by
all BeanPostProcessors (for example: not eligible for auto-proxying)
2023-02-16 02:26:52.523 INFO 1 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'secretManager' of type [com.netflix.spinnaker.kork.secrets.SecretManager] is not eligible for getting processed by all BeanPostPr
ocessors (for example: not eligible for auto-proxying)
2023-02-16 02:26:52.560 INFO 1 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.security.config.annotation.configuration.ObjectPostProcessorConfiguration' of type [org.springframework.secur
ity.config.annotation.configuration.ObjectPostProcessorConfiguration] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2023-02-16 02:26:52.668 INFO 1 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'objectPostProcessor' of type [org.springframework.security.config.annotation.configuration.AutowireBeanFactoryObjectPostProcessor
] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2023-02-16 02:26:52.671 INFO 1 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.security.access.expression.method.DefaultMethodSecurityExpressionHandler@7e32442d' of type [org.springframewo
rk.security.access.expression.method.DefaultMethodSecurityExpressionHandler] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2023-02-16 02:26:52.675 INFO 1 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.security.config.annotation.method.configuration.GlobalMethodSecurityConfiguration' of type [org.springframewo
rk.security.config.annotation.method.configuration.GlobalMethodSecurityConfiguration] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2023-02-16 02:26:52.684 INFO 1 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'methodSecurityMetadataSource' of type [org.springframework.security.access.method.DelegatingMethodSecurityMetadataSource] is not
eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2023-02-16 02:26:53.019 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8089 (http)
2023-02-16 02:26:53.030 INFO 1 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2023-02-16 02:26:53.030 INFO 1 --- [ main] org.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/9.0.31]
2023-02-16 02:26:53.112 INFO 1 --- [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
2023-02-16 02:26:53.112 INFO 1 --- [ main] o.s.web.context.ContextLoader : Root WebApplicationContext: initialization completed in 6852 ms
2023-02-16 02:26:53.306 WARN 1 --- [ main] o.r.ArmoryObservabilityCompositeRegistry : None of the supported Armory Observability Plugin registries where enabled defaulting a Simple Meter Registry which Spectator will use.
2023-02-16 02:26:53.314 INFO 1 --- [ main] a.p.o.r.AddDefaultTagsRegistryCustomizer : Adding default tags to registry: SimpleMeterRegistry
2023-02-16 02:26:53.315 WARN 1 --- [ main] i.a.p.observability.service.TagsService : You can ignore the following warning if you are not running an Armory Wrapper Spinnaker Service for Spinnaker >= 2.19
2023-02-16 02:26:53.315 WARN 1 --- [ main] i.a.p.observability.service.TagsService : Failed to load META-INF/build-info.properties, msg: inStream parameter is null
2023-02-16 02:26:53.321 INFO 1 --- [ main] i.a.p.observability.service.TagsService : Adding default tag hostname: spin-echo-6d8f667f84-pjfps to default tags list.
2023-02-16 02:26:53.321 INFO 1 --- [ main] i.a.p.observability.service.TagsService : Adding default tag lib: aop to default tags list.
2023-02-16 02:26:53.321 INFO 1 --- [ main] i.a.p.observability.service.TagsService : Adding default tag customerEnvName: preprod to default tags list.
2023-02-16 02:26:53.321 INFO 1 --- [ main] i.a.p.observability.service.TagsService : Adding default tag spinSvc: echo to default tags list.
2023-02-16 02:26:53.321 INFO 1 --- [ main] i.a.p.observability.service.TagsService : Adding default tag version: 1.0.0 to default tags list.
2023-02-16 02:26:53.321 INFO 1 --- [ main] i.a.p.observability.service.TagsService : Adding default tag libVer: v1.4.1 to default tags list.
2023-02-16 02:26:53.322 INFO 1 --- [ main] i.a.p.o.r.AddFiltersRegistryCustomizer : Adding Meter Filters to registry: SimpleMeterRegistry
Armory Observability Plugin version: v1.1.1-RC2
Spinnaker version: 1.22.1
Using nri-prometheus which is New Relic's OpenMetrics Prometheus integration to scrape prometheus metrics from endpoints.
The integration is unable to parse metrics from the orca endpoint, due to this error:
text format parsing error in line 289: second TYPE line for metric name \"stage_invocations_total\", or TYPE reported after samples
Manually inspecting the metrics endpoint confirms that stage_invocations_total
is defined multiple times, possibly once per application.
This appears to start happening when you trigger a pipeline on 2 different applications.
I also confirmed this issue exists in plugin version v1.0.0
.
Prometheus server expect to have the tag keys in snakeCase otherwise are dropped during scrape. Specifically for CloudFoundry okhttp metrics we have seen that some tag keys are not following the snakeCase, ie:
# HELP cf_okhttp_requests_seconds
# TYPE cf_okhttp_requests_seconds summary
cf_okhttp_requests_seconds_count{host="host",lib="aop",libVer="v1.4.2-SNAPSHOT",method="POST",spinSvc="clouddriver",status="IO_ERROR",target.host="host",target.port="443",target.scheme="https",uri="none",} 21.0
cf_okhttp_requests_seconds_sum{host="host",lib="aop",libVer="v1.4.2-SNAPSHOT",method="POST",spinSvc="clouddriver",status="IO_ERROR",target.host="host",target.port="443",target.scheme="https",uri="none",} 6.755297792
# HELP cf_okhttp_requests_seconds_max
# TYPE cf_okhttp_requests_seconds_max gauge
cf_okhttp_requests_seconds_max{host="host,lib="aop",libVer="v1.4.2-SNAPSHOT",method="POST",spinSvc="clouddriver",status="IO_ERROR",target.host="host",target.port="443",target.scheme="https",uri="none",} 0.981907154
2020-08-21 16:11:25.000 WARN 1 --- [ main] o.s.boot.actuate.endpoint.EndpointId : Endpoint ID 'aop-prometheus' contains invalid characters, please migrate to a valid format.
Should change the endpoint-id at some future release...
Hi,
Could you also support DataDog as target like Newrelic?
There's existing DataDog support in the micrometer project https://github.com/micrometer-metrics/micrometer/tree/master/implementations/micrometer-registry-datadog/src/main/java/io/micrometer/datadog
Best regards,
Johannes
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.