Giter Club home page Giter Club logo

cryostat-agent's Introduction

cryostat-agent

CI build and push Google Group : Cryostat Development

Discovery agent plugin for Cryostat.

Cryostat has a Discovery API to allow service locator bridges ("Discovery Plugins") to inform Cryostat about the presence of connectable JVM applications. These Discovery Plugins may be implemented in a way that bridges a service locator system to Cryostat's API, or the Discovery Plugin may be implemented on individual application instances so that the applications may self-publish themselves to the Cryostat Discovery API. This agent implements a Discovery Plugin as an attachable JVM agent that can be included in a target application to enhance it for self-publishing its location to Cryostat.

REQUIREMENTS

Run Requirements

  • JDK11+
  • a Cryostat server instance

The Cryostat project follows semantic versioning. Generally, each Cryostat Agent minor version is developed to and compatible with a specific Cryostat server minor release version. Other version combinations outside of this matrix may work but are neither tested nor supported. If you run into any issues, please check if there is an available version upgrade and ensure both your Agent and server match this version matrix.

Cryostat Release Server version Agent version
< 2.3.0 โ‰ค v2.2.1 N/A
2.3.0 v2.3.0 v0.2.0
2.3.1 v2.3.1 v0.2.3
2.4.0 v2.4.0 v0.3.0

Build Requirements

  • Git
  • OpenJDK11+
  • Maven 3+

Run Requirements:

  • A OpenJDK11+ application JVM to attach this agent to
  • Configuration for the application JVM to load this agent
  • A Cryostat server instance

Run

An example for configuring a Quarkus application to use this agent and enable JMX:

JAVA_OPTIONS="-Dcom.sun.management.jmxremote.port=9091 -Dcom.sun.management.jmxremote.rmi.port=9091 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -javaagent:/deployments/app/cryostat-agent-${CRYOSTAT_AGENT_VERSION}.jar"

This assumes that the agent JAR has been included in the application image within /deployments/app/ or mounted as a volume at the same location.

The agent JAR may also be loaded and dynamically attached to an already-running JVM. In this case, the agent JAR must again be already included in the application image mounted as a volume into the container at the same location. If this requirement is met, then the host JVM application may be started first without the Cryostat Agent with whatever its own standard launcher process looks like. Once that application is running, the Cryostat Agent can be launched as a separate process and asked to dynamically attach to the host application:

$ java -jar /path/to/cryostat-agent.jar -Dcryostat.agent.baseuri=http://cryostat.local

In this dynamic attachment mode, the agent configuration options can be specified using the -D/property flag. These must be placed after the -jar /path/to/cryostat-agent.jar in order to be passed as arguments to the agent launcher - if they are passed before the -jar then they will be used by the java process as system properties on the agent launcher itself, rather than having them passed on to the injected instances.

Smart triggers can be specified using --smartTrigger.

The optional PID is a positional argument and may be ignored or set to: 0 to request that the Agent launcher attempt to find exactly one candidate JVM application to dynamically attach to, exiting if zero or more than one applications are found; * to request that the Agent launch attempt to dynamically attach to every JVM application it finds; or a specific PID to request that the Agent attempt to dynamically attach only to that one PID.

Harvester

The various cryostat.agent.harvester.* properties may be used to configure cryostat-agent to start a new Flight Recording using a given event template on Agent initialization, and to periodically collect this recorded data and push it to the Agent's associated Cryostat server. The Agent will also attempt to push the tail end of this recording on JVM shutdown so that the cause of an unexpected JVM shutdown might be captured for later analysis.

Smart Triggers

cryostat-agent supports Smart Triggers that listen to the values of the MBean Counters and can start recordings based on a set of constraints specified by the user.

The general form of a Smart Trigger expression is as follows:

[constraint1(&&/||)constraint2...constraintN;durationConstraint]~recordingTemplateNameOrLabel

Either the filename or label XML tag of the ${templateName}.jfc may be used to specify the event template to use. For example, the JDK distribution ships with a default.jfc file containing the top-level <configuration label="Continuous"> element. This template may be specified in the Smart Trigger definition as any of default.jfc, default, or Continuous.

An example for listening to CPU Usage and starting a recording using the Profiling template when it exceeds 0.2%:

[ProcessCpuLoad>0.2]~profile

An example for watching for the Thread Count to exceed 20 for longer than 10 seconds and starting a recording using the Continuous template:

[ThreadCount>20;TargetDuration>duration("10s")]~Continuous

The first part of the condition before the semicolon is a Common Expression Language expression for testing various MBean metrics . The second part after the semicolon references a special variable, TargetDuration, which tracks the length of time that the first part of the condition has tested true for. This is converted to a java.time.Duration object and compared to duration("10s"), a special construct that is also converted into a java.time.Duration object representing the time threshold before this trigger activates. The duration() construct requires a String argument, which may be enclosed in single ' or double " quotation marks.

Smart Triggers may define more complex conditions that test multiple metrics:

[(ProcessCpuLoad>0.5||SystemCpuLoad>0.25)&&HeapMemoryUsagePercent>0.1;TargetDuration>duration('1m')]~Continuous

These may be passed as an argument to the Cryostat Agent, for example:

JAVA_OPTIONS="-javaagent:-Dcryostat.agent.baseuri=http://cryostat.local!/deployments/app/cryostat-agent-${CRYOSTAT_AGENT_VERSION}.jar=[ProcessCpuLoad>0.2]~profile

(note the '!' separator between system properties overrides and Smart Triggers)

or as a configuration property:

CRYOSTAT_AGENT_SMART_TRIGGER_DEFINITIONS="[ProcessCpuLoad>0.2&&TargetDuration>duration(\"1m\")]~default.jfc"

-Dcryostat.agent.smart-trigger.definitions="[ProcessCpuLoad>0.2&&TargetDuration>duration(\"1m\")]~default.jfc"

Multiple Smart Trigger definitions may be specified and separated by commas, for example:

[ProcessCpuLoad>0.2]~profile,[ThreadCount>30]~Continuous

NOTE: Smart Triggers are evaluated on a polling basis. The poll period is configurable (see list below). This means that your conditions are subject to sampling biases.

Harvester Integration

Any Flight Recordings created by Smart Trigger will also be tracked by the Harvester system. This data will be captured in a JFR Snapshot and pushed to the server on the Harvester's usual schedule. By defining Smart Triggers and a Harvester period without a Harvester template, you can achieve a setup where dynamically-started Flight Recordings begin when trigger conditions are met, and their data is then periodically captured until the recording is manually stopped or the host JVM shuts down.

Configuration

cryostat-agent uses smallrye-config for configuration. Below is a list of configuration properties that can be used to influence how cryostat-agent runs and how it advertises itself to a Cryostat server instance. Properties that require configuration are indicated with a checked box.

  • cryostat.agent.baseuri [java.net.URI]: the URL location of the Cryostat server backend that this agent advertises itself to.
  • cryostat.agent.baseuri-range [String]: a String representing the io.cryostat.agent.ConfigModule.UriRange enum level that restricts the acceptable hosts specified in the cryostat.agent.baseuri property. This is used to control the server locations that this Cryostat Agent instance is willing to register itself with. Default dns_local, which means any IP or hostname that is or resolves to localhost, a link-local IP address, an IP address from a private range block, or a hostname ending in .local will be accepted. If a cryostat.agent.baseuri is specified with a host outside of this range then the Agent will refuse to start. Acceptable values are: loopback, link_local, site_local, dns_local, and public. Each higher/more relaxed level implies that each lower level is also acceptable.
  • cryostat.agent.callback [java.net.URI]: a URL pointing back to this agent, ex. "https://12.34.56.78:1234/". Cryostat will use this URL to perform health checks and request updates from the agent. This reflects the externally-visible IP address/hostname and port where this application and agent can be found.
  • cryostat.agent.api.writes-enabled [boolean]: Control whether the agent accepts "write" or mutating operations on its HTTP API. Requests for remote operations such as dynamically starting Flight Recordings will be rejected unless this is set. Default false.
  • cryostat.agent.instance-id [String]: a unique ID for this agent instance. This will be used to uniquely identify the agent in the Cryostat discovery database, as well as to unambiguously match its encrypted stored credentials. The default is a random UUID string. It is not recommended to override this value.
  • cryostat.agent.hostname [String]: the hostname for this application instance. This will be used for the published JMX connection URL. If not provided then the default is to attempt to resolve the localhost hostname.
  • cryostat.agent.realm [String]: the Cryostat Discovery API "realm" that this agent belongs to. This should be unique per agent instance. The default is the value of cryostat.agent.app.name.
  • cryostat.agent.authorization [String]: Authorization header value to include with API requests to the Cryostat server, ex. Bearer abcd1234. Default None.
  • cryostat.agent.webclient.ssl.trust-all [boolean]: Control whether the agent trusts all certificates presented by the Cryostat server. Default false. This should only be overridden for development and testing purposes, never in production.
  • cryostat.agent.webclient.ssl.verify-hostname [boolean]: Control whether the agent verifies hostnames on certificates presented by the Cryostat server. Default true. This should only be overridden for development and testing purposes, never in production.
  • cryostat.agent.webclient.connect.timeout-ms [long]: the duration in milliseconds to wait for HTTP requests to the Cryostat server to connect. Default 1000.
  • cryostat.agent.webclient.response.timeout-ms [long]: the duration in milliseconds to wait for HTTP requests to the Cryostat server to respond. Default 1000.
  • cryostat.agent.webserver.host [String]: the internal hostname or IP address for the embedded webserver to bind to. Default 0.0.0.0.
  • cryostat.agent.webserver.port [int]: the internal port number for the embedded webserver to bind to. Default 9977.
  • cryostat.agent.webserver.credentials.user [String]: the username used for Basic authorization on the embedded webserver. Default user.
  • cryostat.agent.webserver.credentials.pass.length [int]: the length of the generated password used for Basic authorization on the embedded webserver. Default 24.
  • cryostat.agent.webserver.credentials.pass.hash-function [String]: the name of the hash function to use when generating passwords. Default SHA-256.
  • cryostat.agent.app.name [String]: a human-friendly name for this application. Default cryostat-agent.
  • cryostat.agent.app.jmx.port [int]: the JMX RMI port that the application is listening on. The default is to attempt to determine this from the com.sun.management.jmxremote.port system property.
  • cryostat.agent.registration.retry-ms [long]: the duration in milliseconds between attempts to register with the Cryostat server. Default 5000.
  • cryostat.agent.exit.signals [[String]]: a comma-separated list of signals that the agent should handle. When any of these signals is caught the agent initiates an orderly shutdown, deregistering from the Cryostat server and potentially uploading the latest harvested JFR data. Default INT,TERM.
  • cryostat.agent.exit.deregistration.timeout-ms [long]: the duration in milliseconds to wait for a response from the Cryostat server when attempting to deregister at shutdown time . Default 3000.
  • cryostat.agent.harvester.period-ms [long]: the length of time between JFR collections and pushes by the harvester. This also controls the maximum age of data stored in the buffer for the harvester's managed Flight Recording. Every period-ms the harvester will upload a JFR binary file to the cryostat.agent.baseuri archives. Default -1, which indicates no scheduled harvest uploading will be performed.
  • cryostat.agent.harvester.template [String]: the name of the .jfc event template configuration to use for the harvester's managed Flight Recording. Defaults to the empty string, so that no recording is started.
  • cryostat.agent.harvester.max-files [String]: the maximum number of pushed files that Cryostat will keep over the network from the agent. This is supplied to the harvester's push requests which instructs Cryostat to prune, in a FIFO manner, the oldest JFR files within the attached JVM target's storage, while the number of stored recordings is greater than this configuration's maximum file limit. Default 2147483647 (Integer.MAX_VALUE).
  • cryostat.agent.harvester.upload.timeout-ms [long]: the duration in milliseconds to wait for HTTP upload requests to the Cryostat server to complete and respond. Default 30000.
  • cryostat.agent.harvester.exit.max-age-ms [long]: the JFR maxage setting, specified in milliseconds, to apply to recording data uploaded to the Cryostat server when the JVM this Agent instance is attached to exits. This ensures that tail-end data is captured between the last periodic push and the application exit. Exit uploads only occur when the application receives SIGINT/SIGTERM from the operating system or container platform.
  • cryostat.agent.harvester.exit.max-size-b [long]: the JFR maxsize setting, specified in bytes, to apply to exit uploads as described above.
  • cryostat.agent.harvester.max-age-ms [long]: the JFR maxage setting, specified in milliseconds, to apply to periodic uploads during the application lifecycle. Defaults to 0, which is interpreted as 1.5x the harvester period (cryostat.agent.harvester.period-ms).
  • cryostat.agent.harvester.max-size-b [long]: the JFR maxsize setting, specified in bytes, to apply to periodic uploads during the application lifecycle. Defaults to 0, which means unlimited.
  • cryostat.agent.smart-trigger.definitions [String[]]: a comma-separated list of Smart Trigger definitions to load at startup. Defaults to the empty string: no Smart Triggers.
  • cryostat.agent.smart-trigger.evaluation.period-ms [long]: the length of time between Smart Trigger evaluations. Default 1000.
  • rht.insights.java.opt-out [boolean]: for the Red Hat build of Cryostat, set this to true to disable data collection for Red Hat Insights. Defaults to false. Red Hat Insights data collection is always disabled for community builds of Cryostat.
  • rht.insights.java.debug [boolean]: for the Red Hat build of Cryostat, set this to true to enable debug logging for the Red Hat Insights Java Agent. Defaults to false. Red Hat Insights data collection is always disabled for community builds of Cryostat.

These properties can be set by JVM system properties or by environment variables. For example, the property cryostat.agent.baseuri can be set using -Dcryostat.agent.baseuri=https://mycryostat.example.com:1234/ or CRYOSTAT_AGENT_BASEURI=https://mycryostat.example.com:1234/. See here for more detail.

cryostat-agent's People

Contributors

aali309 avatar andrewazores avatar dependabot[bot] avatar ebaron avatar josh-matsuoka avatar maxcao13 avatar mwangggg avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

cryostat-agent's Issues

[Bug] `POST` to Cryostat failure

Logs from the Cryostat backend:

Caused by: io.cryostat.net.web.http.api.v2.ApiException: No recording submission
	at io.cryostat.net.web.http.api.beta.RecordingsFromIdPostHandler.handleAuthenticated(RecordingsFromIdPostHandler.java:201)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:102)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:72)
	at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1284)
	at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:173)
	at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:140)
	at io.vertx.ext.web.handler.impl.BodyHandlerImpl.handle(BodyHandlerImpl.java:93)
	at io.vertx.ext.web.handler.impl.BodyHandlerImpl.handle(BodyHandlerImpl.java:46)
	at io.cryostat.net.web.http.api.beta.RecordingsFromIdPostBodyHandler.handleAuthenticated(RecordingsFromIdPostBodyHandler.java:117)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:102)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:72)
	at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1284)
	at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:173)
	at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:140)
	at io.vertx.ext.web.handler.impl.CorsHandlerImpl.handle(CorsHandlerImpl.java:189)
	at io.vertx.ext.web.handler.impl.CorsHandlerImpl.handle(CorsHandlerImpl.java:41)
	at io.cryostat.net.web.http.generic.CorsEnablingHandler.handle(CorsEnablingHandler.java:131)
	at io.cryostat.net.web.http.generic.CorsEnablingHandler.handle(CorsEnablingHandler.java:63)
	at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1284)
	at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:173)
	at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:140)
	at io.vertx.ext.web.handler.impl.LoggerHandlerImpl.handle(LoggerHandlerImpl.java:189)
	at io.vertx.ext.web.handler.impl.LoggerHandlerImpl.handle(LoggerHandlerImpl.java:48)
	at io.cryostat.net.web.http.generic.RequestLoggingHandler.handle(RequestLoggingHandler.java:123)
	at io.cryostat.net.web.http.generic.RequestLoggingHandler.handle(RequestLoggingHandler.java:65)
	at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1284)
	at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:173)
	at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:140)
	at io.vertx.ext.web.impl.RouterImpl.handle(RouterImpl.java:68)
	at io.vertx.ext.web.impl.RouterImpl.handle(RouterImpl.java:37)
	at io.cryostat.net.HttpServer$HandlerDelegate.handle(HttpServer.java:168)
	at io.vertx.core.impl.EventLoopContext.emit(EventLoopContext.java:55)
	at io.vertx.core.impl.DuplicatedContext.emit(DuplicatedContext.java:158)
	at io.vertx.core.http.impl.Http2ServerRequest.dispatch(Http2ServerRequest.java:122)
	at io.vertx.core.http.impl.Http2ServerStream.onHeaders(Http2ServerStream.java:96)
	at io.vertx.core.http.impl.Http2ServerConnection.onHeadersRead(Http2ServerConnection.java:155)
	at io.vertx.core.http.impl.Http2ConnectionBase.onHeadersRead(Http2ConnectionBase.java:202)
	at io.vertx.core.http.impl.Http2ServerConnection.onHeadersRead(Http2ServerConnection.java:44)
	at io.netty.handler.codec.http2.Http2FrameListenerDecorator.onHeadersRead(Http2FrameListenerDecorator.java:48)
	at io.netty.handler.codec.http2.Http2EmptyDataFrameListener.onHeadersRead(Http2EmptyDataFrameListener.java:63)
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:409)
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:337)
	at io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onHeadersRead(Http2InboundFrameLogger.java:56)
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader$2.processFragment(DefaultHttp2FrameReader.java:476)
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readHeadersFrame(DefaultHttp2FrameReader.java:484)
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:253)
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159)
	at io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173)
	at io.netty.handler.codec.http2.DecoratingHttp2ConnectionDecoder.decodeFrame(DecoratingHttp2ConnectionDecoder.java:63)
	at io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:393)
	at io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:453)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:519)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:458)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:280)
	at io.vertx.core.http.impl.VertxHttp2ConnectionHandler.channelRead(VertxHttp2ConnectionHandler.java:408)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)
Feb 06, 2023 6:59:43 PM org.slf4j.impl.JDK14LoggerAdapter fillCallerData
WARNING: 127.0.0.1 - - [Mon, 6 Feb 2023 18:59:43 GMT] 2ms "POST /api/beta/recordings/AAl-2I7mLgO4H2DtGz1gApO8nIVGgvrGmA2Q6AC482A= HTTP/2.0" 400 97 bytes "-" "Java-http-client/17.0.5"

The -agent logs just show a generic 400 error response when the harvester attempts to push a file to storage.

[Task] Remove Vert.x dependency

To slim down the built JAR, simplify the implementation, and have fewer rebuilds due to dependency updates, the Vert.x dependency should be removed. The JDK has a workable HTTP client built in that can be used instead of vertx-web's client, and the vert.x EventBus usage can be replaced by a more simple callback pattern or simple internal message queue. The vert.x HTTP server can be replaced by either the non-JDK-API com.sun.net.httpserver.HttpServer or some other embedded HTTP server implementation.

[Task] Set up CI and automations

Like other Cryostat projects, we should have CI and various automations around PRs to help ensure code quality. The CI can also produce the agent fat-JAR as an artefact and publish that somewhere for consumption.

[Bug] Server stored credentials should include JVM ID for uniqueness

In #133 , the previous stored credentials match expression used only the target's JVM ID, which caused a conflict with credentials actually stored for JMX connection use. The new way uses the target's callback URL as the target.connectUrl, which will be an agent HTTP API URL and not conflict with JMX credentials. However, in some scenarios like unclean shutdowns of the Agent and/or Cryostat, old stale Credentials entries may be left behind in the encrypted keyring, which may have the same connectUrl as the (potentially restarted, or at least re-registering) Agent instance. This will cause another conflict and prevent Agent registration. To help mitigate this, Agents can store their credentials with an expression matching on both the callback/connectUrl as well as the JVM ID.

[Bug] Agent blocks graceful shutdown if it didn't start cleanly

Description:

smoketest.sh: add this sample app config

    podman run \
        --name quarkus-test-agent-0 \
        --pod cryostat-pod \
        --env JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/deployments/app/cryostat-agent.jar" \
        --env QUARKUS_HTTP_PORT=10009 \
        --env ORG_ACME_CRYOSTATSERVICE_ENABLED="false" \
        --rm -d quay.io/andrewazores/quarkus-test:latest

Run sh smoketest.sh, let everything spin up.

podman logs quarkus-test-agent-0:

Starting the Java application using /opt/jboss/container/java/run/run-java.sh ...
INFO exec  java -Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/deployments/app/cryostat-agent.jar -XX:+UseParallelGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:+ExitOnOutOfMemoryError -cp "." -jar /deployments/quarkus-run.jar 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/deployments/app/cryostat-agent.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/deployments/lib/main/org.jboss.slf4j.slf4j-jboss-logmanager-1.1.0.Final.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Slf4jLoggerFactory]
__  ____  __  _____   ___  __ ____  ______ 
 --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ 
 -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \   
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/   
2022-12-23 20:38:58,013 INFO  [io.cry.age.Agent] (cryostat-agent) Cryostat Agent starting...
2022-12-23 20:39:00,678 INFO  [io.cry.cor.net.JFRConnectionToolkit] (cryostat-agent) Computed self JVM ID: 7Rprt-Rwnu9Pc4qkKe7G1nDZw6_s6fT2Ry--P8vNZXw=
2022-12-23 20:39:00,746 SEVERE [io.cry.age.Agent] (cryostat-agent) Agent startup failure: java.util.NoSuchElementException: SRCFG00040: The config property cryostat.agent.baseuri is defined as the empty String ("") which the following Converter considered to be null: io.smallrye.config.ImplicitConverters$ConstructorConverter
	at io.smallrye.config.SmallRyeConfig.convertValue(SmallRyeConfig.java:284)
	at io.smallrye.config.SmallRyeConfig.getValue(SmallRyeConfig.java:239)
	at io.smallrye.config.SmallRyeConfig.getValue(SmallRyeConfig.java:167)
	at io.cryostat.agent.ConfigModule.provideCryostatAgentBaseUri(ConfigModule.java:91)
	at io.cryostat.agent.ConfigModule_ProvideCryostatAgentBaseUriFactory.provideCryostatAgentBaseUri(ConfigModule_ProvideCryostatAgentBaseUriFactory.java:36)
	at io.cryostat.agent.ConfigModule_ProvideCryostatAgentBaseUriFactory.get(ConfigModule_ProvideCryostatAgentBaseUriFactory.java:27)
	at io.cryostat.agent.ConfigModule_ProvideCryostatAgentBaseUriFactory.get(ConfigModule_ProvideCryostatAgentBaseUriFactory.java:10)
	at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
	at io.cryostat.agent.MainModule_ProvideCryostatClientFactory.get(MainModule_ProvideCryostatClientFactory.java:48)
	at io.cryostat.agent.MainModule_ProvideCryostatClientFactory.get(MainModule_ProvideCryostatClientFactory.java:10)
	at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
	at io.cryostat.agent.MainModule_ProvideRegistrationFactory.get(MainModule_ProvideRegistrationFactory.java:55)
	at io.cryostat.agent.MainModule_ProvideRegistrationFactory.get(MainModule_ProvideRegistrationFactory.java:10)
	at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
	at io.cryostat.agent.DaggerAgent_Client.registration(DaggerAgent_Client.java:117)
	at io.cryostat.agent.Agent.main(Agent.java:99)
	at io.cryostat.agent.Agent.lambda$agentmain$5(Agent.java:122)
	at java.base/java.lang.Thread.run(Thread.java:833)

2022-12-23 20:39:03,896 INFO  [io.quarkus] (main) quarkus-test 1.0.0-SNAPSHOT on JVM (powered by Quarkus 2.7.2.Final) started in 5.819s. Listening on: http://0.0.0.0:10009
2022-12-23 20:39:03,897 INFO  [io.quarkus] (main) Profile prod activated. 
2022-12-23 20:39:03,897 INFO  [io.quarkus] (main) Installed features: [cdi, rest-client, rest-client-jackson, resteasy, smallrye-context-propagation, vertx]

Ctrl-c to tear down smoketest.sh:

...
^C+ cleanup
+ podman pod stop cryostat-pod
WARN[0010] StopSignal SIGTERM failed to stop container quarkus-test-agent-0 in 10 seconds, resorting to SIGKILL

Expected:

If any required configuration property is not set, the Agent should gracefully handle this and fail to start. SIGTERMing the JVM should result in a normal clean shutdown.

[Bug] `max-files` config results in HTTP 400 from server

Jan 30, 2023 4:19:47 PM io.cryostat.core.log.Logger warn
WARNING: HTTP 400: maxFiles must be a positive integer
io.vertx.ext.web.handler.HttpException: Bad Request
Caused by: io.cryostat.net.web.http.api.v2.ApiException: maxFiles must be a positive integer
	at io.cryostat.net.web.http.api.beta.RecordingsFromIdPostHandler.handleAuthenticated(RecordingsFromIdPostHandler.java:190)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:102)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:72)
	at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1284)
	at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:173)
	at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:140)
	at io.vertx.ext.web.handler.impl.BodyHandlerImpl$BHandler.doEnd(BodyHandlerImpl.java:355)
	at io.vertx.ext.web.handler.impl.BodyHandlerImpl$BHandler.uploadEnded(BodyHandlerImpl.java:321)
	at io.vertx.ext.web.handler.impl.BodyHandlerImpl$BHandler.lambda$null$0(BodyHandlerImpl.java:250)
	at io.vertx.core.impl.future.FutureImpl$3.onSuccess(FutureImpl.java:141)
	at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60)
	at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211)
	at io.vertx.core.impl.future.Composition$1.onSuccess(Composition.java:62)
	at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60)
	at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211)
	at io.vertx.core.impl.future.Composition$1.onSuccess(Composition.java:62)
	at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60)
	at io.vertx.core.impl.future.SucceededFuture.addListener(SucceededFuture.java:88)
	at io.vertx.core.impl.future.Composition.onSuccess(Composition.java:43)
	at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60)
	at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211)
	at io.vertx.core.impl.future.PromiseImpl.tryComplete(PromiseImpl.java:23)
	at io.vertx.core.impl.future.PromiseImpl.onSuccess(PromiseImpl.java:49)
	at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)
Jan 30, 2023 4:19:47 PM org.slf4j.impl.JDK14LoggerAdapter fillCallerData
WARNING: 127.0.0.1 - - [Mon, 30 Jan 2023 16:19:47 GMT] 36ms "POST /api/beta/recordings/92biA-rcHLQhcbkOrujsPQY_rQY7ZBCZlQZS76TBt28= HTTP/1.1" 400 139 bytes "-" "Java-http-client/17.0.5"

[Bug] Agent registration flow after Cryostat restarts

The Agent's startup registration flow is as follows:

  1. Start up, check all required environment variables. Fail if any not defined.
  2. Generate authorization credentials for own webserver
  3. Create a matchExpression tailored to itself (target.jvmId == '$myHashId'), publish a stored credential to Cryostat using this expression and the generated credentials. Retry periodically until this succeeds.
  4. Register as a discovery plugin. Retry periodically until this succeeds.
  5. Publish a description of itself as the single node under its plugin registration.

This is mostly resilient on the Cryostat side against various conditions where communications between the Agent and Cryostat break down, or Cryostat goes down temporarily, because there is the callback URI ping mechanism, where Cryostat will attempt to call the plugin back to 1. ensure that it is still there, 2. ensure the plugin (agent) refreshes its registration token before expiry, 3. Cryostat's stored credentials for the plugin are still valid.

However, this can be problematic on the agent side in the case that Cryostat goes down temporarily and the agent stays up. In this case, Cryostat will call back the plugin with the old callback and old stored credentials, which will fail, prompting Cryostat to deregister the plugin. Cryostat will then forget that this plugin ever existed and not try to ping its callback anymore. Meanwhile, the Agent considers itself registered, published, and up-to-date, and will happily sit idle forever, never renewing its registration with Cryostat. If the user stops the agent (its attached application) and restarts it then the flow will properly resume from the top and everything should be properly registered and published again, however, a properly resilient system would reach this state again when either Cryostat or the agent go down temporarily and reappear later.

Moreover, if the Agent does notice that it is no longer registered with Cryostat and attempts to re-register itself, it will fail at Step 3, because it will regenerate an identical matchExpression, which Cryostat will reject because it expects matchExpressions to be unique. Cryostat should communicate this specific failure mode (ex. via response status code) in a way that the agent recognizes so that it can continue to step 4. Other failure modes should still block progression and probably lead to retrying step 3.

[Story] Agent credentials storage conflicts with authenticated JMX

The Agent generates its HTTP webserver credentials and then stores them in the Cryostat server's encrypted keyring along with the other stored credentials. Cryostat consults these stored credentials and their matchExpressions to determine what credentials to pass when initiating a JMX connection. If the Agent is installed into a target application that is configured with JMX authentication, then the Agent will store credentials that also match the target's JMX definition, since the matchExpression used by the agent is target.jvmId == 'abcd1234'. This conflict results in Cryostat failing to establish a JMX connection to the target since the Agent's webserver credentials are incorrectly passed.

[Task] Intercept `SIGTERM` and upload recording before clean shutdown

Implementation to support cryostatio/cryostat#1013

The Agent should be able to handle SIGTERM and perhaps other related signals which may be sent by the host OS or container platforms when the JVM should exit. There will normally be some grace period before the platform SIGKILLs the JVM to allow for graceful cleanup. The Agent should attempt to send an exit recording file dump to Cryostat within this window. The time required to send the dump is variable (size of recording, speed of link between Agent and Cryostat) so this will be a best-effort attempt, but there may be techniques to try to get it done as quickly as possible. For example, perhaps there should be a configuration to allow attempting to zip these files before sending them, in case the deploying user knows the network link may be slow but the JVM may have the CPU to spare. The Agent may also expose configuration knobs to the user allowing them to configure the specific maxAge/maxSize for the data within these dump files to try to get the upload time within the window. The graceful shutdown period is also likely to be a configurable knob on the platform that the user can experiment with.

[Task] Push recordings to per-agent API endpoint

When the Agent pushes JFR file data to Cryostat, it should do so at an API endpoint that corresponds to the specific Agent instance. ie some unique identifier relating to the Agent, which both the Agent and Crysotat agree upon, should be included as an endpoint path parameter or otherwise in the request metadata, so that Cryostat can collect the pushed files and link them to the Agent instance origin for later queries.

Depends on cryostatio/cryostat#1299

[Story] Separate TLS keystore/truststore from host application

The Agent contains an HTTP webserver that exposes its API for Cryostat. This webserver should support HTTPS, and it should be possible to configure the Agent to use a TLS keystore and certificate for securing this HTTPS server. It should be possible for this TLS keystore to be separate from any the attached target application might already have.

The Agent also contains an HTTP client that it uses for communicating with Cryostat, to register itself as a discovery plugin, publsh information about itself, etc. This HTTP client should support HTTPS that the Cryostat server may expose. Likewise, it should be possible to configure the Agent to add the Cryostat server's TLS certificate to the Agent's truststore, and it should be possible to configure it so that this Agent truststore is separate from the truststore of the attached target application.

Related: cryostatio/cryostat-operator#595

[Bug] Discovery self-node has wrong annotation port number when publishing JMX URL

In Registration.java:

        URI uri = callback;
        if (preferJmx && jmxPort > 0) {
            uri =
                    URI.create(
                            String.format(
                                    "service:jmx:rmi:///jndi/rmi://%s:%d/jmxrmi",
                                    hostname, jmxPort));
        }
        DiscoveryNode.Target target =
                new DiscoveryNode.Target(
                        realm,
                        uri,
                        appName,
                        jvmId,
                        pid,
                        hostname,
                        uri.getPort(),
                        javaMain,
                        startTime);

The uri.getPort() here returns the correct expected port number when the registration connectUrl is the agent's own HTTP connection. When the registration uses JMX, however, (ie when registration.prefer-jmx config property is set), then this returns -1 instead of the actual JMX port number.

[Bug] Interplay of Cryostat discovery ping period and harvester period results in reset intervals and timing skew

          Ah yes, I think that makes sense with the way the agent's internal state machinery works. The discovery ping request tells the agent to re-register itself, which really means de-register and start the registration flow over. But, the harvester also intentionally stops itself when the agent has become deregistered, and starts on successful registration. So if the discovery ping period is shorter than the harvester period you'll end up with the agent probably never even attempting to push harvested files, or if it does then perhaps it gets interrupted part way through sometimes.

There's probably some approach I could take like adding an internal state for "re-registering" to allow this to smoothly transition over, but I'm not sure how worthwhile it is.

This does point out an interesting side-effect of the discovery ping though, because it means that the harvester period will result in files being pushed with that periodicity on one interval, but the interval will reset at the next re-registration time. If the discovery ping period is relatively long and the harvester period is relatively short then this won't be very noticeable, but as these two periods become closer in value then the skew could become noticeable, up to the point you've identified where the skew actually can prevent any harvesting from happening at all.

I think that's a separate issue to work on, maybe for next development cycle.

Originally posted by @andrewazores in #86 (comment)

[Epic] Two-way communications protocol

The agent currently exposes only a readonly HTTP API that the Cryostat server can use to query basic information such as a list of active recording descriptors, the JFR event types and templates available, and MBean metrics data.

The agent should also implement mutation requests for actions such as dynamically starting Flight Recordings, including with custom event templates supplied by the server with the request.

The readonly API should remain implemented and available. We should explore whether the mutable (write/update) API endpoints are always available, or if they are gated behind an additional opt-in property that requires the user to explicitly enable them.

The goal after this project is completed is that the Cryostat backend's client should be able to make all of its requests to an HTTP-registered Agent instance equally as it would for a JMX target.

[Epic] Collect JFR data locally and push to Cryostat

The Agent should provide the capability to start Flight Recordings on the attached JVM, similar to what the flags conventionally do:

-XX:+FlightRecorder -XX:StartFlightRecording=duration=60s,filename=myrecording.jfr

ie there should be configuration properties that the Agent picks up and uses to start a recording, with a given event template (".jfc profile") name, maxAge/maxSize settings. Rather than providing a local filesystem path where the recording should be dumped, the Agent should push the recording to Cryostast over HTTP.

The Agent should have a configuration to push the latest data to Cryostat periodically, similar to what Cryostat's Automated Rules do but with an HTTP push instead of a JMX pull data flow.

The Agent should also be able to intercept SIGTERM (and other signals?) and send the latest recording data to Cryostat before propagating the signal and allowing the JVM to continue clean shutdown.

[Task] Devise a trigger condition syntax and implement processing from config

Idea 1

The syntax should easily map to SmallRye Config, so it should be something that can be expressed and understood when expressed as an environment variable name, as well as when expressed as a system property or properties file entry.

The syntax must also be able to specify:

  • the metric to observe
  • a condition about that metric
  • a trigger value for the condition
  • optional parameters such as a duration threshold that the condition must persist for to trigger
  • the name of a .jfc event template file that should be used when the trigger starts a recording

For example, <Process CPU Load (%), Value Greater Than, 0.2, For 30 Seconds> is a 4-tuple that could be expressed in such a syntax, which should cause a self-explanatory recording trigger. This might be expressed in a rudimentary way as PROCESSCPU_GT_20_30S=profiling.jfc, as a very rough example. This simple environment variable could be split on _ characters. The first field would be used to look up in a table which metric should be observed, and then implementation-specific details for each metric would yield an observation function. The second field, GT, would be matched against supported operations, in this case the > comparator, to apply to the result of the observation function and the parsed value of the third field, 20. In this case this third field could be interpreted as hundredths, ie 0.20. The fourth optional field of 30S would be interpreted as the duration threshold, so the implementation would need to maintain samples over time and check the condition across all of the samples before trigger the recording.


Idea 2

Rather than specifying the trigger conditions as SmallRye Config properties, a more freeform syntax could be used and passed to the agent as an argument.

https://docs.oracle.com/javase/8/docs/api/java/lang/instrument/package-summary.html

$ java -javaagent:/path/to/cryostat-agent.jar=argumentstring

where argumentstring could be more flexible and easier to parse than environment variable names, since more characters should be permissible to help delineate fields. This could also be used to pass a path to the agent that contains something like a JSON file containing a serialized form of the triggers to be used.

[Bug] Discovery ping causes deregistration and bad side effects

  • #90 : timing skew can become total failure to upload periodic recordings when the discovery ping period is shorter than the upload period
  • discovery ping signal causes Agent to deregister and re-register. The deregistration is handled internally as a full deregistration as if the Agent is shutting down. This cancels the periodic archive timer (which starts anew on re-registration) and also seems to prevent onexit uploads

This can probably be resolved by adding additional internal state to the Agent. Currently the lifecycle can go from REGISTERED (and PUBLISHED) to DEREGISTERED, and back again. Adding state transition chains like REGISTERED -> REFRESHING -> REGISTERED (on success) and REGISTERED - > REFRESHING -> DEREGISTERED (on failure) should allow more nuanced handling of the discovery ping signal so that these bad effects can be avoided until distinctly necessary.

[Story] Agent dual-registration

Currently, the Agent implements a Cryostat Discovery Plugin by creating a plugin registration (and corresponding REALM node), and defines a single target JVM node within that Realm, representing itself. This will normally have an Agent HTTP connectUrl, unless the config property for prefer-jmx is set, in which case the Agent will publish a JMX ServiceURL for itself if the JVM it's attached to appears to have JMX enabled.

Why not publish both URLs? The Agent HTTP URL will always be available and publish-able as one target JVM node. After #163 this URL may also support write/mutation operations. A simple check endpoint can be added to return that status so that the available features can be queried without first trying to perform a write operation and seeing if it fails. A second target JVM node under the same Realm could be optionally published as well, if the host JVM appears to have JMX enabled. Then the prefer-jmx config can be dropped.

[Story] Ability to configure Agent to listen to MBean metrics and dynamically start recording

As a user, I would like a way to configure the Agent to listen to selected MBean metrics. When some condition is met regarding that metric, the Agent should dynamically start a JFR recording. The event template used for the recording should also be a related configuration option. Ideally, the trigger can have multiple conditions, ex. CPU usage above X% for Y duration, AND physical memory usage above A% for B duration. Multiple configurations can be stacked together, which would logically OR the conditions. The configurations must be provided to the Agent by some static means ex. environment variables, not by dynamic methods such as accepting a configuration file over the Agent HTTP API. This probably means continuing to use the SmallRye Config properties, which allow env var, system property, or property file specification. A syntax for the conditions will need to be devised that can fit into the properties format.

It should also be possible to configure both the existing periodic upload as well as smart triggers. For example, as a user, I might want to have an always-on recording with the Continuous template that uploads every 30 minutes, as well as smart triggers that start a Profiling recording when some conditions are met and uploads every 5 minutes, and both of these scenarios should run concurrently.

[Task] Add configuration for scheduled recording `maxage`/`maxsize`

Currently, only recordings pushed on application exit have the maxage/maxsize properties set. Recordings pushed on a periodic basis during normal application runtime do not, so the file size can grow large and is likely to contain duplicate data from the last periodic push. This wastes network bandwidth and archival disk storage space.

[Task] CI fails to download `cryostat-core` dependency

For example:

#125
(https://github.com/cryostatio/cryostat-agent/actions/runs/5070291951/jobs/9105123984?pr=125)
image

The project's pom.xml is configured to search the GitHub Maven Packages repository, but authentication is failing.

<repositories>

The CI workflow does have a step that is supposed to set up the Maven settings.xml that provides the credentials however:

servers: '[{"id": "github", "username": "dummy", "password": "${env.GITHUB_TOKEN_REF}"}]'

And the same or very similar configuration is used in the main Cryostat repository, which works and is able to download the cryostat-core dependency:

https://github.com/cryostatio/cryostat/blob/2b0cd5adacf8f928f8666f36663b0a5b31e0e76b/.github/workflows/ci-jobs.yml#L53

[Epic] Smart triggers

The Agent should listen to metrics and events within the attached JVM and use these events to decide when to automatically start/stop JFR recording.

[Task] Target JDK11 (or JDK8?)

For better compatibility with existing applications in production, the agent should target and be built with JDK11 or even JDK8, rather than the current JDK17.

[Bug] Discovery plugin ping reregistration triggers recording cancellation

2022-12-23 20:47:55,799 INFO  [io.cry.age.Registration] (cryostat-agent-worker-0) Registration retry period: 5000(ms)
2022-12-23 20:48:01,551 INFO  [io.cry.age.Registration] (cryostat-agent-worker-1) Registered as f2c4a745-db78-495a-acbe-327d8b8a87bf
2022-12-23 20:48:01,555 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-1) JFR Harvester starting
2022-12-23 20:48:01,926 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-1) JFR Harvester started
2022-12-23 20:48:01,938 INFO  [io.cry.age.Registration] (cryostat-agent-worker-1) publishing self as http://localhost:9977/
2022-12-23 20:48:02,148 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-2) cryostat-agent(1) RUNNING
2022-12-23 20:48:02,270 INFO  [io.cry.age.Registration] (cryostat-agent-worker-2) Publish success
2022-12-23 20:52:59,255 INFO  [io.cry.age.Registration] (cryostat-agent-worker-1) Registered as f2c4a745-db78-495a-acbe-327d8b8a87bf
2022-12-23 20:52:59,256 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-1) JFR Harvester starting
2022-12-23 20:52:59,257 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-1) JFR Harvester started
2022-12-23 20:52:59,264 INFO  [io.cry.age.Registration] (cryostat-agent-worker-1) publishing self as http://localhost:9977/
2022-12-23 20:52:59,288 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-2) cryostat-agent(1) STOPPED
2022-12-23 20:52:59,289 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-2) cryostat-agent(1) CLOSED
2022-12-23 20:52:59,302 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-2) Uploading quarkus-test-agent_default_20221223T205259Z.jfr
2022-12-23 20:57:57,387 INFO  [io.cry.age.Agent] (SIGTERM handler) Caught SIGTERM(15)

I triggered this by setting up the main Cryostat smoketest.sh with the default discovery ping period and an Agent harvester period of 300000(ms - 5 minutes). The Agent never ended up pushing any harvested JFR files to Cryostat. As the logs reveal, the re-registration occurred, which restarted the harvester. This ended up cancelling the running recording without restarting it.

The harvester should not be restarted when the Agent re-registers as a discovery plugin with Cryostat - if the harvester is already running and already has periodic tasks set up to push harvested files, those should continue as usual with no interruption to the push schedule.

[Task] Implement scheduled periodic recording uploads

The Agent should expose configuration properties for periodic pushes of captured JFR data to the Cryostat backend. After the Agent has successfully registered with Cryostat and started a Flight Recording locally, it should periodically push the current JFR buffer contents to Cryostat as a full .jfr file. In the future there can also be work done to stream the data, perhaps on a per-event basis or at least on a per-chunk basis, but that is out of scope for this initial implementation.

[Bug] SEVERE: recordings serialization failure java.lang.NullPointerException: Cannot invoke "java.time.Duration.toMillis()"

May 05, 2023 8:23:29 PM io.cryostat.agent.remote.RecordingsContext handle
SEVERE: recordings serialization failure
java.lang.NullPointerException: Cannot invoke "java.time.Duration.toMillis()" because the return value of "jdk.jfr.Recording.getMaxAge()" is null
	at io.cryostat.agent.remote.RecordingsContext$RecordingInfo.<init>(RecordingsContext.java:126)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
	at io.cryostat.agent.remote.RecordingsContext.getRecordings(RecordingsContext.java:99)
	at io.cryostat.agent.remote.RecordingsContext.handle(RecordingsContext.java:78)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:95)
	at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:71)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:98)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:851)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:95)
	at io.cryostat.agent.WebServer$CompressionFilter.doFilter(WebServer.java:267)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:98)
	at io.cryostat.agent.WebServer$RequestLoggingFilter.doFilter(WebServer.java:212)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:98)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:818)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.