Giter Club home page Giter Club logo

angela's Introduction

Angela Logo

A distributed control framework to handle a Terracotta cluster and test clients

What is the purpose of Angela?

One major obstacle to testing a client/server system is setting up the environment. Not only the server must be installed, but also the clients and both of them might have to be installed on different remote locations.

Angela is meant to tackle this problem and ease the setup of a distributed environment.

It also helps with the control of that distributed environment (e.g. starting/stopping some of the components, fetching some remote files, monitoring, injecting network failures).

The current implementation is targeted for Terracotta, which is a distributed data management platform.

Angela also supports Ehcache 2 and 3, which are implementations of a distributed cache.

Angela can be extensible to handle other distributed softwares.

Initial setup

For running tests on a node, Angela expects a directory at /data/angela to store all its metadata. So make sure that this directory exists or can be created before running any tests. For more details on what what that directory is used for, refer to Angela Directory Structure

Tsa Cluster example

Given the following cluster configuration:

  <servers>
    <server host="localhost" name="Server1">
      <logs>logs1</logs>
      <tsa-port>9510</tsa-port>
      <tsa-group-port>9530</tsa-group-port>
    </server>
  </servers>

We expect the TSA to contain one Terracotta server running on localhost, and this will be automatically resolved by Angela. We can ask now Angela to setup such a cluster:

    ConfigurationContext configContext = customConfigurationContext() (1)
        .tsa(tsa -> tsa (2)
            .topology(new Topology( (3)
                distribution(version(EHCACHE_VERSION), PackageType.KIT, LicenseType.EHCACHE_OS), (4)
                tcConfig(version(EHCACHE_VERSION), getClass().getResource("/tc-config-a.xml")))) (5)
        );

    ClusterFactory factory = new ClusterFactory("GettingStarted::configureCluster", configContext); (6)
    Tsa tsa = factory.tsa() (7)
        .startAll() (8)

    factory.close(); (9)
  1. Create a custom configuration context that is going to hold all the configurable bits

  2. Define the TSA config

  3. Specify the Terracotta cluster topology

  4. Specify the Terracotta distribution : version, package type (KIT) and License

  5. Specify the Terracotta cluster config

  6. Create a Tsa logical instance that serves as an endpoint to call functionalities regarding the Tsa lifecycle

  7. Install the Tsa from the distribution on the appropriate server(s) (localhost in this case)

  8. Start all servers from the Tsa

  9. Stop all Terracotta servers and cleans up the installation

Tsa API

      Tsa tsa = factory.tsa() (1)
          .startAll() (2)

      TerracottaServer active = tsa.getActive(); (3)
      Collection<TerracottaServer> actives = tsa.getActives(); (4)
      TerracottaServer passive = tsa.getPassive(); (5)
      Collection<TerracottaServer> passives = tsa.getPassives(); (6)

      tsa.stopAll(); (7)

      tsa.start(active); (8)
      tsa.start(passive);

      tsa.stop(active); (9)
      Callable<TerracottaServerState> serverState = () -> tsa.getState(passive); (10)
      Awaitility.await()
          .pollInterval(1, SECONDS)
          .atMost(15, SECONDS)
          .until(serverState, is(TerracottaServerState.STARTED_AS_ACTIVE));
  1. Install all Terracotta servers for the given topology

  2. Start all Terracotta servers

  3. Get the reference of the active server. Null is returned if there is none. An exception is throw if there are more than one

  4. Get the references of all active servers. Get an empty collection if there are none.

  5. Get the reference of the passive server. Null is returned if there is none. An exception is throw if there are more than one

  6. Get the references of all passive servers. Get an empty collection if there are none.

  7. Stop all Terracotta servers

  8. Start one Terracotta server

  9. Stop one Terracotta server

  10. Get the current state of the Terracotta server

Client array example

    ConfigurationContext configContext = customConfigurationContext()
        .clientArray(clientArray -> clientArray (1)
            .clientArrayTopology(new ClientArrayTopology( (2)
                distribution(version(EHCACHE_VERSION), PackageType.KIT, LicenseType.EHCACHE_OS), (3)
                newClientArrayConfig().host("localhost-1", "localhost").host("localhost-2", "localhost")) (4)
            )
        );
    ClusterFactory factory = new ClusterFactory("GettingStarted::runClient", configContext);
    ClientArray clientArray = factory.clientArray(0); (5)
    ClientArrayFuture f = clientArray.executeOnAll((context) -> System.out.println("Hello")); (6)
    f.get(); (7)

    factory.close();
  1. Define the client array config

  2. Define the client array topology

  3. Specify the distribution from which to install the client jars

  4. Specify the list of hosts that are going to be used by this client array (two clients, both on localhost in this case)

  5. Create a client array on the remote servers

  6. Execute the lambda on all the remote clients

  7. Wait until all the clients finish their execution

Full example : See class EhcacheTest

IMPORTANT: settings.xml

You can run all the Maven commands with -s settings.xml to use the project's settings.xml and isolate downloaded libraries inside. Change the repo location to point to your default m2 home if needed.

Example: ./mvnw -s settings.xml clean install

How to build

mvn clean install

Run specific tests

mvn test -f integration-test/pom.xml -Dtest=<test-name>

Be careful not to cd directly into the module, you would not use the right kit version !

Things to know

  • Angela is looking for JDK's in $HOME/.m2/toolchains.xml, the standard Maven toolchains file. See https://maven.apache.org/guides/mini/guide-using-toolchains.html to get its format and learn more about it.
  • Angela uses SSH to connect to remote hosts, so every non-localhost machine name is expected to be accessible via ssh, with everything already configured for passwordless authentication.
  • Angela spawns a small controlling app on every remote hosts that is very network-latency sensitive and uses lots of random ports. In a nutshell, this means that testing across WANs or firewalls just doesn't work.
  • Angela expects a writeable /data folder (or at least a pre-created, writeable /data/angela folder) on every machine she runs on, i.e.: the one running the test as well as all the remote hosts.

Updates Feb. 2022

Angela system properties

Corresponding class: AngelaProperties

System Property Default value Description
angela.rootDir /data/angela root dir where Angela puts installation, work directories and any file that is needed
angela.kitInstallationDir use this property to use a local build instead of downloading a kit build
angela.kitCopy false forces a kit copy instead of using a common kit install for multiple tests. useful for parallel execution of tests that changes files in the kit install (e.g. tmc.properties)
angela.skipUninstall false do not clean work directory (used to have access to logs after end of test for debugging test issues)
angela.distribution
angela.additionalLocalHostnames "" Define additional hostnames or ip addresses to be considered as local, separated by comma. Used in case the test is faking some local hostnames
angela.igniteLogging false display Ignite logging (used to help debugging the behaviour of Angela)
angela.agent.debug false put a remote agent in debug mode
angela.tms.fullLogging false
angela.tsa.fullLogging false
angela.voter.fullLogging false
angela.ssh.userName System.getProperty("user.name")
angela.ssh.userName.keyPath
angela.ssh.strictHostKeyChecking true
angela.ssh.port 22
angela.java.resolver toolchain can be set to "user"
angela.java.home System.getProperty("java.home")
angela.java.version 1.8
angela.java.vendor zulu
angela.java.opts -Djdk.security.allowNonCaAnchor=false

Concepts

  1. GroupId: a UUID determined at the AngelaOrchestrator level. All Ignite agents will be part of the same group.
  2. AgentGroup: a class representing the cluster of all Ignite nodes
  3. AgentID: identifies an Ignite agent on the cluster in the form: name#pid@hostname#port
  4. Agent types: there can be 3 types of agents:
    • orchestrator-agent: the agent started in the test JVM locally to control the other ones
    • remote-agent: the agent started via SSH on a remote host
    • others: agents spawned from another agent (orchestrator or remote) to execute jobs for a client Id either locally or on a remote host
  5. AgentControler: the agent controller has been cleared from any Ignite related code. It now ONLY contains the methods called statically from Ignite closures. This class is installed statically.
  6. Executors: these` are the main refactoring. All the com layer has been refactored in these implementations:
    • IgniteFreeExecutor: a local implementation bypassing any Ignite launching
    • IgniteLocalExecutor: an implementation using Ignite but only locally. it won't spawn remote agents through SSH. All angela configs specifying a remote host will be executed on the local machine. New Ignite agents can still be spawned to execute client jobs.
    • IgniteSshRemoteExecutor: this is the default implementation which will spawn agents remotely if a non-local hostname is specified in a configuration
  7. Agent: an agent now decides its own port to start with (thanks to the port mapper) and exposes its agentId. It also registers 3 attributes: angela.version, angela.nodeName and angela.group and needs to be started with angela.instanceName (agent name or type) and angela.group (the group he will be part of).
  8. Closing: closing an executor will communicate to all spawned Ignite agents to also close themselves. Angela was not relying on Ignite to communicate closure, but was relying on killing spawned clients through SSH with their PID. Executor.shutdown(agentId) can now close any spawned agent.

Angela API Usage

Spawned servers Inline servers
Ignite-free mode X X
Ignite-local mode X X
Ingite-remote mode (default) X X

First create an AngelaOrchestrator through Junit rule or the AngelaOrchestrator builder API.

If you are not using Junit, use the AngelaOrchestrator.buidler() API instead. There are several examples in this project in the test module.

@Rule public transient AngelaOrchestratorRule angelaOrchestratorRule = new AngelaOrchestratorRule();

Then derive the cluster factories:

try (ClusterFactory factory = angelaOrchestratorRule.newClusterFactory("ConfigToolTest::testFailingClusterToolCommand", configContext)) {
    // [...]
}

Ignite-free mode:

@Rule public transient AngelaOrchestratorRule angelaOrchestratorRule = new AngelaOrchestratorRule().igniteFree();

Can be used in conjunction with RuntimeOption.INLINE_SERVERS to use inline mode for servers

Ignite local only mode:

@Rule public transient AngelaOrchestratorRule angelaOrchestratorRule = new AngelaOrchestratorRule().igniteLocal();

Only one local Ignite controler, and other local Ignite spawned to execute client jobs.

Can be used in conjunction with RuntimeOption.INLINE_SERVERS to use inline mode for servers

Ignite with remote support (default)

@Rule public transient AngelaOrchestratorRule angelaOrchestratorRule = new AngelaOrchestratorRule().igniteRemote();

// or

@Rule public transient AngelaOrchestratorRule angelaOrchestratorRule = new AngelaOrchestratorRule().igniteRemote(executor -> {
  executor.setStrictHostKeyChecking(false);
  executor.setPort(2222);
  executor.setRemoteUserName("testusername");
  executor.setTcEnv(...)
});

Can be used in conjunction with RuntimeOption.INLINE_SERVERS to use inline mode for servers

What about Inline mode ?

Inline mode will spawn tc nodes within the test JVM. It can be activated with:

distribution(version(Versions.EHCACHE_VERSION), KIT, TERRACOTTA_OS, RuntimeOption.INLINE_SERVERS)

Programmatic SPI

An executor can be obtained from an orchestrator:

Executor executor = angelaOrchestratorRule.getExecutor();

Getting an executor Ignite-free

  UUID group = UUID.randomUUID();
  Agent agent = Agent.local(group);
  Executor executor = new IgniteFreeExecutor(agent);

Getting a local Ignite executor:

  UUID group = UUID.randomUUID();
  PortAllocator portAllocator = new DefaultPortAllocator();
  Agent agent = Agent.igniteOrchestrator(group, portAllocator);
  AgentID agentID = agent.getAgentID();
  Executor executor = new IgniteLocalExecutor(agent);

Getting a standard Ignite executor supporting SSH agent install:

  PortAllocator portAllocator = new DefaultPortAllocator();
  UUID group = UUID.randomUUID();
  Agent agent = Agent.igniteOrchestrator(group, portAllocator);
  AgentID agentID = agent.getAgentID();
  Executor executor = new IgniteSshRemoteExecutor(agent)
      .setStrictHostKeyChecking(false)
      .setPort(...);

angela's People

Contributors

akomakom avatar albinsuresh avatar anthonydahanne avatar aurbroszniowski avatar chrisdennis avatar cljohnso avatar devlg75 avatar esebasti avatar garywkeim avatar gibsong avatar jhouserizer avatar lorban avatar mathieucarbou avatar mobasherul avatar myronkscott avatar prasantaksahoo avatar ramsai1729 avatar tmesic99 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

angela's Issues

Infra issue with Angela system tests and Lin Java 11 ?

Check why system tests are failing only on Linux Java 11 build constantly.

I ran on my laptop both:

./mvnw verify -f integration-test/ -Dfast -Djava.test.version=1.8 -Djava.test.vendor=zulu

and

./mvnw verify -f integration-test/ -Dfast -Djava.test.version=1.11 -Djava.test.vendor=zulu

I was able to confirm that tests are running with the correct JVM and both commands are successful.

But the linux build on jenkins running java 11 for tests is failing.

Move EE stuff in angela-ee

  • SecureTcConfig
  • EnterpriseTcConfig
  • big memory xml files
  • code and methods related to EE (i.e. tms, cluster tool, etc)

TSA needs JAVA_OPTS

TSA needs JAVA_OPTS that are not shared with command-line tools and clients.

Provide an alternative to toolchains.xml

We should find another way to find the JDK. With Gradle, there is no need to have a toolchain.xml file, or even any maven binaries / repo on a computer.

Caused by: java.lang.RuntimeException: Missing JDK with version [1.8] and one vendor in [[zulu]] config in toolchains.xml. Available JDKs: [JDK{home='/Users/mathieu/.jenv/versions/openjdk64-9', version='1.9', vendor='openjdk', valid=false}, JDK{home='/Users/mathieu/.jenv/versions/oracle64-1.8.0.144', version='1.8', vendor='Oracle Corporation', valid=false}, JDK{home='/Users/mathieu/.jenv/versions/zulu64-1.8.0.212', version='1.8', vendor='zulu', valid=false}]
	at org.terracotta.angela.common.util.JavaLocationResolver.resolveJavaLocations(JavaLocationResolver.java:102)
	at org.terracotta.angela.common.util.JavaLocationResolver.resolveJavaLocation(JavaLocationResolver.java:69)
	at org.terracotta.angela.common.distribution.DistributionController.buildEnv(DistributionController.java:68)
	at org.terracotta.angela.common.distribution.Distribution107Controller.createTsa(Distribution107Controller.java:72)
	at org.terracotta.angela.common.TerracottaServerInstance.create(TerracottaServerInstance.java:94)
	at org.terracotta.angela.agent.AgentController.createTsa(AgentController.java:278)
	at org.terracotta.angela.client.Tsa.lambda$create$4baf3c22$1(Tsa.java:227)
	at org.apache.ignite.internal.processors.closure.GridClosureProcessor$C4.execute(GridClosureProcessor.java:1944)
	at org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:568)
	at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6820)
	at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:562)

Perhaps this method could be refactored to take a property file as a config somewhere, which could be optional, and if not provided (i.e. on our local dev computers, then switch back to the default JDK of the computer ?)

  private static List<JDK> findJDKs() {
    try {
      List<JDK> jdks = findJDKs(new File(System.getProperty("user.home") + File.separator + ".m2" + File.separator + "toolchains.xml").toURI().toURL());
      return Collections.unmodifiableList(jdks);
    } catch (MalformedURLException e) {
      throw new RuntimeException(e);
    }
  }

Angela not stopping if test times out

You can reproduce by running the tests in PR: Terracotta-OSS/terracotta-platform#609

basically, the test times out:

org.junit.runners.model.TestTimedOutException: test timed out after 60000 milliseconds

	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
	at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
	at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
	at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:134)
	at org.terracotta.angela.client.util.IgniteClientHelper.executeRemotely(IgniteClientHelper.java:62)
	at org.terracotta.angela.client.ConfigTool.executeCommand(ConfigTool.java:44)
	at org.terracotta.dynamic_config.test_support.DynamicConfigIT.configToolInvocation(DynamicConfigIT.java:441)
	at org.terracotta.dynamic_config.system_tests.activated.AttachCommand1x3IT.test_attach_to_activated_cluster_with_offline_node(AttachCommand1x3IT.java:63)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.terracotta.dynamic_config.test_support.util.NodeOutputRule$1.evaluate(NodeOutputRule.java:60)
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

When that happens, all server process are not shutdown:

~/.../terracotta/terracotta-platform ( attach {4} U:1 ⊘ )
❯  jps
13218 Launcher
13556 TCServerMain
13253 TCServerMain
1397 
13704 Jps
12472 TCServerMain
11707 GradleDaemon

Dependency convergence issue: Angela depends on difference transitive versions

Angela should use the maven enforcer plugin (see tc-platform).

Downstream projects are unable to verify / deploy / release at the moment:

[WARNING] 
Dependency convergence error for org.apache.commons:commons-lang3:3.8.1 paths to dependency are:
+-org.terracotta.dynamic-config.testing:dynamic-config-testing-support:5.7-SNAPSHOT
  +-org.terracotta:angela:3.0.11
    +-org.terracotta:angela-client-internal:3.0.11
      +-org.terracotta:angela-common:3.0.11
        +-org.zeroturnaround:zt-process-killer:1.10
          +-org.apache.commons:commons-lang3:3.8.1
and
+-org.terracotta.dynamic-config.testing:dynamic-config-testing-support:5.7-SNAPSHOT
  +-org.terracotta:angela:3.0.11
    +-org.terracotta:angela-client-internal:3.0.11
      +-io.rest-assured:json-path:3.3.0
        +-io.rest-assured:rest-assured-common:3.3.0
          +-org.apache.commons:commons-lang3:3.4

[WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence failed with message:
Failed while enforcing releasability the error(s) are [
Dependency convergence error for org.apache.commons:commons-lang3:3.8.1 paths to dependency are:
+-org.terracotta.dynamic-config.testing:dynamic-config-testing-support:5.7-SNAPSHOT
  +-org.terracotta:angela:3.0.11
    +-org.terracotta:angela-client-internal:3.0.11
      +-org.terracotta:angela-common:3.0.11
        +-org.zeroturnaround:zt-process-killer:1.10
          +-org.apache.commons:commons-lang3:3.8.1
and
+-org.terracotta.dynamic-config.testing:dynamic-config-testing-support:5.7-SNAPSHOT
  +-org.terracotta:angela:3.0.11
    +-org.terracotta:angela-client-internal:3.0.11
      +-io.rest-assured:json-path:3.3.0
        +-io.rest-assured:rest-assured-common:3.3.0
          +-org.apache.commons:commons-lang3:3.4
]

<plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-enforcer-plugin</artifactId>
        <version>1.4.1</version>
        <executions>
          <execution>
            <id>enforce-java</id>
            <goals>
              <goal>enforce</goal>
            </goals>
            <configuration>
              <rules>
                <dependencyConvergence>
                  <uniqueVersions>true</uniqueVersions>
                </dependencyConvergence>
                <requireJavaVersion>
                  <version>[${java.version},)</version>
                </requireJavaVersion>
                <requireMavenVersion>
                  <version>[${maven.version},)</version>
                </requireMavenVersion>
              </rules>
            </configuration>
          </execution>
        </executions>
      </plugin>

Voter backlog

  1. Implement voter for Distribution102Controller as well.
  2. Add logic to change voter script location for EE kit as it's different from the OSS kit.
  3. Add support for -f option (or do the appropriate changes in angela corresponding to latest kit changes).
  4. Change addVoter API to voters instead.
  5. Change voter to voter(id, hostname).withServer().withServer()

Angela needs to support starting a server with a tc property

This is required to be able to use the l2.classloader.compatibility.

I need 2 methods:

  • .tcProperty(key, value)
  • .tcProperties(Properties) (or Map<String, String>)

The tc properties will need to be provided by test sub classes eventually to start the nodes with some properties already set.

Example:

  protected TerracottaServer createNode(int stripeId, int nodesId) {
    String uniqueId = combine(stripeId, nodesId);
    return server("node-" + uniqueId, "localhost")
        .tsaPort(getNodePort(stripeId, nodesId))
        .tsaGroupPort(getNodeGroupPort(stripeId, nodesId))
        .configRepo("terracotta" + uniqueId + "/repository")
        .logs("terracotta" + uniqueId + "/logs")
        .dataDir("main:terracotta" + uniqueId + "/data-dir")
        .offheap("main:512MB,foo:1GB")
        .tcProperty("l2.classloader.compatibility", "true")
        .metaData("terracotta" + uniqueId + "/metadata");
  }

Licence copy mechanism is broken for BigMemory

the license is copied during the installation, and after the changes which use a shared install for all angela instances, the license is not present in that shared location but in the work instance location, and the terracotta server fails to start.
If we wanted to copy the license in the shared location, we could encounter clashes between different types of licenses according to the test (even though there is very little likeliness since we use the same license 99% of the times, but the tests verifying invalid licenses could be a problem).
Therefore it would be good to be able to specify the location of the license to the terracotta server start script.
However, there is an issue with the start script, we can not pass a different location for the license, we need to use a java property (see https://itrac.eur.ad.sag/browse/TAB-8127)
Currently the workaround is to use the flag to force a copy of the Terracotta install in the Terracotta work instance directory. (-Dangela.kitCopy=true). The license is there present and the server starts.

The Terracotta server was in state STOPPED and was expected to reach one of the states

Since the new Angela version with the "stop" change, some builds are randomly failing.

Example: https://dev.azure.com/TerracottaCI/terracotta/_build/results?buildId=4133&view=logs&j=feb91b3b-c1e7-5841-2b9b-cf68bbf6cff5&t=dae69362-8928-54a2-bd53-da7b3ca8e43a

[ERROR] Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 136.111 s <<< FAILURE! - in org.terracotta.dynamic_config.system_tests.diagnostic.SetCommand1x2IT
[ERROR] testCluster_setOffheap(org.terracotta.dynamic_config.system_tests.diagnostic.SetCommand1x2IT)  Time elapsed: 10.659 s  <<< ERROR!
class org.apache.ignite.IgniteException: Remote job threw user exception (override or implement ComputeTask.result(..) method if you would like to have automatic failover for this exception): Failed to execute job due to unexpected runtime exception [jobId=38c43c97171-d228f81b-7e8a-48c6-8e6f-7b6fae784e38, ses=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=org.terracotta.angela.client.Tsa$$Lambda$287/0x000000010048ac40, dep=GridDeployment [ts=1586889327600, depMode=SHARED, clsLdr=jdk.internal.loader.ClassLoaders$AppClassLoader@2cdf8d8a, clsLdrId=24743c97171-787e621d-ae89-4c50-9ac9-adc1bd9dbf21, userVer=0, loc=true, sampleClsName=org.terracotta.angela.common.util.IgniteCommonHelper, pendingUndeploy=false, undeployed=false, usage=1], taskClsName=org.terracotta.angela.client.Tsa$$Lambda$287/0x000000010048ac40, sesId=28c43c97171-d228f81b-7e8a-48c6-8e6f-7b6fae784e38, startTime=1586889333812, endTime=9223372036854775807, taskNodeId=d228f81b-7e8a-48c6-8e6f-7b6fae784e38, clsLdr=jdk.internal.loader.ClassLoaders$AppClassLoader@2cdf8d8a, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, internal=false, topPred=null, subjId=d228f81b-7e8a-48c6-8e6f-7b6fae784e38, mapFut=IgniteFuture [orig=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=380656297]], execName=null], jobId=38c43c97171-d228f81b-7e8a-48c6-8e6f-7b6fae784e38], err=The Terracotta server was in state STOPPED and was expected to reach one of the states: STARTED_AS_ACTIVE STARTED_AS_PASSIVE STARTED_IN_DIAGNOSTIC_MODE START_SUSPENDED but died before reaching it.]
	at org.apache.ignite.compute.ComputeTaskAdapter.result(ComputeTaskAdapter.java:102)
	at org.apache.ignite.internal.processors.task.GridTaskWorker$5.apply(GridTaskWorker.java:1053)
	at org.apache.ignite.internal.processors.task.GridTaskWorker$5.apply(GridTaskWorker.java:1046)
	at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6852)
	at org.apache.ignite.internal.processors.task.GridTaskWorker.result(GridTaskWorker.java:1046)
	at org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:864)
	at org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1081)
	at org.apache.ignite.internal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:1316)
	at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
	at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
	at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
	at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: class org.apache.ignite.compute.ComputeUserUndeclaredException: Failed to execute job due to unexpected runtime exception [jobId=38c43c97171-d228f81b-7e8a-48c6-8e6f-7b6fae784e38, ses=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=org.terracotta.angela.client.Tsa$$Lambda$287/0x000000010048ac40, dep=GridDeployment [ts=1586889327600, depMode=SHARED, clsLdr=jdk.internal.loader.ClassLoaders$AppClassLoader@2cdf8d8a, clsLdrId=24743c97171-787e621d-ae89-4c50-9ac9-adc1bd9dbf21, userVer=0, loc=true, sampleClsName=org.terracotta.angela.common.util.IgniteCommonHelper, pendingUndeploy=false, undeployed=false, usage=1], taskClsName=org.terracotta.angela.client.Tsa$$Lambda$287/0x000000010048ac40, sesId=28c43c97171-d228f81b-7e8a-48c6-8e6f-7b6fae784e38, startTime=1586889333812, endTime=9223372036854775807, taskNodeId=d228f81b-7e8a-48c6-8e6f-7b6fae784e38, clsLdr=jdk.internal.loader.ClassLoaders$AppClassLoader@2cdf8d8a, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, internal=false, topPred=null, subjId=d228f81b-7e8a-48c6-8e6f-7b6fae784e38, mapFut=IgniteFuture [orig=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=380656297]], execName=null], jobId=38c43c97171-d228f81b-7e8a-48c6-8e6f-7b6fae784e38], err=The Terracotta server was in state STOPPED and was expected to reach one of the states: STARTED_AS_ACTIVE STARTED_AS_PASSIVE STARTED_IN_DIAGNOSTIC_MODE START_SUSPENDED but died before reaching it.]
	at org.apache.ignite.internal.processors.job.GridJobWorker.handleThrowable(GridJobWorker.java:689)
	at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:621)
	at org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:491)
	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
	at org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1191)
	at org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1923)
	... 7 more
Caused by: java.lang.RuntimeException: The Terracotta server was in state STOPPED and was expected to reach one of the states: STARTED_AS_ACTIVE STARTED_AS_PASSIVE STARTED_IN_DIAGNOSTIC_MODE START_SUSPENDED but died before reaching it.
	at org.terracotta.angela.common.TerracottaServerInstance.waitForState(TerracottaServerInstance.java:161)
	at org.terracotta.angela.agent.AgentController.waitForTsaInState(AgentController.java:295)
	at org.terracotta.angela.client.Tsa.lambda$start$7b0bf020$1(Tsa.java:250)
	at org.apache.ignite.internal.processors.closure.GridClosureProcessor$C4.execute(GridClosureProcessor.java:1944)
	at org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:568)
	at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6820)
	at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:562)
	... 11 more

Angela is copying the KIT for each test

With the infra we had before, we didn't need to copy the KIT.

The KIT files were copied by gradle and then we were created one working directory per test using these KIT files.

The only file that could be moved to the KIT was the logback-ext.xml file for the logging. But this file is places in the test/resources folder and then could be considered common to all tests and we then could create a KIT one time with this file and reuse the same KIT files for all the tests, as long as the working directories are not the same.

This would help improve the work dir size and test speed (#10 and #11)

Add custom TSA logging

TSA uses custom logging to take jfr recordings if available. Install a logback-test.xml file in the working directory of 107 distribution of TSA

Be able to assert on each specific node output

Currently we are doing assertions on the log output of the nodes, but when having multiple node this is made harder since system out contains a mix of all these logs. So this is hard (or not possible) to determine easily which logs are for which process.

It would be nice to be able, like the ConfigToolExecutionResult, to have access to a buffer of log lines per node, to avoid mistakes and ensure the assertions are correctly done on the right data.

We should be able to remove any JDK constraint easily

Setting angela.java.vendor to '' or null is not working and takes the default to Zulu.

We should be able to tell Angela to not apply any constraint on the JDK vendor name and/or version

The issue is this code in AngelaProperties:

public String getValue() {
  String specifiedValue = getSpecifiedValue();
  return specifiedValue == null || specifiedValue.isEmpty() ? getDefaultValue() : specifiedValue;
}

Applying the default shouldn't be the responsibility of this class, but instead the caller. This class should instead return an optional and let the caller decide what to do, and eventually apply a different behavior depending whether the prop is there or not (like removing constraint on JDK version and/or vendor)

public Optiona<String> getValue() {
  String specifiedValue = getSpecifiedValue();
  return specifiedValue == null || specifiedValue.isEmpty() ? Optional.empty() : Optional.of(specifiedValue);
}

Log specific TSA startup issue : java.net.BindException: Address already in use

In the Distribution...Controller classes, we check a certain number of console output messages in order to change the states of the cluster.
e.g.

      compile("^.*\\QTerracotta Server instance has started up as ACTIVE\\E.*$"),
            mr -> stateRef.set(TerracottaServerState.STARTED_AS_ACTIVE))

One error that is frequent is when a Terracotta server tries to start but can't because another instance is already started on the same port:

ERROR - com.tc.net.groups.GroupException: java.net.BindException: Address already in use
2

We could add a check on that message in order to log a specific error, this would help to make it obvious when a test fails because of this specific error.

System.err from a process output is not redirected

We have some tests where the server is failing at startup.

The server is putting the system err logs in its same log file as the normal logs.

But Angela does not correctly redirect the error stream coming from the node process into the server logger, which makes it impossible to distinguish which error comes from which node.

    WatchedProcess<TerracottaServerState> watchedProcess = new WatchedProcess<>(new ProcessExecutor()
        .command(createTsaCommand(terracottaServer.getServerSymbolicName(), terracottaServer.getId(), topology, proxiedPorts, installLocation, startUpArgs))
        .directory(installLocation)
        .environment(env)
        .redirectError(System.err)
        .redirectOutput(serverLogOutputStream), stateRef, TerracottaServerState.STOPPED);`

Ideally the err stream should be redirected to the ExternalLoggers.tsaLogger.info(...), or in another logger, but still having the same server name prefix (i.e. [node-1-1]).

Angela performance (slow)

Our PR build only includes our project tests (dynamic config) and cluster tool tests.
PR build time usually takes 15min.

With all our tests now converted to Angela, it takes 30min.

That means out tests are now more than 100% slower (since cluster tool tests were kept as-is).

This can be caused by several things, but let's keep this issue opened so that we can link to it...

test failure causes all test to fail

If one test times out (junit rule), then all the test fails with:

java.lang.RuntimeException: Error starting agent localhost

	at org.terracotta.angela.agent.Agent$Node.init(Agent.java:202)
	at org.terracotta.angela.agent.Agent$Node.<init>(Agent.java:122)
	at org.terracotta.angela.agent.Agent$Node.<init>(Agent.java:117)
	at org.terracotta.angela.client.ClusterFactory.init(ClusterFactory.java:139)
	at org.terracotta.angela.client.ClusterFactory.tsa(ClusterFactory.java:208)
	at org.terracotta.dynamic_config.test_support.DynamicConfigIT.before(DynamicConfigIT.java:130)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.terracotta.dynamic_config.test_support.util.NodeOutputRule$1.evaluate(NodeOutputRule.java:60)
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
Caused by: class org.apache.ignite.IgniteException: Ignite instance with this name has already been started: localhost
	at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:1029)
	at org.apache.ignite.Ignition.start(Ignition.java:326)
	at org.terracotta.angela.agent.Agent$Node.init(Agent.java:199)
	... 20 more
Caused by: class org.apache.ignite.IgniteCheckedException: Ignite instance with this name has already been started: localhost
	at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1143)
	at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678)
	at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:603)
	at org.apache.ignite.Ignition.start(Ignition.java:323)
	... 21 more

Ignite daemon dead or terracotta process dead

On the tc-platform build (https://github.com/Terracotta-OSS/terracotta-platform/pull/) with Azure, we often run into build failures caused by the ignite process failing (dead), or tc nodes not found by the ignite process.
These are cryptic errors hard to troubleshoot since there is n oeasy way to pick up the logs in Azure.
So I am opeeing an issue here to make you all 3 aware that there are some random build failures because of the angela process handling mechanism.
@akomakom : I don't if it is possible in Azure to "pin" a build that failed and look at the content, logs, etc.
We could put everything in trace mode but it won't help if we do not have access of the workspace of a failed build.

I am unable to reproduce locally... (Azure only issue perhaps... ?)

Idea: `angela.kit.version` ?

Like the JDK and others, would it be worth having a property like angela.kit.version that we can set ? For the moment, we are forced to do that in gradle:

  systemProperty 'angela.kit.version', "$tcDefaultVersion"
    return customConfigurationContext()
        .tsa(tsa -> {
          CustomTsaConfigurationContext topology = tsa
              .clusterName("tc-cluster")
              .topology(new Topology(distribution(version(System.getProperty("angela.kit.version")), KIT, TERRACOTTA), dynamicCluster(stripes)));

Make Angela projects work with Java 11

I have activated the Java 11 Lin and Win builds but they do not pass with a CNFE.

Example:

> ./mvnw clean install -Dfast -DskipTests
> ./mvnw verify -f integration-test/ -Dfast -Djava.test.version=1.11 -Djava.test.vendor=zulu -Dit.test=AngelaRuleIT

All tests fail with:

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.terracotta.angela.AngelaRuleIT
[ERROR] WARNING: An illegal reflective access operation has occurred
[ERROR] WARNING: Illegal reflective access by org.apache.ignite.internal.util.GridUnsafe$2 (file:/Users/matc/.m2/repository/org/apache/ignite/ignite-core/2.8.1/ignite-core-2.8.1.jar) to field java.nio.Buffer.address
[ERROR] WARNING: Please consider reporting this to the maintainers of org.apache.ignite.internal.util.GridUnsafe$2
[ERROR] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
[ERROR] WARNING: All illegal access operations will be denied in a future release
Agent is ready
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.595 s <<< FAILURE! - in org.terracotta.angela.AngelaRuleIT
[ERROR] org.terracotta.angela.AngelaRuleIT.testNodeStartup  Time elapsed: 6.267 s  <<< ERROR!
java.lang.NoClassDefFoundError: org/terracotta/angela/KitResolver
        at org.terracotta.angela.agent.kit.LocalKitManager.<init>(LocalKitManager.java:55)
        at org.terracotta.angela.client.Tsa.<init>(Tsa.java:90)
        at org.terracotta.angela.client.ClusterFactory.tsa(ClusterFactory.java:130)
        at org.terracotta.angela.client.support.junit.AngelaRule$1.get(AngelaRule.java:270)
        at org.terracotta.angela.client.support.junit.AngelaRule.tsa(AngelaRule.java:228)
        at org.terracotta.angela.client.support.junit.AngelaRule.startNode(AngelaRule.java:152)
        at org.terracotta.angela.client.support.junit.AngelaRule.startNode(AngelaRule.java:144)
        at org.terracotta.angela.client.support.junit.AngelaRule.startNodes(AngelaRule.java:138)
        at org.terracotta.angela.client.support.junit.AngelaRule.before(AngelaRule.java:107)
        at org.terracotta.angela.client.support.junit.ExtendedTestRule$1.evaluate(ExtendedTestRule.java:35)
        at org.terracotta.angela.client.support.junit.ExtendedTestRule$1.evaluate(ExtendedTestRule.java:38)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
        at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:364)
        at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
        at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:237)
        at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:158)
        at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:428)
        at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
        at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:562)
        at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:548)
Caused by: java.lang.ClassNotFoundException: org.terracotta.angela.KitResolver
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
        ... 29 more

Change logic to create non-random consecutive and shorter temporary directories

Temporary directory is created by using a UUID, which:

  • is long
  • is not consecutive (because random)

It would be nice is temp dirs would be created with shorter name with a timestamp, so that when running tests locally this is really fast to find the right folders. Newly running test would be at the end.

Tmp dirs could be created with such way:

root = Files.createTempDirectory(parent, System.currentTimeMillis() + "-");

This created something like:

image

And more recently run test will be at the end.

Compared to now:

image

Also, this would help a lot on windows...

Angela kit copy in angela/kits is loosing file attributes

I have upgraded to 3.0.15 and get this error:

class org.apache.ignite.IgniteException: Remote job threw user exception (override or implement ComputeTask.result(..) method if you would like to have automatic failover for this exception): Failed to execute job due to unexpected runtime exception [jobId=33a42140171-c2aa1600-661c-4f51-a7ae-05e689db602e, ses=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=org.terracotta.angela.client.Tsa$$Lambda$106/1941847582, dep=GridDeployment [ts=1584911242976, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, clsLdrId=e8542140171-7b466052-abc9-46a9-b30f-45d4bce20336, userVer=0, loc=true, sampleClsName=org.terracotta.angela.common.util.IgniteCommonHelper, pendingUndeploy=false, undeployed=false, usage=1], taskClsName=org.terracotta.angela.client.Tsa$$Lambda$106/1941847582, sesId=23a42140171-c2aa1600-661c-4f51-a7ae-05e689db602e, startTime=1584911244387, endTime=9223372036854775807, taskNodeId=c2aa1600-661c-4f51-a7ae-05e689db602e, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, internal=false, topPred=null, subjId=c2aa1600-661c-4f51-a7ae-05e689db602e, mapFut=IgniteFuture [orig=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=144814010]], execName=null], jobId=33a42140171-c2aa1600-661c-4f51-a7ae-05e689db602e], err=Cannot start process [/Users/mathieu/workspace/terracotta/terracotta-platform/dynamic-config/testing/system-tests/target/angela/kits/5.7-SNAPSHOT/platform-kit-5.7-SNAPSHOT/server/bin/start-tc-server.sh, --node-repository-dir, terracotta1-1/repository, -f, /Users/matc/workspace/terracotta/terracotta-platform/dynamic-config/testing/system-tests/target/test-data/1584911238204-3376473518706484161/single-stripe.properties]]

	at org.apache.ignite.compute.ComputeTaskAdapter.result(ComputeTaskAdapter.java:102)
	at org.apache.ignite.internal.processors.task.GridTaskWorker$5.apply(GridTaskWorker.java:1053)
	at org.apache.ignite.internal.processors.task.GridTaskWorker$5.apply(GridTaskWorker.java:1046)
	at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6852)
	at org.apache.ignite.internal.processors.task.GridTaskWorker.result(GridTaskWorker.java:1046)
	at org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:864)
	at org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1081)
	at org.apache.ignite.internal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:1316)
	at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
	at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
	at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
	at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: class org.apache.ignite.compute.ComputeUserUndeclaredException: Failed to execute job due to unexpected runtime exception [jobId=33a42140171-c2aa1600-661c-4f51-a7ae-05e689db602e, ses=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=org.terracotta.angela.client.Tsa$$Lambda$106/1941847582, dep=GridDeployment [ts=1584911242976, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, clsLdrId=e8542140171-7b466052-abc9-46a9-b30f-45d4bce20336, userVer=0, loc=true, sampleClsName=org.terracotta.angela.common.util.IgniteCommonHelper, pendingUndeploy=false, undeployed=false, usage=1], taskClsName=org.terracotta.angela.client.Tsa$$Lambda$106/1941847582, sesId=23a42140171-c2aa1600-661c-4f51-a7ae-05e689db602e, startTime=1584911244387, endTime=9223372036854775807, taskNodeId=c2aa1600-661c-4f51-a7ae-05e689db602e, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, internal=false, topPred=null, subjId=c2aa1600-661c-4f51-a7ae-05e689db602e, mapFut=IgniteFuture [orig=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=144814010]], execName=null], jobId=33a42140171-c2aa1600-661c-4f51-a7ae-05e689db602e], err=Cannot start process [/Users/mathieu/workspace/terracotta/terracotta-platform/dynamic-config/testing/system-tests/target/angela/kits/5.7-SNAPSHOT/platform-kit-5.7-SNAPSHOT/server/bin/start-tc-server.sh, --node-repository-dir, terracotta1-1/repository, -f, /Users/matc/workspace/terracotta/terracotta-platform/dynamic-config/testing/system-tests/target/test-data/1584911238204-3376473518706484161/single-stripe.properties]]
	at org.apache.ignite.internal.processors.job.GridJobWorker.handleThrowable(GridJobWorker.java:689)
	at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:621)
	at org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:491)
	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
	at org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1191)
	at org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1923)
	... 7 more
Caused by: java.lang.RuntimeException: Cannot start process [/Users/mathieu/workspace/terracotta/terracotta-platform/dynamic-config/testing/system-tests/target/angela/kits/5.7-SNAPSHOT/platform-kit-5.7-SNAPSHOT/server/bin/start-tc-server.sh, --node-repository-dir, terracotta1-1/repository, -f, /Users/matc/workspace/terracotta/terracotta-platform/dynamic-config/testing/system-tests/target/test-data/1584911238204-3376473518706484161/single-stripe.properties]
	at org.terracotta.angela.common.distribution.WatchedProcess.<init>(WatchedProcess.java:38)
	at org.terracotta.angela.common.distribution.Distribution107Controller.createTsa(Distribution107Controller.java:106)
	at org.terracotta.angela.common.TerracottaServerInstance.create(TerracottaServerInstance.java:93)
	at org.terracotta.angela.agent.AgentController.createTsa(AgentController.java:279)
	at org.terracotta.angela.client.Tsa.lambda$create$4baf3c22$1(Tsa.java:227)
	at org.apache.ignite.internal.processors.closure.GridClosureProcessor$C4.execute(GridClosureProcessor.java:1944)
	at org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:568)
	at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6820)
	at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:562)
	... 11 more
Caused by: org.zeroturnaround.exec.ProcessInitException: Could not execute [/Users/mathieu/workspace/terracotta/terracotta-platform/dynamic-config/testing/system-tests/target/angela/kits/5.7-SNAPSHOT/platform-kit-5.7-SNAPSHOT/server/bin/start-tc-server.sh, --node-repository-dir, terracotta1-1/repository, -f, /Users/matc/workspace/terracotta/terracotta-platform/dynamic-config/testing/system-tests/target/test-data/1584911238204-3376473518706484161/single-stripe.properties] in /Users/mathieu/workspace/terracotta/terracotta-platform/dynamic-config/testing/system-tests/target/angela/work/DiagnosticIT-20200322-090718-0-tsa/5.7-SNAPSHOT with environment {JAVA_HOME=/Users/matc/.jenv/versions/1.8, JAVA_OPTS=-Djdk.security.allowNonCaAnchor=false}. Error=13, Permission denied
	at org.zeroturnaround.exec.ProcessInitException.newInstance(ProcessInitException.java:80)
	at org.zeroturnaround.exec.ProcessExecutor.invokeStart(ProcessExecutor.java:1002)
	at org.zeroturnaround.exec.ProcessExecutor.startInternal(ProcessExecutor.java:970)
	at org.zeroturnaround.exec.ProcessExecutor.start(ProcessExecutor.java:942)
	at org.terracotta.angela.common.distribution.WatchedProcess.<init>(WatchedProcess.java:36)
	... 19 more
Caused by: java.io.IOException: Cannot run program "/Users/mathieu/workspace/terracotta/terracotta-platform/dynamic-config/testing/system-tests/target/angela/kits/5.7-SNAPSHOT/platform-kit-5.7-SNAPSHOT/server/bin/start-tc-server.sh" (in directory "/Users/mathieu/workspace/terracotta/terracotta-platform/dynamic-config/testing/system-tests/target/angela/work/DiagnosticIT-20200322-090718-0-tsa/5.7-SNAPSHOT"): error=13, Permission denied
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
	at org.zeroturnaround.exec.ProcessExecutor.invokeStart(ProcessExecutor.java:997)
	... 22 more
Caused by: java.io.IOException: error=13, Permission denied
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
	... 23 more

Angela is loosing file attributes when copying the kit that we have unpacked with Maven:

~/.../terracotta/terracotta-platform ( angela {4} U:1 ⊘ )
❯  cd dynamic-config/testing/system-tests/
/Users/matc/workspace/terracotta/terracotta-platform/dynamic-config/testing/system-tests

~/.../testing/system-tests ( angela {4} U:1 ⊘ )
❯  ls -al target/platform-kit-5.7-SNAPSHOT/server/bin/
total 16
drwxr-xr-x  4 matc  staff   128 Mar 22 17:06 .
drwxr-xr-x  5 matc  staff   160 Mar 22 17:06 ..
-rwxrwxr-x  1 matc  staff  2306 Mar 18 20:00 start-tc-server.bat
-rwxrwxr-x  1 matc  staff  2182 Mar 18 20:00 start-tc-server.sh

~/.../testing/system-tests ( angela {4} U:1 ⊘ )
❯  ls -al target/angela/kits/5.7-SNAPSHOT/platform-kit-5.7-SNAPSHOT/server/bin/
total 16
drwxr-xr-x  4 matc  staff   128 Mar 22 17:07 .
drwxr-xr-x  5 matc  staff   160 Mar 22 17:07 ..
-rw-r--r--  1 matc  staff  2306 Mar 22 17:07 start-tc-server.bat
-rw-r--r--  1 matc  staff  2182 Mar 22 17:07 start-tc-server.sh

Simplify Ignite exception

When executing a remote task, a typical Ignite exception is too verbose, and looks like this:

ava.util.concurrent.CompletionException: class org.apache.ignite.IgniteException: Remote job threw user exception (override or implement ComputeTask.result(..) method if you would like to have automatic failover for this exception): Failed to execute job due to unexpected runtime exception [jobId=6a09ffe4171-e54dd9fe-441d-4a77-bf16-38120f7ca898, ses=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=org.terracotta.angela.client.Tsa$$Lambda$279/2139172958, dep=GridDeployment [ts=1586168305461, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, clsLdrId=1909ffe4171-e54dd9fe-441d-4a77-bf16-38120f7ca898, userVer=0, loc=true, sampleClsName=org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl, pendingUndeploy=false, undeployed=false, usage=2], taskClsName=org.terracotta.angela.client.Tsa$$Lambda$279/2139172958, sesId=5a09ffe4171-e54dd9fe-441d-4a77-bf16-38120f7ca898, startTime=1586168306345, endTime=9223372036854775807, taskNodeId=e54dd9fe-441d-4a77-bf16-38120f7ca898, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, internal=false, topPred=null, subjId=e54dd9fe-441d-4a77-bf16-38120f7ca898, mapFut=IgniteFuture [orig=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=2089045485]], execName=null], jobId=6a09ffe4171-e54dd9fe-441d-4a77-bf16-38120f7ca898], err=TC server died while waiting on state-change condition]

	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
	at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1643)
	at java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1632)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Caused by: class org.apache.ignite.IgniteException: Remote job threw user exception (override or implement ComputeTask.result(..) method if you would like to have automatic failover for this exception): Failed to execute job due to unexpected runtime exception [jobId=6a09ffe4171-e54dd9fe-441d-4a77-bf16-38120f7ca898, ses=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=org.terracotta.angela.client.Tsa$$Lambda$279/2139172958, dep=GridDeployment [ts=1586168305461, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, clsLdrId=1909ffe4171-e54dd9fe-441d-4a77-bf16-38120f7ca898, userVer=0, loc=true, sampleClsName=org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl, pendingUndeploy=false, undeployed=false, usage=2], taskClsName=org.terracotta.angela.client.Tsa$$Lambda$279/2139172958, sesId=5a09ffe4171-e54dd9fe-441d-4a77-bf16-38120f7ca898, startTime=1586168306345, endTime=9223372036854775807, taskNodeId=e54dd9fe-441d-4a77-bf16-38120f7ca898, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, internal=false, topPred=null, subjId=e54dd9fe-441d-4a77-bf16-38120f7ca898, mapFut=IgniteFuture [orig=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=2089045485]], execName=null], jobId=6a09ffe4171-e54dd9fe-441d-4a77-bf16-38120f7ca898], err=TC server died while waiting on state-change condition]
	at org.apache.ignite.compute.ComputeTaskAdapter.result(ComputeTaskAdapter.java:102)
	at org.apache.ignite.internal.processors.task.GridTaskWorker$5.apply(GridTaskWorker.java:1062)
	at org.apache.ignite.internal.processors.task.GridTaskWorker$5.apply(GridTaskWorker.java:1055)
	at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7037)
	at org.apache.ignite.internal.processors.task.GridTaskWorker.result(GridTaskWorker.java:1055)
	at org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:862)
	at org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1146)
	at org.apache.ignite.internal.processors.job.GridJobWorker.finishJob(GridJobWorker.java:941)
	at org.apache.ignite.internal.processors.job.GridJobWorker.finishJob(GridJobWorker.java:789)
	at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:639)
	at org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:499)
	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: class org.apache.ignite.compute.ComputeUserUndeclaredException: Failed to execute job due to unexpected runtime exception [jobId=6a09ffe4171-e54dd9fe-441d-4a77-bf16-38120f7ca898, ses=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=org.terracotta.angela.client.Tsa$$Lambda$279/2139172958, dep=GridDeployment [ts=1586168305461, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, clsLdrId=1909ffe4171-e54dd9fe-441d-4a77-bf16-38120f7ca898, userVer=0, loc=true, sampleClsName=org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl, pendingUndeploy=false, undeployed=false, usage=2], taskClsName=org.terracotta.angela.client.Tsa$$Lambda$279/2139172958, sesId=5a09ffe4171-e54dd9fe-441d-4a77-bf16-38120f7ca898, startTime=1586168306345, endTime=9223372036854775807, taskNodeId=e54dd9fe-441d-4a77-bf16-38120f7ca898, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, internal=false, topPred=null, subjId=e54dd9fe-441d-4a77-bf16-38120f7ca898, mapFut=IgniteFuture [orig=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=2089045485]], execName=null], jobId=6a09ffe4171-e54dd9fe-441d-4a77-bf16-38120f7ca898], err=TC server died while waiting on state-change condition]
	at org.apache.ignite.internal.processors.job.GridJobWorker.handleThrowable(GridJobWorker.java:697)
	at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:629)
	... 5 more
Caused by: java.lang.RuntimeException: TC server died while waiting on state-change condition
	at org.terracotta.angela.common.TerracottaServerInstance.waitForState(TerracottaServerInstance.java:148)
	at org.terracotta.angela.agent.AgentController.waitForTsaInState(AgentController.java:297)
	at org.terracotta.angela.client.Tsa.lambda$start$7b0bf020$1(Tsa.java:255)
	at org.apache.ignite.internal.processors.closure.GridClosureProcessor$C4.execute(GridClosureProcessor.java:1944)
	at org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:576)
	at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7005)
	at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:570)
	... 5 more

Is it possible to simplify. it and only keep the root cause (last block of the stack trace) :

Caused by: java.lang.RuntimeException: TC server died while waiting on state-change condition
	at org.terracotta.angela.common.TerracottaServerInstance.waitForState(TerracottaServerInstance.java:148)
	at org.terracotta.angela.agent.AgentController.waitForTsaInState(AgentController.java:297)
	at org.terracotta.angela.client.Tsa.lambda$start$7b0bf020$1(Tsa.java:255)
	at org.apache.ignite.internal.processors.closure.GridClosureProcessor$C4.execute(GridClosureProcessor.java:1944)
	at org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:576)
	at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7005)
	at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:570)
	... 5 more

ClientToServerDisruptor NPE when using a Dynamic Config

When using a Dynamic config, the Tsa.licenseAll() method (which updates the Tsa ports with proxied ports in case of a using a disruption controller) is never called.
Therefore the Tsa is not updated with proxied ports, and the disruption controller instance gives a NPE
See this failing test:

  public void testReconnectFailureHandling() throws Exception {
    ConfigurationContext configContext = customConfigurationContext()
        .tsa(tsa -> tsa
            .topology(
                new Topology(
                    distribution(version(VERSION), PackageType.KIT, LicenseType.TERRACOTTA),
                    true,
                    dynamicCluster(
                        stripe(
                            server("Server1", "localhost")
                                .tsaPort(9517)
                                .tsaGroupPort(9537)
                                .configRepo("terracotta1/repository")
                                .logs("logs1")
                                .metaData("terracotta1/metadata")
                                .failoverPriority("availability")
                        )
                    )
                )
            ).license(new License(getLicenseURL()))
        );

    try (ClusterFactory factory = new ClusterFactory("testReconnectFailureHandling", configContext)) {
      Tsa tsa = factory.tsa();
      tsa.startAll();
      tsa.attachAll();
      tsa.activateAll();

      final ClientToServerDisruptor disruptor = tsa.disruptionController().newClientToServerDisruptor();
      final URI terracottaServerURI = disruptor.uri();

    }
  }

Adapt to terracotta-utilities classes

Once the general-purpose classes are available in terracotta-utilities repository, angela can remove the duplicate classes and use the ones from terracotta-utilities. This includes classes like PortChooser, file utilities, TimeBudget etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.