Comments (9)
While testing MovingBlocks/Terasology#4611 , the test run failed with:
org.gradle.api.internal.tasks.testing.TestSuiteExecutionException: Could not complete execution for Gradle Test Executor 22.
at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.stop(SuiteTestClassProcessor.java:63)
[…]
at org.gradle.api.internal.tasks.testing.worker.TestWorker.stop(TestWorker.java:133)
[…]
at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118)
at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317)
at org.terasology.engine.rendering.assets.texture.PNGTextureFormat.load(PNGTextureFormat.java:35)
at org.terasology.engine.rendering.assets.texture.PNGTextureFormat.load(PNGTextureFormat.java:19)
at org.terasology.assets.module.UnloadedAssetData$AssetSourceResolver.load(UnloadedAssetData.java:292)
at org.terasology.assets.module.UnloadedAssetData.load(UnloadedAssetData.java:180)
at org.terasology.assets.module.ModuleAssetDataProducer.getAssetData(ModuleAssetDataProducer.java:228)
at org.terasology.assets.AssetType.lambda$reload$2(AssetType.java:359)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at org.terasology.assets.AssetType.reload(AssetType.java:357)
at org.terasology.assets.AssetType.getNormalAsset(AssetType.java:386)
at org.terasology.assets.AssetType.getAsset(AssetType.java:262)
at org.terasology.assets.management.AssetManager.getAsset(AssetManager.java:223)
at org.terasology.engine.rendering.assets.font.FontMaterialProducer.getAssetData(FontMaterialProducer.java:61)
from moduletestingenvironment.
I wonder if we can formulate this issue less vague - "very slow" is subjective (which is not me saying, it's not very slow 😉 ) and makes it hard to assess when this issue would actually be resolved.
Which durations do we expect? What amount of improvement would be sufficient? A.k.a what's the "Definition of Done" for this issue?
from moduletestingenvironment.
from moduletestingenvironment.
Probably the real achievement here would be to understand what is taking so much time and why.
But if we want a range to aim for, I'd say 5 seconds per test would make it merely "really slow" instead of the "extraordinarily slow" it is now. That'd be about two minutes for the suite of 22 tests.
from moduletestingenvironment.
After saying that, it did run in under two minutes on my branch just now. (though not every test was under 5s)
Does that branch speed it up so much? Or is it faster because I only have iota modules, and it would be slower in an omega workspace? 🤔
from moduletestingenvironment.
Or is it faster because I only have iota modules
We've definitely seen issues in the past caused/aggravated by the presence of other modules, even if they aren't activated for whatever is going on. But on the other hand it is slow in Jenkins which has the minimum amount of modules.
If the tests in the MTE itself are just more memory hungry, are we sure they'll take advantage of all available memory? I think the current build agents are configured to offer up to 6 GB, but I have no idea if Gradle is allowing that (especially if it launches in different contexts / peripheral ones like module test cases rather than a plain engine build)
from moduletestingenvironment.
The devil is in the detail here: The tests are not running out of heap space, but out of native memory space. The eBay team wrote a SRE Case Study on this, and there a couple of other blog posts on troubleshooting it.
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
It does look like we accumulate more and more used memory in the native space each time we start up a new engine for an MTE test, seemingly without cleaning up that memory in between. I can run a couple of tests locally just fine, but once crossing a specific threshold the test executor will crash eventually.
In a bit more detail: it all starts from StateLoading
, which tries to push a screen, and then we do a bunch of asset (re-)loading. This happens each time we start an MTE test.
...
at org.terasology.engine.core.modes.StateLoading.init(StateLoading.java:139)
at org.terasology.engine.core.TerasologyEngine.switchState(TerasologyEngine.java:591)
at org.terasology.engine.core.TerasologyEngine.processPendingState(TerasologyEngine.java:580)
at org.terasology.engine.core.TerasologyEngine.tick(TerasologyEngine.java:484)
at org.terasology.moduletestingenvironment.ModuleTestingEnvironment.createHost(ModuleTestingEnvironment.java:475)
at org.terasology.moduletestingenvironment.ModuleTestingEnvironment.setup(ModuleTestingEnvironment.java:167)
at org.terasology.moduletestingenvironment.MTEExtension.beforeAll(MTEExtension.java:93)
...
It does look like we have a native memory leak in there, which becomes apparent here because we start and stop the engine a bunch of times... Eventually, the test executor will crash with the above mentioned OOM.
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118)
at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317)
at org.terasology.engine.rendering.assets.texture.AWTTextureFormat.convertToTextureData(AWTTextureFormat.java:78)
at org.terasology.engine.rendering.assets.texture.AWTTextureFormat.load(AWTTextureFormat.java:50)
at org.terasology.engine.rendering.assets.texture.AWTTextureFormat.load(AWTTextureFormat.java:25)
at org.terasology.gestalt.assets.format.producer.UnloadedAssetData$AssetSourceResolver.load(UnloadedAssetData.java:311)
at org.terasology.gestalt.assets.format.producer.UnloadedAssetData.load(UnloadedAssetData.java:181)
at org.terasology.gestalt.assets.format.producer.AssetFileDataProducer.getAssetData(AssetFileDataProducer.java:227)
at org.terasology.gestalt.assets.AssetType.lambda$reload$2(AssetType.java:360)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at org.terasology.gestalt.assets.AssetType.reload(AssetType.java:358)
...
from moduletestingenvironment.
It seems likely that some of the things causing problems for re-using the same process for multiple MTE tests might also have been causing problems for switching between games in the same Terasology run. There was some work on that in MovingBlocks/Terasology#4829; we should review that and see if MTE is doing the appropriate dispose steps.
from moduletestingenvironment.
Follow-ups:
[closing because I can't transfer issues between @Terasology and @MovingBlocks]
from moduletestingenvironment.
Related Issues (20)
- Mystery issue in large (module-wise) workspaces with MTE tests HOT 1
- Cannot run MTE tests in a standalone module build environment (like Jenkins)
- RFC: Make run* helper timeouts more humane.
- Guard against infinite loops - add timeout somehow if a test gets stuck? HOT 3
- Tests can be overly resource hungry or otherwise get out of control - add hardening somehow? HOT 2
- Package as a library instead of a module HOT 1
- ReuseEngineTest failure HOT 1
- CrashReporter should not be enabled HOT 4
- "Connection refused" errors when attempting parallel test execution HOT 1
- provide in-memory world implementation (MapWorldProvider) HOT 1
- "For complete docs please see the" broken link
- SimpleFarming transitive dependencies irregularities HOT 4
- test attempts to run despite module resolution failure HOT 2
- add ability to use a WorldGenerator defined in src/test (not src/main) HOT 2
- add ability to use assets defined in src/test HOT 1
- Annotation-based dependency definition seems quirky HOT 6
- createClient fails while initializing engine state HOT 5
- engine lifecycle / test case isolation HOT 4
- reducing the number of annotations HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from moduletestingenvironment.