linkedin / dynamometer Goto Github PK
View Code? Open in Web Editor NEWA tool for scale and performance testing of HDFS with a specific focus on the NameNode.
License: BSD 2-Clause "Simplified" License
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
License: BSD 2-Clause "Simplified" License
There are a few portions of the codebase that do not yet play well with Hadoop 3.0.
The TestDynamometerInfra failed,it may be need revise.
> Configure project :
Building version '0.1.4' (value loaded from 'version.properties' file).
> Task :dynamometer-infra:test
com.linkedin.dynamometer.TestDynamometerInfra > classMethod FAILED
java.net.UnknownHostException at TestDynamometerInfra.java:146
3 tests completed, 1 failed
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':dynamometer-infra:test'.
> There were failing tests. See the report at: file:///home/hxh/hadoop/dynamometer-0.1.3/dynamometer-infra/build/reports/tests/test/index.html
* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
* Get more help at https://help.gradle.org
BUILD FAILED in 46s
45 actionable tasks: 7 executed, 38 up-to-date
Currently the test fails by default (if no system properties are specified) because 2.7.4 can no longer be downloaded from the Apache mirror. Ideally #10 should be used to solve this, but in the mean time, bump the version up to 2.7.5.
Due to the way blocks are injected into the simulated DataNodes in Dynamometer, sometimes it is possible for the DataNode to send its initial block report completely empty. This is because the blocks are injected after the DataNode is started, due to the setup of a MiniDFSCluster. If this happens, another block report will not be sent until the block report interval has passed, which can be a very long time. This can result in (a) test timeouts (b) long setup times.
We can add the ability for the ApplicationMaster to monitor which nodes have not reported full block reports, and trigger block reports on those nodes.
Though the project has some tests, it is essentially just one monolithic integration test. We need more fine-grained unit tests to be able to more easily diagnose issues as they arise, and to increase the possibility of catching breakages.
When waiting for the Dynamometer infra application to be "ready" to use, the Client and ApplicationMaster wait for the following criteria to be met:
These percentage-based thresholds are currently hard-coded, but it would be useful if they were configurable to give a user more control over how strict the readiness condition is.
[mr@redhat143 dynamometer-fat-0.1.5]$ ./bin/start-workload.sh -Dauditreplay.input-path=hdfs:///dyno/audit_input_logs/ -Dauditreplay.output-path=hdfs:///dyno/audit_output_logs/ -Dauditreplay.log-start-time.ms=1554247070151 -Dauditreplay.num-threads=1 -nn_uri hdfs://redhat142:9000/ -start_time_offset 1m -mapper_class_name AuditReplayMapper
2019-04-03 08:08:56,771 INFO com.linkedin.dynamometer.workloadgenerator.WorkloadDriver: The workload will start at 1554250196743 ms (2019/04/03 08:09:56 CST)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/ZDH/parcels/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mr/dynamometer-fat-0.1.5/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mr/dynamometer-fat-0.1.5/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mr/dynamometer-fat-0.1.5/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2019-04-03 08:09:04,118 INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl: Timeline service address: http://redhat143:8188/ws/v1/timeline/
2019-04-03 08:09:06,712 INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to process : 1
2019-04-03 08:09:07,156 INFO org.apache.hadoop.mapreduce.JobSubmitter: number of splits:1
2019-04-03 08:09:07,591 INFO org.apache.hadoop.mapreduce.JobSubmitter: Submitting tokens for job: job_1554243591539_0010
2019-04-03 08:09:07,799 INFO org.apache.hadoop.conf.Configuration: found resource resource-types.xml at file:/etc/zdh/yarn/conf.zdh.yarn/resource-types.xml
2019-04-03 08:09:08,703 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1554243591539_0010
2019-04-03 08:09:08,819 INFO org.apache.hadoop.mapreduce.Job: The url to track the job: http://redhat142:8088/proxy/application_1554243591539_0010/
2019-04-03 08:09:08,820 INFO org.apache.hadoop.mapreduce.Job: Running job: job_1554243591539_0010
2019-04-03 08:09:28,561 INFO org.apache.hadoop.mapreduce.Job: Job job_1554243591539_0010 running in uber mode : false
2019-04-03 08:09:28,564 INFO org.apache.hadoop.mapreduce.Job: map 0% reduce 0%
[mr@redhat143 ~]$ hdfs dfs -ls /dyno/audit_output_logs
Found 2 items
-rw-r----- 3 mr users 0 2019-04-03 16:49 /dyno/audit_output_logs/_SUCCESS
-rw-r----- 3 mr users 295 2019-04-03 16:49 /dyno/audit_output_logs/part-r-00000
[mr@redhat143 ~]$ hdfs dfs -cat /dyno/audit_output_logs/part-r-00000
mr,READ,OPEN,-1,-3812586358584700876
mr,WRITE,CREATE,-1,-3089982714344429856
mr,WRITE,DELETE,67108863,357792779
mr,WRITE,MKDIRS,8796093022207,943151732492469
mr,WRITE,RENAME,70368744177663,521322769738249
mr,WRITE,SETPERMISSION,-1,-6855717651319934733
mr,WRITE,SETREPLICATION,16777215,161654944
I have some questions:
I'm looking forward to your reply. Thanks!
Master started failing as of Travis Build #76 on 2018-10-05. Build #75 on 2018-09-28 was fine, and no commits went in between the two.
The Hadoop tarball to use during testing is specified and downloaded from an Apache mirror. Generally only the latest version of each branch is available, but right now, versions must be fully specified (e.g. 2.7.4), so they will go out of date as new maintenance releases come into existence. It should be possible to specify like e.g. 2.7.*
In Hadoop 3.0/CDH5.7 and above,
HDFS-9260 (Improve the performance and GC friendliness of NameNode startup and full block reports) changed the internal representation of block replicas, as well as the block report processing logic in NameNode.
After HDFS-9260, NN expects block replicas to be reported in ascending order of block id. If a block id is not in order, NN discards it silently. Because simulated DataNode in Dynamometer uses hash map to store block replicas, the replicas are not reported in order. The Dynamometer cluster would then see missing blocks gradually increase several minutes after NN starts.
Suggest to change SimulatedBPStorage.blockMap to a TreeMap sorted by block id. Will supply a patch for the proposed change.
Credit: @fangyurao for identifying the issue, and help verifying the fix.
Failure to compile with release dynamometer-0.1.0 package。
root@hadoop:/home/hxh/hadoop/dynamometer-0.1.0# gradle build
Parallel execution with configuration on demand is an incubating feature.
FAILURE: Build failed with an exception.
Where:
Build file '/home/hxh/hadoop/dynamometer-0.1.0/build.gradle' line: 18
What went wrong:
Plugin [id: 'org.shipkit.java', version: '2.1.3'] was not found in any of the following sources:
Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output . Run with --scan to get full insights.
Get more help at https://help.gradle.org
BUILD FAILED in 32s
Currently audit workload traces are performed at the same rate at which they were initially recorded. It would be useful to be able to specify a factor by which to adjust their rate to artificially simulate heavier or lighter workloads of the same characteristics.
The integration test checks for the presence of the truncate
command since it was previously necessary for the infrastructure application setup. It is no longer necessary, so we should remove this check.
The metrics files produced during Dynamometer execution may include values in scientific notation, like 4.2E8
. The parse-metrics.sh
script will currently truncate these to end prior to the E
, causing incorrect results.
There are two glaring issues with the accuracy of audit replay:
Tests are no good if they don't get run! Let's put in place Travis CI which is free for open source projects.
We can provide a hook in the root build.gradle
for the build environment to have specific overrides. This allows the project to be build in more environments and adapt more gracefully to company-specific tooling.
Especially given that a NameNode may have special hardware requirements, it is useful to be able to specify a node label for the containers that the infrastructure application will be launched on. This can be used to e.g. node label a special node to run the NameNode.
The TestDynamometerInfra
test relies on some system properties to be able to change behaviors in the test. Gradle needs to be explicitly told to pass system properties, so attempting to specify these when running the tests via Gradle does not work.
Though Dynamometer supports executing a few different versions of Hadoop, it will currently only run on a 2.7 cluster. This tracks adding support for the rest of the 2.x line.
This class was necessary for the old method of setting up DataNodes which involved laying out fake blocks on disk; we had to know the layout version in that case. Now that SimulatedMultiStorageFSDataset
is used instead, this logic is no longer necessary.
This might be related with #52 .
In https://github.com/linkedin/dynamometer/blob/master/dynamometer-infra/src/main/java/com/linkedin/dynamometer/Client.java#L327 it is trying to find whether there is help option entered for the function, but internally it is using a GnuParser (which is deprecated from version 1.3), and during the flatten process it will try to pull the substring of -h
from option -hadoop_binary_path
and thinks it is asking for help information.
Note this only happens when you put the -hadoop_binary_path
as the first option since other options will make the flatten end early.
When running the replay with audit logs, some warning will come up
18/11/06 01:23:36 WARN audit.AuditReplayThread: Unsupported/invalid command: AuditReplayCommand(absoluteTimestamp=1539650537967, ugi=xxx, command=getEZForPath, src=/certain/path, dest=null, sourceIP=x.x.x.x
The block generation job has custom output logic to allow each reducer to output to multiple block files.
When speculative execution is enabled, this can result in two copies of the same block file being generated (one of which may be incomplete). This can be worked around by setting mapreduce.reduce.speculative = false
.
When a reducer attempt fails, the partial output files will not be cleaned up. I'm not aware of an easy workaround for this beyond manually cleaning up the files after the job completes.
We should have each reducer use a staging directory and only move the output files when it completes.
There have such a description in the read.me
document.
The audit trace replay accepts one input file per mapper, and currently supports two input formats, configurable via the auditreplay.command-parser.class configuration.
Where do we need to configure the auditreplay.command-parser.class?
When using this format you must also specify auditreplay.log-start-time.ms
and how should we specify the auditreplay.log-start-time.ms
?
Hi, I want to use dynamometer using dynamometer script. But it always fails for reasons I don't know inside the container after the yarn app is submitted. Anyone have any good comments?
2022-06-15 18:36:40,223 DEBUG retry.RetryInvocationHandler: Exception while invoking call #30 ClientNamenodeProtocolTranslatorPB.getBlockLocations over null. Not retrying because try once and fail.
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/kjh/.dynamometer/application_1655275704983_0013/nn_info.prop
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:156)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1990)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:768)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:442)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1562)
at org.apache.hadoop.ipc.Client.call(Client.java:1508)
at org.apache.hadoop.ipc.Client.call(Client.java:1405)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy9.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:327)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:869)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:858)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:847)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1015)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:318)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:330)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:906)
at org.apache.hadoop.tools.dynamometer.DynoInfraUtils.waitForAndGetNameNodeProperties(DynoInfraUtils.java:236)
at org.apache.hadoop.tools.dynamometer.Client.lambda$monitorInfraApplication$3(Client.java:902)
at java.lang.Thread.run(Thread.java:748)
below is dynamometer run command:
./dynamometer-infra/bin/start-dynamometer-cluster.sh -hadoop_binary_path /home/kjh/Downloads/hadoop-3.2.2.tar.gz -conf_path /usr/local/hadoop/etc/hadoop -fs_image_dir hdfs:///dyno/fsimage -block_list_path hdfs:///dyno/blocks
Currently, within a single workload replay mapper, numerous threads are started to replay back commands. Commands are partitioned by their source path, and then directed to a thread corresponding to their partition. This results in issues with, for example, skewed paths in which single threads get backed up and result in commands being executed far later than initially intended. Instead, we should simply let all of the threads share a single DelayQueue to spread the load among all of them.
This has the disadvantage that some operations occurring on the same source path may occur out of order, but this has not proved to be an issue in our experience.
I got fail when I try to launch dyno-cluster.
[root@ftp0 hadoop]# start-dynamometer-cluster.sh -hadoop_binary_path hadoop-2.7.2.tar.gz -conf_path /root/hadoop/hadoop-2.7.2/etc/hadoop/conf -fs_image_dir hdfs:///dyno/fsimage -block_list_path hdfs:///dyno/blocks1
console log :
19/07/25 11:47:43 INFO dynamometer.Client: Running Client
19/07/25 11:47:43 INFO client.RMProxy: Connecting to ResourceManager at ftp0/192.168.103.159:8032
19/07/25 11:47:43 INFO dynamometer.Client: Got Cluster metric info from ASM, numNodeManagers=3
19/07/25 11:47:43 INFO dynamometer.Client: Queue info, queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=0, queueChildQueueCount=0
19/07/25 11:47:43 INFO dynamometer.Client: Max mem capabililty of resources in this cluster 9000
19/07/25 11:47:43 INFO dynamometer.Client: Max virtual cores capabililty of resources in this cluster 50
19/07/25 11:47:43 INFO dynamometer.Client: Set the environment for the application master
19/07/25 11:47:43 INFO dynamometer.Client: Using resource FS_IMAGE directly from current location: hdfs://ftp0:9000/dyno/fsimage/fsimage_0000000000000108883
19/07/25 11:47:43 INFO dynamometer.Client: Using resource FS_IMAGE_MD5 directly from current location: hdfs://ftp0:9000/dyno/fsimage/fsimage_0000000000000108883.md5
19/07/25 11:47:43 INFO dynamometer.Client: Using resource VERSION directly from current location: hdfs:/dyno/fsimage/VERSION
19/07/25 11:47:43 INFO dynamometer.Client: Uploading resource CONF_ZIP from [/root/hadoop/hadoop-2.7.2/etc/hadoop/conf] to hdfs://ftp0:9000/user/root/.dynamometer/application_1564026451259_0001/conf.zip
19/07/25 11:47:44 INFO dynamometer.Client: Uploading resource START_SCRIPT from [file:/tmp/hadoop-unjar5145675343523534600/start-component.sh] to hdfs://ftp0:9000/user/root/.dynamometer/application_1564026451259_0001/start-component.sh
19/07/25 11:47:44 INFO dynamometer.Client: Uploading resource HADOOP_BINARY from [hadoop-2.7.2.tar.gz] to hdfs://ftp0:9000/user/root/.dynamometer/application_1564026451259_0001/hadoop-2.7.2.tar.gz
19/07/25 11:47:44 INFO dynamometer.Client: Uploading resource DYNO_DEPS from [/root/dynamometer/build/distributions/dynamometer-0.1.7/bin/../lib/dynamometer-infra-0.1.7.jar] to hdfs://ftp0:9000/user/root/.dynamometer/application_1564026451259_0001/dependencies.zip
19/07/25 11:47:44 INFO dynamometer.Client: Completed setting up app master command: [$JAVA_HOME/bin/java, -Xmx1741m, com.linkedin.dynamometer.ApplicationMaster, --datanode_memory_mb 2048, --datanode_vcores 1, --datanodes_per_cluster 1, --datanode_launch_delay 0s, --namenode_memory_mb 2048, --namenode_vcores 1, --namenode_metrics_period 60, 1><LOG_DIR>/stdout, 2><LOG_DIR>/stderr]
19/07/25 11:47:44 INFO dynamometer.Client: Submitting application to RM
19/07/25 11:47:44 INFO impl.YarnClientImpl: Submitted application application_1564026451259_0001
19/07/25 11:47:45 INFO dynamometer.Client: Track the application at: http://ftp0:8088/proxy/application_1564026451259_0001/
19/07/25 11:47:45 INFO dynamometer.Client: Kill the application using: yarn application -kill application_1564026451259_0001
19/07/25 11:48:00 INFO dynamometer.Client: NameNode can be reached via HDFS at: hdfs://ftp1:9002/
19/07/25 11:48:00 INFO dynamometer.Client: NameNode web UI available at: http://ftp1:50077/
19/07/25 11:48:00 INFO dynamometer.Client: NameNode can be tracked at: http://ftp1:8042/node/containerlogs/container_1564026451259_0001_01_000002/root/
19/07/25 11:48:00 INFO dynamometer.Client: Waiting for NameNode to finish starting up...
19/07/25 11:48:07 INFO dynamometer.Client: Infra app exited unexpectedly. YarnState=FINISHED. Exiting from client.
19/07/25 11:48:07 INFO dynamometer.Client: Attempting to clean up remaining running applications.
19/07/25 11:48:07 ERROR dynamometer.Client: Application failed to complete successfully
After that, I go to see container log under Hadoop.
[root@ftp0 container_1564026451259_0001_01_000001]# pwd
/root/hadoop/hadoop-2.7.2/logs/userlogs/application_1564026451259_0001/container_1564026451259_0001_01_000001
[root@ftp0 container_1564026451259_0001_01_000001]# ls
stderr stdoutstdout is empty !!
stderr :
19/07/25 11:47:51 INFO dynamometer.ApplicationMaster: Setting up container launch context for containerid=container_1564026451259_0001_01_000002, isNameNode=true
19/07/25 11:47:51 INFO dynamometer.ApplicationMaster: Completed setting up command for namenode: [./start-component.sh, namenode, hdfs://ftp0:9000/user/root/.dynamometer/application_1564026451259_0001, 1><LOG_DIR>/stdout, 2><LOG_DIR>/stderr]
19/07/25 11:47:51 INFO dynamometer.ApplicationMaster: Starting NAMENODE; track at: http://ftp1:8042/node/containerlogs/container_1564026451259_0001_01_000002/root/
19/07/25 11:47:51 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1564026451259_0001_01_000002
19/07/25 11:47:51 INFO impl.ContainerManagementProtocolProxy: Opening proxy : ftp1:34334
19/07/25 11:47:51 INFO dynamometer.ApplicationMaster: NameNode container started at ID container_1564026451259_0001_01_000002
19/07/25 11:48:00 INFO dynamometer.ApplicationMaster: NameNode information: {NM_HTTP_PORT=8042, NN_HOSTNAME=ftp1, NN_HTTP_PORT=50077, NN_SERVICERPC_PORT=9022, NN_RPC_PORT=9002, CONTAINER_ID=container_1564026451259_0001_01_000002}
19/07/25 11:48:00 INFO dynamometer.ApplicationMaster: NameNode can be reached at: hdfs://ftp1:9002/
19/07/25 11:48:00 INFO dynamometer.ApplicationMaster: Waiting for NameNode to finish starting up...
19/07/25 11:48:05 INFO dynamometer.ApplicationMaster: Got response from RM for container ask, completedCnt=1
19/07/25 11:48:05 INFO dynamometer.ApplicationMaster: Got container status for NAMENODE: containerID=container_1564026451259_0001_01_000002, state=COMPLETE, exitStatus=1, diagnostics=Exception from container-launch.
Container id: container_1564026451259_0001_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zer...
19/07/25 11:48:05 INFO dynamometer.ApplicationMaster: NameNode container completed; marking application as done
19/07/25 11:48:06 INFO dynamometer.ApplicationMaster: NameNode has started!
19/07/25 11:48:06 INFO dynamometer.ApplicationMaster: Looking for block listing files in hdfs:/dyno/blocks1
19/07/25 11:48:06 INFO dynamometer.ApplicationMaster: Requesting 2 DataNode containers with 2048MB memory, 1 vcores,
19/07/25 11:48:06 INFO dynamometer.ApplicationMaster: Finished requesting datanode containers
19/07/25 11:48:06 INFO dynamometer.ApplicationMaster: Application completed. Stopping running containers
19/07/25 11:48:06 INFO impl.ContainerManagementProtocolProxy: Opening proxy : ftp1:34334
19/07/25 11:48:07 INFO dynamometer.ApplicationMaster: Application completed. Signalling finish to RM
19/07/25 11:48:07 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
19/07/25 11:48:07 INFO dynamometer.ApplicationMaster: Application Master failed. exiting
19/07/25 11:48:07 INFO impl.AMRMClientAsyncImpl: Interrupted while waiting for queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)
Thanks in advanced !!
The Client launches two jobs, the infrastructure job and the workload job, which right now pull from the same configuration. It can be useful to be able to have configurations which apply to only one of the two. Provide a command line option to specify configurations which apply only to the workload job, and not the infrastructure job, so that specific overrides can be applied.
Currently each DataNode is launched as a separate process/JVM, and we fool it into thinking it has all of its necessary blocks by creating the files as 0-length. It would be much more efficient to launch all of the DataNodes in the same JVM using MiniDFSCluster
, and to use SimulatedFSDataset
to store the block metadata only in-memory, saving us from having to create millions of sparse files on disk.
Currently, the NameNode's name and edit dir are stored within the NodeManager local storage. Given the potentially performance critical nature of these storages (writing edit logs to disk can have significant performance impact), it can be desirable to configure these to e.g. a dedicated disk. Provide a way to override the default.
After running start-dynamometer-cluster.sh and replay the prod audit log for some time, some simulated datanodes (containers) lost connection to the RM and when the Yarn application is killed, these containers are still running, which will sending their blocks to the Namenode.
In this case, since datanode has gone through some changes with the replay where Namenode started from a fresh fsimage. Below errors will show up in the webhdfs page after the Namenode starts up.
Safe mode is ON. The reported blocks 1526116 needs additional 395902425 blocks to reach the threshold 0.9990 of total blocks 397826363. The number of live datanodes 3 has reached the minimum number 0. Name node detected blocks with generation stamps in future. This means that Name node metadata is inconsistent.This can happen if Name node metadata files have been manually replaced. Exiting safe mode will cause loss of 7141 byte(s). Please restart name node with right metadata or use "hdfs dfsadmin -safemode forceExitif you are certain that the NameNode was started with thecorrect FsImage and edit logs. If you encountered this duringa rollback, it is safe to exit with -safemode forceExit.
and checking datanode tab in the webhdfs page, a list of a couple datanodes will show up.
start-dynamometer-cluster.sh command:
./start-dynamometer-cluster.sh --hadoop_binary_path hadoop-2.7.2.tar.gz --conf_path /opt/hadoop/wz/dynamome --conf_path /opt/hadoop/wz/dynamometer/bin/conf/ --fs_image_dir hdfs:///dyno/fsimage --block_list_path
check the NameNode's starting log on the AM node :
2019-01-08 16:32:38,311 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2019-01-08 16:32:38,315 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: createNameNode [-D, fs.defaultFS=hdfs://host-xx-xx:9002, -D, dfs.namenode.rpc-address=host-xx-xx:9002, -D, dfs.namenode.servicerpc-address=host-xx-xx:9022, -D, dfs.namenode.http-address=host-xx-xx:50077, -D, dfs.namenode.https-address=host-xx-xx:0, -D, dfs.namenode.name.dir=file:///opt/huawei/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1546852874867_0024/container_1546852874867_0024_01_000002/dyno-node/name-data, -D, dfs.namenode.edits.dir=file:///opt/huawei/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1546852874867_0024/container_1546852874867_0024_01_000002/dyno-node/name-data, -D, dfs.namenode.checkpoint.dir=file:///opt/huawei/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1546852874867_0024/container_1546852874867_0024_01_000002/dyno-node/checkpoint, -D, dfs.namenode.safemode.threshold-pct=0.0f, -D, dfs.permissions.enabled=true, -D, dfs.cluster.administrators="", -D, dfs.block.replicator.classname=com.linkedin.dynamometer.BlockPlacementPolicyAlwaysSatisfied, -D, hadoop.security.impersonation.provider.class=com.linkedin.dynamometer.AllowAllImpersonationProvider, -D, hadoop.tmp.dir=/opt/huawei/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1546852874867_0024/container_1546852874867_0024_01_000002/dyno-node, -D, hadoop.security.authentication=simple, -D, hadoop.security.authorization=false, -D, dfs.http.policy=HTTP_ONLY, -D, dfs.client.read.shortcircuit=false]
2019-01-08 16:32:38,318 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/***********************************************************
SHUTDOWN_MSG: Shutting down NameNode at host-xx-xx
************************************************************/
In start-component.sh , line 277:
${HADOOP_HOME}/sbin/hadoop-daemon.sh start namenode $namenodeConfigs $NN_ADDITIONAL_ARGS;
It seems that NameNode can't recognize the parameters ( $namenodeConfigs )
The $namenodeConfigs is like this :
read -r -d '' namenodeConfigs <<EOF
-D fs.defaultFS=hdfs://${nnHostname}:${nnRpcPort}
-D dfs.namenode.rpc-address=${nnHostname}:${nnRpcPort}
-D dfs.namenode.servicerpc-address=${nnHostname}:${nnServiceRpcPort}
-D dfs.namenode.http-address=${nnHostname}:${nnHttpPort}
-D dfs.namenode.https-address=${nnHostname}:0
-D dfs.namenode.name.dir=file://${nameDir}
-D dfs.namenode.edits.dir=file://${editsDir}
If I has usage error for start-dynamometer-cluster.sh ?
Given that permission checking can be a fairly heavy operation, it is not ideal that Dynamometer currently disabled permissions to let the workload replay user execute all operations. Instead, this job should proxy as the user who initially performed the operation.
I got this:
sunchao@HOST:~/dynamometer$ bin/start-dynamometer-cluster.sh -help
18/07/30 21:54:29 INFO dynamometer.Client: Initializing Client
18/07/30 21:54:29 FATAL dynamometer.Client: Error running Client
org.apache.commons.cli.MissingOptionException: Missing required option: [-hadoop_binary_path Location of Hadoop binary to be deployed (archive). One of this or hadoop_version is required., -hadoop_version Version of Hadoop (like '2.7.4' or '3.0.0-beta1') for which to download a binary. If this is specified, a Hadoop tarball will be downloaded from an Apache mirror. By default the Berkeley OCF mirror is used; specify dyno.apache-mirror as a configuration or system property to change which mirror is used. The tarball will be downloaded to the working directory. One of this or hadoop_binary_path is required.]
at org.apache.commons.cli.Parser.checkRequiredOptions(Parser.java:299)
at org.apache.commons.cli.Parser.parse(Parser.java:231)
at org.apache.commons.cli.Parser.parse(Parser.java:85)
at com.linkedin.dynamometer.Client.init(Client.java:323)
at com.linkedin.dynamometer.Client.run(Client.java:228)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at com.linkedin.dynamometer.Client.main(Client.java:220)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
while trying to get helper message from the script. It seems I need to pass argument for hadoop_binary_path
but is it really necessary for the -help
option?
upload-fsimage.sh
failed at the end because VERSION
file already exists in the target HDFS dir:
sunchao@HOSTNAME:~/dynamometer$ bin/upload-fsimage.sh XXX hdfs:///app/dyno/fsimage /home/sunchao/fsimage
Using fsimage: fsimage_XXX
Creating temporary XML fsimage file at /tmp/tmp.IUfFlwXXdk/fsimage_XXX.xml
Created temporary XML fsimage file
Uploading /home/sunchao/fsimage/current/fsimage_XXX...
Uploading /tmp/tmp.IUfFlwXXdk/fsimage_XXX.xml...
Uploading /home/sunchao/fsimage/current/fsimage_XXX.md5...
Uploading /home/sunchao/fsimage/current/VERSION...
copyFromLocal: `hdfs:///app/dyno/fsimage/VERSION': File exists
Error while uploading /home/sunchao/fsimage/current/VERSION; exiting
Perhaps we can check if VERSION
exists before converting to XML? or maybe skip the check if it already exists? this is only a minor issue though as the fsimage file is already uploaded.
start-dynamometer-cluster.sh command:
./bin/start-dynamometer-cluster.sh -conf_path /root/dynamometer/dynamometer0.1.7/myconf -fs_image_dir hdfs:///dyno/fsimage -block_list_path hdfs:///dyno/blocks -hadoop_binary_path /root/dynamometer/dynamometer0.1.7/hadoop-2.8.3.tar.gz
the console error info:
19/07/18 11:23:37 INFO impl.YarnClientImpl: Submitted application application_1563419715675_0002
19/07/18 11:23:38 INFO dynamometer.Client: Track the application at: http://centos-node1:8088/proxy/application_1563419715675_0002/
19/07/18 11:23:38 INFO dynamometer.Client: Kill the application using: yarn application -kill application_1563419715675_0002
19/07/18 11:23:58 INFO dynamometer.Client: NameNode can be reached via HDFS at: hdfs://centos-node2:9002/
19/07/18 11:23:58 INFO dynamometer.Client: NameNode web UI available at: http://centos-node2:50077/
19/07/18 11:23:58 INFO dynamometer.Client: NameNode can be tracked at: http://centos-node2:8042/node/containerlogs/container_1563419715675_0002_01_000002/root/
19/07/18 11:23:58 INFO dynamometer.Client: Waiting for NameNode to finish starting up...
19/07/18 11:24:09 INFO dynamometer.Client: Infra app exited unexpectedly. YarnState=FINISHED. Exiting from client.
19/07/18 11:24:09 INFO dynamometer.Client: Attempting to clean up remaining running applications.
19/07/18 11:24:09 ERROR dynamometer.Client: Application failed to complete successfully
then i use "$?" to test the state of last commond :
[root@centos-node1 dynamometer0.1.7]# echo $?
2
Inspired by the previous question,i check the application's stderr
cat: metricsTailPIDFile: No such file or directory
./start-component.sh: line 299: 2207 Terminated sleep 1
and check the application's stdout:
starting namenode, logging to /export/server/hadoop-2.8.3/logs/userlogs/application_1563419715675_0002/container_1563419715675_0002_01_000002/hadoop-root-namenode-centos-node2.out
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Going to attempt to upload metrics to: hdfs://centos-node1:9000/user/root/.dynamometer/application_1563419715675_0002/namenode_metrics
Unable to upload metrics to HDFS
Started namenode at pid 2124
Waiting for parent process (PID: 2005) OR namenode process to exit
Cleaning up namenode at pid 2124
Deleting any remaining files
If i have something wrong in configuration?
Currently within waitForNameNodeReadiness
, while waiting for the NameNode to have received enough block reports to be ready for use, the AppMaster
will poll the NameNode to discover which DataNodes haven't sent block reports yet and trigger reports on those DataNodes. This can help when a DataNode sent its initial block report before all of its blocks were injected, in which case a better report wouldn't be sent until the block report interval expired (which can be very long). Right now it stops as soon as the block thresholds are met, but it would be better if it continued to do this even after the thresholds are met, until all DataNodes have actually reported.
Dynamometer does NOT support negative block id's, which results in blocks with negative id's never being reported by any simulated DataNode.
A change has been made to XMLParser.java in our branch of Dynamometer so that negative block id's are also dealt with.
Due to the change made above, we have to change SimulatedMultiStorageFSDataset.java as well.
In Dynamometer, each DataNode has more than 1 SimulatedStorage to manage, and this following Map is maintained by each SimulatedStorage in a simulated DataNode. Moreover, a SimulatedStorage could be involved in multiple blockpools.
Map<blockpool id, Map<block, block information>>
To access a given block (associated with a blockpool id) on a simulated DataNode, we have to
( i) determine which SimulatedStorage this given block belongs to according to its block id, and then
(ii) use the associated blockpool id to retrive Map<block, block information> corresponding to the block to be accessed.
The SimulatedStorage's managed by a DataNode are arranged on an ArrayList and each SimulatedStorage on the ArrayList could be accessed by a "non-negative" integer upper-bounded by the size of that ArrayList, exclusive. To determine the SimulatedStorage a given block belongs to, the original Dynamometer simply uses (block id % number of simulated storages) as the index to access the ArrayList mentioned above. Hence, once we have a negative block id, an ArrayIndexOutOfBoundsException will be triggered. Some changes have been made in SimulatedMultiStorageFSDataset.java so that a negative block id is properly taken care of.
I try to build latest dynamometer
When it run TestDynamometerInfra, it seems it stuck at this point
Dose anything need to be pre-installed for the build
kevin@kevin-pc:~/git/dynamometer(master)$ ./gradlew build
Parallel execution is an incubating feature.
Configure project :
Building version '0.1.7' (value loaded from 'version.properties' file).
Task :dynamometer-workload:compileJava
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Task :dynamometer-blockgen:javadoc
/home/kevin/git/dynamometer/dynamometer-blockgen/src/main/java/com/linkedin/dynamometer/blockgenerator/XMLParserMapper.java:26: warning - Tag @link: reference not found: org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer
1 warning
Task :dynamometer-workload:javadoc
/home/kevin/git/dynamometer/dynamometer-workload/src/main/java/com/linkedin/dynamometer/workloadgenerator/TimedInputFormat.java:29: warning - WorkloadDriver#START_TIMESTAMP_MS (referenced by @value tag) is an unknown reference.
1 warning
<============-> 94% EXECUTING [6m 27s]
IDLE
IDLE
:dynamometer-infra:test > 2 tests completed
:dynamometer-infra:test > Executing test com.linkedin.dynamometer.TestDynamometerInfra
IDLE
IDLE
To avoid the workload replay job tasks from timing out, we should add a background thread which occasionally updates progress.
Right now only running via command line is supported, but we should also support being run via a workflow scheduler. We can start with Azkaban. Some of the code added should be applicable to other schedulers as well.
Hi,
I collected all prerequisites (fsimage, audit log) and prepared local environment (accompanying hdfs, separate yarn manager) according to Dynamometer readme and tried to start workload scripts. Tried Hadoop versions: 2.7.4 and 2.8.4.
${DYN_HOME}/bin/upload-fsimage.sh 0894 ${HDFS_PATH}/fsimage \
${BASE_DIR}/fsimage-${HADOOP_VERSION}
fsimage - passed
${DYN_HOME}/bin/generate-block-lists.sh \
-fsimage_input_path ${HDFS_PATH}/fsimage/fsimage_0000000000000000894.xml \
-block_image_output_dir ${HDFS_PATH}/blocks \
-num_reducers 10 -num_datanodes 3
generate-block-lists - passed
${DYN_HOME}/bin/start-dynamometer-cluster.sh "" \
-hadoop_binary_path file://${BASE_DIR}/hadoop-${HADOOP_VERSION}.tar.gz \
-conf_path file://${BASE_DIR}/conf.zip \
-fs_image_dir ${HDFS_PATH}/fsimage \
-block_list_path ${HDFS_PATH}/blocks
start-dynamometer-cluster: looks working according to output:
...
19/10/18 03:56:56 INFO dynamometer.Client: NameNode has started!
19/10/18 03:56:56 INFO dynamometer.Client: Waiting for 2 DataNodes to register with the NameNode...
19/10/18 03:57:02 INFO dynamometer.Client: Number of live DataNodes = 2.00; above threshold of 2.00; done waiting after 6017 ms.
19/10/18 03:57:02 INFO dynamometer.Client: Waiting for MissingBlocks to fall below 0.010199999...
19/10/18 03:57:02 INFO dynamometer.Client: Number of missing blocks: 102.00
19/10/18 04:00:03 INFO dynamometer.Client: Number of missing blocks = 0.00; below threshold of 0.01; done waiting after 180082 ms.
19/10/18 04:00:03 INFO dynamometer.Client: Waiting for UnderReplicatedBlocks to fall below 1.02...
19/10/18 04:00:03 INFO dynamometer.Client: Number of under replicated blocks: 102.00
${DYN_HOME}/bin/start-workload.sh \
-Dauditreplay.log-start-time.ms=1000 \
-Dauditreplay.input-path=file://${BASE_DIR}/audit_logs-${HADOOP_VERSION} \
-Dauditreplay.output-path=${RESULTS_DIR} \
-Dauditreplay.num-threads=1 \
-nn_uri hdfs://$HOSTNAME:9000/ \
-start_time_offset 1m \
-mapper_class_name AuditReplayMapper
start-workload - it started and never finish during couple of hours repeating 'map > map':
19/10/18 04:07:53 INFO workloadgenerator.WorkloadDriver: The workload will start at 1571396933516 ms (2019/10/18 04:08:53 PDT)
19/10/18 04:07:54 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
19/10/18 04:07:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
19/10/18 04:07:55 INFO input.FileInputFormat: Total input files to process : 1
19/10/18 04:07:55 INFO mapreduce.JobSubmitter: number of splits:1
19/10/18 04:07:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local579807884_0001
19/10/18 04:07:55 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
19/10/18 04:07:55 INFO mapreduce.Job: Running job: job_local579807884_0001
19/10/18 04:07:55 INFO mapred.LocalJobRunner: OutputCommitter set in config null
19/10/18 04:07:55 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/10/18 04:07:55 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
19/10/18 04:07:55 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
19/10/18 04:07:55 INFO mapred.LocalJobRunner: Waiting for map tasks
19/10/18 04:07:55 INFO mapred.LocalJobRunner: Starting task: attempt_local579807884_0001_m_000000_0
19/10/18 04:07:55 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/10/18 04:07:55 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
19/10/18 04:07:55 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
19/10/18 04:07:55 INFO mapred.MapTask: Processing split: file:/home/rscherba/ws/hadoop/dynamometer-test/audit_logs-2.8.4/hdfs-audit.log:0+251649
19/10/18 04:07:55 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
19/10/18 04:07:55 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
19/10/18 04:07:55 INFO mapred.MapTask: soft limit at 83886080
19/10/18 04:07:55 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
19/10/18 04:07:55 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
19/10/18 04:07:55 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
19/10/18 04:07:55 INFO audit.AuditReplayMapper: Starting 1 threads
19/10/18 04:07:55 INFO audit.AuditReplayThread: Start timestamp: 1571396933516
19/10/18 04:07:55 INFO audit.AuditReplayThread: Sleeping for 57526 ms
19/10/18 04:07:56 INFO mapreduce.Job: Job job_local579807884_0001 running in uber mode : false
19/10/18 04:07:56 INFO mapreduce.Job: map 0% reduce 0%
19/10/18 04:08:07 INFO mapred.LocalJobRunner: map > map
19/10/18 04:13:07 INFO mapred.LocalJobRunner: map > map
19/10/18 04:18:07 INFO mapred.LocalJobRunner: map > map
19/10/18 04:23:07 INFO mapred.LocalJobRunner: map > map
19/10/18 04:28:07 INFO mapred.LocalJobRunner: map > map
19/10/18 04:33:07 INFO mapred.LocalJobRunner: map > map
19/10/18 04:38:07 INFO mapred.LocalJobRunner: map > map
19/10/18 04:43:07 INFO mapred.LocalJobRunner: map > map
...
How long Dynamometer workload should work? How run script arguments can affect test run? How to check in logs is there is something wrong in the configuration?
To xkrogen ,
Good afternoon! The NameNode would miss all blocks and none DataNode was registered when Manual Workload Launch. These commands was used:
1.Execute the Block Generation Job:
./generate-block-lists.sh -fsimage_input_path hdfs://cluster/user/qa/dyno/fsimage/fsimage_0000000000282000135.xml -block_image_output_dir hdfs://cluster/user/qa/dyno/blocks -num_reducers 1 -num_datanodes 1
2.Manual Workload Launch:
./start-dynamometer-cluster.sh --hadoop_binary_path hadoop-2.7.3-1.2.7.tar.gz --conf_path /home/hdfs/Dynamometer/dynamometer-0.1.0-SNAPSHOT/bin/hadoop --fs_image_dir hdfs://cluster/user/qa/dyno/fsimage --block_list_path hdfs://cluster/user/qa/dyno/blocks
#1 used direct access to DFSClient
to perform more accurate listing operations. To access the DFSClient
from within DistributedFileSystem
, a utility was added in the o.a.h.hdfs
package to access the package-private dfs
field. In the hadoopRuntime
(default) configuration, the o.a.h
package is excluded, so though this works fine in the bundled integration test, it fails when run from the generated zip.
I noticed that DistributedFileSystem
exports a public getClient()
method which we can use instead. It's marked @VisibleForTesting
, but is still less hacky than using a workaround to access package-private field.
Currently the default ACLs for viewing container logs are used, meaning only the launching user can view them. We can simply piggyback off of the MapReduce configuration for the same, mapreduce.job.acl-view-job
In the README it says:
./bin/start-workload.sh
-Dauditreplay.input-path hdfs:///dyno/audit_logs/
-Dauditreplay.num-threads 50
-nn_uri hdfs://namenode_address:port/
-start_time_offset 5m
-mapper_class_name AuditReplayMapper
However, it seems both Dauditreplay.input-path
and -Dauditreplay.num-threads
are not valid options. Only valid options are: nn_uri
, start_time_offset
, start_timestamp_ms
, and mapper_class_name
?
The Dynamometer job makes use of resources which are within its JAR such as the start_component.sh
script. If the JAR is first unpacked and individual files are added to the classpath, as with the hadoop jar
command, this currently works fine. However it does not work properly if the JAR is not unpacked, since the file does not actually exist anywhere (it is within an archive). We should support this to properly run from a normal JAR.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.