craylabs / smartredis Goto Github PK
View Code? Open in Web Editor NEWSmartSim Infrastructure Library Clients.
License: BSD 2-Clause "Simplified" License
SmartSim Infrastructure Library Clients.
License: BSD 2-Clause "Simplified" License
CommandList
is used to group multiple commands to be executed by RedisCluster
or Redis
. The Command
objects contained in the CommandList
are dynamically allocated and currently only pointers to the dynamically created Command
objects are stored. The copy and move operators for the CommandlList
are not implemented. These operators should be defined and implemented.
As issue #43 is addressed, the CommandList
object should be completed. The default operators defined by the compiler may not produce the intended behavior (i.e. copy operations would be a shallow copy of the underlying Command
). Default move operators should be sufficient, but should be specified as such in the header with default
.
Command
CommandList
CommandList
move operators as defaultI follow the instruction here: https://www.craylabs.org/docs/installation.html#smartredis to install SmartRedis. I think all of libs are successfully built. And then I compile the examples and no errors. However, when I run smartredis/examples/serial/cpp/build/smartredis_mnist, I get the error as below:
terminate called after throwing an instance of 'std::runtime_error'
what(): The environment variable SSDB must be set to use the client.
Aborted (core dumped)
Do I miss something when I install smartredis?
Description
Compiled clients have unpack_tensor()
to make memory management better (i.e. not have the client or dataset allocate memory and have that memory for the entire life of the object). However, Python does not have that. It seems like we could implement an analog of unpack_tensor()
if the user allocated a numpy array ahead of time. This could probably be accomplished with pybind11::array::mutable_data()
which appears to return a c-ptr to the underlying numpy array data. Since numpy defaults to c-contiguous memory, we should just be able to fill that (although none of this has been tested).
However, it would be more pythonic to not that the user allocate the numpy array, but just return a numpy array to them that relies on python reference counting to control scope. The path to implement this is a little less clear.
If the C++ client fails to connects to a database, the failed connection is not raised as a std::runtime_error
to the Python client. This should be changed as all other errors get raised as std::runtime_error
to the Python client.
Attempt to connect the Python client to a nonexistent database.
A std::runtime_error
should be raised.
All
Fortran has built-in support for complex numeric types, and these are used in some simulations. We should plumb support for these through SmartRedis.
Fortran users who do simulations with complex types will benefit
Paralleling existing numeric types should get us most of the way there once we figure out the in-memory representation of the complex types. There may need to be some back-end support to handle them if the data types are larger than what is supported today.
Although this functionality is essentially the same as another _exists() command, the differences in naming have tripped up new users and new developers.
Newer users and developers will find the new routine easier to find when looking for this functionality. Further, if the implementation for dataset existence ever diverges, we'll have a start on retraining users to use the new approach.
This will require doc update as well.
How should this feature be implemented? What will need to be done/included?
Description
Currently, metadata in a DataSet
is serialized using protobuf and associated with the key dataset_name.meta
. However, this single key, value pair does not allow the user to view the name of the metadata fields using the redis CLI. Moreover, protobuf is a very large dependency to include, given that the metadata types (e.g. protobuf message types) are relatively simple and are fixed.
Justification
Users will be able to view metadata fields using the redis-cli. Additionally, build/test/release will be simplified with the removal of a large dependency.
Implementation Strategy
DataSet
representation in the database to use hash field commands (e.g. HSET
) for metadata fields. For example, a command would look like HSET dataset_name.meta meta_field_1 serialized_value
. This would still use protobuf for each metadata field.Description
The Fortran client misses the calls needed to run ensembles in a flexible way, using the current features which are available in the other clients.
Justification
This will help running ensembles in Fortran.
Implementation Strategy
Steps needed:
Functions which still need a Fortran interface:
void use_tensor_ensemble_prefix(bool use_prefix);
void use_model_ensemble_prefix(bool use_prefix);
void set_data_source(std::string source_id);
Functions for which a Fortran interface exists (you might want to quickly review them):
bool poll_model(const std::string& name,
int poll_frequency_ms,
int num_tries);
bool poll_tensor(const std::string& name,
int poll_frequency_ms,
int num_tries);
bool model_exists(const std::string& name);
bool tensor_exists(const std::string& name);
Ideally a test would look like a simplified version of tests/cpp/client_test_ensemble.cpp
but we don't need to run models and scripts. Just check that the keys are prefixed as expected for a tensor, a model, a script, and a dataset, if env vars SSKEYIN
and SSKEYOUT
are set.
Description
Add SILC_INSTALL_PATH
to setup_env.sh
to make it easier for outside applications to reference the silc static library.
Give the python client access to the redis command FLUSHALL
by exposing it through the Pybindings. Here is the documentation for FLUSHALL
. This is a portion of issue #1 .
Users will be able delete all the keys of all the existing databases from the python client.
The C++ client has this functionality so just needs to be exposed through the Pybindings and pyclient.cpp file to the python client layer. This will be a non-keyed command that takes in an address of a db node. As with issue #90, we may want to add a function to pick a db node that is active so that we don't have to initialize a new connection.
This enables users from all around the world to work in their native language/character set.
Users from non-English speaking countries will gain access to full character sets
Easiest approach is to add parallel routines to the interface for Unicode support, then push the plumbing all the way through. Making sure that strings are tracked by type will be a good safety feature.
Description
The python client.run_model seg faults if the model name is wrong. This is because the C++ client throws an uncaught redis error stemming from a redis-plus-plus throw. To provide better behavior, we should have the C++ client throw a std::runtime_error to the python client for all redis errors. This would enable easier error catching in the python client.
Currently, the order of include paths in CMakeLists.txt
is:
include_directories(SYSTEM
install/include
include
src/fortran
)
However, this means that install/include
will be search before install
. This is problematic for developers because when repeatedly building the library after modifying header files, the files in install/include
will be used for the build. Currently, the install
directory needs to be deleted to build the library after modifying header files.
The order of includes should be:
include_directories(SYSTEM
include
install/include
src/fortran
)
make lib
// modify header file
make lib
The modified header file in include
should be used for the build.
All OS and systems.
Currently warnings are printed if an IOError or TimeoutError are encountered when executing a command. This can lead to too much output when many clients are active and it is hard to read. We will only have output on failed execution after all attempts are made.
This will improve user experience.
We will remove the warning output.
Currently the Python client uses SmartRedis::get_tensor()
inside of the PyClient::get_tensor()
. This means, that the SmartRedis::Client
is holding onto dynamically allocated memory even though it is copying that memory into a PyArray
. This could cause out of memory errors if PyClient::get_tensor()
is used inside of loops with large arrays. Currently there is no analog to unpack_tensor()
in the Python client.
It is likely users will use PyClient::get_tensor()
in a loop, and we need to make sure memory management is sufficient in the Python client.
There are two routes to solve this problem:
unpack_tensor()
function in Python that mimics the C++, C, and Fortran clients. This does not seem very Pythonic, however, as Python users won't really want to create a numpy array ahead of time.Pyclient::get_tensor()
to release the memory allocated by Client
after it has been copied to the numpy array.it seems like option 2 is the better behavior for Python users, but it would mean that get_tensor()
would have different memory management behavior in C/C++/Fortran and Python. We would need to call this out explicitly in the documentation.
This ticket will involve change the PyClient
code and adding protected
methods to SmartRedis::Client
to allow for the release of memory.
Description
The Python client should have the same metadata functionality that C/C++/Fortran has.
Description
Currently, the compiled client tests have CPU
specified in the source file. It might be worthwhile to specify CPU or GPU via the setup_test_env.sh
file or some other mechanism so the user or developer can switch between CPU
and GPU
.
Description
When executing run_model and run_script with a Redis cluster, data movement is needed to make sure that the input and output tensors conform to the hash slot data locality requirements for Redis. As a result, the run_model()
and run_script()
commands actually consist of multiple Redis commands. Pipelining should make these multiple commands more efficient. Note that this is not a feature to expose pipelining the user.
Justification
Pipelining the run_model and run_script API calls should increase performance.
Implementation Strategy
This feature can be implemented with the pipelining features in underlying redis libraries.
Description
Assorted code cleanup:
The Python client returns None
instead of True
or False
.
Any Python client that runs key_exists()
, model_exists()
, or tensor_exists()
will run into this.
The function should return a boolean answer.
This bug applies to all systems and is an internal Python client error.
RedisAI supports ONNX as a backend. However, only PyTorch is tested in the SmartRedis unit tests: test_model_methods_torch.py. We should have tests that ensure ONNX functionality is correct in the client.
There needs to be SmartRedis tests that ensure ONNX functionality is implemented correctly.
The API functions below should be tested. A potential test case could use a sckit-learn random forest model. See RedisAI examples for scikit-learn examples.
Client.set_model_from_file()
Client.set_model()
Client.run_model()
Client.get_model()
Currently the Client
will look for the .tensors
metadata field in the DataSet
when Client.get_dataset()
is called. However, a check should be performed to ensure that the field exists. Currently, a runtime_error
is thrown if the field does not exist, meaning that a DataSet must contain tensors. This is not the intended behavior. A DataSet should be able to contain only tensors, only metadata, or nothing. The DataSet should have a user-facing function that can be used to check if a field exists in the DataSet. This can be used internally as well when Client.get_dataset()
is executed.
Perform a client.get_dataset()
call on a DataSet
that does not contain any tensors.
The DataSet should allow for only tensors, only metadata, nothing, or tensors and metadata.
This behavior affects all systems
Currently the CommandReply copy constructor and copy assignment operator are not implemented. However, as issue #1 is addressed, CommandReply should be fully implemented. While passing CommandReply by reference is encouraged, copy operations should be enabled and allowed.
This is a potential requirement for issue #1 as CommandReply objects are more broadly utilized for Python client commands. Additionally, if CommandReply is ever exposed to the user for custom command execution, all operators should be implemented.
Currently the inverse CRC16 calculation is used to distribute models and scripts to all of the Redis cluster nodes by prefixing each model name and script name with a two character hash tag that forces the key into a hash slot owned by the database node. However, certain prefixes calculated contain special characters (i.e. "}") that prematurely conclude the hash tag, resulting in erroneous placement. Currently this is avoided by picking a different hash slot for the database node. However, to achieve full control over key placement, we need to extend the inverse prefix calculation to all hash slots.
This is important for multi-key commands where we need to group multiple keys into the same hash slot, but do not have the flexibility to pick from a range of hash slots.
First an investigation into the frequency of problematic prefixes must be made. After that, a solution strategy can be formulated.
Setup a few workflows for github actions
We don't want to have to manually build pip wheels for every OS. We also don't want to include code that doesn't build or doesn't pass pylint or black standards. Automating this will be a huge improvement to our contributor workflow.
Write Github Actions for each of the above items. Eventually we will want a deployer as well.
Currently the copy
, delete
, and rename
operations for tensors and DataSet
are not implemented in the Python client. These need to be implemented.
This is a requested feature in the Python client.
These functions will follow the same behavior as the C++, C, and Fortran clients.
Description
Make the python client "pip-installable"
Justification
Users should be able to easily pull down the python client for rapid testing and usage. This will also simplify the usage of the SILC python client within the infrastructure library.
Implementation Strategy
Most likely, there will need to be some custom build step for the silc client. We currently build the client through a command in the makefile make pyclient
. This will need to be called during the pip install. For an example of how to do this, see how I did this for Arkouda Bears-R-Us/arkouda#153
Currently the DataSet
ack key is the set to the name of the DataSet
. In the case where the client is used in conjunction with SmartSim ensemble capabilities (e.g. SSKEYIN
and/or SSKEYOUT
are set), there will be an error because the ack key will take the form ensemble_name.dataset_name
while all other keys are of the form {dataset_name}.meta
or {dataset_name}.tensor_name
. On clusters, this will cause a CROSSSLOT error because the ack key will not be in the same hash slot as the other fields.
A short-term fix for this problem could be just to separate the manipulations of the ack key for rename and delete such that there is no CROSSLOT error. This works for the current implementation because DataSet
put and get commands internally are not pipelined, so we do not need to ensure everything goes to the same hash slot or database node. However, in the future we should have all DataSet
entities go to the same hash slot. Possible future (more robust) solutions include:
DataSet
metadata.If this either of the more robust fixes is implemented, the process for checking the existence of a DataSet will need to be changed.
Export the following variables before running the client_test_delete_dataset
test:
export SSKEYIN="test_ensemble"
export SSKEYOUT="test_ensemble"
This will produce terminating with uncaught exception of type sw::redis::ReplyError: CROSSSLOT Keys in request don't hash to the same slot
.
The ack key should work correctly with and without the ensemble functionality activated.
All systems
The Command
object and RedisServer
object were originally written for keyed commands. However, as the ability to execute commands like INFO
is added to the Python client (see #1 ), these objects need to handle non-keyed commands that might address a single database node or the entire cluster.
This is a prerequisite feature in order to be able to accomplish #1 which will give users better view of database characteristics and performance.
The design changes to Command
and RedisServer
should allow for the following functionality:
Description
setup_env.sh
does not set LD_LIBRARY_PATH
correctly for dependencies other than protobuf. We should construct the full path and then set LD_LIBRARY_PATH
.
Give the python client access to the redis command CLUSTER INFO
by exposing it through the Pybindings. Here is the documentation for CLUSTER INFO
. This is a portion of issue #1 .
Users will be able to obtain information about Redis Cluster vital parameters in the python client.
The C++ client has this functionality so just needs to be exposed through the Pybindings and pyclient.cpp
file to the python client layer. This will be a non-keyed command that takes in an address of a db node in the cluster. We may want to add a function to pick a db node that is active so that we don't have to initialize a new connection.
With the new project name, all references to SILC
/silc
must be changed to SmartRedis
/smartredis
.
RedisAI supports TensorFlow as a backend. However, only PyTorch is tested in the SmartRedis unit tests: test_model_methods_torch.py. We should
have tests that ensure TensorFlow functionality is correct in the client, specifically for input parameters that are specific to TensorFlow models (i.e. inputs
and outputs
in client.set_model()
).
There needs to be SmartRedis tests that ensure TensorFlow functionality is implemented correctly.
The following API functions should be tested:
Client.set_model_from_file()
Client.set_model()
Client.run_model()
Client.get_model()
inputs
and outputs
in client.set_model
) are implemented correctly for ensembles with prefixing for the above functions.As pipelines inside of SmartRedis are added to improve efficiency, code simplification would increase if CommandReply
could be constructed from sw::redis::QueuedReplies
and if the CommandReply []
operator could index into the sw::redis::QueuedReplies
. This would allow merging and reuse of code between pipeline and non-pipeline commands.
Internal pipelines improve client efficiency, and to accelerate the implementation of pipelines CommandReply
should work with QueuedReplies
.
All of the CommandReply
operators and constructors should work with QueuedReplies
.
Description
On some systems, when running the Fortran test suite, client_test_misc_tensor
fails and the following error is returned by RedisAI:
WRONGTYPE Operation against a key holding the wrong kind of value
How to reproduce
Running the Fortran test suite on some systems.
Notice that this can easily be reproduced by first storing a string variable and then trying to store a tensor with the same key. This is happening in the Fortran suite (across two tests), and some changes to the used keys is needed.
Expected behavior
The Fortran suite should run without errors.
System
Description
There are certain commands like AI.INFO
that would be great to have access to in the python client, but non-trival to include in the other compiled clients (C, C++, Fortran). For this reason, the custom command structure used from Redis Plus Plus should be exposed through the Pybindings such that the python client can call arbitrary Redis and RedisAI commands.
Justification
Users will be able to obtain useful information like INFO
, AI.INFO
keys
and other helpful Redis commands in the python client.
Implementation Strategy
The C++ client has this functionality and it just needs to be exposed through the pyclient.cpp
file to the python client layer.
The current error handling scheme consists of throwing exceptions whenever a problem is detected. While this works fine for C++ and Python clients, C and Fortran clients have no way to detect, catch, or interpret an exception. All exceptions will thus be immediately fatal. We need a softer approach that is suitable for these clients.
See above; there is essentially no ability to recover from errors in the C or Fortran client today
The standard approach in C libraries would be creating an enumeration of potential errors that can be returned as a response code to API calls, with something like ERROR_NONE to indicate that an operation proceeded to normal completion. The API layers could then catch exceptions thrown in the higher layers, translate, and return the error. The text associated with exceptions could be saved and returned through a new get_last_error_message() command. I believe that the Fortran client could be channeled through the C client interface to allow exception catching.
SmartRedis should have a base Dockerfile that users can look at and use as a template for including with their containerized jobs.
Any user wanting to run a containerized workload with SmartRedis will be able to use the Dockerfile to build an image and include that image in their container.
Write a DockerFile for SmartRedis that
/usr/local/lib
Description
Originally, bad_function
in test_errors.py
had only the line:
raise Exception
However, with the new version of RedisAI, we get an error from RedisAI saying:
undefined value Exception:
File "<string>", line 3
def bad_function(data):
"""Bad function which only raises an exception"""
raise Exception
~~~~~~~~~ <--- HERE
We got around this by return False from the function instead (the exception is not actually required by the tests. However, we need to understand why this is the case.
Currently we are pinning to RedisAI 1.0.2. However, there is a build failure in 1.0.2 related to dlpack not being pinned to v0.3. At this point, we should update to 1.2-RC1.
The examples
directory contains simple programs showing how one can use the client in each supported language. In particular, the C++ and Fortran examples use MPI, which is not used in tests.
The documentation does not mention the examples anywhere, and we want to fix this by adding the examples to the documentation, in such a way that if examples are changed, the documentation is automatically updated.
PyBind is not pinned to a version, and as a result, our github actions are failing on MacOSX. We should only pin to stable, final versions.
Any PR will fail the checks on MacOSX because of a changed in PyBind.
Github actions pass.
Description
Currently every compile test builds the entire client from source. It would be much more efficient to build the client as a library, and use that library for each client test. This would dramatically improve time to test the library
Right now we are not pausing between caught IOError attempts. This causes reliability issues when a database is momentarily unavailable because of high traffic. We should add a sleep to each catch block in RedisCluster.run()
and Redis.run()
. This sleep might be best as an adaptive function that waits longer each attempt.
Description
Add the ability to rename, copy, and delete DataSet objects from the C++ client.
Building the serial Fortran client example smartredis_put_get_3D.F90
located in examples/serial/fortran
with PGI compilers (nvfortran specifically) version 21.3.0 I get the following error message:
Scanning dependencies of target smartredis_dataset
[ 10%] Building Fortran object CMakeFiles/smartredis_dataset.dir/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/fortran_c_interop.F90.o
[ 20%] Building Fortran object CMakeFiles/smartredis_dataset.dir/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90.o
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
0 inform, 0 warnings, 1 severes, 0 fatal for get_meta_scalars_i32
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
0 inform, 0 warnings, 2 severes, 0 fatal for get_meta_scalars_i64
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
0 inform, 0 warnings, 3 severes, 0 fatal for get_meta_scalars_float
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
0 inform, 0 warnings, 4 severes, 0 fatal for get_meta_scalars_double
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
0 inform, 0 warnings, 4 severes, 0 fatal for add_meta_scalar_i32
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
0 inform, 0 warnings, 4 severes, 0 fatal for add_meta_scalar_i64
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
0 inform, 0 warnings, 4 severes, 0 fatal for add_meta_scalar_float
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
0 inform, 0 warnings, 4 severes, 0 fatal for add_meta_scalar_double
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
0 inform, 0 warnings, 4 severes, 0 fatal for add_meta_string
CMakeFiles/smartredis_dataset.dir/build.make:107: recipe for target 'CMakeFiles/smartredis_dataset.dir/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90.o' failed
make[2]: *** [CMakeFiles/smartredis_dataset.dir/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90.o] Error 2
CMakeFiles/Makefile2:96: recipe for target 'CMakeFiles/smartredis_dataset.dir/all' failed
make[1]: *** [CMakeFiles/smartredis_dataset.dir/all] Error 2
Makefile:102: recipe for target 'all' failed
make: *** [all] Error 2
Please note that the specific branch of SmartRedis was cloned with the following command git clone https://github.com/CrayLabs/SmartRedis.git --depth=1 --branch v0.1.1 smartredis-0.1.1
. The CMakeLists.txt
file was used to build the example code, with the only modification being that CMAKE_FC_COMPILER
is set to the desired nvfortran compiler path.
The error is produced on the ThetaGPU machine at ALCF. From a clean environment obtained executing module purge
, CMake version 3.19.5 and the NVIDIA HPC SDK version 21.3 are loaded. The CmakeLists.txt
file available in the example directory (ensuring nvfortran is used) is then used to configure and build the smartredis_put_get_3D.F90
example code, producing the compiler error pasted above. No PrgEnv is loaded since none is available on ThetaGPU.
The expected behavior is for the Fortran client to build correctly.
The README, while not factually incorrect, does not supply a whole lot of information. In addition, we should have the same badges we use for SmartSim available in the SmartRedis repository.
Give the python client access to AI.INFO
by exposing it through the Pybindings. Here is the documentation for AI.INFO
. It returns general information about the execution of a given model or script. This is a portion of issue #1 .
Users will be able to obtain the information from the AI.INFO
RedisAI command in the python client.
The C++ client has this functionality and it just needs to be exposed through the Pybindings and pyclient.cpp
file and to the python client layer. The implementation will be similar to the INFO
command implementation.
Description
When scaling on horizon, at 200 client nodes (80 clients per node) and 16 db nodes, we get an error thrown by the redis database:
bool cask_cudnn::compareHandles<cask_cudnn::ConvolutionShader>(cask_cudnn::ConvolutionShader const*, cask_cudnn::ConvolutionShader const*) ()
from /lus/cls01029/arigazzi/smartsim-dev/tm_improve/SmartSim/third-party/RedisAI/install-gpu/backends/redisai_torch/lib/libtorch_cuda.so
Decreasing the threads per queue to 1 seemed to fix this problem.
This was done with redisAI v1.0.2, and redis v6. Similiar issues were seen with the redisAI v1.2RC.
Description
Currently we throw std::runtime
errors for all internal C++ client errors, including those that result for redis-plus-plus and hiredis errors. This was done initially for convenience so that we could easily handle them in the Python client. However, as client functionality grows, we need better visibility for users and developers into the errors being thrown so that better exception handling can be performed.
Implementation Strategy
Custom error classes should be implemented in core C++ client, and those errors should be wrapped into the Python client binding.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.