Giter Club home page Giter Club logo

Comments (6)

bjosv avatar bjosv commented on June 24, 2024

I'm not sure that I understand what you are aiming for, but maybe the existing testcases can give some hints?
As I understand it redis-cli --pipe will just push the requests and count the responses until it gets a response of a echo,
maybe you can do something similar? See link to the redis-cli details.

from hiredis-cluster.

DaveLanday avatar DaveLanday commented on June 24, 2024

Hi @bjosv, Thank you for the reply. I apologize if I was unclear. Previously, I was using --pipe for mass insertion. I was simply piping the output of my awk script as the link you posted describes. Unfortunately, the redis-cli does not support --pipe when cluster mode is enabled, which is now a requirement for this project. I guess my basic question is if pipelining will give me similar functionality/performance of --pipe or if there is any tool in the hiredis-cluster library that would allow me to achieve similar results for pushing, say, ~2.4 million key/values. Thank you for pointing me to the testcases link. I have written some C/C++ code that calls my awk script and pipelines the output to redisClusterAppendCommand. I am hoping to test it today.

from hiredis-cluster.

zuiderkwast avatar zuiderkwast commented on June 24, 2024

Please try. It's interesting to see how hiredis-cluster handle it. What if there is a redirect (a failover or slot migration happens in the middle of the pipeline import), maybe it can handle it too? This is also intresting to find out.

If it works, it would be nice to have a simple program based on hiredis-cluster that can import data from a CSV file or something like that.

from hiredis-cluster.

DaveLanday avatar DaveLanday commented on June 24, 2024

@zuiderkwast I was able to get it to work for a small awk script that pushed ~2800 strings to a list.
I essentially collected the output of the script in a string. I then parsed the string. Each command in the string was separated by a newline character so it was simple to count and parse each command. I used a batch size of 1000. Did something similar to the following to collect my set of commands:

std::string result; // Stores the result from the pipeline
std::string delimiter = "\n"; // Each command will be separated by a newline
size_t pos=0; // Position of the next redis command
std::string rcmd; // Redis HSET command 
std::array<char, 5000> buffer; // Holds each line of the parsed file. We assume that each line will be atmost than 5000 characters in length
std::unique_ptr<FILE, decltype(&pclose)> pipe(popen(cmd, "r"), pclose); // Read from the open pipeline ...
if (!pipe) {
    throw std::runtime_error("popen() failed!");
}
while (fgets(buffer.data(), buffer.size(), pipe.get()) != nullptr) { // fgets is useful because it will stop when it reaches \n or EOF, we use this to store each line in a buffer array and then push the command to redis...
    result += buffer.data(); 
    tot_cmds += 1;
}
// Add a newline at the end of the results (handles EOF):
result+="\n";

Furthermore, my pipeliningloop was modeled after this hiredis example:

unsigned int remainder = tot_cmds % batch_size; // Use this to clean up any remaining commands (over/under the batch size)
  if ( tot_cmds > batch_size ){
      for ( i=0; i < tot_cmds/batch_size; i++ ) { // for each batch, do the following...
          n_cmd=0; // Reset the number of commands pushed for each batch ...
          for ( j=0; j < batch_size; j++ ) {
              pos = result.find(delimiter);
              rcmd = result.substr(0, pos);
              redisClusterAppendCommand(conn, rcmd.c_str()); // Push the current command to redis ...
              result.erase(0, pos + delimiter.length()); // We modify the string in place ...
              n_cmd+=1;
          }
          while ( n_cmd-- > 0 ) {
              status = redisClusterGetReply(conn, (void **)(void **)&reply);
              
              // Error handle the commands, we either didn't format it correctly, or the data could not be pushed for some other reason:
              if ( !reply || status==REDIS_ERR ) {
                  fprintf (stderr, "%s\n", conn->errstr);
                  tot_failures+=1;
              } else {
                  tot_successes+=1;
              }
              freeReplyObject(reply);
          }
      }
      if ( remainder != 0 ) {
          n_cmd=0;
          for ( i=0; i < remainder; i++ ) { // Handle any remaining
              pos = result.find(delimiter);
              rcmd = result.substr(0, pos);
              redisClusterAppendCommand(conn, rcmd.c_str());
              result.erase(0, pos + delimiter.length()); // We modify the string in place ...
              n_cmd+=1;
          }
          while ( n_cmd-- > 0 ) {
              status = redisClusterGetReply(conn, (void **)(void **)&reply);
              // Error handle the commands, we either didn't format it correctly, or the data could not be pushed for some other reason:
              if ( !reply || status==REDIS_ERR ) {
                  fprintf (stderr, "%s\n", conn->errstr);
                  tot_failures+=1;
              } else {
                  tot_successes+=1;
              }
              freeReplyObject(reply);
          }
      }
  } else { // If we don't have to worry about batching, just pipeline all the data because it should be safe to do so ...
      while ( (pos = result.find(delimiter)) != std::string::npos ) {
          rcmd = result.substr(0, pos);
          redisClusterAppendCommand(conn, rcmd.c_str());
          result.erase(0, pos + delimiter.length());
      }
      while ( tot_cmds-- > 0 ) {
          status = redisClusterGetReply(conn, (void **)(void **)(void **)&reply);

          // Error handle the commands, we either didn't format it correctly, or the data could not be pushed for some other reason:
          if (  !reply || status==REDIS_ERR ) {
              fprintf (stderr, "%s\n", conn->errstr);
              tot_failures+=1;
          } else {
              tot_successes+=1;
          }
          freeReplyObject(reply);
      }
  }

This isn't my full code, but simply a relevant reference. Below is the output I received from Elasticache:

CONNECTED SUCCESSFULLY!
Total Replies: 2285
 error replies: 0, successful replies: 2285
real	0m0.293s
user	0m0.102s
sys	0m0.066s

Also, just want to say @bjosv , I took your advice and changed my build environment from Debian. If you recall, I was trying to statically compile in debian image, then copy over the binary to a busybox:glibc image for run-time (keep it light weight ya know?). Unfortunately, since libdl.so.2 is involved (due to requiring openssl libs), the program isn't strictly static because it requires libdl.so.2 to be present on the machine and in the correct path. I kept running into an issue where even if I copied the shared library to the correct directory in the busybox image, the program would crash. I ended up building on Alpine linux docker image, which resulted in light weight build and run-time images and no errors!

from hiredis-cluster.

bjosv avatar bjosv commented on June 24, 2024

Great that you managed to solve the image issues!

from hiredis-cluster.

DaveLanday avatar DaveLanday commented on June 24, 2024

UPDATE: So, my example above seems to work for RPUSH commands to the same list object, but HSET commands are throwing a Segfault. I am trying to debug

from hiredis-cluster.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.