Giter Club home page Giter Club logo

Comments (13)

msalle avatar msalle commented on September 3, 2024

Hi Jose,
first of all, the package names you try to install are not correct. I'd say you need for example globus-gridftp-server-progs.
https://packages.ubuntu.com/search?suite=focal&searchon=names&keywords=globus
for the Ubuntu package names. I don't know what type of service you're trying to install so can't really say which ones you'd need.
Secondly, the links in https://gridcf.org/gct-docs/6.2/admin/quickstart/index.html#q-prereq are wrong. Thanks for pointing that out, we'll make an issue of that and fix it.
Thirdly, for installing from source, you should use the source installer, look for Download URL for GCT full source tarball on the release page https://github.com/gridcf/gct/releases/tag/v6.2.20210826 Starting from that tarball should work fine, no need to run autoreconf, you can just run directly configure
You can also do a git clone and checkout the tag, but the zip or tarball from the release page is currently not supported due to this issue you bumped into.

from gct.

msalle avatar msalle commented on September 3, 2024

Actually, for the first point see gridcf/gct-docs#14 (comment)

from gct.

ortegajosant avatar ortegajosant commented on September 3, 2024

Hi thank you for your response,
So, is it necessary to install all the packages listed in https://packages.ubuntu.com/search?suite=focal&searchon=names&keywords=globus?
So I guess installing globus-gram-job-manager globus-gridftp-server-progs globus-simple-ca globus-gsi-cert-utils-progs myproxy myproxy-server myproxy-admin will install all the dependencies needed or should I add another?
I think that adding this updated information to the documentation could also help to avoid confusion.

from gct.

maarten-litmaath avatar maarten-litmaath commented on September 3, 2024

Hola Jose,
please be aware that the GCT is maintained at best-effort level and for some components we do not really have the expertise to help debug things. What is more, the biggest users of this middleware are steadily moving away from it --> the signs are not good for new projects to start depending on it. You may want to look for alternative SW stacks to implement the functionality you need...

from gct.

ortegajosant avatar ortegajosant commented on September 3, 2024

Hola Jose, please be aware that the GCT is maintained at best-effort level and for some components, we do not really have the expertise to help debug things. What is more, the biggest users of this middleware are steadily moving away from it --> the signs are not good for new projects to start depending on it. You may want to look for alternative SW stacks to implement the functionality you need...

Oh, that's good information, thank you for that.

from gct.

ortegajosant avatar ortegajosant commented on September 3, 2024

@maarten-litmaath Do you know which SW they are migrating to?

from gct.

maarten-litmaath avatar maarten-litmaath commented on September 3, 2024

Hola Jose,
the biggest users are the CERN LHC experiments on WLCG plus related communities on EGI, OSG and NorduGrid. Years ago we already stopped using GRAM and today we rely on HTCondor and ARC for grid jobs. The GridFTP protocol has mostly been phased out on WLCG in the last 12 months, being replaced with HTTP+WebDAV and Xrootd. That leaves us with GSI and MyProxy for X509, which we have started replacing with JWT for job submissions, but it will take a few years still to weed out X509 completely. Now, the GCT is being maintained by just a few people whose organizations still need to care about some parts of this SW for the time being --> as soon as the SW becomes mostly irrelevant for a contributing institution, its support efforts will probably be stopped.

What are you trying to set up?

from gct.

ortegajosant avatar ortegajosant commented on September 3, 2024

Hi! Currently I'm working on a project to process a lot of tasks, something like "serverless" with a lot of containers, and it requires to read and write a lot files continuously, most of them are small actually, so we want to improve the data transmission, now I don't know if using something like "gridftp" will help 🤔

from gct.

fscheiner avatar fscheiner commented on September 3, 2024

@ortegajosant:
That depends. Maybe first define what your file size range is and what distribution you expect for a fixed number of files. Also what transfer performance do you want to achieve and what is your hardware capable of?

The GridFTP server (globus-gridftp-server) and client (globus-url-copy aka "guc") as provided by the GCT offer a variety of options to optimize transfers of small files:

  • data channel caching (guc option -fast) - this allows to keep up already existing data channel connections over the transfer of multiple files. Which effectively hides the latency for connection establishment, which would otherwise add up for lots of small files.
  • pipelining (guc option -pp or -ppq n to also set queue depth IIC) - this allows to initiate the transfer of n files in a single operation w/o waiting for the transfer of each file to complete before initiating the transfer of the next file, effectively hiding latency due to control channel communication. NOTICE: I do not recommend this functionality, as it doesn't work well with the optional reliability functionality which logs the transfer progress and allows to continue a failed transfer at the point it was interrupted according to the logged progress. Another point against it is that the "pipelined" file transfers happen sequentially, whereas when using concurrency (see below) they happen in parallel.
  • concurrency (guc option -cc n) - this also allows to initiate the transfer of n files in a single operation w/o waiting for the transfer of each file to complete before initiating the transfer of the next file. But it works differently compared to pipelining in that it opens a control channel for each file to be transferred at once. As the connection establishment happens at the same time, the latency is effectively hidden for multiple files. In contrast to pipelining this does work well with the optional reliability functionality.

By default, GridFTP data channels are not encrypted nor checksummed. If you need checksumming and encryption expect a drop in performance as this requires more CPU power. During my testing in a past project on a 10 Gbps capable connection between Stuttgart and Karlsruhe using six K10 Opteron cores against eight E5 Xeon cores (i.e. 16 hardware threads) - sorry, don't remember the clock rates, but neither the fastest nor the slowest specs - we achieved:

  • 868 MiB/s (using -cc 16 -tcp-bs 4M -p 8 -fast) - no checksumming nor encryption
  • 583 MiB/s (using -cc 16 -tcp-bs 4M -p 16 -fast -cd -dcsafe) - with checksumming enabled
  • 280 MiB/s (using -cc 16 -tcp-bs 4M -p 16 -fast -cd -dcpriv) - with encryption enabled

...for transferring 32 x 20055224320 Byte sized files (i.e. roughly 598 GiB in total). More modern hardware can of course allow for better performance, reducing the drop.

For testing GridFTP performance I recommend using my own tool tgftp (a wrapper for guc) as it allows for batch testing. This includes per test pre- and post command execution for e.g. gathering system and configuration info (like hardware information or congestion control protocol, etc.) before testing or processing log file values after testing, etc..

And If you prefer a more advanced client than guc, have a look at gtransfer which is a wrapper around tgftp and provides many useful features, like defining guc options per connection or per file size class or bash completion (incl. remote directory browsing) or host aliases to avoid the need to provide "full" URLs for source and destination addresses. In addition any data transfer can be interrupted (either voluntarily by hitting Ctrl+C or due to something out of your control) and continued from where it stopped afterwards by re-issuing the very same command.

from gct.

fscheiner avatar fscheiner commented on September 3, 2024

Should be fixed with gridcf/gct-docs#27.

from gct.

fscheiner avatar fscheiner commented on September 3, 2024

@ortegajosant:
Is your initial request solved with the changes from gridcf/gct-docs#27 in place now? Please see https://gridcf.org/gct-docs/6.2/admin/quickstart/index.html again for the updated content.

from gct.

ortegajosant avatar ortegajosant commented on September 3, 2024

Thank you @fscheiner

from gct.

fscheiner avatar fscheiner commented on September 3, 2024

You're welcome.

from gct.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.