Giter Club home page Giter Club logo

Comments (12)

msalle avatar msalle commented on July 29, 2024 1

Hi Maarten, all,

I tend to agree with you. And we should certainly focus on security fixes and the really urgent fixes. But if it is easy to change the default and add a flag, I think we could do that but I'm worried it could be quite involved to find exactly how and where to do that. It could be quite deeply inside the GSI code itself.

from gct.

maarten-litmaath avatar maarten-litmaath commented on July 29, 2024

Hi all,
I suspect this is a leftover from an old potential use case
in which the server would act as a hub for another one
and thus need to have the credential at its disposal.

I do not think we have such a use case today and I agree
we then should remove that unnecessary work.

I suggest we just comment out the code for now and
not make it optional. Then see if anyone complains... :-)

from gct.

msalle avatar msalle commented on July 29, 2024

You're saying no-one is doing 2rd party transfers any more? It would be necessary for those I'd say?

from gct.

maarten-litmaath avatar maarten-litmaath commented on July 29, 2024

Hi all,
I assumed 3rd party transfers would e.g. go through FTS
instead of the globus-url-copy CLI, but Mischa is right:
we cannot know that. We then would need to consider
making the CLI smarter, as only it knows whether the
server will be used for a 3rd party transfer or not.

from gct.

paulmillar avatar paulmillar commented on July 29, 2024

In case this isn't clear, FTP (and, by extension, GridFTP) supports third-party transfers without requiring delegation. This is because FTP has a separate TCP connections for the "control" and "data" channels.

Somewhat simplified, the procedure is:

  • The client (e.g., FTS) tells destination "be passive; prepare to receive data"
  • The destination server replies "OK, send me the data by connecting to IP address a.b.c.d and on TCP port x".
  • The client (FTS) then tells the source server "be active; send me data, connecting to IP address a.b.c.d and TCP port x".
  • The source server then connects directly to the destination server, and transfers the data.

The port number is used as a secret, with which to authorise the transfer, the transfer is not otherwise protected.

If the data channel is encrypted (via TLS), then it would be possible for the data channel to have mutual authentication, with the data-channel client (the source service, in the above example) using an X.509 credential during the TLS handshake. This would require the data-channel client to have some kind of X.509 credential with which to authenticate, which could be the user's credential (if the client delegated) but it could also be something else, such as the host credential.

On a more practical point, I think the easiest solution would be to make delegation optional, and disabled by default.

I suspect the vast majority of globus-url-copy commands are for direct (2nd party) transfers. I don't know of any real deployments where these require delegation.

The default could be made more sophisticated; for example, enabled if the command is a third-party transfer with a secure data channel and disabled otherwise.

The underlying problem is that GSI delegation is just broken. It happens too early and is controlled by the wrong agent. In general, only the server (not the client) knows whether or not delegation is needed for a particular operation. Moreover, the server can only say whether delegation is needed once it knows what the client wants to do. The happens only after the GSI handshake has completed. :-(

IMHO, the correct approach would be to define a specific error code to mean "requires delegation" and have separate FTP commands to allow the client to delegation a credential. However, I don't think anyone has the energy to implement this.

from gct.

maarten-litmaath avatar maarten-litmaath commented on July 29, 2024

Hi Paul,
why would only the server know that? The client knows if the arguments
that were supplied denote 2 servers --> 3rd party transfer...

from gct.

paulmillar avatar paulmillar commented on July 29, 2024

Hi Maarten,

Yes, the client knows whether it is asking for an encrypted third-party transfer, but it still doesn't know whether the server needs the client to delegate the credential.

Here are some counter-examples where the client is requesting an encrypted third-party transfer, but it do not require delegation:

  • The data-channel server does not require any client credential during the TLS handshake.
  • The data-channel server will accept the data-channel client's own credential.
  • The data-channel client has previously received (and cached) a delegated credential from this user, which is still valid.
  • The operation has (or will) fail; e.g., user is not authorised, no such file.
  • The data-channel client can authenticate with its own (server) credential and the DN of the user. The data-channel server accepts the DN because of a strong trust relationship.

The last example is a specific example of a more general scenario, where delegation is avoided by having a high level of trust between the two services. Globus (Online)'s sharing use-case is one example. xcache can also support something similar for accessing embargoed data.

As you mentioned earlier, there's also the counter example where "normal" (non-third-party transfers) could require delegation: the FTP server is acting as a proxy in front of some other service(s), which requires authentication. Therefore (almost?) all operations would require delegation.

So, in general, only the server will know whether delegation is needed. The client can make intelligent guesses, but they are still only guesses.

(As an aside, this server-tells-client-when-to-delegate model is how delegation works in HTTP-TPC: the client attempts a transfer and, if a credential is needed, the server tells the client that it must delegate and then retry the HTTP-TPC request).

from gct.

maarten-litmaath avatar maarten-litmaath commented on July 29, 2024

Hi all,
we need to be careful with the efforts we still put into GridFTP
while we know it will have become irrelevant in WLCG and for
many other VOs, particularly big ones, in the next 1-2 years.

We have survived with this misfeature for 15+ years at scale
--> we probably can tolerate such cases for a few more,
particularly at much lower scales than before.

Hence I propose we just do nothing about this. Comments?

from gct.

fscheiner avatar fscheiner commented on July 29, 2024

@paulmillar @msalle @maarten-litmaath
Another use case for guc delegating credentials is most likely the (experimental) multicasting feature of the Globus GridFTP server, specifically when instructing receiving servers to forward data to other servers. See https://gridcf.org/gct-docs/latest/gridftp/user/index.html#gridftp-user-experimental and maybe this paper for details.

from gct.

paulmillar avatar paulmillar commented on July 29, 2024

I also found more details in GCT v6.2 GridFTP : Developer’s Guide.

@fscheiner Do you know of a community that's using GridFTP Multicasting? It's certainly an interesting idea, albeit one with some drawbacks.

IIRC, the client (guc) chooses whether to delegate, by sending a 0 or a D over the established TLS connection, as part of the GSI handshake. So, if the intention is to support GridFTP Multicast then the client could delegate when multicasting, and refrain from delegating if no multicasting is expected.

from gct.

fscheiner avatar fscheiner commented on July 29, 2024

I also found more details in GCT v6.2 GridFTP : Developer’s Guide.

Thanks for that. I seemed to remember that multicasting illustration, but couldn't remember where it was used in the documentation, so referenced the paper about this functionality instead.

@fscheiner Do you know of a community that's using GridFTP Multicasting?

Actually not. But I could imagine it could be a good way to distribute images to a webserver cluster.

It's certainly an interesting idea, albeit one with some drawbacks.

What drawbacks do you see? I guess it would be more efficient to replicate the data packets on the routers, though I assume this won't work when the data channel is encrypted and it would also nullify the possible network overlay which would be useful to "connect" GridFTP servers located in private networks to GridFTP servers located in public networks. And w/o local writes on the intermediate servers, this is more efficient than what I enabled with multi-step transfers with gtransfer.

IIRC, the client (guc) chooses whether to delegate, by sending a 0 or a D over the established TLS connection, as part of the GSI handshake. So, if the intention is to support GridFTP Multicast then the client could delegate when multicasting, and refrain from delegating if no multicasting is expected.

I.e. delegation when using (1) 3rd party transfers and/or (2) multicasting.

from gct.

paulmillar avatar paulmillar commented on July 29, 2024

Well, the main drawback is that the client doing the upload needs to know where all the places where the file should be written. In some cases, that might make sense (e.g., a CDN-like data placement model), but others (e.g., a grid job that's just completed) it not necessarily the best approach.

Of course, the client could call out to some external service to learn where the files should be placed. However, in that case, the external agent could manage the transfers itself (e.g., Rucio) without involving the client. There's not much benefit to this multicast solution.

Some features become a little harder and perhaps a little more fragile. For example, if there was a data placement policy of "one copy in Europe (don't care which site) and one copy in the USA (don't care which site)" then it would require an early binding (this job knows to send its output to Fermilab and DESY; the next job knows to send its output to MIT and KIT, ...) It would be hard to change this binding once the job starts without the client calling out to an external service.

Another problem I see is how are errors handled. Suppose a transfer fails, how is this reported back? Which agent (if any) should retry the transfer? This might be described in the paper, but I didn't see it while skimming through.

Another aspect is that file-level checksums are usually calculated over the entire file's contents. That means that, if you care about data integrity, you will probably have to implement a store-and-forward model, to be sure there's no data corruption. Relying on TLS or using mode-X do offer alternatives, albeit with some limitations. So, if the implementation uses the store-and-forward model then I think almost all the benefits are gone: the same thing could be achieved by managing the transfers with an external data-placement service.

IIRC, there are some cool projects that implement multicast file delivery, which use FEC to achieve some reliability. Although the kind of multi-hop file delivery that Globus developed is certainly useful in some use-cases, I think you get more-or-less the same benefit using a higher-level data placement service (like Rucio). That approach also gives the users an additional level of flexibility (error-handling, dynamic changes to placement rules, etc).

As usual, this is just my 2c-worth :-)

from gct.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.