Giter Club home page Giter Club logo

Comments (25)

nicolas33 avatar nicolas33 commented on August 29, 2024

imapfw is not ready to download emails.

I'd be happy to know more on your use case to get a bigger picture of your expected use case. What do you mean by "hook a module"? What module(s) would you hook?

from imapfw.

Rafiot avatar Rafiot commented on August 29, 2024

My idea is the following:

  • The users install and configure imapfw to connect to their imap server
  • The users configure their email client (Thunderbird, outlook,...) to connect to imapfw
  • imapfw acts as a transparent proxy for all messages except for the ones with attachments. In that case, it gives the whole source of the message to my script that check the sanity of the attachment and returns a message (after having changed the attachment if needed)

The other thing I'm not sure about is how an IMAP server would handle a message modified by the client. Do you have an idea?

from imapfw.

nicolas33 avatar nicolas33 commented on August 29, 2024

Thanks.

The other thing I'm not sure about is how an IMAP server would handle a message modified by the client. Do you have an idea?

emails are mapped with their UID. Changing the email invalidates the UID. So, the changed email must be removed from server and re-uploaded as a new email.

from imapfw.

nicolas33 avatar nicolas33 commented on August 29, 2024

Forgot to say our (WIP) documentation is available on the website

from imapfw.

Rafiot avatar Rafiot commented on August 29, 2024

Thanks, I'll look at the doc and the code.

Do you think having such a module is doable with imapfw in the future? I'm comfortable working on a project under heavy development but I simply want to make sure it is possible before I start allocating time on it.

from imapfw.

nicolas33 avatar nicolas33 commented on August 29, 2024

I think it is possible in two ways:

  • if you need the "real" IMAP server to take changes into account, the proxy has to propagate the changes (download, make the change, delete old on server, create new on server).
  • if there's no need for the IMAP server to be aware of the changes, the proxy could be an IMAP server like Dovecot or internal (must be implemented). In this case, imapfw would sync both IMAP servers (and possibly make changes on the emails). Since Dovecot works on Maildir, imapfw could also sync the "real" IMAP server to the Dovecot database (Maildir).

from imapfw.

nicolas33 avatar nicolas33 commented on August 29, 2024

There another possible way: write a real proxy. This would be the hard path but very interesting, BTW.
In this case, imapfw should create a socket and allow triggers on IMAP requests.

from imapfw.

Rafiot avatar Rafiot commented on August 29, 2024

Solution 1 seems to be the best one in my case (I don't really understand the difference with solution 3).

The goal is to be able to propose such a solution for occasional users of webmails too, so the changes need to be somehow synchronized back to the "real" IMAP server.

from imapfw.

nicolas33 avatar nicolas33 commented on August 29, 2024

There are different levels a proxy could act:

  • high-level (solution 1): the proxy exposes/tunes emails "on demand" when they are first downloaded and propagates changes back on "real". This could mean long waiting time responses for the client since it has to wait for a new UID (re-upload) on updates. In this case, discovering new emails is triggered by the client. The proxy implements almost a full IMAP server but the emails aren't stored locally. The database is the "real" server. In this case, the provided UIDs could be different from "real".
  • high-level (solution 2): imapfw syncs emails to a local maildir and then exposes them via IMAP which requires local disk space to store the emails. Discovering and updating emails is done by the proxy without connected client.
  • low-level (solution 3): proxy works on IMAP requests and commands. The proxy doesn't interpret the IMAP requests from the client. They are blindly relayed to the server. The exposed UIDs are the "real" UIDs. However, downloaded emails can be updated and propagated back on "real" before they are exposed to the client. This would mean long delays for plenty of IMAP commands because any discovered UID requires each email to be checked first in case they need changes. This likely requires a local database of known (already checked) UIDs and regexp on IMAP commands from the client.

There could be other ways of working for your proxy. For example, a monitor could regularly request "real" for new emails so they are checked and updated if needed. The proxy would only expose pre-validated UIDs.

from imapfw.

nicolas33 avatar nicolas33 commented on August 29, 2024

BTW, I wonder the update on emails is best done on the server at delivery time (MDA). I think most IMAP servers allow to do this kind of things. ,-)

from imapfw.

Rafiot avatar Rafiot commented on August 29, 2024

Definitely, a postfix script will also happen, and it is the cleanest way, but the goal is to support user with no specific knowledge, infrastructure or support team at-hand (webmail users working in small organisations receiving all kind of ransomwares).

Just to make sure I got it right:

(everytime I say mail client, I mean Thunderbird/Outlook/...)

Solution 3

The mail client uses imapfw as an actual proxy and connects to it to get the emails, imapfw is the only one connecting to the IMAP server.
Every email passing through is send to the sanitizing module. If it is has an attachment, it is sanitized (optionally: the original email is sent in quarantine), the sanitized email is tagged as sanitized, pushed back to the server and passed to the email client.

Solution 1

imapfw acts as a mail client and modify the emails on demand

Downside: the mail client still connects to the remote IMAP server so it will still receive unprocessed malicious attachments if imapfw didn't had time to update the email.

Solution 2

imapfw does the the same as solution 1 but with a local storage.

Downside: If the email clients connects directly to imapfw, it will only see the emails in the local storage and not the ones on the server.

Solution 3 is most definitely the best one, because even if it is a bit slower at fetching the emails, the mail client will still do all the caching it was doing before (let's say the last 30 days and all the subjects) so the extra hop isn't critical.

from imapfw.

nicolas33 avatar nicolas33 commented on August 29, 2024

Solution 3

The mail client uses imapfw as an actual proxy and connects to it to get the emails, imapfw is the only one connecting to the IMAP server.

Same goes for 1 & 2.

Every email passing through is send to the sanitizing module. If it is has an attachment, it is sanitized (optionally: the original email is sent in quarantine), the sanitized email is tagged as sanitized, pushed back to the server and passed to the email client.

Same goes for 1 & 2.

Solution 1

imapfw acts as a mail client and modify the emails on demand

imapfw is an IMAP client in all the alternatives.

Downside: the mail client still connects to the remote IMAP server

No, the mail client connects to the proxy.

so it will still receive unprocessed malicious attachments if imapfw didn't had time to update the email.

Same goes for 2 & 3.

Solution 2

imapfw does the the same as solution 1 but with a local storage.

No, it doesn't do the same as solution 1. Solution 1 is about using IMAP as a language for the remote database. Solution 2 is about syncing both server and proxy.

Downside: If the email clients connects directly to imapfw, it will only see the emails in the local storage and not the ones on the server.

True, the latest emails on the server must be processed at regular intervals to update the local and remote databases.

Solution 3 is most definitely the best one, because even if it is a bit slower at fetching the emails, the mail client will still do all the caching it was doing before (let's say the last 30 days and all the subjects) so the extra hop isn't critical.

Caching can be done with solution 1, too.

However, I don't think solution 3 will be "a bit" slower. I think this can become a lot slower. For example, if the mail client only requests for the list of UIDs, the proxy must first download ALL the unkown emails to process them and then return the correct list of numbers.

I don't know which option is the best. I'd say it depends on what users expect. Solutions 1 and 3 are hard because IMAP is client side while the purpose is to apply changes on the server. Each solution has downsides. Proxying IMAP is "easy" as long as no modifications are made on the emails.

from imapfw.

Rafiot avatar Rafiot commented on August 29, 2024

Ok, I understand now.

What the users expects is to receive their messages in their email client and not changing their habits. They also have multiple devices (PCs, phone, ...) and use a webmail from time to time so having a local cache isn't the goal, and we need to sync the changes back to the server (so the other clients also have the sanitized version).

from imapfw.

Rafiot avatar Rafiot commented on August 29, 2024

Now a very practical question: is it something you think will be doable with imapfw in a near-ish future? Or should I look at an other library?

I'd very happy to participate to the development but right now, I don't really understand where I should look at in the code, as the framework seems very extensive.

from imapfw.

nicolas33 avatar nicolas33 commented on August 29, 2024

I can't tell how much time this would require since it depends on contributions (myself included). Also, this depends on your own knwoledges of Python and how "production ready" you expect it.

For now, imapfw is still early stage so you should expect to write quite some code. OTOH, this means you have more degrees of liberty to implement what you want.

I think imapfw has the best extensibility compared to any other library due to the design and the Python metaprograming capabilites.

I'd say imapfw can be a good long-time solution if you have enough time to spend on the code.

For a starter, I'd first look the screencast. Next, you should look at the code and request me on gitter. You can ask any question, as much as you want, so you can get a better picture of the current state and have a better overview of what you could do with imapfw.

from imapfw.

nicolas33 avatar nicolas33 commented on August 29, 2024

What the users expects is to receive their messages in their email client and not changing their habits. They also have multiple devices (PCs, phone, ...) and use a webmail from time to time so having a local cache isn't the goal, and we need to sync the changes back to the server (so the other clients also have the sanitized version).

Pushing back improves safety but most email clients will need 2 different accounts (one for the real remote and another for the proxy) so that the local caching of the clients won't be usefull while switching between both.

Whatever the solution, accessing the real would expose to un-checked emails.

from imapfw.

Rafiot avatar Rafiot commented on August 29, 2024

Great, I'll dig into imapfw more and look for a way to implement it. I have decent skills in python so I'll definitely contribute as much as I can.

FYI, I wrote a quick&dirty script that takes a mail as input and returns a sanitized version: https://github.com/Rafiot/PyCIRCLean/blob/mail/bin/mail.py

It is far from being production ready, but this is the idea.

Regarding the 2 different accounts, I still don't get it :/

To me, it should work that way:
imapfw

Of course, if the users uses any other device at the same time they use the proxy, they will get the un-sanitized version but as soon as the sanitizing is done, the only version that stays on the server is the sanitized one.
And on the machine where the proxy is running, they only get to see the mail when the sanitizing is done.

from imapfw.

nicolas33 avatar nicolas33 commented on August 29, 2024

sanitize

from imapfw.

nicolas33 avatar nicolas33 commented on August 29, 2024

Things are worse because the clients can connect more than once and this should not trigger twice the same checks.

from imapfw.

nicolas33 avatar nicolas33 commented on August 29, 2024

The more I think about this, the more I'm convinced you should use both a proxy and a monitor. The proxy would only hides unchecked UIDs while the monitor (IDLE mode?) would sanitize the emails.

sanitize-proxy-monitor

from imapfw.

Rafiot avatar Rafiot commented on August 29, 2024

Okay, my idea was to have no database at the proxy's level and just look at the content of each email passing through, but your approach is probably more efficient.

I would still prefer, or at least have the possibility, to look at the original email in a quarantine folder but that's a detail at that point.

from imapfw.

nicolas33 avatar nicolas33 commented on August 29, 2024

Okay, my idea was to have no database at the proxy's level and just look at the content of each email passing through, but your approach is probably more efficient.

But you have to know which emails are already checked to avoid scanning them more than once.

I would still prefer, or at least have the possibility, to look at the original email in a quarantine folder but that's a detail at that point.

This is something I'd suggest at some point. Blindly trusting a sanitizer is crazy. ,-)

from imapfw.

Rafiot avatar Rafiot commented on August 29, 2024

Sounds great, I now have a beta version of the mail parsing script: https://github.com/CIRCL/PyCIRCLean/blob/mail/bin/mail.py (it needs some refactoring)
I tested it on junk mails (~50k) and it works properly. Now we need to get the proxy together :)

Can you tell me what imapfw can do and can't do right now based on your last graph? This will help me to prepare my roadmap.

from imapfw.

nicolas33 avatar nicolas33 commented on August 29, 2024

You should look at the code. For IMAP sessions, see https://github.com/OfflineIMAP/imapfw/blob/master/imapfw/imap/imap.py#L108

from imapfw.

Rafiot avatar Rafiot commented on August 29, 2024

very simple script to process a directory of emails: https://github.com/Rafiot/imapfw/blob/msghook/rascals/dev.messagehook.rascal

from imapfw.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.