Giter Club home page Giter Club logo

mftp's Issues

Dockerise

Dockerise the codebase in for easy deployment.

Authorisation (only via insti account)

So, here's the plan.

  • User provides their kgpian.iitkgp.ac.in email.
  • Receives an OTP and verifies the ownership of the email.
  • Creates a password and completes user creation.
  • Persist the session for 15 days (after this it will expire) and the user will have to relogin.

Security Perspective

  • Rate limit based either on anonymous session or client IP.
  • Associate an account only with 3 devices at max to prevent account sharing.

Cronjob as alternative to service

Implement mftp as a cronjob along with a continuously running service in infinite loop.

Basic idea

  • mftp cronjob disable
  • mftp cronjob enable
  • mftp cronjob enable 30

Why this? and Why not only this?

Cronjob will be best for:

  • Internal Servers.. not requiring OTP for login
  • Small devices like rpis.. limited in resources

Service will be best for:

  • External Servers.. requiring OTP for login
  • Having enough resources

Keyword based emails from MFTP

This is there as a TODO in the README.

I have understood most of the code now, and I think it will be easy to implement this now. How do you want to do this? This is what I have so far:

  • Have a new collection in the DB that has a list of keywords and the email ID to fire an email to
  • At every diffed notice, search the Notice text for the list of keywords and also the PDF attachment (using some PDF search library)
  • If the keyword matches, then send that email also, otherwise stop.

I have a feeling that all this will take some time? Say 10 users, 4 keywords per user, that's 40 things to search for in the PDF. I am not sure if there's a bottleneck here, and if there is what it is.

@amrav What do you think?

hosting the project with a web UI

Is it possible to host this project, so that people may actually use it?
I was thinking of creating a web UI where people can subscribe to get emails from CDC Notice Board. This will be different from the already existing CDC Notify app in that the attachments in the notice will be attached to the email.
I would like to know if this is a feasible goal.

Notice PDFs not opening

mftp/update.py

Line 22 in 3c69b05

ERP_ATTACHMENT_URL = 'https://erp.iitkgp.ac.in/TrainingPlacement/TPJNFDescriptionShow?filepath='

Something has probably been changed here (the URL of the notice PDFs maybe?), and PDFs in our mail are incorrectly encoded.

MFTP doctor

Get notified of any abnormality, basically a sanity check. Will add and remove a cronjob which keeps checking on the status of mftp and logs and reports when it detects any abnormality.

  • Implement mftp-doctor script
  • Dockerise it
  • Incorporate into script
    • mftp doctor enable
    • mftp doctor enable 10
    • mftp doctor disable

Note

Use ntfy for sending notifications of doctor.

PWA interface

Google groups suck (see this). Need to develop a Progressive Web App with following necessary features:

  • Chrome and Safari Support
  • Notifications
  • Attachment display
  • #78
  • Filtering. For example, Internship and Placement
  • Searching. Based on anything, like, company name, date etc. Basically notification content.

Move from Python 2 to 3

Heroku-18 (currently used and the only version to support python 2) has reached its End-of-Life.
For newer builds to take place we need to upgrade to Heroku-22. So a migration of code to the latest version of Python is necessary.

Personally tweaked features

Features available to individual hosters:

  • Apply feature (w/ and w/o ntfy)
  • Open companies list on start of the day
  • #8
  • Shortlist notifier
    • From Mail Body
    • From Attachment
    • Handle multiple instances of shortlists

Duplicate Mails

The duplicate mails are repeatedly sent with attachments of 0 bytes.

Switch From Ping Mechanism To Heroku Scheduler

Currently, we have a cron job on the metakgp digital ocean server that pings the URL https://mftp.herokuapp.com/ and hence fetches the new notices. This way, though has been working, leads to sometimes Heroku idling the instance of the application and no ping is able to reach or somehow the metakgp cron job fails.

Solution

It would be better to use the Heroku scheduler add-on and modify the script to work with it and set a periodic frequency for the script (or function) to run and send new notices. This would make us totally independent of the cron job as well.

Ref:

README update

Issue

  • The README still mentions the use of mongolab while setting up, this needs to be updated as we don't use that anymore

MFTP Revamp

The work is being done on the revamp branch.
This is the ToDo list in order of priority.

  • Revamp Notice fetching with - better logic and less overhead!
  • MongoDB integration
  • Beautiful enough!? email formatting
  • Sending Emails
  • Sending Emails over LAN (campus network)
  • MFTP as a service: Implement all of the functionalities of a service
  • Fix currently known BUGs:
    • Attachment missing randomly - Testing on production required
    • Emails not getting sent - Testing on production required
    • Power-cut and resetting of static IP configuration
    • False positive for the session status - says dead but was alive
    • MongoDB DNS resolution error, fails after powercut. This only fails because it's the first thing to do with internet.
  • Create a proper README.md
  • Error Handling
    • Extensive logging

Project dependencies may have API risk issues

Hi, In mftp, inappropriate dependency versioning constraints can cause risks.

Below are the dependencies and version constraints that the project is using

backports-abc==0.4
backports.ssl-match-hostname==3.4.0.2
beautifulsoup4==4.4.1
certifi==2015.11.20.1
docopt==0.4.0
futures==3.0.3
pymongo==3.4
requests==2.8.1
singledispatch==3.4.0.3
six==1.10.0
tornado==4.3
wheel==0.24.0
python-dotenv==0.5.1

The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict.
The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

After further analysis, in this project,
The version constraint of dependency pymongo can be changed to >=3.0,<=4.1.1.

The above modification suggestions can reduce the dependency conflicts as much as possible,
and introduce the latest version as much as possible without calling Error in the projects.

The invocation of the current project includes all the following methods.

The calling methods from the pymongo
bson.json_util.loads
pymongo.MongoClient.get_default_database
pymongo.MongoClient.close
bson.json_util.dumps
pymongo.MongoClient
The calling methods from the all methods
insert_from_file
further_defaulters.append
mc_old.get_default_database.notices.find
defaulters.append
start_database_export
pymongo.MongoClient.close
further_repeated.append
pymongo.MongoClient.get_default_database
open
mc_new.get_default_database.notices.insert
os.path.dirname
bson.json_util.dumps
pymongo.MongoClient
dotenv.load_dotenv
argparse.ArgumentParser.add_argument
bson.json_util.loads
argparse.ArgumentParser.add_mutually_exclusive_group
os.path.join
len
parser.add_mutually_exclusive_group.add_argument
f.write
argparse.ArgumentParser
argparse.ArgumentParser.parse_args
f.read
export_db
format
print
repeated_notices.append
insert_notice

@developer
Could please help me check this issue?
May I pull a request to fix it?
Thank you very much.

Filtered subscribers

Implement algorithm to send notifications to subscribers (topic in case of ntfy or separate google groups) based on certain filters like:

  • Internship
  • Placement
  • PPO

Change subject from "Notice: CV submission" to "Placement: CV submission"

It would be better if the subject is changed from Notice: CV Submission - Company to Internship: CV Submission - Company or Placement: Urgent - CV Verification.

This is possible since the first line mentions type:placement or type:internship. This would help reduce confusions amongst people sitting for placements who might confuse notices of internship as being of placements.

KeyError when parsing notices

MFTP currently fails to run, probably because of some changes to ERP's TNP noticeboard. The error log is:

Unhandled error occured : 
Traceback (most recent call last): 
  File "main.py", line 27, in func 
    update.check_notices() 
  File "/app/erp.py", line 78, in wrapped_func 
    *args, **kwargs) 
  File "/app/erp.py", line 93, in wrapped_func 
    func(session=session, sessionData=sessionData, *args, **kwargs) 
  File "/app/update.py", line 48, in check_notices 
    m = re.search(r'ViewNotice\("(.+?)","(.+?)"\)', a.attrs['onclick'])
KeyError: 'onclick'

Add attachments to the emails

A few of the notices on the ERP notice board have a PDF file attached to them. It would be nice to have these attachments also available in the MFTP forums.

[IMPROVEMENT] Failure Monitoring

Context

If and when #38 is implemented using Heroku Scheduler, we will need a better way to get notified when the service is not working. Here are key things to note:

  1. When we are running on the Heroku scheduler, there is a "very rare" chance of the job not running or getting missed.
  2. The above is taken care of by Heroku itself, and we do not have to worry about the same. So, the fault we should be notified of is when the script runs but fails.

Requirement

A localized (can pinpoint the exact place that threw the error) and minimal (so that the maintainers do not mark it as spam and multiple maintainers can be added). A good way would be to use the existing mailing mechanism setup using Mailgun REST API. We should also aim at keeping the configuration and reconfiguration overhead as low as possible, if feasible, we should pass it from Github itself.

NOTE: this should ideally be done after #38

Resources

When secret_answer entered by user is wrong, it should throw a better error

The r.history array becomes empty and thus this line throws this error:

Traceback (most recent call last):
  File "update.py", line 132, in <module>
    check_notices()
  File "path/erp.py", line 77, in wrapped_func
    r.history[1].headers['Location']).group(1)
IndexError: list index out of range

Instead, if before accessing r.history[1], it should ensure that r.history's size is at least 2 and if the size is less than 2, it should ask the user to check their secret answer settings (probably a typo in one of the answers)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.