roznoshchik / lurnby Goto Github PK

A tool for active reading and personal knowledge management

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 0.07% Python 27.09% JavaScript 21.70% HTML 25.40% CSS 7.82% Shell 0.02% Mako 0.03% SCSS 17.86% Procfile 0.01%

reading spaced-repetition-system personal-knowledge-management personal-knowledge-system

lurnby's Introduction

Hi there I'm RR 👋

I work in Javascript and Python. I use Flask/Django, or Koa / Express / NestJS on the backend. I work with SQL using SQLAlchemy, Sequelize, Mikro-ORM, Prisma and Mongodb using Mongoose. On the frontend, I work with Angular, React, jQuery, Vanilla JS, and SCSS. I have a love/hate relationship with typescript.

I am passionate about building things from the ground up and love constantly learning new things. I prefer working with mission driven organizations that are actively working to improve life for planet and people.

In my spare time I am addicted to playing beach volleyball, sleep, and coding interesting side projects. In a past life I was a Product Manager, Researcher, and Illustrator.

lurnby's People

Contributors

Stargazers

Watchers

Forkers

nicholascgilpin saifrahmed syllogy pushpen kinddevil largo shiyong8101 siddeyg kinkir vivtek onuratakan ermolalex agmrozek saidctb kevintruong mclion noc3ur

lurnby's Issues

Markdown support for all text area inputs.

Would be nice to add markdown support to the different text area inputs in the site.

PDF parsing is poor

The current pdf library leaves a lot to be desired.

It only works for simple pdfs with plain images And text.

Anything more complex that has graphs, charts, etc, comes through very poorly.

One idea is to just work with Pdfs as images. And then possibly do an OCR on the text content.

But there is a lot that needs to be Explored there to render things properly so that it works with lurnby.

DB Connection Timeout Issue

There is an issue when if you leave a screen open for too long, the database connection will close before the app connection closes. If someone tries to do something on that page, they will get an error that looks like this:

The log for this issue is:

Exception on /app/articles [GET]
Traceback (most recent call last):
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1283, in _execute_context
   self.dialect.do_execute(
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
   cursor.execute(statement, parameters)
psycopg2.OperationalError: terminating connection due to administrator command
SSL connection has been closed unexpectedly


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
 File "/app/.heroku/python/lib/python3.9/site-packages/flask/app.py", line 2447, in wsgi_app
   response = self.full_dispatch_request()
 File "/app/.heroku/python/lib/python3.9/site-packages/flask/app.py", line 1952, in full_dispatch_request
   rv = self.handle_user_exception(e)
 File "/app/.heroku/python/lib/python3.9/site-packages/flask_cors/extension.py", line 165, in wrapped_function
   return cors_after_request(app.make_response(f(*args, **kwargs)))
 File "/app/.heroku/python/lib/python3.9/site-packages/flask/app.py", line 1821, in handle_user_exception
   reraise(exc_type, exc_value, tb)
 File "/app/.heroku/python/lib/python3.9/site-packages/flask/_compat.py", line 39, in reraise
   raise value
 File "/app/.heroku/python/lib/python3.9/site-packages/flask/app.py", line 1948, in full_dispatch_request
   rv = self.preprocess_request()
 File "/app/.heroku/python/lib/python3.9/site-packages/flask/app.py", line 2242, in preprocess_request
   rv = func()
 File "/app/app/__init__.py", line 75, in before_request_func
   if current_user.is_authenticated:
 File "/app/.heroku/python/lib/python3.9/site-packages/werkzeug/local.py", line 432, in __get__
   obj = instance._get_current_object()
 File "/app/.heroku/python/lib/python3.9/site-packages/werkzeug/local.py", line 554, in _get_current_object
   return self.__local()  # type: ignore
 File "/app/.heroku/python/lib/python3.9/site-packages/flask_login/utils.py", line 26, in <lambda>
   current_user = LocalProxy(lambda: _get_user())
 File "/app/.heroku/python/lib/python3.9/site-packages/flask_login/utils.py", line 346, in _get_user
   current_app.login_manager._load_user()
 File "/app/.heroku/python/lib/python3.9/site-packages/flask_login/login_manager.py", line 329, in _load_user
   user = self._load_user_from_remember_cookie(cookie)
 File "/app/.heroku/python/lib/python3.9/site-packages/flask_login/login_manager.py", line 372, in _load_user_from_remember_cookie
   user = self._user_callback(user_id)
 File "/app/app/models.py", line 135, in load_user
   return User.query.get(int(id))
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 1021, in get
   return self._get_impl(ident, loading.load_on_pk_identity)
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 1138, in _get_impl
   return db_load_fn(self, primary_key_identity)
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/orm/loading.py", line 287, in load_on_pk_identity
   return q.one()
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 3436, in one
   ret = self.one_or_none()
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 3405, in one_or_none
   ret = list(self)
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 3481, in __iter__
   return self._execute_and_instances(context)
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 3506, in _execute_and_instances
   result = conn.execute(querycontext.statement, self._params)
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1020, in execute
   return meth(self, multiparams, params)
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
   return connection._execute_clauseelement(self, multiparams, params)
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement
   ret = self._execute_context(
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1323, in _execute_context
   self._handle_dbapi_exception(
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1517, in _handle_dbapi_exception
   util.raise_(
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
   raise exception
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1283, in _execute_context
   self.dialect.do_execute(
 File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
   cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) terminating connection due to administrator command
SSL connection has been closed unexpectedly

[SQL: SELECT "user".id AS user_id, "user".goog_id AS user_goog_id, "user".firstname AS user_firstname, "user".username AS user_username, "user".email AS user_email, "user".password_hash AS user_password_hash, "user".admin AS user_admin, "user".test_account AS user_test_account, "user".deleted AS user_deleted, "user".suggestion_id AS user_suggestion_id, "user".account_created_date AS user_account_created_date, "user".last_active AS user_last_active, "user".last_action AS user_last_action, "user".tos AS user_tos, "user".token AS user_token, "user".token_expiration AS user_token_expiration, "user".preferences AS user_preferences, "user".add_by_email AS user_add_by_email, "user".review_count AS user_review_count
FROM "user"
WHERE "user".id = %(param_1)s]
[parameters: {'param_1': 9}]
(Background on this error at: http://url2468.lurnby.com/ls/click?upn=O95o0jN-2F92mJdpf7ZhEbgLXtFoI7wvvm0Nlb71SAqYc7I4OlCr9vuEBttZ6PLGT1WNJK_-2FZKyCIFGBlWZyR9dtbwHIdOuvcuQq8Y2fMr-2B-2FbeShjLyosNNioPDJzhSgpKKo74YFbAenDCCE7-2Bkqr8yx5SEv084ovLO1u39hYHllO4yfmmzrwczMZ24bimgvQg0j9VfPKhsZ6407LPtIKOe92SaPIyJNjOYcQUjO5GidlprUpwSgFJX1-2FmYyDnyXm5ii6-2BzSv3dLzcrWZIMfMen51r0Zw-3D-3D

I saw an issue with Recommended Systems that was extremely similar, and the solution was outlined here: https://blog.stigok.com/2021/02/28/sqlalchemy-postgres-ssl-eof-detected.html

Ultimately, it was to include a pool_pre_ping configuration in the config.py file. It looks like this:

SQLALCHEMY_ENGINE_OPTIONS = {"pool_pre_ping": True}

Here's how it works:

The reasons for the SSL SYSCALL error: EOF detected is that the client (ORM) thinks that the TCP connection is still up, but the server has already hung up without saying so. The client then starts sending a query down the pipe and when it does, it notices the connection is broken, resulting in a sudden EOF.

What pool_pre_ping does is to test the connection before attempting to execute the actual query. This comes with an extra round-trip for all queries, but at least in my small-scale application this doesn’t matter at all. Behind the scenes, it sends a query similar to SELECT 1 to sort of ping the database. If it succeeds it follows up with the actual query you wanted to send – if it fails it recycles the connection along with all other connections established earlier than the connection it tried, and establishes a new one before sending the query again.

Decouple from Amazon

For storing image content the app currently sends images to amazon s3. This is fine for the web-app version, but if the app is meant to run locally, then it's not necessary.

There should be a flag somewhere to determine if this is supposed to be a web app or an offline app and removes the Amazon dependency if that's the case.

Deleting articles

I could not find a button to do it. The closest thing I found was "Archive" where the article disappears into undisclosed location.

Offtopic: Great app! I am thinking of replacing Pocket with it.

Creating a Lurnby api

For some of the planned features for lurnby, including offline support and native mobile apps, it's important to first separate the data from the application to allow for multiple clients. This is a sketch of the api

User

Method	Endpoint	Description
POST	`/user`	create new user
GET	`/user/<id>`	get user info
GET	`/user/<id>/email`	enable add by email
GET	`/user/<id>/senders`	get approved senders
PUT	`/user/<id>/senders`	update approved senders
GET	`/user/<id>/export`	export all users data
GET	`/user/<id>/preferences`	get user communication preferences
PUT	`/user/<id>/preferences`	update user communication preferences
PUT	`/user/<id>`	update user
DEL	`/user/<id>`	delete user

Auth

Method	Endpoint	Description
POST	`/authorize`	log in / receive tokens
POST	`/refresh`	refresh tokens

Articles

Method	Endpoint	Description
GET	`/article`	Get articles
POST	`/article`	create new article
GET	`/article/<id>`	get article
PUT	`/article/<id>`	update article
DEL	`/article/<id>`	delete article
GET	`/article/<id>/notes`	get article notes
PUT	`/article/<id>/notes`	update article notes
GET	`/article/<id>/highlights`	get article highlights
GET	`/article/<id>/export`	export article

Highlights

Method	Endpoint	Description
GET	`/highlight`	Get highlights
GET	`/highlight/export`	export highlights
POST	`/highlight`	create new highlight
GET	`/highlight/<id>`	get highlight
PUT	`/highlight/<id>`	update highlight
DEL	`/highlight/<id>`	delete highlight
GET	`/highlight/review`	get highlights for review

Tags

Method	Endpoint	Description
GET	`/tag`	Get tags
POST	`/tag`	create new tag
GET	`/tag/<id>`	get tag
PUT	`/tag/<id>`	update tag
DEL	`/tag/<id>`	delete tag

Incomplete Unit tests

The unit tests haven't been updated in a long time and need to be written from scratch. Currently the app is tested manually when changes are made and then tested again in a staging environment, but human error and all that.

Need to first brainstorm what the unit tests should be and then write them.

add by email fails if there is more than 1 recipient.

def add_by_email():
    recipient = request.form['to']
    if '<' in recipient:
        recipient = recipient.split('<')[1][:-1]

This is the code being used to get the recipient of the email. When someone emails something to Lurnby and there is more than a single email in the request.form['to'] then the function fails as it isn't pulling out the right email.

A better solution would likely be to use regex to pull out the email that has @add-article.lurnby.com as the ending.

Import lurnby web to lurnby self hosted

In a case where a user want to self-host lurnby himself it would be interesting to add the ability to import the data exported from the current website https://www.lurnby.com.

My current use case is that im testing your application and would like to self-host it later.

Medium Articles (any articles behind paywall)

Currently can't parse articles that are behind a paywall.

This includes medium articles that are rate limited, as well as articles from Bloomberg properties like CityLab.

The alternative for now is to copy & paste and add the articles manually.

Video demo

In the future there is an option to connect via api or see how a user might be authenticated to these sites.

Titles in Highlights Email are Tiny

See screencap here from a gmail edition of the "Recent Highlights Email" (also on mobile):

By changing h6 to h4 tag, it looks better:

Separate recommendation for consideration:

Group highlights from individual articles together?

Offline Mode for web hosted lurnby

Currently lurnby.com doesn't work offline. If trying to access it offline the service worker just shows a standard this app doesn't work offline.

But the idea is that it should also work offline to some degree, although I am not sure exactly how much.

A simple idea is that it should cache the x most recent articles so that you could read them offline. Or it should cache x most recent highlights so that review is possible.

In the case of articles, I think that becomes a bit challenging when figuring out how to also allow highlighting in offline mode.
Highlights actually change the text of the article so to create a highlight object, you would need to:

Capture highlighted text
Capture notes added to highlight
Capture any tags/topics
Capture the precise location in the text
Add to some sort of queue that then updates the db when network access arrives.

One possible solution for this is that when creating a highlight while offline, the highlight is created with a temporary ID and then rendered to the screen as normal. A javascript object takes the place of the DB.

Once network functionality is regained and an actual ID is generated by the db, the article text gets updated so that the highlight points to the proper place.

Finding Epub Images

Epubs seem to have very limited consistency with how they organize their internal file structure.

I haven't figured out a great way of finding the image folder.

images = soup.find_all('img')
        if images:
            for img in images:
                img["loading"] = "lazy" 
                filename = img['src']   
                filename = filename.replace("../", path+"/")

                if not os.path.exists(filename):
                    filename = f"{path}/{img['src']}"

                if not os.path.exists(filename):
                    filename = f"{path}/EPUB/media/{img['src']}"
 
                if not os.path.exists(filename):
                    filename = f"{path}/EPUB/images/{img['src']}"
            
                if not os.path.exists(filename):
                    filename = img['src']
                    filename = filename.replace("../", path+"/OEBPS/")

Whenever I encounter an epub whose images don't load, I need to load up the epub, look at the folder structure and then manually add in the branching path.

I'm sure there's a better way to search the epub to locate the image folder itself which would work for any yet undiscovered filepaths.

Export should offer html or plaintext options

Since switching over the highlights and most text inputs to support html, the actual content now being exported is html. This means that it's a bit limiting what you can use the content for if you export it from lurnby.

As a user I would want to specify when choosing my export if highlights and notes should be exported as html content or parsed for their plaintext versions.

Add TinyMCE support to manual article entry.

This should be an easy quick fix. And should be present with or without markdown support.

Highlighting images and media doesn't work yet.

Currently Lurnby's highlighting only works for text content. Ideally you should be able to highlight images, graphs, and charts.

The existing libraries all seem to rely on making a screenshot by recreating the DOM and I'm not positive if that would work in all use cases such as pdfs and the like.

In any case, would need to do a few things for that.

Open up an option to take a screenshot instead of create a highlight in reader mode.
Create a new db field for storing image location in the Highlight model.
Change how the highlight displays if it's an image or if it's text.

Need a more comprehensive help section

Lurnby has a lot of things that aren't obvious. There are some videos that show the different functionality in action, but it would be much better to also have a focused getting started guide that linked you to guides on the different features and functions.

These would prob be a combo of text + gif.

Allow external highlights

All highlights are currently connected to an article inside of the app. But this is a limitation that shouldn't be there as you should be able to import highlights from anywhere to start using with the platform.

The easiest might be to add a boolean for external=True and an externalSource=... fields to the highlight model.

Then the highlight page should have a create highlight button to allow for manual creation.
It should also have an import highlights button that uses something like this import code as base.

Also the web extensions can then be updated to allow for sending highlights and not just for sending the url to the app.

Email Sender Identification

I received an email from Lurnby the other day that looked like this:

Looks like the sender name is coming through as "team" because of "[email protected]" in the Config.py file.

To fix this, you can format the name as Name <[email protected]>. Reference: https://stackoverflow.com/questions/44385652/add-senders-name-in-the-from-field-of-the-email-in-python

I did this in RecSys and it made my emails more user friendly & better branded.