tobi-de / leerming Goto Github PK

An implementation of the `Leitner box` that can generate flashcards using llms from documents, youtube videos and web page links.

Home Page: https://leerming.com

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 0.32% Python 37.98% HTML 40.87% Shell 0.10% CSS 19.26% JavaScript 1.46%

anki django flashcards gpt-3 learning leitnerbox llm openai

leerming's Introduction

Unlocking Understanding, One Card at a Time

Note

Alpha quality software!.

Description

Leerming is an open-source Django-based web app that follows the Leitner box method. Create flashcards effortlessly from PDFs, videos, and web links. Supercharge your learning experience.

Leitner Box Method

The Leitner box method is a simple yet effective technique for learning and retaining information. It works by organizing flashcards into different boxes or levels. As you study, correctly answered flashcards move to higher boxes, while incorrect ones move down. This spaced repetition system helps reinforce your memory over time.

For a more detailed explanation of the Leitner box method, check out Wikipedia.

Leitner Box Algorithm Implementation

Flashcards are organized into seven distinct levels. Each card starts at Level 1. The transition between levels is based on performance during reviews.
Each level corresponds to a specific number of days between reviews. For example, Level 1 cards are reviewed daily, while Level 2 cards are reviewed every two days. The exact mapping can be found in the codebase here.
During a review, when a card is answered correctly, it moves up to the next level. Once a card reaches Level 7, it is marked as mastered.
On the other hand, if a card is answered incorrectly during a review, it is downgraded to Level 1, regardless of its previous level. This ensures that challenging material is revisited frequently, while mastered content is reviewed less frequently.

Card Generation from Documents

Leerming can currently generate flashcards from web pages, YouTube videos, PDF files and Microsoft Word documents.

Text Extraction: Uploaded documents, regardless of their original format, undergo automated text extraction, transforming the content into a common text format.
Text Segmentation and Storage: The extracted text is divided into smaller, manageable chunks. For each chunk, we generate embeddings using OpenAI's models. These embeddings, along with the original text content, are then stored in a PostgreSQL database equipped with pgvector. This step is executed by a dedicated worker process.
Key Question as Focal Point: Users provide a key question that serve as a central topic for generating flashcards. Additionally, users select one of their uploaded documents.
Chunk Matching with L2Distance: Leerming identifies document chunks that are closest to the user's key question using L2Distance, ensuring the relevance of the generated flashcards.
Prompt Generation with Language Models (LLM): Using the key question and the identified document chunks, Leerming generates an LLM prompt. This prompt is then sent to Language Models (LLM) to generate flashcards.

Local Development Setup

Requirements

Ensure you have the following prerequisites in place:

PostgreSQL database with the pgvector extension. If you use Docker, you can find a suitable image available.
Rye for streamlined dependency management. While not mandatory, it simplifies the process. You can use the requirements-dev.lock in the project root with any tool that supports the Python requirements.txt format.
An openai API key, you can get one at https://platform.openai.com/account/api-keys.

Setup and Run

Follow these steps to set up and run Leerming locally:

Clone the repository: git clone https://github.com/tobi-de/leerming.git
Navigate to the project directory: cd leerming
Create and activate a virtual environment: rye shell
Install dependencies: rye sync
Create a .env file by copying from .env.template and fill it out: cp .env.template .env
Apply migrations: python manage.py migrate
Create the cache table: python manage.py createcachetable
Install Watson for full-text search: python manage.py installwatson
Create a superuser: python manage.py makesuperuser
Start the development server: python manage.py runserver

leerming's People

Contributors

Stargazers

Watchers

leerming's Issues

using text grade (easy, good, difficult) instead of number based ?

Nouveau
    Levels 1 and 2

Familier
    Levels 3 and 4

Avancé
    Levels 5, 6, and 7

if content is too long, automatically split and choose relevant section

related to #20

import-export

apply soft size limit to flashcard form

when the user type a question of more than 200 characters, remind them to keep it short, same for the answers, 150 characters

alpine or hyperscript for this

add tests

the mvp is mostly done at this point, tests are needed

setup auto backup using pghoard

localize date based on user timezone

New User Flow

Landing Page.
Sign Up View.
- User clicks on the "Sign Up" button.
Verification Email Sent View.
- User is informed that a verification email has been sent.
Email Verification View.
- User clicks on the verification link from the email.
Login View.
- User logs in after email verification.
Profile Setup View.
- User sets up their profile with review_days and review_time.
Dashboard View.
- User is redirected to their dashboard, which initially displays an empty list of cards

send cards to a friend

seconds should not be taken into account with the schedule manager

Edit Cards

Dashboard View.
Search or Scroll to Locate Card View.
- User locates a specific card.
Select Card to Edit View.
- User selects the card they want to edit.
Edit Card Content View.
- User edits card content (Front/Back or Fill in the Gap).
Update Card Rank View.
- User updates the card's rank.
User can delete a card

Review Session

Notification.
- User receives a notification at the scheduled review time.
Session Start View.
- User clicks on the notification to start the session.
Review Card View.
- User goes through cards one by one.
Answer Card View.
- User attempts to answer the question.
- Buttons: "Correct" and "Incorrect."
Next Card View.
- User proceeds to the next card.
Session End View.
- User completes all the cards for the current session.
Score and Mastered Cards View.
- User views session score and mastered cards.

No loading animation during the first milliseconds of document upload submit

admin action to create schedule task for users

Create New Cards Manually

Dashboard View.
Create Card View.
- User clicks on the "Create New Card" button.
Select Card Type View.
- User chooses between "Front/Back Card" or "Fill in the Gap Card."
Fill Card Form View.
- User fills in card details (Front/Back or Fill in the Gap).
Save Card View.
- User saves the card.
Navigate Between Cards View.
- User can navigate between created cards.
View and Edit Card View.
- User can view answers and edit cards.

User should be able to start a new session anytime of the day, but only once per day

Maybe the number of times per day could be a user preference

Create Cards from Document

Dashboard View.
Create Card View.
- User clicks on the "Create New Card" button.
Select Card Type View.
- User chooses between "Front/Back Card" or "Fill in the Gap Card."
Document Upload View.
- User selects a document to create cards from.
Generate Cards View.
- User generates cards from the document.
Navigate Between Generated Cards View.
- User can navigate between generated cards.
View and Edit Card View.
- User can view answers and edit cards.

notifications system, could be use to communicate with users, log system change

category / subject to group cards

multi format support for document

Basic block of text

limit the size of the input
send the complete block to create card

Pdf or any other kind of doc (docx, html, txt, etc..)

Get the text content from the file - https://llamahub.ai/l/file-unstructured
Split the content into multiple documents - recursive text splitter
Generate embedding for each document
Save each document and their embedding in a UserDocumentChunk table
Create a UserUpoadedDocument with the filename and use it to group all UserDocumentChunk together
Use pgvector to get relevant documents to generate cards based on the user "Central quesion"

HTML

either save the content as an html file and use the same steps as for pdf since this https://llamahub.ai/l/file-unstructured support html, or use a specific loader for html https://llamahub.ai/l/web-unstructured_web

Youtube video

Get the video transcript using https://llamahub.ai/l/youtube_transcript
Use step 2 to 6 are the same as the pdf and docs

continue a session on another device result in a 500

Since review sessions are stored in the django session, they are not available from on device to another

improve prompts

take inspirations from exitinsting qa tools

currently there is no scheduled task

create multiple fill_in_the_gap card by specifying multiple answers separated by a comma

split based on token count

basic landing page

Something really simple so that early users understand what the app is about

google sign in

enable http2 on nginx

tmp file does not exist

in prod the worker process and the main process do not run on the same container, worker do not have access to the temp file and fails to create documents

admin link on UI

User Preferences

Profile Settings View.
- User accesses their profile settings. (short_name, full_name, review_days and review_time)
Notification Preferences View.
- User customizes notification preferences (method, frequency).