Giter Club home page Giter Club logo

quizcard-generator's People

Contributors

ogallagher avatar

Stargazers

 avatar

Watchers

 avatar

quizcard-generator's Issues

Configure special treatment of prefixes and suffixes

I'm not sure what to do about them, but in cases where grammatical prefixes (ex. pre-, re-, un-) and suffixes (ex. -들, -에, -가, -는) occur frequently I've noticed some unusual results.

  • Since choices for a given word test are currently aiming for similarity, the choices end up being the same word with different conjugations/particles attached.
  • Different forms of the same word can disproportionately fill the number of tests generated.

Publish as package at npmjs.com

  • fix cli arg aliases, or abandon aliases for now
  • .npmignore files
  • move temp logger import to quizcard_cli.ts
  • #23
  • include .d.ts type declaration files in npm package
  • Update install instructions to reference npm package

Configure choice count and variation per cloze

Choice count is the number of choices from which to choose the correct word.

  • configure choice count in cli
  • configure choice count in web ui

Choice variation is edit distances between the choices.

  • update Word.get_closest_words to accept variation/randomness
  • configure choice variation in cli
  • configure choice variation in web ui

Add live demo with web UI to readme

Near the top of the documentation, include a video demonstrating wordsearch...tld/quizcard-generator, with an explanation that it's a hosted web UI implemented on top of the CLI tool.

Custom tags for Anki note export

It would be quite easy to bulk edit tags within Anki after export as well, but also easy to provide an input for them at the quizcard generator step.

  • tag cli opt declaration
  • tag cli opt implementation

i18n of webpage

  • Frontend language selector button, locale cookie.
  • Frontend script fetches translation file for selected locale.
  • Localization of element title attributes.
  • Localization of element innerText attributes.
  • Handle spa
  • Handle eng
  • Ensure no translatable strings are used as data/control values

Exclude non word characters from Anki clozes

If the token (coughing?) is parsed as a word with key_string='coughing' raw_string='(coughing)', then the corresponding Anki note cloze should be generated as ({{c1::coughing}}?), with ( and ?) being outside of the cloze text.

Use Korean NLP library for filtering testable words by part of speech

I plan to use the konlpy Python package, with a driver script that quizgen can call to fetch part of speech (POS) tags for a given sentence. #26 (comment)

  • python konlpy cli driver that accepts a string and returns the token POS tags

  • quizgen accepts source text language opt

  • if source text language is Korean

    • when building each sentence, pass the sentence text to the konlpy driver
    • parse the POS tags and assign them to Word instances
    • In Word, a new member root_string has the subset of key_string that excludes unimportant parts of speech (particles/ornaments). Another member stores the ornaments.
  • What to do with word root_string and ornaments?

I don't think words should use root_string as the unique identifier, because a word test should include the particles as part of the test; they are sometimes what makes an answer invalid or valid, depending on the correct overall part of speech.

I could use root_string to count occurrences of a word (across multiple instances of Word with different key_string), which could be used to limit the number of tests of the same word. Likewise, it could be used to enable tests of words otherwise considered to occur too infrequently.

I could use root_string for edit distance, so that words with different particles and the same root would be stored as edit distance zero. This could then be used to exclude words with the same root from choices for a test (ex. 가방은, 가방이). But again, sometimes testing different parts of speech for the same root is desirable for testing grammar instead of vocabulary.

Configure max tests of the same word

There is certainly benefit in testing a word in multiple contexts, but there should be a way to configure the maximum number of tests of amy same word.

  • use word test limit in anki generator
  • Configure w cli opt
  • Configure w web ui

Configure note prologues and epilogues

Prologues and epilogues are additional text from neighboring notes' sentences, which will not be rendered with any testable words.

  • add prologue and epilogue fields to note type
  • store sentence prologues and epilogues in quizgen
  • configure lengths of prologue and epilogue from cli
  • document new fields in readme
  • update notes export column list comment for 2 new columns
  • configure lengths from web UI
  • show prologue and epilogue in web card preview
  • update card template to display prologue and epilogue fields around the question text.
    Eventually, showing and hiding these will be done via control tags #32.

Use template library for Anki cards

I like the sound of nunjucks, by Mozilla and inspired by Jinja2 (which I have used a lot).

This will allow me to develop the card templates more fully with components in separate files.

  • enable anki card templating
  • update usage documentation for compiling templates before using in Anki

Customize number of sentences per note

Currently, every generated Anki note represents one sentence, but allowing more sentences per note will give more context for each card.

  • opt for sentence tokens max
  • web UI input for sentence tokens max
  • Web input for tokens max seems not to work
  • web UI input for sentence words min
  • set web input default value for words min to 3 (same as placeholder)
  • #33

Fix web server null note error quizcard_webserver.js:34

Example error logs

101.INFO: parsed 458 sentences, 2026 words
323.INFO: max choices = 5
147.INFO: calculations complete for 택시-운전사_대본_1.txt
215.INFO: generate null anki notes
219.DEBUG: word_frequency_ordinal_max or min = undefined || undefined
222.DEBUG: choice variation = 0.7
223.DEBUG: prologue=10 epilogue=8
/home/dh_yx2fag/wordsearch.dreamhosters.com/quizcard_webserver.js:34
                    (note) => note.toString(
                                   ^
TypeError: Cannot read properties of null (reading 'toString')
    at /home/dh_yx2fag/wordsearch.dreamhosters.com/quizcard_webserver.js:34:36
    at Array.map (<anonymous>)
    at /home/dh_yx2fag/wordsearch.dreamhosters.com/quizcard_webserver.js:33:40
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Configure min words, max tokens per sentence

There is already a QuizCardGenerator.sentence_word_count_min property (not configurable), so there can be another like sentence_word_count_max if we're counting testable words, or sentence_token_count_max if counting tokens.

Use Anki tags to customize card render

If the current note's tags are available within the DOM on render, then I can support variations of fill-blanks, as well as other layouts of the same information, without needing to create duplicate notes for associating to different card templates.

  • Prefix all quizgen generated tags with qg- except those specified by the user, and those for identifying the source text.

Initial render control tags:

  • If you add the show-logging tag, then render logs can be shown in the card body.
  • If you remove the show-choices tag, then the multiple choices for the cloze are not shown.
  • If you remove the show-source-file and show-source-line tags, that section will be hidden.
  • If you remove the show-prologue tag, the preceding sentence will be hidden, and similarly for show-epilogue.
  • show-randomized shuffles the choices

Document

  • Explain render control tags in readme

Words should have different raw text per location

Low priority

Currently, whatever raw text was present at the first occurrence location of a word, that same text will be used everywhere when it's time to export.

To fix this (ex. ... hello. "Hello?"), move raw_string to be a child of the Word.locations collection. When formatting the sentence for export, the location corresponding to the current sentence and token number will determine which value is chosen.

i18n of cli opt descriptions

  • cli driver determines env locale
  • lang opt to override locale
  • cli driver updates option descriptions before loading them for the help message.
  • cli driver mandatory arg prompts are also localized

Document likely pitfalls and tips in web UI

  • notes are uniquely identified by source file name and source file line number.

When the updated notes are imported into Anki, they are uniquely identified with the euid column, which will not change as long as the quizcard-generator input/source file name and line number to which the note belongs don't change.

  • If the input file is too large, the server will throw a cryptic error (likely because of exceeded memory limit). In this scenario, one has to split the file into segments. Note each segment should have a different source file name to prevent overwriting notes from the previous segment.

  • Notes that are generated without any testable words are assigned the not-testable tag, but are still included in the notes file. This allows you to then decide case by case, after import to Anki, whether to edit them or delete them.

Fix card choice shuffle

The structure of the active choice list does not guarantee the list element (<ul/>) to be the root.

i18n of readme

  • Host quizgen readme in webpage using md to HTML compiler.
  • Frontend language select button
  • Frontend language cookie
  • i18n frontend script that detects browser language, fetches corresponding localization file, and translates page strings.
    For this I plan to use roddeh-i18n since it's referenced from an MDN extensions API article, though it doesn't seem like the most ubiquitous library choice.
  • fix i18n.create().extend() in fork

Word edit distance reduce runtime and memory usage

  • Make edit distances a single shared structure, rather than owned individually by each word.
    This might not actually reduce memory usage at all, because I still need to store the distance between every word a and b, bidirectionally. If the data structure is singular, each distance will still have n^2 keys (n = word count).

  • Ignore edit distances beyond a configurable threshold.
    For the threshold, I can use the same formula as the configurable choice variance.

  • Begin calculating edit distances while parsing the source text.

  • Move word edit distances structure into storage as temp files? Maybe this is overkill, for a future enhancement. It may be better to move other structures into storage first, like the lists of sentences and words.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.