ogallagher / quizcard-generator Goto Github PK
View Code? Open in Web Editor NEWGiven a source document, generate quiz/flash cards
Home Page: https://wordsearch.dreamhosters.com/quizcard-generator
License: MIT License
Given a source document, generate quiz/flash cards
Home Page: https://wordsearch.dreamhosters.com/quizcard-generator
License: MIT License
I'm not sure what to do about them, but in cases where grammatical prefixes (ex. pre-, re-, un-) and suffixes (ex. -들, -에, -가, -는) occur frequently I've noticed some unusual results.
Once the full control flow has been confirmed, document the steps to follow in readme
.
Allow exclusion of words by their line number. It would also then make sense to optionally show line numbers in the web ui source editor.
Any text that populates AnkiNote.text
should have contained double quotes escaped in export.
Use istanbul/nyc to integrate code coverage into the tests.
In Anki fill-blanks
card front template, update the script to shuffle the order of the active choice list elements.
.npmignore
filesquizcard_cli.ts
.d.ts
type declaration files in npm packageChoice count is the number of choices from which to choose the correct word.
Choice variation is edit distances between the choices.
Word.get_closest_words
to accept variation/randomnessNear the top of the documentation, include a video demonstrating wordsearch...tld/quizcard-generator
, with an explanation that it's a hosted web UI implemented on top of the CLI tool.
It would be quite easy to bulk edit tags within Anki after export as well, but also easy to provide an input for them at the quizcard generator step.
I'm using these options in the wordsearch generator web driver, so information about them should be exposed for external use.
I'm seeing
--exclude-word=this,some,--exclude-word=this,some
instead of
--exclude-word=this --exclude-word=some
title
attributes.innerText
attributes.spa
eng
If the token (coughing?)
is parsed as a word with key_string='coughing' raw_string='(coughing)'
, then the corresponding Anki note cloze should be generated as ({{c1::coughing}}?)
, with (
and ?)
being outside of the cloze text.
I plan to use the konlpy Python package, with a driver script that quizgen can call to fetch part of speech (POS) tags for a given sentence. #26 (comment)
python konlpy cli driver that accepts a string and returns the token POS tags
quizgen accepts source text language opt
if source text language is Korean
Word
instancesWord
, a new member root_string
has the subset of key_string
that excludes unimportant parts of speech (particles/ornaments). Another member stores the ornaments. What to do with word root_string
and ornaments?
I don't think words should use root_string
as the unique identifier, because a word test should include the particles as part of the test; they are sometimes what makes an answer invalid or valid, depending on the correct overall part of speech.
I could use root_string
to count occurrences of a word (across multiple instances of Word
with different key_string
), which could be used to limit the number of tests of the same word. Likewise, it could be used to enable tests of words otherwise considered to occur too infrequently.
I could use root_string
for edit distance, so that words with different particles and the same root would be stored as edit distance zero. This could then be used to exclude words with the same root from choices for a test (ex. 가방은
, 가방이
). But again, sometimes testing different parts of speech for the same root is desirable for testing grammar instead of vocabulary.
There is certainly benefit in testing a word in multiple contexts, but there should be a way to configure the maximum number of tests of amy same word.
Prologues and epilogues are additional text from neighboring notes' sentences, which will not be rendered with any testable words.
I like the sound of nunjucks, by Mozilla and inspired by Jinja2 (which I have used a lot).
This will allow me to develop the card templates more fully with components in separate files.
Currently, every generated Anki note represents one sentence, but allowing more sentences per note will give more context for each card.
For example, instead of "keep most common 100 testable words", you can "keep most common 10% of testable words".
Example error logs
101.INFO: parsed 458 sentences, 2026 words
323.INFO: max choices = 5
147.INFO: calculations complete for 택시-운전사_대본_1.txt
215.INFO: generate null anki notes
219.DEBUG: word_frequency_ordinal_max or min = undefined || undefined
222.DEBUG: choice variation = 0.7
223.DEBUG: prologue=10 epilogue=8
/home/dh_yx2fag/wordsearch.dreamhosters.com/quizcard_webserver.js:34
(note) => note.toString(
^
TypeError: Cannot read properties of null (reading 'toString')
at /home/dh_yx2fag/wordsearch.dreamhosters.com/quizcard_webserver.js:34:36
at Array.map (<anonymous>)
at /home/dh_yx2fag/wordsearch.dreamhosters.com/quizcard_webserver.js:33:40
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
I do not want to include the entire value of --input-file-content
in the notes opts comment.
There is already a QuizCardGenerator.sentence_word_count_min
property (not configurable), so there can be another like sentence_word_count_max
if we're counting testable words, or sentence_token_count_max
if counting tokens.
I added the empty field, so now we can populate it with a translation service.
Main being package.json#main
, quizcard_generator.js
. I'm specifically thinking of the CLI option keys.
If the current note's tags are available within the DOM on render, then I can support variations of fill-blanks
, as well as other layouts of the same information, without needing to create duplicate notes for associating to different card templates.
qg-
except those specified by the user, and those for identifying the source text.Initial render control tags:
show-logging
tag, then render logs can be shown in the card body.show-choices
tag, then the multiple choices for the cloze are not shown.show-source-file
and show-source-line
tags, that section will be hidden.show-prologue
tag, the preceding sentence will be hidden, and similarly for show-epilogue
.show-randomized
shuffles the choicesDocument
Low priority
Currently, whatever raw text was present at the first occurrence location of a word, that same text will be used everywhere when it's time to export.
To fix this (ex. ... hello. "Hello?"
), move raw_string
to be a child of the Word.locations
collection. When formatting the sentence for export, the location corresponding to the current sentence and token number will determine which value is chosen.
lang
opt to override localeWhen the updated notes are imported into Anki, they are uniquely identified with the euid column, which will not change as long as the quizcard-generator input/source file name and line number to which the note belongs don't change.
If the input file is too large, the server will throw a cryptic error (likely because of exceeded memory limit). In this scenario, one has to split the file into segments. Note each segment should have a different source file name to prevent overwriting notes from the previous segment.
Notes that are generated without any testable words are assigned the not-testable
tag, but are still included in the notes file. This allows you to then decide case by case, after import to Anki, whether to edit them or delete them.
The structure of the active choice list does not guarantee the list element (<ul/>
) to be the root.
i18n.create().extend()
in forkThis embeds the options used to generate the file within the file header comments section, making it easier to reproduce and adjust later.
This is what Anki uses to determine whether to insert or update the note when comparing with existing (ex. previously imported) notes.
Make edit distances a single shared structure, rather than owned individually by each word.
This might not actually reduce memory usage at all, because I still need to store the distance between every word a
and b
, bidirectionally. If the data structure is singular, each distance will still have n^2
keys (n
= word count).
Ignore edit distances beyond a configurable threshold.
For the threshold, I can use the same formula as the configurable choice variance.
Begin calculating edit distances while parsing the source text.
Move word edit distances structure into storage as temp files? Maybe this is overkill, for a future enhancement. It may be better to move other structures into storage first, like the lists of sentences and words.
There should be more variety between cards and less ambiguity between choices, if multiple words can be tested simultaneously in the same card, each choice being a combination of the tested word candidates.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.