ambuda-org / vidyut Goto Github PK

View Code? Open in Web Editor NEW

45.0 5.0 21.0 1.8 MB

Infrastructure for Sanskrit software. For Python bindings, see `vidyut-py`.

Rust 96.55% Makefile 0.29% Shell 0.11% Python 0.97% HTML 0.88% JavaScript 1.15% PowerShell 0.06%

nlp rust sanskrit

vidyut's People

Contributors

Stargazers

Watchers

vidyut's Issues

[prakriya] lit forms of वयँ (01.0547) are wrong

In the लिट् forms of वयँ (01.0547) dhatu, वय् is shown as getting samprasarana through the sutra, ग्रहिज्यावयिव्यधिवष्टिविचतिवृश्चतिपृच्छतिभृज्जतीनां ङिति च 6/1/16. Due to this the generated forms are wrong.

The sutra only applies to वय् adesha of वेञ् dhatu but not to the वय् dhatu (See Kashika and Kaumudi).

So, as shown in Kaumudi and Madhaviyadhatuvritti (ववये - वादित्वान्नैत्त्वाभ्यासलोपौ) the forms of वयँ (वय्) dhatu should be:

eka	dvi	bahu
ववये	ववयाते	ववयिरे
ववयिषे	ववयाथे	ववयिध्वे/ववयिढ्वे
ववये	ववयिवहे	ववयिमहे

But the current forms are:

(AFAIK, ऊये, ऊयाते, etc. cannot be optional forms for the वयँ (वय्) dhatu either. But please do check.)

[installation] `aoeu` command not found in `create_all_data.sh`

When I tried to run the make create_all_data the following error occurs at this line in the create_all_data.sh script.

I am assuming that it's a stray line as far as I could debug given that I was able to run the script successfully (I think) after removing aoeu line.

[prakriya] ढत्व in ध्वम् and षीध्वम् is not possible if the anga is not ending with इण्

Due to the sutra विभाषेटः 8.3.79, the ध् in ध्वम् (लिट/लुङ्) and षीध्वम् (आशीर्लिङ्) is optionally replaced by ढ्, if the anga ends with इण् and is followed by इट् (aagama).

If the anga does not end with इण्, then the sutra cannot apply.

But I noticed that vidyut applies this sutra for many dhatus whose anga is not इण्-अन्त। Some of the examples are:

स्पर्धँ: पस्पर्धिढ्वे , पस्पर्धिध्वे / स्पर्धिषीढ्वम् , स्पर्धिषीध्वम् / अस्पर्धिढ्वम् , अस्पर्धिध्वम्
राजृँ: रेजिढ्वे , रेजिध्वे , रराजिढ्वे , रराजिध्वे / राजिषीढ्वम् , राजिषीध्वम् / अराजिढ्वम् , अराजिध्वम्
भाषँ: बभाषिढ्वे , बभाषिध्वे / भाषिषीढ्वम् , भाषिषीध्वम् / अभाषिढ्वम् , अभाषिध्वम्

AFAIK this is not correct. Please correct me if I am wrong.

sanskrit/data.git/bin/make_data.py - python 3.6.9 syntax error on line 58 list[str]. Is there a minimum python version?

Capacity to link to a particular verb form

https://ambuda-org.github.io/vidyullekha/?tab=tin&dhatu=02.0040

I was looking at the derivation of verb form ‘ayAni’ of ‘iR gatO’. When I copied the link, I got the above link.

Request -
Kindly allow the user to share the link to a particular verb form. This may require specification of kartari/karmaRi, sannanta, lakAra, vacana etc.

All I want is that the user should be able to share unique link for each verb form.

Thank you.

[prakriya] Some forms of जनँ जनने (03.0025) are wrong

Forms of जनँ जनने (03.0025) are shown as getting ई आदेश in अपित् places through the sutra, ई हल्यघोः 6/4/113.

But this is not possible due to the more specific sutra, जनसनखनां सञ्झलोः 6/4/42. Due to this, the dhatu's forms in अपित् places are wrong: जजेतः, जजेताम्, अजजेताम्, जजेयाताम्.

The correct forms are shown in Siddhanta Kaumudi:

एषामाकारोऽन्तादेशः स्याज्झलादौ सनि झलादौ क्ङिति च । जजातः । जज्ञति । जजंसि । जजान । जजन्यात् । जजायात् । जन्यात् । जायात् ।

[prakriya] Improve quality for यङन्तs

We have basic tests in basic_tinantas, but these need more work.

The difficulty here is somewhere between #16 and #17. In particular, I'm not sure how to derive the अजादिs (e.g. I can understand अशश्यते but not अशाश्यते, so I think I'm missing a rule). But the main difficulty will be implementing the right order for various changes: dhatu substitutions, dvitva, samprasarana. (Our current order works for basic verbs in kartari prayoga but is failing here.)

[prakriya] Some lit forms of टुओँश्वि (01.1165) are not generated

The following forms should also be generated for टुओँश्वि (01.1165) dhatu in lit lakara:

शिश्वियतुः
शिश्वियुः
शिश्वियथुः
शिश्विय
शिश्वियिव
शिश्वियिम

Ref: Siddhanta Kaumudi & Kashika

prakriya: separate parasmaipada / atmanepada forms in debugger

As done on sites like ashtadhyayi.com. This is the expected format on most sites. Otherwise, the debugger output feels like an error.

Install.sh last step in script running Vidyut on "saMskftam" fails data-dir not provided.

Missing vArtika - ६.१.६४.व्२

Missing vArtika

There may be more such issues where the vArtika is not displayed properly.

[kosha] Consider using `Option` instead of `None` variants in `semantics`

Option is more idiomatic but might lead to a lot of friction for callers. We should experiment with it and see how it feels.

[prakriya] Optimize the `tripadi` module

Profiling indicates that the tripadi module is slow.

Many of the rules in the tripadi need to iterate over every character in the string so that they can apply various sandhi changes. Currently, we create a new CompactString for each of these rules. My rough guess is that we create a dozen such strings for each word we derive, even if none of the rules have scope to apply. CompactString shouldn't stack allocate in most cases, but the copy work required here is still slow.

Once we confirm that this is a problem with profiling, we should avoid the extra copies here. Two approaches that come to mind:

Instead of creating a new string, iterate over the Term strings and manage indices carefully.
Store one copy of the string and rebuild it only if a rule applies. The code would follow the basic pattern of ItPrakriya, e.g., by extending the Prakriya struct with new data and helper methods.

I think (2) is generally cleaner, and it has the side effect of improving our APIs.

Be able to call from python

Ought to be able to call from python.

http://saidvandeklundert.net/learn/2021-11-18-calling-rust-from-python-using-pyo3/ seems to have some tips.

[vidyullekha] 'प्रयोग' and 'सनादि' options should reset when switching between dhatus

Current behaviour

As shown in the GIF below, when switching from one dhatu to another the 'प्रयोग' and 'सनादि' options are reset in the UI but the forms are displayed with the same options that were previously selected.

Currently, they can be reset only by selecting another option and then selecting the original option.

Expected behaviour

The options should reset to default ("कर्तरि" and "none") when switching between dhatus.

Unify install scripts for *nix/Windows

See #51 , #36 for context. Ideally, we should have a single install script that all systems can use.

[prakriya] Add support for समुह्यात्

Implement https://ashtadhyayi.com/sutraani/7-4-23 in the angasya module to support this form.

lakAras in selected transliteration

Currently, they are in fixed transliteration.
Kindly provide them in desired transliteration - in the given example in Devanagari script.

Cargo not found when running in MSYS2

make not found when running in windows

[prakriya] Convert `Box<dyn Error>` to `ArgumentError`

ArgumentError is our standard error type for bad or invalid input. I think we can return this in most places, if not all of them.

[Feature Request] Support for non-Devnagari transliteration

Hi,

Great project!! Really interesting and inspiring.

Are you interested in adding support for non-Devnagari transliteration, maybe one or all of IAST, ITRANS, SLP1? Will be useful for folks with difficulty in reading Devnagari.

I would love to contribute to developing this feature. Maybe for a start, we could implement a small transient popup that shows the transliteration(s) near the pointer on extended mouse hover?

prakriya: Clicking on a dhātu should scroll to top

To reproduce:

Go to https://ambuda-org.github.io/vidyullekha/
Scroll a couple of pages and click on a dhātu
The resulting page will be scrolled to the bottom (the Lrn forms etc), instead of being at the top (Lat).

(I guess it's just retaining scroll position and rendering new contents, but it's a bit confusing every time.)

How to access forms with upasarga

Reference - drdhaval2785/prakriya#119
https://ashtadhyayi.com/dhatu/02.0030?type=ting says that this root takes Atmanepadi suffices when 'A' is added as prefix.

I wanted to check on the demo at https://ambuda-org.github.io/vidyullekha/?tab=tin&dhatu=02.0030, but could not find out a way to check with a given upasarga.

Question - How do I access a verb form with a given upasarga on the demo frontend?

Update public APIs according to best practices

https://rust-lang.github.io/api-guidelines/about.html

When we publish on cargo, we should aim for a minimal API that is maximally permissive. This is a great first issue if you know some Rust already.

[Prakriya] Some lit lakara forms of गुपूँ रक्षणे (01.0461) dhatu missing

गुपूँ रक्षणे (01.0461) is वेट्। Its non-इट् forms in lit lakara viz. जुगोप्थ, जुगुप्व and जुगुप्म are not generated.

Reference: Siddhanta Kaumudi

[prakriya] Form इयष्ठ of यजँ (01.1157) dhatu is not generated

यजँ (01.1157) dhatu has two forms in lit lakara parasmaipadi: इयजिथ and इयष्ठ. Refer Siddhanta Kaumudi and Madhaviya Dhatu Vritti.

But इयष्ठ is not generated.

prakriya: show final form at the end

Initial feedback from https://groups.google.com/g/sanskrit-programmers/c/si3PmK_hShQ/m/FHXJJ1ZzCAAJ

Also at the end of the prakriya, it would make sense to show the final form as the last step.

Probably simplest to simply append the final form again, as another row in the table (repeating the form shown above the table).

prakriya demo: Set up instructions

I tried to work on the demo web app, but things are broken:

➜  ~/w/ambuda/vidyut/vidyut-prakriya git:(main) ✗ make debugger 
wasm-pack build --target web
[INFO]: 🎯  Checking for the Wasm target...
[INFO]: 🌀  Compiling to Wasm...
    Finished release [optimized + debuginfo] target(s) in 0.05s
[INFO]: License key is set in Cargo.toml but no LICENSE file(s) were found; Please add the LICENSE file(s) to your project directory
[INFO]: ⬇️  Installing wasm-bindgen...
[INFO]: Optimizing wasm binaries with `wasm-opt`...
[INFO]: ✨   Done in 0.96s
[INFO]: 📦   Your wasm pkg is ready to publish at /Users/shreevatsa/w/ambuda/vidyut/vidyut-prakriya/pkg.
cp pkg/* www/static/wasm
cp: www/static/wasm is not a directory
make: *** [debugger] Error 1

After creating that directory:

➜  ~/w/ambuda/vidyut/vidyut-prakriya git:(main) ✗ mkdir -p www/static/wasm
➜  ~/w/ambuda/vidyut/vidyut-prakriya git:(main) ✗ make debugger           
wasm-pack build --target web
[INFO]: 🎯  Checking for the Wasm target...
[INFO]: 🌀  Compiling to Wasm...
    Finished release [optimized + debuginfo] target(s) in 0.05s
[INFO]: License key is set in Cargo.toml but no LICENSE file(s) were found; Please add the LICENSE file(s) to your project directory
[INFO]: ⬇️  Installing wasm-bindgen...
[INFO]: Optimizing wasm binaries with `wasm-opt`...
[INFO]: ✨   Done in 0.97s
[INFO]: 📦   Your wasm pkg is ready to publish at /Users/shreevatsa/w/ambuda/vidyut/vidyut-prakriya/pkg.
cp pkg/* www/static/wasm
cp data/* www/static/data
cp: www/static/data is not a directory
make: *** [debugger] Error 1

Ok, creating that one too:

➜  ~/w/ambuda/vidyut/vidyut-prakriya git:(main) ✗ mkdir -p www/static/data
➜  ~/w/ambuda/vidyut/vidyut-prakriya git:(main) ✗ make debugger           
wasm-pack build --target web
[INFO]: 🎯  Checking for the Wasm target...
[INFO]: 🌀  Compiling to Wasm...
    Finished release [optimized + debuginfo] target(s) in 0.05s
[INFO]: License key is set in Cargo.toml but no LICENSE file(s) were found; Please add the LICENSE file(s) to your project directory
[INFO]: ⬇️  Installing wasm-bindgen...
[INFO]: Optimizing wasm binaries with `wasm-opt`...
[INFO]: ✨   Done in 0.96s
[INFO]: 📦   Your wasm pkg is ready to publish at /Users/shreevatsa/w/ambuda/vidyut/vidyut-prakriya/pkg.
cp pkg/* www/static/wasm
cp data/* www/static/data
cd www && source env/bin/activate && python app.py
/bin/sh: env/bin/activate: No such file or directory
make: *** [debugger] Error 1

Indeed there's no env/bin/activate inside www so maybe it needs to be done in the other order?

➜  ~/w/ambuda/vidyut/vidyut-prakriya git:(main) ✗ source env/bin/activate
(env) ➜  ~/w/ambuda/vidyut/vidyut-prakriya git:(main) ✗ cd www && python app.py
Traceback (most recent call last):
  File "/Users/shreevatsa/w/ambuda/vidyut/vidyut-prakriya/www/app.py", line 1, in <module>
    from flask import Flask, render_template
ModuleNotFoundError: No module named 'flask'

Maybe there needs to be a requirements.txt and all that, as this app is no longer just a standalone index.html file :-)

wrong closing tags in vidyut-prakriya's `index.html`

There are wrong closing tags in the following lines of the vidyut-prakriya's index.html:

L213: should be </a> instead of </div>
L231: should be <input ... /> instead of <input>...</input> since input is a self-closing tag.
L244: should be </div> instead of </ol>

See the lint errors on Prettier playground online.

[prakriya] तातङ् is added even before तुह्योस्तातङ्ङाशिष्यन्यतरस्याम् is applied

[prakriya] श्लु is not shown in prakriya when जुहोत्यादिभ्यः श्लुः is applied

Install script fails to clone sanskrit/data.git with git protocol

Environment used was WSL - with no github specific settings. Git protocol fails, Switcihing to https works fine.

prakriya: Nice urls for easy sharing

May be nice to encode state in the URL (general motivation, how to do it with Alpine.js), for easily sharing a specific prakriya or dhātu view with others.

prakriya: Show sūtra text

Show the sūtra text too, and not just link to ashtadhayi.com. This may mean fetching and saving (and serving) the sūtras, just like dhatupatha.tsv.

This was initial feedback from two different people at https://groups.google.com/g/ambuda-discuss/c/n7toiAOQ6wY and https://groups.google.com/g/sanskrit-programmers/c/si3PmK_hShQ/m/J-n1GytzCAAJ so likely more people will have the same feedback.

[prakriya] Prakriya doesn't show all rules involved in the process?

In the derivation of बुध्यात् (बुधँ अवगमने 01.0994):

पुगन्तलघूपधस्य च has prApti but is not shown.
क्ङिति च prohibits पुगन्तलघूपधस्य च but is not shown.

Without the presence of atleast the क्ङिति च, it is difficult to understand why there is no guNa in the form.

Dr Dhaval's Prakriyāpradarśinī shows क्ङिति च:

If this is intended, I understand.

[prakriya] Handling rule conflicts

We currently have reasonable support for karmani prayoga, and I'll also add support for sanAdi pratyayas by the end of the year. We have experimental support for various krdantas and basic support for subantas.

Currently, how are rule conflicts handled in prakriyA simulation? The regular interpretation of विप्रतिषेधे परं कार्यम्, augmented by a web of paribhAShA-s?

Would it be simple to implement an option to resolve such rule conflicts by means of the simpler framework described in Rishi rajpopat's thesis which recently entered the news and fascinated / surprised many? This will be enormously valuable in validating the claims made therein, and will likely lead to advances in our understanding of what pANini intended + drawbacks therein.

[kosha] Refactor `Kosha` logic into a new `MultiFst` class

MultiFst should handle duplicate keys and do mst of the heavy lifting. Kosha should then be a thin wrapper over MultiFst.

Use cases:

can use MultiFst for other kinds of linguistic data
can create Python bindings directly for MultiFst which can be handy for some applications (e.g. creating lightweight FST structs for word lists.)

Technical blockers

to avoid double allocation, get_all should return an iterator instead of a Vec.

fetch_training_data.py - cloning sanskrit.git over git protocol fails in WSL Ubuntu with no git specific settings - https cloning works finefine.

Add option to store sutras / derivation status for all roots

It would help to store the data for future use to display step by step derivation.
May store as JSON / any other convenient format.

[prakriya] Add a better test suite for krdantas

Online data for tinantas and subantas is quite reasonable. But as far as I'm aware, there are no high-quality lists of krdantas. I have started a basic test suite in basic_krdantas, but we should add more cases here.

This task requires some knowledege of व्याकरण or else a willingness to go through various grammar books, etc. to determine which forms we should expect.

description typo

should change from "Sanksrit software" -> "Sanskrit software"

prakriya: side-by-side view for different forms for the same inputs

When multiple forms are generated (e.g. "भवतात् , भवताद् , भवतु"), may be nice to see all their prakriya-s side by side. (Basically make the entire table cell a link/action target, rather than an individual generated form inside a table cell.)

(nice to have but a little trickier on the frontend side)

(Above quoted from #19 — may fit well with #23 if all the inputs (lakāra, puruṣa, vacana…) are the "state".)

[prakriya] Improve quality for णिजन्तs

We have basic tests in basic_tinantas, but these need more work.

The major issue is that we need to correctly implement पुगन्तलघूपधस्य च to account for पुगन्त. Otherwise there are minor issues, e.g. for जागृ guna.

prakriya: add search box for dhatus

Similar to sites like ashtadhyayi.com. A list of 2000 dhatus is too much to reasonably scroll through, especially since they are in their aupadeshika forms.

[kosha] Explore different bitfield orderings

In packing, I chose a bitfield ordering more or less on a hunch, but I don't think our current ordering works very well because our modular_bitfield crate uses an endianness that's different from what I expected.

I think a better ordering or approach here could potentially decrease the size of the FST. My guess is that we might save up to 10% on size, which means more of the FST can be kept in the processor cache.

A good PR here should quantify the size decrease when using a different bitfield ordering.

[prakriya] Improve quality for सन्नन्तs

We have basic tests in basic_tinantas, but these need much more work.

As compared to our णिजन्तs, our सन्नन्तs have more quality issues and are not quite as reliable. The major issue is that a lot of small सन् rules haven't been implemented yet, but I'm not sure which ones.

[prakriya] Some lit forms of अक्षूँ (01.0742) and few other ऊदित् dhatus are not generated

Expected

अक्षूँ (01.0742) is ऊदित् dhatu. So, due to स्वरतिसूतिसूयतिधूञूदितो वा 7/2/44, it optionally gets इडागम when followed by a वलादि आर्धधातुक suffix. In case of lit lakara, it should have the following forms:

eka	dvi	bahu
आनक्ष	आनक्षतुः	आनक्षुः
आनष्ठ , आनक्षिथ	आनक्षथुः	आनक्ष
आनक्ष	आनक्षिव , आनक्ष्व	आनक्षिम , आनक्ष्म

Similarly, तक्षूँ (01.0743) and त्वक्षूँ (01.0744).

Current

Additional Info

Acc. to Ashtadhyayi Sahajabodha - Part II of Pushpa Dikshit (p.304), all the dhatus given below should also behave similarly. But I have not checked them yet.

Here is a related issue in Dhaval Patel's SanskritVerb project.

द्प्.०१.०९३४

https://ambuda-org.github.io/vidyullekha/?dhatu=01.0985&tab=tin

This has some faulty occurrence of some sUtra number.

[Prakriya] स्कन्दिर् dhatu should have forms चस्कन्त्थ and चस्कन्थ in लिट्

स्कन्दिर् (01.1134) is वेट् when followed by थल् due to the भारद्वाजनियम।

Currently only चस्कन्दिथ is generated. (See image below).

But चस्कन्त्थ and चस्कन्थ should also be generated. See Siddhanta Kaumudi for reference.

Explore using `CompactString` in segmenter

CompactString is a memory-efficient string that can store up to 24 bytes on the stack before making a heap allocation. It's mostly a drop-in replacement for String, and you can learn more about it here.

We've had success improving performance by using CompactString in vidyut-prakriya, and it might also improve performance in vidyut-cheda.

A PR here would experiment with CompactString and quantify the performance change.