ambuda-org / vidyut Goto Github PK
View Code? Open in Web Editor NEWInfrastructure for Sanskrit software. For Python bindings, see `vidyut-py`.
Infrastructure for Sanskrit software. For Python bindings, see `vidyut-py`.
In the लिट् forms of वयँ (01.0547) dhatu, वय् is shown as getting samprasarana through the sutra, ग्रहिज्यावयिव्यधिवष्टिविचतिवृश्चतिपृच्छतिभृज्जतीनां ङिति च 6/1/16. Due to this the generated forms are wrong.
The sutra only applies to वय् adesha of वेञ् dhatu but not to the वय् dhatu (See Kashika and Kaumudi).
So, as shown in Kaumudi and Madhaviyadhatuvritti (ववये - वादित्वान्नैत्त्वाभ्यासलोपौ
) the forms of वयँ (वय्) dhatu should be:
eka | dvi | bahu |
---|---|---|
ववये | ववयाते | ववयिरे |
ववयिषे | ववयाथे | ववयिध्वे/ववयिढ्वे |
ववये | ववयिवहे | ववयिमहे |
But the current forms are:
(AFAIK, ऊये, ऊयाते, etc. cannot be optional forms for the वयँ (वय्) dhatu either. But please do check.)
When I tried to run the make create_all_data
the following error occurs at this line in the create_all_data.sh
script.
I am assuming that it's a stray line as far as I could debug given that I was able to run the script successfully (I think) after removing aoeu
line.
Due to the sutra विभाषेटः 8.3.79, the ध् in ध्वम् (लिट/लुङ्) and षीध्वम् (आशीर्लिङ्) is optionally replaced by ढ्, if the anga ends with इण् and is followed by इट् (aagama).
If the anga does not end with इण्, then the sutra cannot apply.
But I noticed that vidyut applies this sutra for many dhatus whose anga is not इण्-अन्त। Some of the examples are:
AFAIK this is not correct. Please correct me if I am wrong.
https://ambuda-org.github.io/vidyullekha/?tab=tin&dhatu=02.0040
I was looking at the derivation of verb form ‘ayAni’ of ‘iR gatO’. When I copied the link, I got the above link.
Request -
Kindly allow the user to share the link to a particular verb form. This may require specification of kartari/karmaRi, sannanta, lakAra, vacana etc.
All I want is that the user should be able to share unique link for each verb form.
Thank you.
Forms of जनँ जनने (03.0025) are shown as getting ई आदेश in अपित् places through the sutra, ई हल्यघोः 6/4/113.
But this is not possible due to the more specific sutra, जनसनखनां सञ्झलोः 6/4/42. Due to this, the dhatu's forms in अपित् places are wrong: जजेतः, जजेताम्, अजजेताम्, जजेयाताम्.
The correct forms are shown in Siddhanta Kaumudi:
एषामाकारोऽन्तादेशः स्याज्झलादौ सनि झलादौ क्ङिति च । जजातः । जज्ञति । जजंसि । जजान । जजन्यात् । जजायात् । जन्यात् । जायात् ।
We have basic tests in basic_tinantas
, but these need more work.
The difficulty here is somewhere between #16 and #17. In particular, I'm not sure how to derive the अजादिs (e.g. I can understand अशश्यते but not अशाश्यते, so I think I'm missing a rule). But the main difficulty will be implementing the right order for various changes: dhatu substitutions, dvitva, samprasarana. (Our current order works for basic verbs in kartari prayoga but is failing here.)
The following forms should also be generated for टुओँश्वि (01.1165) dhatu in lit lakara:
Ref: Siddhanta Kaumudi & Kashika
As done on sites like ashtadhyayi.com. This is the expected format on most sites. Otherwise, the debugger output feels like an error.
Option
is more idiomatic but might lead to a lot of friction for callers. We should experiment with it and see how it feels.
Profiling indicates that the tripadi
module is slow.
Many of the rules in the tripadi
need to iterate over every character in the string so that they can apply various sandhi changes. Currently, we create a new CompactString
for each of these rules. My rough guess is that we create a dozen such strings for each word we derive, even if none of the rules have scope to apply. CompactString
shouldn't stack allocate in most cases, but the copy work required here is still slow.
Once we confirm that this is a problem with profiling, we should avoid the extra copies here. Two approaches that come to mind:
Instead of creating a new string, iterate over the Term
strings and manage indices carefully.
Store one copy of the string and rebuild it only if a rule applies. The code would follow the basic pattern of ItPrakriya
, e.g., by extending the Prakriya
struct with new data and helper methods.
I think (2) is generally cleaner, and it has the side effect of improving our APIs.
Ought to be able to call from python.
http://saidvandeklundert.net/learn/2021-11-18-calling-rust-from-python-using-pyo3/ seems to have some tips.
As shown in the GIF below, when switching from one dhatu to another the 'प्रयोग' and 'सनादि' options are reset in the UI but the forms are displayed with the same options that were previously selected.
Currently, they can be reset only by selecting another option and then selecting the original option.
The options should reset to default ("कर्तरि" and "none") when switching between dhatus.
Implement https://ashtadhyayi.com/sutraani/7-4-23 in the angasya
module to support this form.
ArgumentError
is our standard error type for bad or invalid input. I think we can return this in most places, if not all of them.
Hi,
Great project!! Really interesting and inspiring.
Are you interested in adding support for non-Devnagari transliteration, maybe one or all of IAST, ITRANS, SLP1? Will be useful for folks with difficulty in reading Devnagari.
I would love to contribute to developing this feature. Maybe for a start, we could implement a small transient popup that shows the transliteration(s) near the pointer on extended mouse hover?
To reproduce:
(I guess it's just retaining scroll position and rendering new contents, but it's a bit confusing every time.)
Reference - drdhaval2785/prakriya#119
https://ashtadhyayi.com/dhatu/02.0030?type=ting says that this root takes Atmanepadi suffices when 'A' is added as prefix.
I wanted to check on the demo at https://ambuda-org.github.io/vidyullekha/?tab=tin&dhatu=02.0030, but could not find out a way to check with a given upasarga.
Question - How do I access a verb form with a given upasarga on the demo frontend?
https://rust-lang.github.io/api-guidelines/about.html
When we publish on cargo, we should aim for a minimal API that is maximally permissive. This is a great first issue if you know some Rust already.
गुपूँ रक्षणे (01.0461) is वेट्। Its non-इट् forms in lit lakara viz. जुगोप्थ, जुगुप्व and जुगुप्म are not generated.
Reference: Siddhanta Kaumudi
यजँ (01.1157) dhatu has two forms in lit lakara parasmaipadi: इयजिथ and इयष्ठ. Refer Siddhanta Kaumudi and Madhaviya Dhatu Vritti.
But इयष्ठ is not generated.
Initial feedback from https://groups.google.com/g/sanskrit-programmers/c/si3PmK_hShQ/m/FHXJJ1ZzCAAJ
Also at the end of the prakriya, it would make sense to show the final form as the last step.
Probably simplest to simply append the final form again, as another row in the table (repeating the form shown above the table).
I tried to work on the demo web app, but things are broken:
➜ ~/w/ambuda/vidyut/vidyut-prakriya git:(main) ✗ make debugger
wasm-pack build --target web
[INFO]: 🎯 Checking for the Wasm target...
[INFO]: 🌀 Compiling to Wasm...
Finished release [optimized + debuginfo] target(s) in 0.05s
[INFO]: License key is set in Cargo.toml but no LICENSE file(s) were found; Please add the LICENSE file(s) to your project directory
[INFO]: ⬇️ Installing wasm-bindgen...
[INFO]: Optimizing wasm binaries with `wasm-opt`...
[INFO]: ✨ Done in 0.96s
[INFO]: 📦 Your wasm pkg is ready to publish at /Users/shreevatsa/w/ambuda/vidyut/vidyut-prakriya/pkg.
cp pkg/* www/static/wasm
cp: www/static/wasm is not a directory
make: *** [debugger] Error 1
After creating that directory:
➜ ~/w/ambuda/vidyut/vidyut-prakriya git:(main) ✗ mkdir -p www/static/wasm
➜ ~/w/ambuda/vidyut/vidyut-prakriya git:(main) ✗ make debugger
wasm-pack build --target web
[INFO]: 🎯 Checking for the Wasm target...
[INFO]: 🌀 Compiling to Wasm...
Finished release [optimized + debuginfo] target(s) in 0.05s
[INFO]: License key is set in Cargo.toml but no LICENSE file(s) were found; Please add the LICENSE file(s) to your project directory
[INFO]: ⬇️ Installing wasm-bindgen...
[INFO]: Optimizing wasm binaries with `wasm-opt`...
[INFO]: ✨ Done in 0.97s
[INFO]: 📦 Your wasm pkg is ready to publish at /Users/shreevatsa/w/ambuda/vidyut/vidyut-prakriya/pkg.
cp pkg/* www/static/wasm
cp data/* www/static/data
cp: www/static/data is not a directory
make: *** [debugger] Error 1
Ok, creating that one too:
➜ ~/w/ambuda/vidyut/vidyut-prakriya git:(main) ✗ mkdir -p www/static/data
➜ ~/w/ambuda/vidyut/vidyut-prakriya git:(main) ✗ make debugger
wasm-pack build --target web
[INFO]: 🎯 Checking for the Wasm target...
[INFO]: 🌀 Compiling to Wasm...
Finished release [optimized + debuginfo] target(s) in 0.05s
[INFO]: License key is set in Cargo.toml but no LICENSE file(s) were found; Please add the LICENSE file(s) to your project directory
[INFO]: ⬇️ Installing wasm-bindgen...
[INFO]: Optimizing wasm binaries with `wasm-opt`...
[INFO]: ✨ Done in 0.96s
[INFO]: 📦 Your wasm pkg is ready to publish at /Users/shreevatsa/w/ambuda/vidyut/vidyut-prakriya/pkg.
cp pkg/* www/static/wasm
cp data/* www/static/data
cd www && source env/bin/activate && python app.py
/bin/sh: env/bin/activate: No such file or directory
make: *** [debugger] Error 1
Indeed there's no env/bin/activate
inside www
so maybe it needs to be done in the other order?
➜ ~/w/ambuda/vidyut/vidyut-prakriya git:(main) ✗ source env/bin/activate
(env) ➜ ~/w/ambuda/vidyut/vidyut-prakriya git:(main) ✗ cd www && python app.py
Traceback (most recent call last):
File "/Users/shreevatsa/w/ambuda/vidyut/vidyut-prakriya/www/app.py", line 1, in <module>
from flask import Flask, render_template
ModuleNotFoundError: No module named 'flask'
Maybe there needs to be a requirements.txt
and all that, as this app is no longer just a standalone index.html
file :-)
There are wrong closing tags in the following lines of the vidyut-prakriya's index.html
:
</a>
instead of </div>
<input ... />
instead of <input>...</input>
since input is a self-closing tag.</div>
instead of </ol>
See the lint errors on Prettier playground online.
Environment used was WSL - with no github specific settings. Git protocol fails, Switcihing to https works fine.
May be nice to encode state in the URL (general motivation, how to do it with Alpine.js), for easily sharing a specific prakriya or dhātu view with others.
Show the sūtra text too, and not just link to ashtadhayi.com. This may mean fetching and saving (and serving) the sūtras, just like dhatupatha.tsv.
This was initial feedback from two different people at https://groups.google.com/g/ambuda-discuss/c/n7toiAOQ6wY and https://groups.google.com/g/sanskrit-programmers/c/si3PmK_hShQ/m/J-n1GytzCAAJ so likely more people will have the same feedback.
In the derivation of बुध्यात् (बुधँ अवगमने 01.0994):
Without the presence of atleast the क्ङिति च, it is difficult to understand why there is no guNa in the form.
Dr Dhaval's Prakriyāpradarśinī shows क्ङिति च:
If this is intended, I understand.
We currently have reasonable support for karmani prayoga, and I'll also add support for sanAdi pratyayas by the end of the year. We have experimental support for various krdantas and basic support for subantas.
Currently, how are rule conflicts handled in prakriyA simulation? The regular interpretation of विप्रतिषेधे परं कार्यम्, augmented by a web of paribhAShA-s?
Would it be simple to implement an option to resolve such rule conflicts by means of the simpler framework described in Rishi rajpopat's thesis which recently entered the news and fascinated / surprised many? This will be enormously valuable in validating the claims made therein, and will likely lead to advances in our understanding of what pANini intended + drawbacks therein.
MultiFst
should handle duplicate keys and do mst of the heavy lifting. Kosha
should then be a thin wrapper over MultiFst
.
Use cases:
MultiFst
for other kinds of linguistic dataMultiFst
which can be handy for some applications (e.g. creating lightweight FST structs for word lists.)Technical blockers
get_all
should return an iterator instead of a Vec
.It would help to store the data for future use to display step by step derivation.
May store as JSON / any other convenient format.
Online data for tinantas and subantas is quite reasonable. But as far as I'm aware, there are no high-quality lists of krdantas. I have started a basic test suite in basic_krdantas
, but we should add more cases here.
This task requires some knowledege of व्याकरण or else a willingness to go through various grammar books, etc. to determine which forms we should expect.
should change from "Sanksrit software" -> "Sanskrit software"
When multiple forms are generated (e.g. "भवतात् , भवताद् , भवतु"), may be nice to see all their prakriya-s side by side. (Basically make the entire table cell a link/action target, rather than an individual generated form inside a table cell.)
(nice to have but a little trickier on the frontend side)
(Above quoted from #19 — may fit well with #23 if all the inputs (lakāra, puruṣa, vacana…) are the "state".)
We have basic tests in basic_tinantas
, but these need more work.
The major issue is that we need to correctly implement पुगन्तलघूपधस्य च to account for पुगन्त. Otherwise there are minor issues, e.g. for जागृ guna.
Similar to sites like ashtadhyayi.com. A list of 2000 dhatus is too much to reasonably scroll through, especially since they are in their aupadeshika forms.
In packing
, I chose a bitfield ordering more or less on a hunch, but I don't think our current ordering works very well because our modular_bitfield
crate uses an endianness that's different from what I expected.
I think a better ordering or approach here could potentially decrease the size of the FST. My guess is that we might save up to 10% on size, which means more of the FST can be kept in the processor cache.
A good PR here should quantify the size decrease when using a different bitfield ordering.
We have basic tests in basic_tinantas
, but these need much more work.
As compared to our णिजन्तs, our सन्नन्तs have more quality issues and are not quite as reliable. The major issue is that a lot of small सन् rules haven't been implemented yet, but I'm not sure which ones.
अक्षूँ (01.0742) is ऊदित् dhatu. So, due to स्वरतिसूतिसूयतिधूञूदितो वा 7/2/44, it optionally gets इडागम when followed by a वलादि आर्धधातुक suffix. In case of lit lakara, it should have the following forms:
eka | dvi | bahu |
---|---|---|
आनक्ष | आनक्षतुः | आनक्षुः |
आनष्ठ , आनक्षिथ | आनक्षथुः | आनक्ष |
आनक्ष | आनक्षिव , आनक्ष्व | आनक्षिम , आनक्ष्म |
Similarly, तक्षूँ (01.0743) and त्वक्षूँ (01.0744).
Acc. to Ashtadhyayi Sahajabodha - Part II of Pushpa Dikshit (p.304), all the dhatus given below should also behave similarly. But I have not checked them yet.
Here is a related issue in Dhaval Patel's SanskritVerb project.
स्कन्दिर् (01.1134) is वेट् when followed by थल् due to the भारद्वाजनियम।
Currently only चस्कन्दिथ is generated. (See image below).
But चस्कन्त्थ and चस्कन्थ should also be generated. See Siddhanta Kaumudi for reference.
CompactString
is a memory-efficient string that can store up to 24 bytes on the stack before making a heap allocation. It's mostly a drop-in replacement for String, and you can learn more about it here.
We've had success improving performance by using CompactString in vidyut-prakriya
, and it might also improve performance in vidyut-cheda
.
A PR here would experiment with CompactString and quantify the performance change.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.