david-allison / manx-corpus-search Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Requested by Max
You could spell out what happens when you put just * in the Search box, either on the Main page, or on the Result-Browse page.
I feel we should show this as placeholder text below the search box. Remove or hide it when results are shown.
Jul 07 14:05:59 ubuntu-s-1vcpu-1gb-lon1-01 systemd[1]: manx-corpus.service: Main process exited, code=killed, status=9/KILL
Jul 07 14:05:59 ubuntu-s-1vcpu-1gb-lon1-01 systemd[1]: manx-corpus.service: Failed with result 'signal'.
Jul 07 14:06:09 ubuntu-s-1vcpu-1gb-lon1-01 systemd[1]: manx-corpus.service: Scheduled restart job, restart counter is at 1.
Jul 07 14:06:09 ubuntu-s-1vcpu-1gb-lon1-01 systemd[1]: Stopped Manx Corpus Search.
Jul 07 14:06:09 ubuntu-s-1vcpu-1gb-lon1-01 systemd[1]: Started Manx Corpus Search.
I noticed that it won't be able to find a definition for
goll-mygeayrt
but it can forgoll
andmygeayrt
Dictionary definitions were tested as a 'nice to have' - it seems they're useful enough to stay (I've had a few positive comments on them).
Currently, we don't provide the server with enough context to provide a translation. We work on the 'word' level, but the client side does not have enough context to either:
The request should be: (context, selection)
rather than (word)
. From here, on the server we can process the selection and more accurately determine the correct phrase/word to return.
Test case above (could also be handled at the dictionary-level), but we lose out on phrase selection, and defining recursive/contextual structures.
I have added ManxCorpus to the Manx-English dictionaries accessible via Multidict
https://multidict.net/multidict/?sl=gv&tl=en&dict=MxCorp&word=snaue
Very easy, since the URL
https://manxcorpus.com/?q=snaue
will search for the Manx word “snaue”.
However, I have bother doing things English searches, since I can’t find any URL which will search ManxCorpus for the English word “crawl”, for example. What I would hope for is something like
https://manxcorpus.com/?q=crawl&lang=en
It would useful to have some clear marking when documents are the original text (as in most of the eighteenth century religious texts which are translations), and when they are new translations from the Manx (Ned Beg for example).
If I search for 'lumlane
' and nothing appears, suggest lum-lane
(2 results)
Fix the commandline in publish.yml
Make the file executable
The main aim is to allow anyone to deploy a fork of the site to DigitalOcean
jq
installversion:'2.7.102.0',appVersion:'5.2.6',uq:'20210325071104',viewPointVersion:'4.28.18017.37845',lastModificationDateString:'2021-03-25-19-11-04'
COM:"Calf of Man Bird Observatory Report",
CEC:"Camp Echo",
CHU:"Camp Humor",
KCZ:"Camp Zeitung",
CTG:"Castletown Gazette",
DSC:"Das Schleierlicht",
MGN:"German Gymnastics Association",
GFL:"Green Final",
HNS:"Holiday News",
IDT:"Isle of Man Daily Times",
IME:"Isle of Man Examiner",
IMT:"Isle of Man Times",
WAC:"Isle of Man Weekly Advertising Circular",
IWG:"Isle of Man Weekly Gazette",
JMM:"Journal of The Manx Museum",
LAE:"Lager Echo",DLA:"Lager Laterne",
LAZ:"Lager Zeitung",LAU:"Lager-Ulk",
MNA:"Manks Advertiser",
MNM:"Manks Mercury",
TMC:"Manx Cat",
TFP:"Manx Free Press",
MNB:"Manx Liberal",
MMNT:"Manx Museum and National Trust Report",
MNP:"Manx Patriot",
MRS:"Manx Rising Sun",
MNS:"Manx Star",
TMS:"Manx Sun",
TMN:"Manxman",
MDP:"Mona Daily Programme",
MNH:"Mona's Herald",
PCG:"Peel City Guardian",
PSL:"Peel Sentinel",
QUT:"Quousque Tandem",
RCE:"Ramsey Chronicle",
RYC:"Ramsey Courier",
RWN:"Ramsey Weekly News",
TRS:"Rising Sun",
TTS:"TT Special",
UNU:"Unter Uns",
WER:"Werden"
A newspaper image seems to be in the format: https://www.imuseum.im/Olive/APA/IsleofMan/get/image.ashx?kind=block&href=MNH%2F1833%2F10%2F25&id=Ar0010000&ext=.png
&id=Ar0080001&ext=.png
Where MNH%2F1833%2F10%2F25
= MNH/1833/10/25
==Problems (currently)==
Unknown if these are solvable
sk
link which is unique to each page
sk
seems to be generated client-side, so we can hopefully reverse-engineer itJH would prefer for notes to not be visible while reading the corpus.
I agree. It would be sensible to hide notes by default and toggle them if [1]
is clicked.
If a linkage between notes and the text can't be found, then the notes should be shown as-is.
We only have 1 trivial warning: might as well get to 0 and keep it that way
'Dhelby' returns results.
'Clague' seems case insensitive
TODO: Add Notes to the main page of the document
Originally posted by @david-allison-1 in david-allison/manx-search-data#475 (comment)
aall
-> results
aall
-> no results
Helps merchandise features
Takes up RAM, more finnicky and verbose than using C#, and we no longer need it now we've moved to Lucene
Maybe make the help more prominent if nobody has searched, or provide example queries
Requested by Max
Done in the backend - just needs a UI
Initially: en + gv
Then: Different editions (such as in Carrey yn Pheccagh): Two English sources and 1 Manx
We have a lot of corpus information from the Newspaper archives on https://www.imuseum.im/newspapers/.
Ideal:
manifest.json.txt
to a data structure. Add an arbitrary validation check on manx-search-data
This is extremely inefficient and low-hanging fruit, can lead to timeouts when searching for a wildcard on a given work
TODO: Add iMuseum link
Originally posted by @david-allison-1 in david-allison/manx-search-data#475 (comment)
Now we have a domain, we should focus more on SEO. Lots of low hanging fruit here:
I can't see how to initiate a new issue - but some Ned Beg pieces have '???' in the text - indicating that I wasn't sure of the translation and wanted to discuss with other contributors. When I use ??? in the search field I find other items, and the markers I was looking for are not highlighted.
Originally posted by @robteare in david-allison/manx-search-data#322 (comment)
This needs a test (may be hard if we don't have dictionaries on disk)
Originally posted by @david-allison in #149 (comment)
Request by Rob: Just use the slider in the Advanced Options
Requested by Max
Currently, manifest.json.txt
refers to a single work.
In some cases, we'll have works translated by two different people (eg: Psalms1610).
These may want to be in the same CSV for alignment and comparison (?).
If this is the case, then we'll need to expand manifest.json.txt
Currently: dotnet publish --framework net8.0
Highlighting likely won't be perfect, but we can make it a lot better:
https://github.com/david-allison-1/manx-corpus-search/runs/2550569740?check_suite_focus=true
2021-05-10T23:29:10.8274884Z Determining projects to restore...
2021-05-10T23:29:11.7157857Z Restored /var/www/manx-corpus/CorpusSearch/CorpusSearch.csproj (in 472 ms).
2021-05-10T23:29:12.5325316Z Restored /var/www/manx-corpus/CorpusSearch.Test/CorpusSearch.Test.csproj (in 800 ms).
2021-05-10T23:29:24.2855022Z CorpusSearch -> /var/www/manx-corpus/CorpusSearch/bin/Debug/net5.0/CorpusSearch.dll
2021-05-10T23:29:24.2866213Z CorpusSearch -> /var/www/manx-corpus/CorpusSearch/bin/Debug/net5.0/CorpusSearch.Views.dll
2021-05-10T23:29:24.4409503Z v15.7.0
2021-05-10T23:29:24.4426333Z Restoring dependencies using 'npm'. This may take several minutes...
2021-05-10T23:53:16.3627646Z ##[error]The operation was canceled.
Unsure what's happening here. Works when I ssh into the machine
I like it that the Search throws up Cregeen entry and dictionary entry first. Presumably dictionary here means Phil Kelly’s dictionary, while “Dictionary” top right main page = Cregeen Aa-orderit. It would be helpful to make this clear.
From Max
Using the same terminology as http://bible.learnmanx.com/
If set, send the query with *query*
Initially on YouTube
Requested by Max
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.