approach0 / search-engine Goto Github PK
View Code? Open in Web Editor NEWA math-aware search engine.
Home Page: http://approach0.xyz
License: MIT License
A math-aware search engine.
Home Page: http://approach0.xyz
License: MIT License
I really like the search, but right now it's way to broad. It gives too many useless results.
One thing that would greatly help is a "forced title" setting to force the title to match the keywords and expressions.
What's nice about the loose math expression search is that variable names don't matter (for example
Broken Mathjax of entry 8 copy-pasted for reference:
...Put it this way \int {\frac{x}{{\sqrt { ... t {\frac{{2ax + b}}{{\sqrt {a{x^2} + bx + c} }}dx} - \frac{b}{{2a}}\int {\frac{{dx}}{{\sqrt {a{x^2} + bx + c} }}}
\displaystyle \frac{c}{a} - \frac{{{b^2}}}{{4{a^2}}} < 0 = - ... 2}.
1- https://approach0.xyz/search/?q=test&p=1%22%3E%3Csvg/onload=alert(/test/)%3E
2- https://approach0.xyz/search/?q=%24test%22%3E%3Csvg/onload=alert(/test/)%3E&p=1
fix: sanitize input ( try htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
and check here ) from params q
and p
in L149-150 index.php.
1- https://approach0.xyz/demo/search-relay.php?p=1&q[]=test
2- https://approach0.xyz/demo/?q[]=test&p=1
1 is caused by strlen($qry_str)
in search-relay.php L45, to fix it: use is_scalar
when checking $_GET['q']
.
MathJax v3 is coming out, I am thinking to update the current render library A0 is using. This will give a better search experience since it takes too much time to be rendered for a page of search results currently. I feel v3 is much more faster than previous version. Although KaTeX is also fast, I found it cannot handle many content as robust as MathJax when I was choosing between them as A0 search result render library.
In the new infrastructure based on Docker Swarm, the SSH connection randomly breaks and it causes search daemons to restart. Anyone knows how to boil down the problem further?
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,2]<stdout>: [1,2]<stdout>:[ 0] (in memo) prefix/VAR/BASE (pf=138766737, ipf=1.86)
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,2]<stdout>: [1,2]<stdout>:[ 1] (in memo) prefix/NUM/SUPSCRIPT (pf=6633044, ipf=4.90)
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,2]<stdout>: [1,2]<stdout>:[ 2] (on disk) prefix/VAR/BASE/HANGER (pf=9006427, ipf=4.60)
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,2]<stdout>: [1,2]<stdout>:[ 3] (in memo) prefix/NUM/SUPSCRIPT/HANGER (pf=138766395, ipf=1.86)
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,2]<stdout>: [1,2]<stdout>:[ 4] (on disk) prefix/VAR/BASE/HANGER/SIGN (pf=6569091, ipf=4.91)
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,2]<stdout>: [1,2]<stdout>:[ 5] (on disk) prefix/VAR/BASE/HANGER/TIMES (pf=8928046, ipf=4.60)
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,2]<stdout>: [1,2]<stdout>:[ 6] (on disk) prefix/NUM/SUPSCRIPT/HANGER/SIGN (pf=15976531, ipf=4.02)
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,2]<stdout>: [ 7] (on disk) prefix/VAR/BASE/HANGER/SIGN/ADD (pf=30797021, ipf=3.37)
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,2]<stdout>: [ 8] (on disk) prefix/VAR/BASE/HANGER/TIMES/SIGN (pf=3410477, ipf=5.57)
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,2]<stdout>: [ 9] (on disk) prefix/NUM/SUPSCRIPT/HANGER/SIGN/ADD (pf=11096355, ipf=4.39)
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,2]<stdout>: [ 10] (on disk) prefix/VAR/BASE/HANGER/TIMES/SIGN/ADD (pf=9205922, ipf=4.57)
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,1]<stdout>:merge time cost: 2411 msec.
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,0]<stdout>:merge time cost: 3240 msec.
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,1]<stdout>:Query handle cost: 3390 msec.
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,1]<stdout>:
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,3]<stdout>:merge time cost: 4611 msec.
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,2]<stdout>:merge time cost: 13286 msec.
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,3]<stdout>:Query handle cost: 13932 msec.
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,3]<stdout>:
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,2]<stdout>:Query handle cost: 13937 msec.
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,2]<stdout>:
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,0]<stdout>:Query handle cost: 13984 msec.
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,0]<stdout>:
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | Connection to blue-shard3 closed by remote host.
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | --------------------------------------------------------------------------
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | ORTE was unable to reliably start one or more daemons.
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | This usually is caused by:
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 |
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | * not finding the required libraries and/or binaries on
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | one or more nodes. Please check your PATH and LD_LIBRARY_PATH
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | settings, or configure OMPI with --enable-orterun-prefix-by-default
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 |
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | * lack of authority to execute on one or more specified nodes.
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | Please verify your allocation and authorities.
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 |
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | Please check with your sys admin to determine the correct location to use.
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 |
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | * compilation of the orted with dynamic libraries when static are required
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | (e.g., on Cray). Please check your configure cmd line and consider using
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | one of the contrib/platform definitions for your system type.
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 |
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | * an inability to create a connection back to mpirun due to a
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | lack of common network interfaces and/or no route found between
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | them. Please check network connectivity (including firewalls
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | and network routing requirements).
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | --------------------------------------------------------------------------
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,2]<stdout>:node[2] closing index...
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,1]<stdout>:node[1] closing index...
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,3]<stdout>:node[3] closing index...
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,0]<stdout>:
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,0]<stdout>:shutdown httpd...
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,0]<stdout>:node[0] closing index...
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | [1,0]<stderr>:Caught signal: 15
blue_mpirun.1.alj956wxat8g@calabash-admin-vJesW0sEF6 | + set +x
I have built and pulled the forked version of Mathquill from @TheSil, see: 8c51528
Now the \lVert, \lvert, \langle and \rangle can be successfully entered.
And \dots is correctly handled after updating KaTeX.
Here I would welcome anyone to keep adding things into the list and send a pull request to the forked Mathquill.
@TheSil Thank you again for your help. Feel free to keep working on your forked repo and expand the "subs" there, only if you find other things are missing in that list. I would be more than willing to build your source and merge it into Approach0 online demo.
To help us identify which part is the bottleneck when we have a slow query case.
The webpage says
Opps! Server is down right now, but will be back shortly. (return code #101)
@t-k- Dear sir, thanks a lot for your contribution. For the notes aren’t very detailed, I wonder if I want to search formula form my local database, how can I use your code?
I will be grateful for your reply.
Currently we are using dir_search_podfs()
(podfs short for post-order DFS) function to search directory. Need to add a function (maybe named dir_search_bfs) to do BFS on directories. This will improve search results' effectiveness because we will more likely get more results within a small depth delta number.
relevant files: $PROJECT/dir-util/dir-util.[ch]
I have not used AoPS crawler for several months. However, (since I am playing with Docker swarm) today I run into AoPS crawler issue:
./crawler-artofproblemsolving.com.py -n 0 -o 3650 -c 3
WARNING: Couldn't write lextab module <module 'slimit.lextab' from '/usr/local/lib/python3.7/dist-packages/slimit/lextab.py'>. Won't overwrite existing lextab module
WARNING: yacc table file version is out of date
WARNING: Token 'BLOCK_COMMENT' defined, but not used
WARNING: Token 'CLASS' defined, but not used
WARNING: Token 'CONST' defined, but not used
WARNING: Token 'ENUM' defined, but not used
WARNING: Token 'EXPORT' defined, but not used
WARNING: Token 'EXTENDS' defined, but not used
WARNING: Token 'IMPORT' defined, but not used
WARNING: Token 'LINE_COMMENT' defined, but not used
WARNING: Token 'LINE_TERMINATOR' defined, but not used
WARNING: Token 'SUPER' defined, but not used
WARNING: There are 10 unused tokens
WARNING: Couldn't create <module 'slimit.yacctab' from '/usr/local/lib/python3.7/dist-packages/slimit/yacctab.py'>. Won't overwrite existing tabmodule
[curl] /community/
Traceback (most recent call last):
File "./crawler-artofproblemsolving.com.py", line 483, in <module>
main(sys.argv)
File "./crawler-artofproblemsolving.com.py", line 466, in main
extra_opt)
File "./crawler-artofproblemsolving.com.py", line 350, in crawl_category_topics
for category, topic, e in list_category_topics(category, newest, oldest, c):
File "./crawler-artofproblemsolving.com.py", line 264, in list_category_topics
session = parsed['AoPS.session']
TypeError: 'NoneType' object is not subscriptable
It looks like AoPS has changed its API, It also could be my network is blocking AoPS, I have not tested yet.
@TheSil IF you get time, please help me and see if you can reproduce this issue, thanks!
Dear author, excuse me, I have two questions :
Excuse me, if I want to parse the operator tree generated by tex-parser to my python program to do other applications, how should I do? Maybe I can use the result of preorder output and inorder output to construct the original tree in python, but how should I revise the code to get the inorder output? Thanks for your help.
I thought I would create an issue for this. Sometimes it would be nice to filter which sources I want to display (some other MSE users expressed this request).
In php script of server side, log client IP for later traffic analysis (may be future will add IP block function based on this).
relevant files: demo/web/search-relay.php
Without HSTS or HSTS being preloaded a user would need to first connect to the website to retrieve the key then to connect again, something that wont happen if the HSTS header is preloaded. Right now there is a problem to get it added in the HSTS Preload list (which is included in all major browsers, from Chromium to Firefox)
https://securityheaders.io/?q=approach0.xyz&followRedirects=on
https://hstspreload.org/?domain=approach0.xyz
Error 1: No HSTS header is present on the response.
The website doesn't have HSTS to begin with :]
Error 2: Too many redirects: There are more than 3 redirects starting from http://approach0.xyz
.
Error 3: Insecure redirect: http://approach0.xyz
redirects to an insecure page on redirect #2: http://approach0.xyz/search
Error 4: Insecure redirect https://approach0.xyz
redirects to an insecure page: http://approach0.xyz/search
Even solving these other redirect errors will result in faster speeds.
The search engine can't associate e.g. this:
\arcsin{\frac{a}{b}}
with this:
\arcsin{\left(\frac{a}{b}\right)}
基础库中引用的C库可以使用简单函数名替换。例如:
printf 的使用,如果在库中频繁使用printf功能,则建议使用类似如下的方式替换,
static inline int lib_printf(const char *fmt, ...) { .... printf( some string ); .... }
这样有个好处,库被别人引用时,可以定制是否需要输出。
Hello! I was trying to access http://approach0.xyz/ecir2019/ to reproduce results for the ecir2019 ntcir math formula retrieval results.
When I was trying to access the above webpage, I got a "502 bad gateway" error.
Can I download the resources elsewhere, especially the corpus.txt
file, to be able to run the indexer?
Thanks!
I'd recommend having some ability to force searches to return only pages where certain words are found. For instance, when I searched for
Dieudonne absolutely convergent series
I only want to see results that have the word "dieudonne" in them because I'm specifically looking for an explanation of one of his theorems--but I get a lot of search results that don't have him. Having good support for exact and approximate matches, "and", "or", and other sorts of advanced search features would be helpful.
Thanks for developing this!
Best wishes,
Adam
Visiting http:\approach.xyz doesn't redirect to https:\approach0.xyz\search instead it downloads a file called download with the following content (Chrome)
<?php
header("HTTP/1.1 301 Moved Permanently");
header("Location: https://approach0.xyz/search/");
?>
While in IE it just displays the page with plaint text from above.
Even if we are providing users mathquill math edit box, users who do not know TeX still need a "table" to look up symbols/functions, and we will make this symbols clickable:
If user chooses a symbol/function, he click the item, and send that symbol/function to his query box.
relevant files: demo/web/index.html
With Debian debian:buster
image. A0 will eat string following some utf-8 characters.
Example:
# docker run -it -p 8921:8921 -v `pwd`/../indexerd/tmp:/mnt/index a0 searchd.out -i /mnt/index -c0 -C0
$ curl -X POST http://localhost:8921/search --header "Content-Type: application/json" -d '{"ip":"127.0.0.1","page":1,"kw":[{"type":"tex","str":"1+2+\u2026+100"}]}'
Output:
[inverted lists]
[0] (level 2) 9.90 `1+2+' (TeX, upp=9.90, th=0.77)
[ 0] (on disk) prefix/ONE/SIGN (pf=4, ipf=4.90)
[ 1] (on disk) prefix/NUM/SIGN (pf=8, ipf=4.20)
[ 2] (on disk) prefix/ONE/SIGN/ADD (pf=4, ipf=4.90)
[ 3] (on disk) prefix/NUM/SIGN/ADD (pf=8, ipf=4.20)
Here the TeX string is wrong, it is expected to be `1+2+…+100' which is the actual behavior outside container.
Currently only in-memory term posting list and Indri index are compressed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.