penrose / arxiv-miner Goto Github PK
View Code? Open in Web Editor NEWanalysis of arxiv equations and figures to guide Penrose development
analysis of arxiv equations and figures to guide Penrose development
heyo @dormaayan ! I noticed that we have a function to return a boolean if it's a math paper, based on finding the tag:
#Given a paper name, return whether it is a math paper or not
def isMathPaper(paperName):
return subjects[paperName].count("math.") > 0
Is there any reason to limit our set at this point? The categories are pretty fuzzy overall, maybe just look at all of them?
Hey @dormaayan ! You might be sleeping, but I have another question for you when you are back! I see that we have a number of pages file, and we read from there with csvreader:
npagesFile = "npages.csv"
but this was derived outside of the notebook (and I don't see where). Did you get this from the metadata via the arxiv API? E.g.: here is the arxiv_comment
:
{'affiliation': 'None',
'arxiv_comment': '4 pages, 1 figure, presented at the 37th International Symposium on\n Multiparticle Dynamics in Berkeley',
'arxiv_primary_category': {'scheme': 'http://arxiv.org/schemas/atom',
'term': 'nucl-th'},
Or did you derive it via some length metric of the actual LaTex (or something else)? I'm going to do the entire thing (on a per paper basis) in one schwoop so I won't have any summary csv file (and it wouldn't be feasible, given the number of papers!)
Is the following tex the only way that a figure can be represented?
thing = "\\begin{tikzpicture}"
I have used the same (with figure) in some papers, and I saw this commented out:
#thing = "\\begin{figure}"
@dormaayan are you only interested in the first tag? There are definitely figure tags in there. If it's the case that the figures (graphics) aren't included this would not represent the true number of figures in the paper, no?
Considering that now we have Stanford servers for our usage, we might want to ask Paul (who provided us what we have now) to get the full arXiv Including the actual figures, which are missing now
This might be in our interest in the near future explore also the figures themselves
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.