Giter Club home page Giter Club logo

Comments (15)

goodlux avatar goodlux commented on September 17, 2024 1

I'm running it on this directory:
https://github.com/pytorch/pytorch/tree/master/caffe2/operators
I can't tell which file it is choking on though. Also, trying to use different Doxygen directives from
"exhaleDoxygenStdin", but I'm not sure what format put these in.

from exhale.

goodlux avatar goodlux commented on September 17, 2024 1

Hi @svenevs! Circling back to this; I wanted to reach out to you about some related questions, but this probably isn't the best thread. How can I contact you directly? My email: [email protected]

Thanks!

from exhale.

ddemidov avatar ddemidov commented on September 17, 2024 1

Could the issue be solved by changing how the file names are generated? I got the same error when generating docs for a heavily templated library. Changing node file names from mangled name to SHA1 of the mangled names seems to solve the issue for me:

diff --git a/exhale/graph.py b/exhale/graph.py
index 938679b..7d823f8 100644
--- a/exhale/graph.py
+++ b/exhale/graph.py
@@ -18,6 +18,7 @@ import sys
 import codecs
 import itertools
 import textwrap
+import hashlib
 from bs4 import BeautifulSoup
 
 try:
@@ -2021,7 +2022,10 @@ class ExhaleRoot(object):
         # create the file and link names
         node.file_name = os.path.join(
             self.root_directory,
-            "{kind}_{name}.rst".format(kind=node.kind, name=html_safe_name)
+            "{sha}.rst".format(sha=hashlib.sha1(
+                "{kind}_{name}".format(
+                    kind=node.kind, name=html_safe_name).encode()
+                ).hexdigest())
         )
         node.link_name = "{kind}_{name}".format(
             kind=utils.qualifyKind(node.kind).lower(),

from exhale.

ddemidov avatar ddemidov commented on September 17, 2024 1

The project in question is https://github.com/ddemidov/amgcl; I've just setup a branch that tests exhale here: https://github.com/ddemidov/amgcl/tree/docs-exhale, ddemidov/amgcl@1d399ab.

from exhale.

ddemidov avatar ddemidov commented on September 17, 2024 1

Thank you for the pointers! I can confirm that #35 works, and that letting exhale run doxygen makes things better (see ddemidov/amgcl@954bcbf).

from exhale.

svenevs avatar svenevs commented on September 17, 2024

Oh God. It seems that somehow the name of the file is set as the documentation of the function?

It appears to be for a function, specifically some kind of operator in namespace caffe2? Does that seem to match anything? If so, please post the c++ code (including method documentation).

from exhale.

svenevs avatar svenevs commented on September 17, 2024

Like you may be able to find the documentation with selective grepping. Most of the underscores there are from spaces, so maybe grep -ri "Our operator expects that" include/ or something. Just don't make the phrase too long because if the docstrings you search for is split across lines grep won't find it.

from exhale.

goodlux avatar goodlux commented on September 17, 2024

Also, I'm running this on a mac, so the long filename might be an OS dependent issue?

from exhale.

svenevs avatar svenevs commented on September 17, 2024

Technically this is operating system dependent, but every filesystem I know of has a maximum file length encoded, typically 1024 or 2048 characters.

  1. The problem comes from here: https://github.com/pytorch/pytorch/blob/1ca6e77615ac0020df97ca237b1af542ad1d4bf7/caffe2/operators/percentile_op.cc#L85-L106

  2. The circumstances of your code base may not be amenable to Exhale. Doxygen will process all of these files, so both .h files and .cc files. There are a few things at play here:

    • Doxygen has trouble with macros that are not called with a semicolon. So even simple macros get skipped, and you'll (very very unfortunately) need to predefine all of the metaprogramming you are doing.
    • What makes this hard is I don't really know which version of OPERATOR_SCHEMA is going to be used. And even if you did, enumerating all of the expansions for every item would probably be an exercise in futility.

To be honest, I don't think Doxygen can handle your framework, meaning Exhale / Breathe cannot either. You can add a line GENERATE_HTML = YES to exhaleDoxygenStdin. Next to the xml/ directory there should be an html directory, you should open that up and see what doxygen is doing, but most likely those docs are also completely broken.

Some possible workarounds:

  1. At the expense of having no documentation, skip anything that is breaking. Exhale inserts two compile-time directives to help with this:

    exhale/exhale/configs.py

    Lines 898 to 899 in d5a6adb

    PREDEFINED = DOXYGEN_DOCUMENTATION_BUILD
    PREDEFINED += DOXYGEN_SHOULD_SKIP_THIS

    There's no anchor for it, but scroll to the PREDEFINED section at the bottom of the "mastering" doxygen docs. The idea is to #ifdef away code that breaks docs compilation (e.g., forward declarations).

  2. That won't really save you, because in the end the metaprogramming here with raw string literal docs is great for python, but not for C++. Specifically, Doxygen will never pick those up as documentation strings, since docstrings have to be comments starting before the declaration (/**, /!, or ///). Changing the metaprogramming for adding documentation is technically possible, but would be a huge pain in the ass and probably break all of the python documentation (since even if you got it working, you won't be able to get that into raw string literals now for the python docs).

    So basically the main friction here is that the current documentation is setup for python help strings, and as a consequence no C++ documentation parser will succeed in extracting those. Traditionally, you have all of the docstrings in .h files. So you could add to exhaleDoxygenStdin something like EXCLUDE_PATTERNS = *.cc *.cu to make it only process the .h files.

    But then you'd need to add documentation to all of the .h files.

I hope that wasn't too much information. The main problem you have here is the tension between "documentability" in C++ vs Python help strings. Solving the C++ problem, unfortunately, will result in a large amount of duplicated documentation, one for the C++ docstring, and one for Python. I think that will ultimately be a recipe for disaster.

Although I hate to discourage you from using Exhale,

  1. Metaprogramming and Doxygen is hard to get right, and the way I create filenames creates problems for SFINAE stuff. I need to use proper C++ name mangling for the filenames, or else SFINAE templates will also generate names that are too long :/
  2. Where very large projects are concerned, Breathe will actually be a bottleneck for you (especially on ReadTheDocs). There's an extensive discussion about that on breathe/#315, but the gist: the breathe parser needs to be updated because of memory leaks with minidom. I've actually started work on updating it, but it's a beast of an update that I won't be able to fix for quite some time.

All that said, I'm happy to help you hack through things if you want to keep going with it! Feel free to ask more questions / clarifications on the above!

from exhale.

goodlux avatar goodlux commented on September 17, 2024

This is definitely not too much info, all very helpful. There's a lot of complexity and history in this code, but you definitely got to the heart of the problem -- passing the docstrings through C++ code, while at the same time still being able to extract the information out. That is the question I'm really trying to resolve...or maybe it is actually two questions.

  1. How can you document a function completely outside of the c++ codebase (say in markdown or XML) then import that information into c++ and have it as part of the c++ documentation.
  2. ...And then...have that same documentation pass through to python (pybind) for code completion and docstring on the command line as well as python docs.

There hasn't been a solid effort yet to document these newer features...it is still in a very formative. There is a version of these docs in Doxygen already

https://caffe2.ai/doxygen-c/html/classes.html

But it isn't the most user-friendly interface.

I was hoping to use the Breathe/Exhale for the parts of the c++ code that weren't passing dynamic code around though marcos...the major structural parts, or parts that wouldn't be implemented in python.

I like the idea of using the PREDEFINED forward declarations to block off much of the code and start by documenting just the most important well-defined parts. It looks great when I run it on just single input file (pool_op.cc), and I'd like to get to the point where there is that kind of clarity across the project. Also using just the .h files sounds promising. but I wasn't sure...I think there might be an issue with lack of maintainability there...will people actually keep up the header files if they are working in .cc?

As far as hacking through this, I did manage to get a bit further, past percentile_op.cc and a few others like it, but then some different issues popped up. I think I might start by limiting the code that is ingested into Doxygen, then adding in slowly.

So I will definitely have some more questions as I go...Thank you!

from exhale.

svenevs avatar svenevs commented on September 17, 2024

Hi @ddemidov,

Thanks for resurfacing this, I need to fix this ASAP. Is your project public? I want to keep human friendly filenames where possible, but for simplicity templates will get special cased because the current scheme is absurdly naive. Doxygen already produces a unique hash, so I want to use that.

I'd like to test using your code if possible to make sure the intended fix actually works before doing a release.

from exhale.

svenevs avatar svenevs commented on September 17, 2024

Excellent, thank you very much! I should be able to solve this today once and for all. It keeps falling off my radar and shouldn't because the fixes won't be hard.

Edit: ran into some snags that probably aren't worth solving. I'm going to think on it tonight but will probably just abandon human friendly names. Things like classes and functions are straightforward to link to with rst links (:class:, :func:). I will probably just use the doxygen refid for file names for everything except files, directories, and namespaces since exhale does not use the breathe counterpart for those (and therefore no official crossref targets exist).

from exhale.

svenevs avatar svenevs commented on September 17, 2024

@ddemidov see #35 for the proposed fix, the reason it is not as simple as the sha is for reasons you would never have known about (what I do / do not use from Breathe, and what is actually linkable via sphinx).

I wanted to mention that while I was able to get past the error for you on your project, there are some other things going on with your code.

  1. What you told exhale in conf.py about "doxygenStripFromPath" is not what you told Doxygen (since you are running it on your own via the Doxyfile). This results in things like directories having the absolute path, which is probably not what you want (especially on RTD...).

    I suggest you take a look at whether or not Exhale running Doxygen for you would work out. It seems like you are testing out potential doc system changes, using "exhaleDoxygenStdin" is the encouraged approach. It also already adds the \rst and \endrst to ALIASES for you automatically 😉

  2. I must will point that unfortunately the state of affairs with Exhale and templates (because of Breathe) is really frustrating. You will get over 1000 warnings about the same thing. It's "safe" to ignore in that things will get documented at show up somewhere, but they may have weird template <> template <...> things going on because of Breathe. I am working with Breathe to fix this, but TBH I may just abandon it and solve it differently.

    So in reality, this means that actual documentation warnings that you would ordinarily want to squash will be harder to find. I wish this were not the case, but it is what it is at least for now 😢

  3. Function overloads. I want to solve that too, as I just need to parse parameters. Your framework is large / I couldn't tell if this affects you, but I will fix that before the 0.2.0 release (which has the filename too long fixes as well). Just wanted to note that this may come up and it is a known problem to look out for (#13)

Basically, Exhale was able to create a tree view for your project, but the file hierarchy was wrong (mis-coordination between Doxygen and Exhale), and it is unclear to me whether or not the treeView was actually correct. While I am hopeful that Exhale will work for your project, it may be too "impolished" at this point in time. Please feel free to raise additional issues with concrete examples of things Exhale messes up with your framework. I cannot guarantee how quickly I will be able to solve things depending on the problem, but maybe in the future Exhale might be more polished / ready for heavily templated code like yours.

from exhale.

svenevs avatar svenevs commented on September 17, 2024

Cool! My absurd test cases don't work on Windows because of their antiquated filesystem. I'm gonna have to introduce some added trickery to fix that, but I'm hoping to finish it before I sleep tonight...

from exhale.

svenevs avatar svenevs commented on September 17, 2024

@goodlux FYI this issue being closed is a win / lose for you. Win in the sense that PyTorch definitely needs the improved filename logic. Lose in the sense that the original issue readily identified that something was going seriously wrong with what Exhale thought the name of things was (the whole using the documentation as the name of the documented object thing) will no longer be readily obvious as of current master and imminent 0.2.0 release.

Just pointing it out so that you know there may now be "hidden bugs" because I fallback on the sha1 approach suggested here: #33 (comment)

from exhale.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.