Comments (13)
It could potentially happen frequently, if you were adding thousands of slides for example.
The problem with cacheing values like that is that the XML is the actual authoritative source and the two could easily become out-of-sync. So there's no alternative to reading in the list every time I'm afraid. For example where would you store the cached list? A class has module lifetime so if you loaded more than one presentation on a single run it would be wrong. And an instance like Slide or even Presentation is just a proxy. You can easily have more than one Slide instance pointing to the same slide element subtree in the XML. So there's no reliable location for this sort of thing other than the XML.
Also, the fall-back algorithm (which is what requires sorting) would only be engaged when the max()+1
algorithm failed, which has taken these ten years to show up for the first time. So I don't expect we'd get any complaints :)
from python-pptx.
Let's keep it open @sasensi. We might add this in a later release and seems maybe easier for folks to find an open issue than a closed one if they're seeing this problem :)
from python-pptx.
having the same issue - can you share where did you update this code as i cant seem to get this error fixed.
In my case, I made sure to run the code posted here: #972 (comment) to patch the library before I use it.
And then the error was gone.
from python-pptx.
Hmm, interesting.
So this is slide-ids, right, which means there will usually be a modest number, say less that 100, and as many as 1000 would be extremely rare. So a linear-time algorithm is likely to be fast enough.
The max(sldIds) + 1
approach reduces the chance of using a deleted slide id. I vaguely remember there being a problem with that but can't remember it specifically. Perhaps someone was deleting slides and then a new slide was being assigned the same id and that caused some sort of problem.
So the max(sldIds) + 1
approach is probably preferable, but is fallible as we see here. So there could be a test and fallback to the next_available_id_gt_256
algorithm and that should solve this problem at least.
I'd say something like this would be a good approach and more efficient than the original:
used_ids = sorted(int(id_str) for id_str in self.xpath("./p:sldId/@id"))
id_count = len(used_ids)
for idx, n in enumerate(range(256, 256+id_count)):
if used_ids[idx] != n:
return n
return 256+id_count
case 1: all slide-ids are used in order
---------------------------------------
used_ids = [256, 257, 258, 259]
ns = [256, 257, 258, 259]
matches all, returns 256+4=260
case 2: there is a gap in slide-ids
-----------------------------------
used_ids = [256, 258, 259, 260]
ns = [256, 257, 258, 259]
mismatch on used_ids[1], returns n=257
It's not a lot more efficient, but at least is O(NlogN) instead of O(N^2).
from python-pptx.
Is the sort going to happen frequently, @scanny? In which case maintaining a sorted list might be cheaper.
from python-pptx.
Thank you for your insightful answers, seems like a combination of the 2 approaches is the ultimate solution 😄
So do you think that this can be fixed as part of the library in a soon future or should I better go with a monkey patch for now ?
As mentioned initially, I can also work on a PR if it helps ?
from python-pptx.
I think a monkey-patch is a great first step. The code would be the same as in a PR and it would be a cut-and-paste solution anyone who was facing this problem could add to their own code for an immediate solution.
A PR is probably not worth the trouble I expect. Although if you wanted to add the tests for it that might be worthwhile. Generally all a PR does for me is give me some idea of an approach that appears to work as a starting point for the actual implementation. To maintain the industrial-grade quality of the package any change needs to go in with full tests and documentation where required, and PRs generally lack those broader software engineering aspects.
from python-pptx.
Ok, thank you, so I went with this monkey patch that tries the current implementation and fallback to the old one (your improved version):
from pptx.oxml import CT_SlideIdList
from pptx.oxml.simpletypes import ST_SlideId
def _next_id_patched(self) -> int | None: # noqa: ANN001
id_str_lst = self.xpath("./p:sldId/@id")
next_id = max([255] + [int(id_str) for id_str in id_str_lst]) + 1
if ST_SlideId.validate(next_id):
return next_id
used_ids = sorted(int(id_str) for id_str in id_str_lst)
id_count = len(used_ids)
for idx, n in enumerate(range(256, 256 + id_count)):
if used_ids[idx] != n:
return n
return 256 + id_count
CT_SlideIdList._next_id = property(_next_id_patched) # noqa: SLF001
Do you prefer to keep the issue opened or closed ?
from python-pptx.
having the same issue - can you share where did you update this code as i cant seem to get this error fixed.
from python-pptx.
that works ! thanks.
from python-pptx.
Any forensics on this? Like an idea how these particular presentations get slides with large ids like this?
I'm vaguely suspecting some plugin that assigns ids starting from the maximum number downward as a simple-minded way to avoid collisions with existing slides.
Can you describe the provenance of PPTX files where you've noticed this behavior?
from python-pptx.
I'm advancing this to a shortlist item given it appears it's not as much of a fluke as originally suspected :)
from python-pptx.
Any forensics on this? Like an idea how these particular presentations get slides with large ids like this?
I'm vaguely suspecting some plugin that assigns ids starting from the maximum number downward as a simple-minded way to avoid collisions with existing slides.
Can you describe the provenance of PPTX files where you've noticed this behavior?
i spent an hr trying to read the ID's and found related reason is that if you have a lot of master slides then an old ppt can have indexes that go beyond the max limit. manual copy paste of slides from 1 ppt to another doesnt refresh the slide ID and you really need VBA code to reset the slide ID numbers. even if you only have 5 slides in the deck but 10 master slides the indexing gets screwed up. make sense?
from python-pptx.
Related Issues (20)
- Is it possible to add support for "/select,file:///..." URIs? HOT 3
- Convert PPTX slide to Image HOT 1
- How to embed an .h3d (Hyperworks output file) file? HOT 3
- Line char None values
- How to set shink in sharp when text box overflow in Python pptx HOT 1
- Fit shape to text?
- Hide a shape HOT 2
- MSO_AUTO_SIZE.TEXT_TO_FIT_SHAPE attribute does not take effect HOT 4
- Removing video results in corrupted file. HOT 6
- Create multiple slides with concurrent.futures HOT 1
- run function on click HOT 1
- How to access and create media objects HOT 7
- copying slide-masters HOT 1
- Make xlsxwriter an optional dependency HOT 4
- There is a bug for load image under the package "langchain.document_loaders import UnstructuredImageLoader" HOT 1
- Add OLE shapes
- auto_size not working
- notes_placeholder and notes_text_frame are None when a note exists HOT 1
- Minor Gridlines format settings in charts
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-pptx.