ruby-docx / docx Goto Github PK
View Code? Open in Web Editor NEWa ruby library/gem for interacting with .docx files
License: MIT License
a ruby library/gem for interacting with .docx files
License: MIT License
Hello @chrahunt, some of my colleagues edit docx files by WPS Office. Output docx files succesfully processed by your gem, but not saved. There is an exeption:
/var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/file.rb:262:in `block in read': undefined method `read' for Zip::NullInputStream:Module (NoMethodError)
from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/entry.rb:483:in `get_input_stream'
from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/file.rb:230:in `get_input_stream'
from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/file.rb:262:in `read'
from /vagrant/lib/docx/document.rb:111:in `block (2 levels) in save'
from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/entry_set.rb:42:in `block in each'
from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/entry_set.rb:41:in `each'
from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/entry_set.rb:41:in `each'
from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/central_directory.rb:182:in `each'
from /vagrant/lib/docx/document.rb:104:in `block in save'
from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/output_stream.rb:53:in `open'
from /vagrant/lib/docx/document.rb:103:in `save'
Obviously, rubyzip gem processes that docx files different from docx files by MS Office.
Looks like there's an issue when replacing text ;
doc.bookmarks['abookmark'].insert_after("some text")
Following the example you supply this gives the error above.
gem 'nokogiri',">1.5">0.9"
gem 'rubyzip', "
rails 3.2.1
ruby 1.9.2
I wondered if this gem could be used to insert more complex elements into a document like tables.
Thanks
I'm trying to figure out how to replace the paragraphs with translated text. I can't quite figure out how the 'replace_entry' function works. Is that the best way to go about it? The documentation is pretty scarce, it would be awesome if someone could give me some pointers!
There is some way to replace the content of inspect?
If I did something like this
d = Docx::Document.open('example.docx')
d.paragraphs.each do |p|
p.node.inspect = p.node.inspect.gsub('Old text', 'Next text')
end
The Install paragraph of README.md has code for specifying version '~> 0.2.07'
but the master branch has been bumped to version 3.
Before I submit a pull request, is there any reason for this?
Additionally, I believe the require option :require => ["docx"]
is unnecessary as the gem's main file is identical to the gem name.
From bookmark.rb:
# Insert text before bookmarkStart node
def insert_text_before(text)
text_run = get_run_after
text_run.text = "#{text}#{text_run.text}"
end
# Insert text after bookmarkStart node
def insert_text_after(text)
text_run = get_run_before
text_run.text = "#{text_run.text}#{text}"
end
As you can see, insert_text_before does what insert_text_after should do, and vice versa.
Thanks
Is there support for .doc
also?
doc = Docx::Document.open("example.doc")
[7] pry(LambdaFunctions::LambdaHandler)> File.open("test.doc", 'wb') do |f|
[7] pry(LambdaFunctions::LambdaHandler)* f.write raw
[7] pry(LambdaFunctions::LambdaHandler)* end
=> 69632
[8] pry(LambdaFunctions::LambdaHandler)> doc = Docx::Document.open("test.doc")
Errno::ENOENT: No such file or directory - word/document.xml
from /Users/staguilar/.rvm/gems/ruby-2.5.3/gems/rubyzip-1.2.3/lib/zip/file.rb:361:in `get_entry'
I'm getting the following error when I try to open a .docx
file. Errno::ENOENT: No such file or directory - word/styles.xml
in the following line.
4] pry(#)> doc = Docx::Document.open("./tmp/511676831_1.docx")
Errno::ENOENT: No such file or directory - word/styles.xml
Here is the file I'm trying to open:
https://www.dropbox.com/s/guhjbcevwu8nrb7/511676831_1.docx?dl=0
Given I have uploaded two versions of a docx document to the web app
When I click "View Difference"
Then I should see reds and greens of text deleted and added between the documents.
Is that possible with this gem?
What would be the easiest way to delete a paragraph? The spec shows deleting the text content of a paragraph using #blank! but the paragraph still remains within the document. I'm thinking about a method like #delete! which would remove the entire w:p node.
I'm thinking about something within paragraph.rb like:
def delete!
self.node.remove
end
Dewayne
o-*
Im looking to output the contents of an array into a document line-by-line, but so far have not been able to figure it out. Everything so far is displayed on a single line.
docx
is the most popular gem which can manipulate docx
documents. However, I have some questions which is not described in the document.
docx
to read or write?I have the following code:
Gemfile
gem 'docx', '~> 0.2.07'
Rake task
require 'docx'
docx = Docx::Document.open(template_docx)
...
And I get the following error:
LoadError: cannot load such file -- zip
Am I doing something wrong?
I need to get the current state as an IO - ready to be uploaded. I can see the stream method when I look at the Document class but get the following error when I try and call it:
NoMethodError:
undefined method `stream' for #<Docx::Document:0x000055fa7bb472a8>
Example of what I am trying to do:
doc = Docx::Document.open(downloaded_file)
# make changes to doc.zip
stream = doc.stream
I'm trying to create a .docx template using rails but I'm getting problem in loading the 'docx' class.
NoMethodError in ClientsController#create
`undefined method `close' for nil:NilClass`
Controller
def templater
require "docx"
doc = Docx::Document.open("base.docx")
doc.paragraphs.each do |p|
p.each_text_run do |tr|
tr.substitute('_placeholder', 'teste')
end
end
doc.save('base-edit.docx')
end
Gemfile
gem 'docx', :require => ["docx"]
docx
gem version: generic gem docx (last version)It would be useful if there were a function to simply identify when Macros are present. This way this library could be used to help reject documents that are uploaded that contain macros, for security reasons.
This OWASP page discusses how to do this in Java:
https://www.owasp.org/index.php/Protect_FileUpload_Against_Malicious_File
Note this is not a request to support macros beyond being able to identify if a macro is present.
1/ My word document has not bookmarked. So how to write data to word?
2/ I have the word
How to keep the format when I get it?
1/ I want to write (append) data and then save a new file
2/ Keep format
Thanks!
docx
gem version: [e.g 0.5.0]Hi,
there is a rails engine called doccex - https://github.com/mustardseeddatabase/doccex.
Is this a fork of this library? And/or what are the different basic characteristics of your project?
Thankx
As discussed in #103, one bug led to another.
With the file referenced in that PR, a crash occurs when getting the file's text:
Source:
Docx::Document.open(file) do |doc|
return doc.to_s
end
Trace:
NoMethodError:
undefined method `xpath' for nil:NilClass
# /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:94:in `hyperlink_relationships'
# /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:88:in `hyperlinks'
# /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:48:in `document_properties'
# /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:189:in `parse_paragraph_from'
# /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:61:in `block in paragraphs'
# /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/nokogiri-1.11.3-x86_64-darwin/lib/nokogiri/xml/node_set.rb:239:in `block in each'
# /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/nokogiri-1.11.3-x86_64-darwin/lib/nokogiri/xml/node_set.rb:238:in `upto'
# /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/nokogiri-1.11.3-x86_64-darwin/lib/nokogiri/xml/node_set.rb:238:in `each'
# /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:61:in `map'
# /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:61:in `paragraphs'
# /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:110:in `to_s'
Hi, I am getting following error while trying to read empty docx file created using File.new("file.docx", "w")
command:
error: Zip::Error (File file.docx has zero size. Did you mean to pass the create flag?)
can anybody tell what is happening?
I am reading a Docx file saved as Blob field in Mysql database. The output from the Mysql table is in the form of a Binary String as extracted from "Event" of Logstash. I am able to write the binary string to a file and then read it using Docx. However, if i pass the data directly to Docx, it gives error.
Steps to reproduce the behavior or put a short code to reproduce the bug.
require 'docx'
# I WRITE THE BINARY STRING TO A DOCX FILE AND READ IT
File.binwrite('c:\path\filename.doc', event.get('Blob field'))
doc = Docx::Document.new('/path/to/your/docx/filename.docx')
#ERROR--THIS DOES NOT WORK
doc = Docx::Document.new('event.get('Blob field'))
# TRIED TO CONVERT THE DATA TO A STRINGIO, BUT DID NOT WORK
file_to_read=StringIO.New(event.get('Blob field'))
doc = Docx::Document.new(file_to_read)
## Expected behavior
Is there a way to pass stringIO directly to Docx or any other way around to circumvent writing the file to Disk and then reading it.
Sorry for the wrong Label
## Environment
- Ruby version: [e.g 2.7.1]
- `docx` gem version: [e.g 0.5.0]
- Windows
The version of the rubyzip
gem that is currently required is vulnerable to directory traversal attacks, see this issue. This also affects this gem.
Updating the rubyzip to version 1.2.1 does not break anything afaik, so it's no effort.
I have this create action to extract data from doc and docx files (if available) using the docx gem and the msworddoc-extractor gem
if @subject.save
if @subject.odoc.present?
@odoc_url = @subject.odoc.url
if File.extname(URI.parse(@odoc_url).path) == ".docx"
@subject.homework= ""
doc = Docx::Document.open(@odoc_url)
doc.paragraphs.each do |p|
@subject.homework = @subject.homework+p.to_html
end
else
MSWordDoc::Extractor.load(@odoc_url) do |doc|
@subject.homework= doc.whole_contents
end
end
@subject.save
end
now, doc files works fine.. My problem is with doc = Docx::Document.open(@odoc_url)
when i use the code on my local machine it works fine.. when i push into production i get an error Zip::Error: File s3.amazonaws.com/~~~ not found
I'm not really sure how to load the file to be accessible to the docx gem
is it possible to create a password protected document using this library?
Hello,
Targetting a bookmark that is located in the footer or the header of a specified .docx doesn't work for me. If I get those bookmarks out of these special document parts, they get normally replaced, but a bookmark located there is not found by my ruby on rails app.
Any idea ? Is it a feature to come ?
Thanks in advance for your answer.
I wrote this simple code:
require 'docx'
# Create a Docx::Document object for our existing docx file
doc = Docx::Document.open("a.docx")
# Retrieve and display paragraphs
doc.paragraphs.each do |p|
puts p
end
...and keep getting "undefined method `close' for nil:NilClass".
I am sure the file is there, and tried many different locations for the file, so I'm suspecting this gem is not maintained.
Can anyone guide me? Thank you
when I try to read the doc by doing
doc = Docx::Document.open("tmp/document.docx")
I get an error saying "NameError: uninitialized constant Zip::File"
I have gem 'zip' & gem 'rubyzip' both in my gemfile.
What can be the possible cause and solution for this?
I found a .docx file that appears to be completely valid (it opens in Word) but raises an error ENOENT when opening with this gem.
Docx::Document.new with this file: weird_docx.docx
The code looks for word/document.xml and word/document2.xml but not word/document22.xml, which is what's inside this doc.
Isn't Word just obnoxious?
Need to import picture inplace of bookmark
Upgrading gem from 0.4 to 0.6. Code no longer works since I call Document.open with a Pathname
see example
require 'docx'
pn = Pathname.new("/home/me/mydoc.docx")
doc = Docx::Document.new(pn)
=> undefined method `close' for nil:NilClass
N/A
document object could be opened with a Pathname as in version 0.4
docx
gem version: [0.6.0I am getting the following errors when using docx in my rails app. I have a rails model called Tag.
/Users/me/.rvm/gems/jruby-1.7.4@neo4jtest/gems/docx-0.2.03/lib/docx/elements/bookmark.rb:10 warning: already initialized constant TAG
/Users/me/.rvm/gems/jruby-1.7.4@neo4jtest/gems/docx-0.2.03/lib/docx/elements/text.rb:7 warning: already initialized constant TAG
/Users/me/.rvm/gems/jruby-1.7.4@neo4jtest/gems/docx-0.2.03/lib/docx/containers/text_run.rb:17 warning: already initialized constant TAG
/Users/me/.rvm/gems/jruby-1.7.4@neo4jtest/gems/docx-0.2.03/lib/docx/containers/paragraph.rb:12 warning: already initialized constant TAG
I am guessing this can be fixed with better use of namespaces.
Hello.
I'm trying to use the gem to make a simple edit to some paragraphs in a docx file. The processing goes well, but when attempting to call #save on the document object, I get a Zlib buffer error, with the following stack trace:
["ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/inflater.rb:44:in `inflate'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/inflater.rb:44:in `internal_produce_input'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/inflater.rb:15:in `sysread'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/input_stream.rb:82:in `sysread'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/ioextras/abstract_input_stream.rb:33:in `read'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/file.rb:264:in `block in read'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/entry.rb:501:in `get_input_stream'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/file.rb:232:in `get_input_stream'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/file.rb:264:in `read'", "ruby-2.3.4/gems/docx-0.3.0/lib/docx/document.rb:110:in `block (2 levels) in save'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/entry_set.rb:38:in `block in each'"]
Any ideas on what the problem may be?
Please see title .
When i used docx gem why met this problem ?
The following code:
require 'docx'
d = Docx::Document.open('111.doc')
d.each_paragraph do |p|
puts d
end
error:`get_entry': No such file or directory - word/document.xml (Errno::ENOENT)
Need to indicate exactly that i used root user to ran this program.
Dear all
I am new developer in rails.
Could i install this gem in rails 4 ?
I have put it in Gemfile. when i bundle install, i get the issues that my rubyzip is not compitible. this is my message error
In Gemfile:
docx (>= 0) ruby depends on
rubyzip (~> 0.9) ruby
roo (>= 0) ruby depends on
rubyzip (1.1.0)
Please let me know about my problem
Sory for my bad english.
Thanks
Some word documents I was parsing use Content Control fields to create forms like the following screenshot shows:
I cannot provide the example word doc I was using but I can possibly make an example if necessary.
The following example would not print the value "Approved" that was in the Content Control field above:
# Retrieve and display paragraphs as html
doc = Docx::Document.open('example.docx')
doc.paragraphs.each do |p|
puts p.to_html
end
However if I modified paragraph.rb to look like the following, the Content Control fields are parsed:
@node.xpath('w:r|w:hyperlink/w:r|w:sdt/w:sdtContent/w:r').map { |r_node| Containers::TextRun.new(r_node, @document_properties) }
Is this possible if I have the document password? If not, can someone point me to where/how I can add this functionality?
Hi!
There's a security vulnerability in nokogiri fixed in version 1.10.4. This gem requires ~>1.8, holding back upgrades of nokogiri. Could you release a version requiring the newer nokogiri, or just >1.8 open-endedly? Thanks!
Hello,
Thanks for making this gem!
I'm working on a web app which requires reading content from doc files, although docx gem. It just prints the content regardless of page, I'd like to retrieve content with respect to pages.
How can I achieve this with docx gem?
Thanks,
Ankit
Hi, I'm looking for a library that does a simple task:
Thanks in advance
Hi
I installed docx gem in my project, and I tried open a docx file using Docx::Document.open("test.docx"). Then I got an error - "Zip::ZipError: zip end of central directory signature not found". I knew it 's relative to my env, but I don't know how to solve this error. ps: I use ubuntu system.
Thx
Hi, is there anyway to set the color of the text that is inserted using insert_text_after?
If not, I can try to add it, but can someone give me a hint of where to look or generally what to do? (I don't know anything about docx format).
thanks
Joel
Is it possible to hook into the change tracking functionality with this gem?
I don't think there is a way to add a header and footer directly but if I open a document that already has a header and footer, how would I access it? I tried referencing a bookmark that I placed inside the header but it doesn't seem to be included in doc.bookmarks. Any ideas?
I'm trying to convert a docx
file into PDF. The process I thought about was as follows, convert the docx
file into an HTML file and from HTML into PDF. However, using this process the outcome wasn't what I expected.
testing.pdf
This is what it looks like after the process mentioned above. Here is a link to the origin docx
file
https://www.dropbox.com/s/f1klwguv4r9iyje/testing.docx?dl=0
I think word documents use XML so this might improve how documents are displayed if I saved the file from docx to xml and then into PDF(You might have better direction on this.)
So far I have doc = Docx::Document.open('testing.docx')
When I try to get the XML from the document I get nil
.
[61] pry(#<PDFProducer>)> doc.xml
=> nil
Can one get XML from the word document? Or am I wrong in my assumption that word documents use XML?
When I create a brand new Word online file and save (download) it to my PC and then try and open it with this library I get the following error:
No such file or directory - word/document.xml (Errno::ENOENT)
/var/task/vendor/bundle/ruby/2.5.0/gems/rubyzip-2.2.0/lib/zip/file.rb:397:in `get_entry'
/var/task/vendor/bundle/ruby/2.5.0/gems/rubyzip-2.2.0/lib/zip/file.rb:255:in `get_input_stream'
/var/task/vendor/bundle/ruby/2.5.0/gems/rubyzip-2.2.0/lib/zip/file.rb:287:in `read'
/var/task/vendor/bundle/ruby/2.5.0/gems/docx-0.4.0/lib/docx/document.rb:26:in `initialize'
/var/task/vendor/bundle/ruby/2.5.0/gems/docx-0.4.0/lib/docx/document.rb:50:in `new'
/var/task/vendor/bundle/ruby/2.5.0/gems/docx-0.4.0/lib/docx/document.rb:50:in `open'
/var/task/event.rb:87:in `stamp_docx_properties'
/var/task/event.rb:26:in `block in handle'
/var/task/event.rb:20:in `each'
/var/task/event.rb:20:in `handle'
Upon inspection of the docx file on my system it seems to have a word/document2.xml
file instead of a word/document.xml
file.
To replicate this blank docx file navigate to https://www.office.com/launch/word and create a blank document then save it to your local PC.
I am not sure why docx files created in word online have word/document2.xml
instead of word/document.xml
but could you please add support for this?
Hello! Since OpenOffice XML distinguishes into paragraphs and tables, but every table cell also appears as a paragraph: is there any way to access only those paragraphs that are not inside of a table (MWE where table cells are also appearing in the list of paragraphs)?
After numerous testing of a document that won't open when saved with this gem, I've been able to pinpoint that a docx containing an aligned image will break after saving.
Here is an example docx with just one aligned image:
www.docx
Here is the resulting docx when saved with this gem:
www-broken.docx
Hi, folks!
Is there a way to fetch the page number of docx file while reading them.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.