Giter Club home page Giter Club logo

open-xml-powertools's Introduction

[ARCHIVED] Open-XML-PowerTools

This repository is no longer maintained by Microsoft. It has been archived and can still be forked and cloned for use and continued developement.

If you're looking for a fork of this project that is actively maintained, try the following:

https://github.com/EricWhiteDev/Open-Xml-PowerTools

The Open XML PowerTools provides guidance and example code for programming with Open XML Documents (DOCX, XLSX, and PPTX). It is based on, and extends the functionality of the Open XML SDK.

It supports scenarios such as:

  • Splitting DOCX/PPTX files into multiple files.
  • Combining multiple DOCX/PPTX files into a single file.
  • Populating content in template DOCX files with data from XML.
  • High-fidelity conversion of DOCX to HTML/CSS.
  • High-fidelity conversion of HTML/CSS to DOCX.
  • Searching and replacing content in DOCX/PPTX using regular expressions.
  • Managing tracked-revisions, including detecting tracked revisions, and accepting tracked revisions.
  • Updating Charts in DOCX/PPTX files, including updating cached data, as well as the embedded XLSX.
  • Comparing two DOCX files, producing a DOCX with revision tracking markup, and enabling retrieving a list of revisions.
  • Retrieving metrics from DOCX files, including the hierarchy of styles used, the languages used, and the fonts used.
  • Writing XLSX files using far simpler code than directly writing the markup, including a streaming approach that enables writing XLSX files with millions of rows.
  • Extracting data (along with formatting) from spreadsheets.

Copyright (c) Microsoft Corporation 2012-2017 Licensed under the MIT License. See License in the project root for license information.

News

New Release! Version 4.4.

This version has a completely re-written WmlComparer.cs, which now supports nested tables and text boxes. WmlComparer.cs is a module that compares two DOCX files and produces a DOCX with revision tracking markup. It enables retrieving a list of revisions.

Open-Xml-PowerTools Content

There is a lot of content about Open-Xml-PowerTools at the Open-Xml-PowerTools Resource Center at OpenXmlDeveloper.org

See:

Build Instructions

Prerequisites:

  • Visual Studio 2017 Update 5 or .NET CLI toolchain

Build

With Visual Studio:

  • Open OpenXmlPowerTools.sln in Visual Studio
  • Rebuild the project
  • Build the solution. To validate the build, open the Test Explorer. Click Run All.
  • To run an example, set the example as the startup project, and press F5.

With .NET CLI toolchain:

  • Run dotnet build OpenXmlPowerTools.sln

Change Log

Version 4.3 : June 13, 2016

  • New WmlComparer module

Version 4.2 : December 11, 2015

  • New SmlDataRetriever module
  • New SmlCellFormatter module

Version 4.1.3 : November 2, 2015

  • DocumentAssembler: Fix bug associated with duplicate bookmarks.
  • DocumentAssembler: Enable processing of content controls / metadata in footer rows.
  • DocumentAssembler: Avoid processing content controls used for purposes other than the DocumentAssembler template, including page numbers in footers, etc.

Version 4.1.2 : October 31, 2015

  • HtmlToWmlConverter: Handle unknown elements by recursively processing descendants

Version 4.1.1 : October 21, 2015

  • Fix to AddTypes.ps1 to compile WmlToHtmlConverter.cs instead of HtmlConverter.cs
  • Fix to MettricsGetter.ps1 to correctly report whether a document contains tracked revisions
  • Added some unit tests for PresentationBuilder

Version 4.1.0 : September 27, 2015

  • New HtmlToWmlConverter module
  • HtmlConverter generates non breaking spaces as #00a0 unicode charater, not   entity.

Version 4.0.0 : August 6, 2015

  • New DocumentAssember module
  • New SpreadsheetWriter module
  • New Cmdlet: Complete-DocxTemplateFromXml
  • Fix DocumentBuilder: deal with headers / footers more rationally
  • Enhance DocumentBuilder: add option to discard headers / footers from section (but keep layout of section)
  • Fix RevisionAccepter: deal with w:moveTo immediately before a table
  • New test document library in the TestFiles directory
  • XUnit tests
  • Cleaned up build system
  • Build using the open source Open-Xml-SDK and the new System.IO.Packaging by default
  • Back port to .NET 3.5
  • Rename the PowerShell module to Open-Xml-PowerTools

Version 3.1.11 : June 30, 2015

  • Updated projects and solutions to build with the open source Open XML SDK and new System.IO.Packaging

Version 3.1.10 : June 14, 2015

  • Changed Out-Xlsx Cmdlet to C# implementation
  • Fix Add-DocxText

Version 3.1.09 : April 20, 2015

  • Fix OpenXmlRegex: PowerPoint 2007 and xml:space issues, causing 2007 to not open PPTX's

Version 3.1.08 : March 13, 2015

  • Added Out-Xlsx Cmdlet

Version 3.1.07 : February 9, 2015

  • Added Merge-Pptx Cmdlet
  • Added New-Pptx Cmdlet
  • Added New-PmlDocument
  • Fixed help for Merge-Docx
  • Don't throw duplicate attribute exception when running FormattingAssembler.AssembleFormatting twice on same document.

Version 3.1.06 : February 7, 2015

  • Added Expand-DocxFormatting Cmdlet
  • Cmdlets do not keep a handle to the current directory, preventing deletion of the directory.
  • Added additional tests to Test-OxPtCmdlets

Version 3.1.05 : January 29, 2015

  • Added GetListItemText_zh_CN.cs
  • Fixed GetListItemText_fr_FR.cs
  • Partially fixed GetListItemText_ru_RU.cs
  • Fixed GetListItemText_Default.cs
  • Added better support in ListItemRetriever.cs
  • Added FileUtils class in PtUtil.cs

Version 3.1.04 : December 17, 2014

  • Added Get-DocxMetrics Cmdlet
  • Added New-WmlDocument Cmdlet
  • Added MetricsGetter.cs module
  • Added MettricsGetter01.cs module, along with sample documents
  • Reworked Add-DocxText, new style of using it with New-WmlDocument

Version 3.1.03 : December 9, 2014

  • Added ChartUpdater.cs module
  • Added ChartUpdater01.cs module, along with sample documents
  • Added Test-OxPtCmdlets Cmdlet

Version 3.1.02 : December 1, 2014

  • Added Add-DocxText Cmdlet

Version 3.1.01 : November 23, 2014

  • Added Convert-DocxToHtml Cmdlet
  • Added Chinese and Hebrew sample documents
  • Cmdlets in this release Clear-DocxTrackedRevision Convert-DocxToHtml ConvertFrom-Base64 ConvertFrom-FlatOpc ConvertTo-Base64 ConvertTo-FlatOpc Get-OpenXmlValidationErrors Merge-Docx New-Docx Test-OpenXmlValid

Version 3.1.00 : November 13, 2014

  • Changed installation process - no longer requires compilation using Visual Studio
  • Added ConvertTo-FlatOpc Cmdlet
  • Added ConvertFrom-FlatOpc Cmdlet
  • Changed parameters for Test-OpenXmlValid, Get-OpenXmlValidationErrors
  • Removed the unnecessary 1/2 second sleep when doing Word automation in the New-Docx Cmdlet

Version 3.0.00 : October 29, 2014

  • New release of cmdlets that are written as 'Advanced Functions' instead of in C#.

Procedures for enhancing Open-Xml-PowerTools

There are a variety of things to do when adding a new CmdLet to Open-Xml-PowerTools:

  • Write the new CmdLet. Put it in the Cmdlets directory
  • Modify Open-Xml-PowerTools.psm1
    • Call the new Cmdlet script to make the function available
    • Modify Export-ModuleMember function to export the Cmdlet and any aliases
  • Update Readme.txt, describing the enhancement
  • Add a new test to Test-OpenXmlPowerToolsCmdlets.ps1

Procedures for enhancing the core C# modules

  • Modify the code
  • Write xUnit tests
  • Write an example if necessary
  • Run xUnit tests

open-xml-powertools's People

Contributors

bkreminski-moldev avatar brettbailey avatar ericwhitedev avatar flensrocker avatar jawn avatar kimsey0 avatar navaneethadev avatar phenx avatar sebastiaanlubbers avatar tarunchopra avatar thomasbarnekow avatar tomjebo avatar twsouthwick avatar zache avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

open-xml-powertools's Issues

DocumentAssembler: Repeat-Element doesn't find matching EndRepeat

I had an "In-Paragraph Repitition" like

<Repeat Select="./car"/><Content Select="@name" />, <EndRepeat />

and i splitted it up into a multiple paragraph-version (by pressing enter after at the end of the content controls). Now it looks like this:

<Repeat Select="./car"/>
<Content Select="@name" />
<EndRepeat />

now, for some reason, DocumentAssembler doesn't find the corresponding <EndRepeat /> element, even if the logic is correct from my point of view.

I wanted to attach the docx-file, but it was denied. So here's the screenshot.
matchingendrepeatnotfound

I already figured out that this situation could be "repaired" by inserting the content controls for Repeat and EndRepeat again. After doing so, I got the following elements:
matchingendrepeatnotfound2

Please note that optical the first and the second picture differ. In the second picuture there are small << and >> symbols in the blue content control markers. I guess they show if the content controll represents a complete paragraph!?

From point of view, both versions are logically the same and should behave equally.

This issue is one of the biggest pitfalls for template creators. I already stumbled over it many times. The worse thing with this issue is, that it seems to be closely related to #29 ,because the error is not reported as expected, but an invalid DOCX is created.

This is because in this case I (again) got the following invalid XML-result:

<w:p w:rsidRDefault='00F26444' w:rsidR='00F26444' w:rsidP='008C221B'>
  <w:p>
    <w:r>
      <w:rPr>
        <w:color w:val='FF0000' />
        <w:highlight w:val='yellow' />
      </w:rPr>
      <w:t>Repeat does not have matching EndRepeat</w:t>
    </w:r>
  </w:p>
</w:p>

(see the w:p in w:p)

PresentationBuilder fails for PPTX that contain MP4

From a forum post:

Please take a look at the code I got from PresentationBuilder::CopyRelatedMedia and the definition of DocumentFormat.OpenXml.Packaging.MediaDataPartType.

Since MediaDataPartType doesn't have mp4 in its enumeration, PowerTools GetMediadataPartTypeFromContentType uses the default value - MediaDataPartType.Wmv - as mp4 media type. That is why mp4 file cannot be played correctly.

I can play my mp4 in my original pptx. I converted my mp4 to .avi file and inserted it in my original pptx. Both .mp4 and .avi can play. After copied version generated, only .avi file can play, but .mp4 gives the error "Codec Unavailable".

From Eric White: This needs an adjustment to GetMediadataPartTypeFromContentType.

        if (temp.DataPart == null)
        {
            var ct = oldPart.ContentType;
            <strong>MediaDataPartType </strong>mdpt = <strong>GetMediadataPartTypeFromContentType</strong>(ct);
            MediaDataPart newPart = newContentPart.OpenXmlPackage.CreateMediaDataPart(mdpt);
            newPart.FeedData(oldPart.GetStream());

private static MediaDataPartType GetMediadataPartTypeFromContentType(string ct)
{
MediaDataPartType mdpt = MediaDataPartType.Wmv;

        if (ct == "audio/aiff")
            mdpt = MediaDataPartType.Aiff;
        else if (ct == "video/x-ms-asf-plugin")
            mdpt = MediaDataPartType.Asx;
        else if (ct == "video/avi")
            mdpt = MediaDataPartType.Avi;
        else if (ct == "audio/midi")
            mdpt = MediaDataPartType.Midi;
        else if (ct == "audio/mp3")
            mdpt = MediaDataPartType.Mp3;
        else if (ct == "audio/mpeg")
            mdpt = MediaDataPartType.MpegAudio;
        else if (ct == "audio/mpegurl")
            mdpt = MediaDataPartType.MpegUrl;
        else if (ct == "video/mpeg")
            mdpt = MediaDataPartType.MpegVideo;
        else if (ct == "video/mpg")
            mdpt = MediaDataPartType.Mpg;
        else if (ct == "audio/ogg")
            mdpt = MediaDataPartType.OggAudio;
        else if (ct == "video/ogg")
            mdpt = MediaDataPartType.OggVideo;
        else if (ct == "video/quicktime")
            mdpt = MediaDataPartType.Quicktime;
        else if (ct == "video/vc1")
            mdpt = MediaDataPartType.VC1;
        else if (ct == "audio/wav")
            mdpt = MediaDataPartType.Wav;
        else if (ct == "audio/x-ms-wma")
            mdpt = MediaDataPartType.Wma;
        else if (ct == "video/x-ms-wmv")
            mdpt = MediaDataPartType.Wmv;
        else if (ct == "video/x-ms-wmx")
            mdpt = MediaDataPartType.Wmx;
        else if (ct == "video/x-ms-wvx")
            mdpt = MediaDataPartType.Wvx;
        else if (ct == "audio/unknown")
            mdpt = MediaDataPartType.Wav;
        return mdpt;
    }

namespace DocumentFormat.OpenXml.Packaging
{
// Summary:
// Defines part media types.
public enum MediaDataPartType
{
// Summary:
// Audio Interchange File Format (.aiff)
Aiff = 0,
//
// Summary:
// MIDI Audio (.mid)
Midi = 1,
//
// Summary:
// MP3 (.mp3)
Mp3 = 2,
//
// Summary:
// MP3 Playlist File (.m3u)
MpegUrl = 3,
//
// Summary:
// WAV audio (.wav)
Wav = 4,
//
// Summary:
// Windows Media Audio File (.wma)
Wma = 5,
//
// Summary:
// Mpeg audio (.mpeg)
MpegAudio = 6,
//
// Summary:
// Ogg Vorbis (.ogg)
OggAudio = 7,
//
// Summary:
// Advanced Stream Redirector File (.asx)
Asx = 8,
//
// Summary:
// Audio Video Interleave File (.avi)
Avi = 9,
//
// Summary:
// MPEG 1 System Stream (.mpg)
Mpg = 10,
//
// Summary:
// MPEG 1 System Stream (.mpeg)
MpegVideo = 11,
//
// Summary:
// Windows Media File (.wmv)
Wmv = 12,
//
// Summary:
// Windows Media Player A/V Shortcut (.wmx)
Wmx = 13,
//
// Summary:
// Windows Media Redirector (.wvx)
Wvx = 14,
//
// Summary:
// QuickTime video (.mov)
Quicktime = 15,
//
// Summary:
// Ogg Stream (.ogg)
OggVideo = 16,
//
// Summary:
// VC-1 Stream (.wmv)
VC1 = 17,
}
}

Operation is not supported.

Hi,

When i run the demo WmlToHtmlConverter01.exe get the error PtOpenXmlExtensions.GetXDocument throw exception 'Operation is not supported.'. Environment is Mono on Mac. How could I fix that?

Thanks,

Edison

HTML Converter: Images position

The docx to Html Converter takes the image and adds a img element inside a span element on the html generated, but all the spans have white-space: pre-wrap (because of this setting : GeneralCss = "span { white-space: pre-wrap; }"; ) and because of that all the images are shifted to the right when the image is on the left side of the document, and shifted down when the image is on the right side of the document. If the white-space: pre-wrap CSS is removed from all the spans then the html has no spaces between all elements.

I fixed that by adding a new css class which overwrites the white-space: pre-wrap with white-space: nowrap for the spans that have an img element inside with jquery after the html has been generated.

Excess enumeration of RowDfn.Cells property in SpreadsheetWriter.Write

RowDfn.Cells is a IEnumerable, and during execution of the SpreadsheetWriter.Write method it's enumerated twice, causing excess memory usage and bugs in case of a stateful implementation of IEnumerable'1 interface.

For example, IEnumerable might use a database connection which might be disposed upon disposing the underlying IEnumerator.

Merge Document Issue (Urgent Please)

Hi,

I have use latest Document builder version.But still i got document merge issue.

image2 1

The content of second word templates is merge with content of previous word template.I need the second word template start with as new page.
How to resolve above problem.
(Any code is there or any word setting is needed)

Repeated Header and footer count

Acutally i need to get the characters count in docx(including header and footer).
but open xml sdk is not giving header and footer part count.
now i have read the header and footer text. but I would like to know the number of time the header and footer text used in docx. so that i can multiply and get the correct count of header and footer text

Stream writing XLXS

Hi, the readme page mentions the following capability:

 Writing XLSX files using far simpler code than directly writing the markup, including a streaming approach that enables writing XLSX files with millions of rows.

I can't seem to find an example of this. Can anyone provide any guidance on how to to this? What I'm hoping to do is generate a large XLSX file in a WEBAPI service and stream the output without buffering the file in memory.

The exception: Message=CssParser error: line 1 col 3: invalid identity

Here's the caller code:

    protected override IEnumerable<OpenXmlElement> GenerateNewContent(HtmlField contentItem, OpenXmlElement content, IResourceManager resourceManager)
    {
        var xElement = PrepareHtml(contentItem.HtmlString);

        var usedAuthorCss = HtmlToWmlConverter.CleanUpCss((string)xElement.Descendants().FirstOrDefault(d => d.Name.LocalName.ToLower() == "style"));

        var settings = HtmlToWmlConverter.GetDefaultSettings();

        var doc = HtmlToWmlConverter.ConvertHtmlToWml(contentItem.DefaultCss ?? string.Empty, usedAuthorCss, contentItem.UserCss ?? string.Empty, xElement, settings, null, null);
        using (var mstream = new MemoryStream(doc.DocumentByteArray))
        {
            using (var innerdoc = WordprocessingDocument.Open(mstream, false))
            {
                var paragraphs = innerdoc.MainDocumentPart.Document.Body.Elements<Paragraph>().Select(_ => _.CloneNode(true)).ToList();

                foreach (var source in innerdoc.MainDocumentPart.HyperlinkRelationships)
                {
                    resourceManager.AddHyperlinkRelationship(source.Uri, source.IsExternal, source.Id);
                }

                return paragraphs;
            }
        }
    }

And we use this CSS file:

html, address,
blockquote,
body, dd, div,
dl, dt, fieldset, form,
frame, frameset,
h1, h2, h3, h4,
h5, h6, noframes,
ol, p, ul, center,
dir, hr, menu, pre   { display: block; unicode-bidi: embed }
li              { display: list-item }
head            { display: none }
table           { display: table }
tr              { display: table-row }
thead           { display: table-header-group }
tbody           { display: table-row-group }
tfoot           { display: table-footer-group }
col             { display: table-column }
colgroup        { display: table-column-group }
td, th          { display: table-cell }
caption         { display: table-caption }
th              { font-weight: bolder; text-align: center }
caption         { text-align: center }
body            { margin: 8px }
h1              { font-size: 2em; margin: .67em 0 }
h2              { font-size: 1.5em; margin: .75em 0 }
h3              { font-size: 1.17em; margin: .83em 0 }
h4, p,
blockquote, ul,
fieldset, form,
ol, dl, dir,
menu            { margin: 1.12em 0 }
h5              { font-size: .83em; margin: 1.5em 0 }
h6              { font-size: .75em; margin: 1.67em 0 }
h1, h2, h3, h4,
h5, h6, b,
strong          { font-weight: bolder }
blockquote      { margin-left: 40px; margin-right: 40px }
i, cite, em,
var, address    { font-style: italic }
pre, tt, code,
kbd, samp       { font-family: monospace }
pre             { white-space: pre }
button, textarea,
input, select   { display: inline-block }
big             { font-size: 1.17em }
small, sub, sup { font-size: .83em }
sub             { vertical-align: sub }
sup             { vertical-align: super }
table           { border-spacing: 2px; }
thead, tbody,
tfoot           { vertical-align: middle }
td, th, tr      { vertical-align: inherit }
s, strike, del  { text-decoration: line-through }
hr              { border: 1px inset }
ol, ul, dir,
menu, dd        { margin-left: 40px }
ol              { list-style-type: decimal }
ol ul, ul ol,
ul ul, ol ol    { margin-top: 0; margin-bottom: 0 }
u, ins          { text-decoration: underline }
br:before       { content: "\A"; white-space: pre-line }
center          { text-align: center }
:link, :visited { text-decoration: underline }
:focus          { outline: thin dotted invert }

/* Begin bidirectionality settings (do not change) */
BDO[DIR="ltr"]  { direction: ltr; unicode-bidi: bidi-override }
BDO[DIR="rtl"]  { direction: rtl; unicode-bidi: bidi-override }

*[DIR="ltr"]    { direction: ltr; unicode-bidi: embed }
*[DIR="rtl"]    { direction: rtl; unicode-bidi: embed }

The exception:

OpenXmlPowerTools.OpenXmlPowerToolsException was unhandled by user code
  HResult=-2146233088
  Message=CssParser error: line 1 col 3: invalid identity.  CSS => C:\Inetpub\wwwroot\lookbook\Website\_res\css\docx-default.css
  Source=OpenXmlPowerTools
  StackTrace:
       at OpenXmlPowerTools.HtmlToWml.CSS.CssParser.ParseText(String content)

How can we overcome this issue?

Line is not formatting in converted HTML

HI,
I am using this dll for converting my word document to HTML files. actually its converting but if there is any drawing line in my word document then the drawing line is not coming in converted html file. Is that bug or should i need to add any code for that?
kindly let me know.

Thanks,

HtmlToWmlCssApplier double conversion

Hi!
The ComputeAbsoluteLength method in the HtmlToWmlCssApplier class converts string values to double with this code (line 1424):

!double.TryParse(value, out decValue) 

Unfortunately, this uses CultureInfo.CurrentCulture, but the ComputeAbsoluteLength method assumes that the double value should contain a dot, not a comma.
If you run this code on a machine with a CultureInfo that expects a comma in the double value, this code will fail.
Example:
If I run this on a machine with hu_HU culture info, this code can't convert "2.38" to 2.38, because it expects a comma, which is obviously wrong.

Changing the TryParse calls to this should solve the problem:

!double.TryParse(value, NumberStyles.Float, CultureInfo.InvariantCulture, out decValue)

HtmlToWml - span tag generates new paragraph

If I use 'span' tag in my HTML code, the converter creates a new paragraph on each 'span' instead of just changing a text style inside a paragraph.

Example: my HTML code looks like

test <span style="text-decoration:line-through;">text</span> paragraph

It generates one line of text in a web browser:
test text paragraph

But HtmlToWml converter generates _three lines_ of text like this:
test
text
paragraph

How to make text decoration changes inside a paragraph?

input.txt
input_css.txt
example_output.docx

RevisionAccepter throws Internal error 20, found element {http://schemas.openxmlformats.org/wordprocessingml/2006/main}moveFromRangeStart

I have a document with a moveFromRangeStart element and when calling the RevisionAccepter.AcceptRevisions method it give me the following error:
Internal error 20, found element {http://schemas.openxmlformats.org/wordprocessingml/2006/main}moveFromRangeStart, at OpenXmlPowerTools.RevisionAccepter.b__9c(XElement ce)
at System.Linq.Enumerable.WhereEnumerableIterator1.MoveNext() at System.Linq.Enumerable.Any[TSource](IEnumerable1 source)
at OpenXmlPowerTools.RevisionAccepter.AllParaContentIsDeleted(XElement p)
at OpenXmlPowerTools.RevisionAccepter.b__9a(BlockContentInfo b)
at System.Linq.Enumerable.Any[TSource](IEnumerable1 source, Func2 predicate)
at OpenXmlPowerTools.RevisionAccepter.AllContentDeleted(IGrouping2 g) at OpenXmlPowerTools.RevisionAccepter.<>c__DisplayClass94.<AcceptDeletedAndMoveFromParagraphMarksTransform>b__8b(IGrouping2 g, Int32 i)
at System.Linq.Enumerable.d__52.MoveNext() at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content) at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content) at OpenXmlPowerTools.RevisionAccepter.AcceptDeletedAndMoveFromParagraphMarksTransform(XNode node) at System.Linq.Enumerable.WhereSelectEnumerableIterator2.MoveNext()
at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
at OpenXmlPowerTools.RevisionAccepter.AcceptDeletedAndMoveFromParagraphMarksTransform(XNode node)
at OpenXmlPowerTools.RevisionAccepter.AcceptDeletedAndMoveFromParagraphMarks(XElement element)
at OpenXmlPowerTools.RevisionAccepter.AcceptRevisionsForPart(OpenXmlPart part)
at OpenXmlPowerTools.RevisionAccepter.AcceptRevisions(WordprocessingDocument doc)

I took the latest code from master. Basically I found a "fix" but I am not sure whether that is correct (it looks too simple) -> I just added this Tag to the RevisionAccepter.IsRunContent method (returning false).

Can someone please advise or confirm that fix?

Thanks a lot.

HtmlToWml - font tag support

I have a HTML fragment from SharePoint rich text field and I need to paste it into a Word document. The converter works good but it ignores 'font' tag at all.

<div class="ExternalClassEE9830F7214044AA993ADA217D2B5BBE"><font color="#ff91d2"><b>paragraph 1</b></font></div>

It has black font color in .docx after conversion.
The full html fragment is attached.
html.txt

DocumentAssember: Unit tests for generated documents that won't open.

I've hit a number of issues where a document has been assembled correctly, no errors raised but the document just won't open.

Its usually because of missing xml tags that can be rectified by putting spaces in the template. There is big risk that an end user may tweak the template wording and accidentally remove some white space that causes us to email documents to customers that won't open.

I would like to write small unit tests to isolate the issue. (and then fix properly)

Not sure how to do it though :-( Is there a schema anywhere we could validate the output against?

using DocumentAssembler, word template tags

Hi Eric,
thanks for DocumentAssembler introduction video is very usefull.

But how could I define a image content in my .docx template?
For regular text content is, for example: "
For Table is "


there are Repeat and Conditional tags too.

And for image? is something like media?

Where could I find all avaliable tags?

I have search in OpenXmlPowerToolsExamples project without success

those tags are specific of Open-Xml-PowerTools or are from XPath or are from Word..... I'm confused

thanks for your time.

HtmlToWmlConverter fails on embedded images

If the HTML contains an element such as:

<img width="125" height="119" alt="cid:[email protected]" src="data:image/png;base64,/9j/4AAQSkZJRgABAQEAeAB4AAD/2wBDAAgGBgcGBQg ...

It tries to instantiate a Bitmap using that src value as a file name, which fails:

<Error>
<Message>An error has occurred.</Message>
<ExceptionMessage>
The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.
</ExceptionMessage>
<ExceptionType>System.IO.PathTooLongException</ExceptionType>
<StackTrace>
at System.IO.PathHelper.GetFullPathName() at System.IO.Path.NormalizePath(String path, Boolean fullCheck, Int32 maxPathLength, Boolean expandShortPaths)
 at System.IO.Path.GetFullPathInternal(String path) at System.IO.Path.GetFullPath(String path)
 at System.Drawing.IntSecurity.UnsafeGetFullPath(String fileName)
 at System.Drawing.IntSecurity.DemandReadFileIO(String fileName)
 at System.Drawing.Bitmap..ctor(String filename)
 at OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.TransformImageToWml(XElement element, HtmlToWmlConverterSettings settings, WordprocessingDocument wDoc)
 at OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.Transform(XNode node, HtmlToWmlConverterSettings settings, WordprocessingDocument wDoc, NextExpected nextExpected, Boolean preserveWhiteSpace)
 at OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.<>c__DisplayClass93.<Transform>b__75(XNode n)
 at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
 at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
 at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
 at OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.Transform(XNode node, HtmlToWmlConverterSettings settings, WordprocessingDocument wDoc, NextExpected nextExpected, Boolean preserveWhiteSpace)
 at OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.<>c__DisplayClassa5.<GenerateNextExpected>b__a1(XNode n)
 at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
 at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
 at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
 at OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.GenerateNextExpected(XNode node, HtmlToWmlConverterSettings settings, WordprocessingDocument wDoc, String styleName, NextExpected nextExpected, Boolean preserveWhiteSpace)
 at OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.Transform(XNode node, HtmlToWmlConverterSettings settings, WordprocessingDocument wDoc, NextExpected nextExpected, Boolean preserveWhiteSpace)
 at OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.<>c__DisplayClass93.<Transform>b__5a(XNode n)
 at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
 at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
 at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
 at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
 at OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.UpdateMainDocumentPart(WordprocessingDocument wDoc, XElement html, HtmlToWmlConverterSettings settings)
 at OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.ConvertHtmlToWml(String defaultCss, String authorCss, String userCss, XElement xhtml, HtmlToWmlConverterSettings settings, WmlDocument emptyDocument, String annotatedHtmlDumpFileName) 

Since I am performing these conversions inside an IIS app pool, it would be nice to remove the file system calls entirely and transform a MemoryStream instead of creating a file somewhere.

Support for SVG

Is supporting inline svg in HtmlToWmlConverter anywhere on this project roadmap? I wonder how difficult it might be to convert svg to wml, or if it's even feasible.

The Ignorable attribute contains an undefined prefix

When the example document NumberedListTest.docx is passed through SimplifyMarkup, the resulting docx lists w15 in the Ignorable attribute in word/document.xml, but it misses a namespace declaration for w15.
If the resulting document is subsequently passed through Expand-DocxFormatting.ps1the same problem occurs both in word/document.xml and word/styles.xml.
These problems are signaled by both the Word executable and Get-OpenXmlValidationErrors.ps1.

Converting Headers and Footers to HTML

Hi, guys,

I'm trying to convert a docx file to an HTML document, and want a way to get the header and footer from the document, each in separate functions. Right now it looks like I can only get the body using the ConvertToHtml function and passing in the wDoc.

I don't seem to find a way to do this in Open-Xml-PowerTools. I think what would be best is if, in addition to passing in a WmlDocument or WordprocessingDocument object to ConvertToHtml, you could also open third method, passing in an XElement which could represent an XDocument root or even just a subset of the entire document.

Let me know if this makes sense, or if I missed some functionality in the library.

Thanks!

Issues when converting links with HtmlToWmlConverter

I found a couple of issues with converting <a href=""></a> links to Wml.

If the href attribute does not exist or is not a valid Uri the link text is removed from the converted document. In my opinion it would be better to keep the link text: Example

Testing <a href="#">link</a> with # => Testing with #

Linked images are removed from the output. Example

An image <a href="http://www.example.com"><img src="/img/image.png" alt="" /></a> => An image

Link in table cell generates invalid word document. Example:

<table><tr><td><a href="http://www.example.com">example</a></td></tr></table>

Add snk file to System.IO.Packaging,DocumentFormat.OpenXml and PowerTools .Strong-name signed assemblies must specify a public key in their InternalsVisibleTo declarations.

Hi,

I have add latest OpenXml SDK (2.6) and PowerTools 4.0 in my project.I have added snk file to my project.I got Unable to emit assembly error.So i try to add snk file for System.IO.Packaging,DocumentFormat.OpenXml and PowerTools . For System.IO.Packaging there is no problem.But For DocumentFormat.OpenXml,After adding snk file i got below error.

Friend assembly reference 'SdbBuilder' is invalid. Strong-name signed assemblies must specify a public key in their InternalsVisibleTo declarations.

Friend assembly reference 'ofapiTest' is invalid. Strong-name signed assemblies must specify a public key in their InternalsVisibleTo declarations.

File name is (ofapi.cs)

How to solve above error.

How to add snk file to all 3 projects.I need dll with stong named type project.
Please help me to resolve the issue.

ZipArchive.CreateEntry not found.

I'm working on a simple app to merge presentations based on the PresentationBuilder example. I have it working fine in Windows but when I push it to my Linux box, compile it in Mono it throws a "System.MissingMethodException: Method 'ZipArchive.CreateEntry' not found." error. Any idea what is going on?

Unhandled Exception:
System.MissingMethodException: Method 'ZipArchive.CreateEntry' not found.
  at System.IO.Packaging.ZipPackage.Dispose (Boolean disposing) <0x415c39c0 + 0x00043> in <filename unknown>:0
  at System.IO.Packaging.Package.System.IDisposable.Dispose () <0x415c2780 + 0x001bb> in <filename unknown>:0
  at OpenXmlPowerTools.OpenXmlPowerToolsDocument.GetDocumentType (System.Byte[] bytes) <0x41590d60 + 0x00518> in <filename unknown>:0
  at OpenXmlPowerTools.OpenXmlPowerToolsDocument.GetDocumentType () <0x41590d30 + 0x0001b> in <filename unknown>:0
  at OpenXmlPowerTools.PmlDocument..ctor (System.String fileName) <0x41590b80 + 0x0002c> in <filename unknown>:0
  at MergePresentatons01.MergePresentatons01.Main (System.String[] args) <0x4158cd70 + 0x006c0> in <filename unknown>:0
[ERROR] FATAL UNHANDLED EXCEPTION: System.MissingMethodException: Method 'ZipArchive.CreateEntry' not found.
  at System.IO.Packaging.ZipPackage.Dispose (Boolean disposing) <0x415c39c0 + 0x00043> in <filename unknown>:0
  at System.IO.Packaging.Package.System.IDisposable.Dispose () <0x415c2780 + 0x001bb> in <filename unknown>:0
  at OpenXmlPowerTools.OpenXmlPowerToolsDocument.GetDocumentType (System.Byte[] bytes) <0x41590d60 + 0x00518> in <filename unknown>:0
  at OpenXmlPowerTools.OpenXmlPowerToolsDocument.GetDocumentType () <0x41590d30 + 0x0001b> in <filename unknown>:0
  at OpenXmlPowerTools.PmlDocument..ctor (System.String fileName) <0x41590b80 + 0x0002c> in <filename unknown>:0
  at MergePresentatons01.MergePresentatons01.Main (System.String[] args) <0x4158cd70 + 0x006c0> in <filename unknown>:0

DocumentAssembler: Formatting support?

I'm using DocumentAssembler to inject text and tables. For some of the content I'd like to be able to insert formatted text with at least new lines/paragraphs and tabs/indents into the table cells and elements. It looks like EvaluateXPathToString removes all structure from the XML though.

Is it possible to add support for e.g. HTML or basic replacements (\n, \t etc.)?

In the meantime I think I can probably do a post-processing step after injecting my text with markup (inject CDATA maybe), then search/replace e.g. \r\n or

for a new paragraph. Any hints on how to do that that would be appreciated.

Thanks for the great library!

Converting to HTML and back to WML causes loss of font type on bulleted lists

 If you convert to HTML and then back to WML the font type is lost in some cases.  If I create a bulleted list in Word my configuration uses “Symbol” size 11 font for the bullets by default.  It uses “Calibri (Body)” size 11 for the text following the bullet.

 If I convert this to HTML and then back to WML using the WmlToHtmlConverter and then the HtmlToWmlConverter both the bullets and the text turn to “Times New Roman” size 12.  This causes the bullets to appear as empty rectangle blocks instead of the traditional filled circle.

Libraries used,

DocumentFormat.OpenXml 2.6.0.0
OpenXmlPowerTools 4.2.0.0
System.Management.Automation 1.0.0.0 (6.1.7601.17515)

Code to reproduce

using DocumentFormat.OpenXml.Packaging;
using OpenXmlPowerTools;
using System.IO;

namespace HtmlConversionIssue
{
    class Program
    {
        static void Main(string[] args)
        {
            var documentBytes = File.ReadAllBytes("BulletedList.docx");
            using (var memoryStream = new MemoryStream(documentBytes.Length))
            {
                memoryStream.Write(documentBytes, 0, documentBytes.Length);
                using (var wordDocument = WordprocessingDocument.Open(memoryStream, true))
                {
                    RevisionAccepter.AcceptRevisions(wordDocument);
                    var convertedHtml = WmlToHtmlConverter.ConvertToHtml(wordDocument, new WmlToHtmlConverterSettings());
                    var convertedWml = HtmlToWmlConverter.ConvertHtmlToWml(string.Empty, string.Empty, string.Empty, convertedHtml, HtmlToWmlConverter.GetDefaultSettings());

                    convertedWml.SaveAs("ConvertedAndBackAgain.docx");
                }
            }
        }
    }
}
[BulletedList.docx](https://github.com/OfficeDev/Open-Xml-PowerTools/files/291519/BulletedList.docx)

Fails when converting html table to docx

Running this code:

        var result = HtmlToWmlConverter.ConvertHtmlToWml(
            "", "", "",
            XElement.Parse("<html><body><table><tr><td>asger</td></tr></table></body></html>"),
            HtmlToWmlConverter.GetDefaultSettings());

Throws this exception:

System.NullReferenceException

Object reference not set to an instance of an object.

   at OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.GetTableGrid(XElement element, HtmlToWmlConverterSettings settings)
   at OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.Transform(XNode node, HtmlToWmlConverterSettings settings, WordprocessingDocument wDoc, NextExpected nextExpected, Boolean preserveWhiteSpace)
   at OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.<>c__DisplayClass20_0.<Transform>b__3(XNode n)
   at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
   at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
   at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
   at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
   at OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.UpdateMainDocumentPart(WordprocessingDocument wDoc, XElement html, HtmlToWmlConverterSettings settings)
   at OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.ConvertHtmlToWml(String defaultCss, String authorCss, String userCss, XElement xhtml, HtmlToWmlConverterSettings settings, WmlDocument emptyDocument, String annotatedHtmlDumpFileName)
   at OpenXmlPowerTools.HtmlToWmlConverter.ConvertHtmlToWml(String defaultCss, String authorCss, String userCss, XElement xhtml, HtmlToWmlConverterSettings settings)

Can anybody help me spot what, I am doing something wrong? :)

Ability to process data from clipboard

Maybe it is already supported, but I could not find it?

As an api user
I want to be able to translate a text from clipboard that represents a chunk of text copied / cut from MS Word document into HTML
So that I can then use that HTML on a web page

Actually, it would be perfect if I could pass just a string with OXML (so that I care about reading clipboard myself), I just described the real-life use case we need to support.

DocumentAssembler: typo in Repeat element leeds to invalid docx result

I had the following situation:

repeatissue31

Please note the typo in the Repeat statement: "Secelt" should be "Select".

This typo leads to an invalid docx result. So for a template creator this means the corresponding error message cannot be viewed because the result document (that contains the error message) could not be opened in word. Another big pitfall! I had to inspect the document on xml level before I found the reason.

I would expect DocumentAssembler to not produce invalid documents in any case.

This issue again could be related to #29 , but I'm not sure here. In contrast to #29 the Repeat and EndRepeat elements are not "in paragraph" (the small << and >> symbols in the blue markers are visible)

DocumentAssembler and Page Number

I was trying to Process a Document with DocumentAssembler that had a page number configured into the Footer. DocumentAssembler is trying to process the content of the Content Control that Word inserted as XML and replacing it with the Invalid Xml Message. This might be a more general problem in that DocumentAssembler appears to want to process all Content Controls in the Document?

HtmlToWmlConverter.ConvertHtmlToWml(...) throws OpenXmlPowerToolsException: did not return a double

I'm trying to parse some html, in first round without regards for css in the html, keeping it as simple as possible. I've tried feeding an empty string, and the "defaultCss" as defined in the example : https://github.com/OfficeDev/Open-Xml-PowerTools/blob/vNext/OpenXmlPowerToolsExamples/HtmlToWmlConverter02/HtmlToWmlConverter02.cs , but to no avail. I keeping getting the same exception, supplied here with the stacktrace:

[OpenXmlPowerToolsException: did not return a double?]
OpenXmlPowerTools.HtmlToWml.CssApplier.ComputeAbsoluteLength(XElement element, CssExpression assignedValue, HtmlToWmlConverterSettings settings, CssExpression lengthForPercentage) +2086
OpenXmlPowerTools.HtmlToWml.CssApplier.<.cctor>b__210(XElement element, CssExpression assignedValue, HtmlToWmlConverterSettings settings) +74
OpenXmlPowerTools.HtmlToWml.CssApplier.GetComputedPropertyValue(PropertyInfo propertyInfo, XElement element, String propertyName, HtmlToWmlConverterSettings settings) +418
OpenXmlPowerTools.HtmlToWml.CssApplier.SetAllValues(XElement xHtml, HtmlToWmlConverterSettings settings) +236
OpenXmlPowerTools.HtmlToWml.CssApplier.ApplyAllCss(String defaultCss, String authorCss, String userCss, XElement newXHtml, HtmlToWmlConverterSettings settings, CssDocument& defaultCssDoc, CssDocument& authorCssDoc, CssDocument& userCssDoc, String annotatedHtmlDumpFileName) +315
OpenXmlPowerTools.HtmlToWml.HtmlToWmlConverterCore.ConvertHtmlToWml(String defaultCss, String authorCss, String userCss, XElement xhtml, HtmlToWmlConverterSettings settings, WmlDocument emptyDocument, String annotatedHtmlDumpFileName) +189
OpenXmlPowerTools.HtmlToWmlConverter.ConvertHtmlToWml(String defaultCss, String authorCss, String userCss, XElement xhtml, HtmlToWmlConverterSettings settings) +21
HtmlPrintingDemo.Models.DocxGeneration.GenerateDocx(String htmlContent) in C:\Code\POC\HtmlPrinting\HtmlPrintingDemo\Models\DocxGeneration.cs:29
HtmlPrintingDemo.Controllers.PrintController.Docx(String templateId, String entityId, Example example) in C:\Code\POC\HtmlPrinting\HtmlPrintingDemo\Controllers\PrintController.cs:64

The code calling looks like this:

 public static DocxGeneration GenerateDocx(string htmlContent)
        {
            //See http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2015/10/12/screen-cast-introducing-the-htmltowmlconverter-module.aspx

            //XHTML can only have 1 root - so we wrap in html/body
            XElement html = XElement.Parse(string.Format("<html><body>{0}</body></html>",htmlContent));
            var settings = HtmlToWmlConverter.GetDefaultSettings();

            //set directory for images? etc....

            //var usedAuthorCss = HtmlToWmlConverter.CleanUpCss((string)html.Descendants().FirstOrDefault(d => d.Name.LocalName.ToLower() == "style") ?? dummyCss);
            var usedAuthorCss = HtmlToWmlConverter.CleanUpCss(dummyCss);
            var userCss = HtmlToWmlConverter.CleanUpCss(dummyCss);

            var document = HtmlToWmlConverter.ConvertHtmlToWml(defaultCss, defaultCss, defaultCss, html, settings);
            return new DocxGeneration(document.DocumentByteArray.AsStream());

        }

        const string defaultCss =
           @"html, address,
blockquote,
body, dd, div,
dl, dt, fieldset, form,
frame, frameset,
h1, h2, h3, h4,
h5, h6, noframes,
ol, p, ul, center,
dir, hr, menu, pre { display: block; unicode-bidi: embed }
li { display: list-item }
head { display: none }
table { display: table }
tr { display: table-row }
thead { display: table-header-group }
tbody { display: table-row-group }
tfoot { display: table-footer-group }
col { display: table-column }
colgroup { display: table-column-group }
td, th { display: table-cell }
caption { display: table-caption }
th { font-weight: bolder; text-align: center }
caption { text-align: center }
body { margin: auto; }
h1 { font-size: 2em; margin: auto; }
h2 { font-size: 1.5em; margin: auto; }
h3 { font-size: 1.17em; margin: auto; }
h4, p,
blockquote, ul,
fieldset, form,
ol, dl, dir,
menu { margin: auto }
a { color: blue; }
h5 { font-size: .83em; margin: auto }
h6 { font-size: .75em; margin: auto }
h1, h2, h3, h4,
h5, h6, b,
strong { font-weight: bolder }
blockquote { margin-left: 40px; margin-right: 40px }
i, cite, em,
var, address { font-style: italic }
pre, tt, code,
kbd, samp { font-family: monospace }
pre { white-space: pre }
button, textarea,
input, select { display: inline-block }
big { font-size: 1.17em }
small, sub, sup { font-size: .83em }
sub { vertical-align: sub }
sup { vertical-align: super }
table { border-spacing: 2px; }
thead, tbody,
tfoot { vertical-align: middle }
td, th, tr { vertical-align: inherit }
s, strike, del { text-decoration: line-through }
hr { border: 1px inset }
ol, ul, dir,
menu, dd { margin-left: 40px }
ol { list-style-type: decimal }
ol ul, ul ol,
ul ul, ol ol { margin-top: 0; margin-bottom: 0 }
u, ins { text-decoration: underline }
br:before { content: ""\A""; white-space: pre-line }
center { text-align: center }
:link, :visited { text-decoration: underline }
:focus { outline: thin dotted invert }
/* Begin bidirectionality settings (do not change) */
BDO[DIR=""ltr""] { direction: ltr; unicode-bidi: bidi-override }
BDO[DIR=""rtl""] { direction: rtl; unicode-bidi: bidi-override }
*[DIR=""ltr""] { direction: ltr; unicode-bidi: embed }
*[DIR=""rtl""] { direction: rtl; unicode-bidi: embed }
";

        const string userCss = @"";

        private const string dummyCss = "";

Any help is much appreciated

Save IEnumerable<object> as *.xlsx using a stream approach due to large number of records.

Hello,
First, i want to thank you Eric White and all the other developers for the hard work and also for sharing it with others!

I would like to request a method where i can pass in an arbitrary IEnumerable<T> and a file path. The method should create a spreadsheet using a (true) streaming approach. I already watched your Screencast (Using Open XML and LINQ to XML in a Streaming Fashion to Create Huge Spreadsheets) and I tried to build the an Export Function like you pointed out in your SpreadsheetWriter Example but I still get the famous System.OutOfMemoryException. Furthermore I don’t understand how your approach is different from all the others. You still build up the workbook containing all the data in memory.

A method which reads a few rows and persists them to harddisk without holding everything in memory would be great.

Unfortunately, I don’t know how to do this. Here is my most recent approach based on your SpreadsheetWriter. It works fine for small spreadsheets.

Usage:

IEnumerable<Tbl_ANALYSIS_MILD_Movements>  testData = Db.Instance.RetrieveMildMovements();
ExcelExport.ExportData<Tbl_ANALYSIS_MILD_Movements>(testData, testData.Count(), "Test.xlsx");

Source Code:

using System.Collections.Generic;
using OpenXmlPowerTools;
using System;
using System.Reflection;


namespace ExcelExportTest.ViewModel
{
    public class ExcelExport
    {
        public static void ExportData<T>(IEnumerable<T> data, int totalRows, string destination)
        {
            Type dataType = typeof(T);

            var columnDefinitons = GetColumnDefinitons(dataType);
            var rows = GetRows<T>(columnDefinitons, data, totalRows);

            var worksheetDfn = new WorksheetDfn
            {
                TableName = dataType.Name,
                Name = dataType.Name,
                ColumnHeadings = columnDefinitons,
                Rows = rows
            };

            var workbookDfn = new WorkbookDfn { Worksheets = new[] { worksheetDfn } };

            SpreadsheetWriter.Write(destination, workbookDfn);
        }

        private static List<CellDfn> GetColumnDefinitons(Type dataType)
        {
            List<CellDfn> columnDefinitons = new List<CellDfn>();
            PropertyInfo[] properties = dataType.GetProperties();

            foreach (PropertyInfo column in properties)
            {
                columnDefinitons.Add(new CellDfn()
                {
                    CellDataType = CellDataType.String,
                    Value = column.Name,
                    Bold = true
                });
            }

            return columnDefinitons;
        }

        private static RowDfn[] GetRows<T>(List<CellDfn> columnHeadings, IEnumerable<T> data, int totalRows)
        {
            if (columnHeadings == null) throw new ArgumentNullException();
            if (data == null) throw new ArgumentNullException();

            var dataType = typeof(T);
            PropertyInfo[] properties = dataType.GetProperties();

            var rows = new OpenXmlPowerTools.RowDfn[totalRows];
            var currentRow = 0;
            foreach (var dataEntry in data)
            {
                var currentColumn = 0;
                var row = new OpenXmlPowerTools.RowDfn();

                OpenXmlPowerTools.CellDfn[] cells = new OpenXmlPowerTools.CellDfn[columnHeadings.Count];

                foreach (var column in columnHeadings)
                {
                    var property = dataType.GetProperty(column.Value.ToString());
                    var value = property.GetValue(dataEntry);

                    var tempValue = string.Empty;
                    if (value != null)
                    {
                        tempValue = value.ToString();
                    }

                    cells[currentColumn] = new OpenXmlPowerTools.CellDfn
                    {
                        Value = tempValue,
                        CellDataType = GetOpenXmlExcelType(properties[currentColumn].PropertyType)
                    };

                    currentColumn++;
                }

                row.Cells = cells;

                rows[currentRow] = row;

                currentRow++;
            }

            return rows;
        }

        private static CellDataType GetOpenXmlExcelType(Type type)
        {
            if (type == typeof(short) || type == typeof(short?) ||
                type == typeof(decimal) || type == typeof(decimal?) ||
                type == typeof(double) || type == typeof(double?) ||
                type == typeof(float) || type == typeof(float?) ||
                type == typeof(int) || type == typeof(int?) ||
                type == typeof(long) || type == typeof(long?)) return CellDataType.Number;

            if (type == typeof(bool) || type == typeof(bool?)) return CellDataType.Boolean;
            if (type == typeof(DateTime) || type == typeof(DateTime?)) return CellDataType.Date;

            return CellDataType.String;
        }
    }
}

Questions:

  • Did i understood your SpreadsheetWriter Example correctly or am i missing the point?
  • How can I Improve my code to work with huge collections (700000 Row, 24 Columns)?

Thank you for your advise and keep up the good work!

Table/Document format gets changed after accept all changes using OpenXml Power Tools 4.1.3

Hello Team

I am using OpenXml Power Tools 4.1.3 for accepting revision in document, when I use acceptRevision () method from OpenXml Power Tools my document/table gets disturbed and shows contents in single column, if I accept Revisions using Word's 'Accept all changes' menu, document not get disturbed, it's seems its a BUG of OpenXml Power Tools 4.1.3.

I have attached sample document to replicate the issue. see below link

http://1drv.ms/1PloF8g

Please help me out to resolve this problem.

Thanks
Prasad

DocumentAssembler: Support for Images

It would be nice if there was an easier way to insert images into an assembled document. Currently we are using a some bookmark replacement code found on @EricWhiteDev 's blog but to do this we have to save the WmlDocument to disk and then open it again as a WordProcessingDocument to run the snippet before saving again.

If we add it to document assembler there needs to be some way of controlling one or both of height and width.

DocumentAssembler - Having an Else construct for Conditional

With the newly added NotMatch, possible to have two alternative pieces of content based on a match. It would be cleaner, markup wise, to have an Else.

We have both block-level, and run-level conditional, and Else would be applicable in both these scenarios.

WmlToHtmlConverter text boxes

I noticed in this post, back in 2014, that there were plans to support converting MS Word text boxes to HTML. At the moment text boxes seem to be more or less ignored. So I was wondering if there has been any update since that post on if/when text boxes will be supported? Or if there is a work around anyone knows about to be able to convert text boxes?

Thank you for any info you can give me and for the great tool!

DocumentAssembler: Error-Message of Repeat-Element might produce invalid DOCX

In the following case a Repeat-Element produces invalid DOCX:

If there‘s a Repeat-Element in a paragraph, mixed with other elements in the same paragraph AND the XPath-Evaluation for Repeat returns no result-nodes.

In this case Repeat produces an Error-Message like <w:p><w:r>...Error-message...</w:r></w:p> WITHIN the context of a <w:p> element and I would get the following invalid result:

<w:p><w:p><w:r>...Error-message...</w:r></w:p></w:p>

The usecase is: a template with "In-Paragraph Repitition" like this one:

repeatissue29

In most cases this is not a problem, because in most cases the Repeat-Element IS the current paragraph (small << and >> symbols in the blue markers of the content controls). So the Repeat-Element (and the corresponding <w:p>) is just replaced by <w:p><w:r>...Error-message...</w:r></w:p> and everything is fine.

I guess this is because of https://github.com/OfficeDev/Open-Xml-PowerTools/blob/vNext/OpenXmlPowerTools/DocumentAssembler.cs#L608
not being aware the current context.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.