Giter Club home page Giter Club logo

Comments (48)

Sicos1977 avatar Sicos1977 commented on June 28, 2024 2

Can you try the latest sourcecode that I just pushed to GitHub? I did put in a fallback mechanism so that the IFilterReader uses the old iPersistFile interface when the iPersistStream interface fails

        var iPersistStream = iFilter as NativeMethods.IPersistStream;
        Exception iPersistStreamException = null;

        // IPersistStream is asumed on 64 bits systems
        if (iPersistStream != null)
        {
            try
            {
                iPersistStream.Load(new IStreamWrapper(stream));
                NativeMethods.IFILTER_FLAGS flags;
                if (iFilter.Init(iflags, 0, IntPtr.Zero, out flags) == NativeMethods.IFilterReturnCode.S_OK)
                    return iFilter;
            }
            catch (Exception exception)
            {
                Marshal.ReleaseComObject(iFilter);
                iPersistStreamException = exception;
            }
        }

        if (iPersistStreamException != null)
        {
            if (string.IsNullOrWhiteSpace(fileName))
                throw new IFOldFilterFormat("The IFilter does not support the IPersistStream interface, supply a filename to use the IFilter", iPersistStreamException);

            // If we get here we probably are using an old IFilter so try to load it the old way
            // ReSharper disable once SuspiciousTypeConversion.Global
            var persistFile = iFilter as IPersistFile;
            if (persistFile != null)
            {
                persistFile.Load(fileName, 0);
                NativeMethods.IFILTER_FLAGS flags;
                if (iFilter.Init(iflags, 0, IntPtr.Zero, out flags) == NativeMethods.IFilterReturnCode.S_OK)
                    return iFilter;
            }
        }

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

My version is based on the same old utility that you found on codeproject. I reused most of the code... so I really don't know why my version doesn't work on your server. I just tested my version on a windows 2012 R2 server and that one works without any problems.

I didn't do anything special in the program.. the only thing that I did is that I upgraded some interfaces so that they would work with the IPersistantStream interface. It is really hard for me to guess what is going wrong on your side. Can you post some screenshots?

capture

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

What Adobe IFilter did you install? I have this one --> http://www.adobe.com/support/downloads/detail.jsp?ftpID=5542

And is your SQL instance 32 or 64 bits?

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

I want to understand why it works so. I tested my SQL CLR with IFilterTextReader and it works as demo app.
Ok, this is rtf file

ifilter_1

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

That is pdf.
And SQL works good with rtf and pdf (the same? ) ifilters.

ifilter_2

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

Where did you get the IFilter Tester (the one on the right)... seems like an old version that I have made some time ago.

The most important question now is... is the program running in 32 bits or 64 bits mode?
The reason why I ask this is because when a program runs in 32 bits mode it will need 32 bits IFilters and the same for 64 bits mode.

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024
  1. pdf ifilter, this is sql server shows:

.pdf 6C337B26-3E38-4F98-813B-FBA18BAB64F5 C:\Windows\system32\glcndFilter.dll 6.2.9200.16451 Microsoft Corporation
(https://support.microsoft.com/en-us/kb/2791465)

  1. iFilerTester
    Using IFilter in C#
    Eyal Post, 19 Mar 2006

http://www.codeproject.com/Articles/13391/Using-IFilter-in-C

  1. I have built the application as it is:. Any CPU.
    But I also use IFilterTextReader.dll in my SQL CLR function in SQL 2012 (EE x64). It works as demo app too: gives me the same exception on pdf.

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

Ok, I will test it.

Btw, I found this info:
"c:\windows\system32\glcndFilter.dll which is the default PDF ifilter in 2012 server"

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

I replaced this piece of code and tested:

  1. rtf - is hanging too
  2. pdf -
    at IFilterTextReader.FilterLoader.LoadAndInitIFilter(Stream stream, String extension, Boolean disableEmbeddedContent, String fileName) in f:_Samples\RedisCLR\ConsoleTest\IFilterTextReader\FilterLoader.cs:line 162
    at IFilterTextReader.FilterReader..ctor(String fileName, String extension, Boolean disableEmbeddedContent, Boolean includeProperties) in f:_Samples\RedisCLR\ConsoleTest\IFilterTextReader\FilterReader.cs:line 138
    at IFilterTextViewer.MainForm.SelectButton_Click(Object sender, EventArgs e) in f:_Samples\RedisCLR\ConsoleTest\IFilterTextViewer\MainForm.cs:line 112
    COM object that has been separated from its underlying RCW cannot be used.

I can try to replace the pdf filter with Adobe native but if SQL Server works with glcndFIlter.dll then it is interesting why app does not work....

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

You can also try to remove all this code _job... it is needed when adobe ifilters are used:

    public MainForm()
    {
        InitializeComponent();

        // Add the current process to the sandbox
        _job.AddProcess(Process.GetCurrentProcess().Handle);
    }

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

var persistFile = iFilter as IPersistFile;
if (persistFile != null) //<----------FilterLoader.cs:line 162

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

Removed _job call, but the same error...

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

Try to remove the complete iPersistStream code so that it only uses the old interface.... then it is the same as Eyal's code.

        // ReSharper disable once SuspiciousTypeConversion.Global
        var iPersistStream = iFilter as NativeMethods.IPersistStream;

        // IPersistStream is asumed on 64 bits systems
        if (iPersistStream != null)
        {
            iPersistStream.Load(new IStreamWrapper(stream));
            NativeMethods.IFILTER_FLAGS flags;
            if (iFilter.Init(iflags, 0, IntPtr.Zero, out flags) == NativeMethods.IFilterReturnCode.S_OK)
                return iFilter;
        }
        else

Just another question... where you from? ... the USA?

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

Also, glcndFilter.dll - is default pdf ifilter for Windows Server 2012 and Windows 8 too
I have checked on my server all is as described here:

https://ryancr.wordpress.com/category/computers-and-internet/windows-8/

b) Default value at HKEY_CLASSES_ROOT.pdf\PersistentHandler should be {1AA9BF05-9A97-48c1-BA28-D9DCE795E93C}

c) Default value at HKEY_CLASSES_ROOT\CLSID{1AA9BF05-9A97-48c1-BA28-D9DCE795E93C}\PersistentAddinsRegistered{89BCB740-6119-101A-BCB7-00DD010655AF} should be {6C337B26-3E38-4F98-813B-FBA18BAB64F5}

d) If you’re running Windows 8x:

Default value at HKEY_CLASSES_ROOT\CLSID{6C337B26-3E38-4F98-813B-FBA18BAB64F5}\InProcServer32 should be %systemroot%\system32\glcndFilter.dll
In an administrative command prompt, run: regsvr32 %systemroot%\system32\glcndFilter.dll and confirm you get “DllRegisterServer in C:\WINDOWS\system32\glcndFilter.dll succeeded.”

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

Ok, I removed code which you wrote.
Now it is:

  1. pdf
    at IFilterTextReader.FilterReader..ctor(String fileName, String extension, Boolean disableEmbeddedContent, Boolean includeProperties) in f:_Samples\RedisCLR\ConsoleTest\IFilterTextReader\FilterReader.cs:line 138
    at IFilterTextViewer.MainForm.SelectButton_Click(Object sender, EventArgs e) in f:_Samples\RedisCLR\ConsoleTest\IFilterTextViewer\MainForm.cs:line 112
    There is no IFilter installed for the extension '.pdf'

  2. rtf: works correctly!

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

Can you replace the IFilterReader constructors with this code? It will tell you if it is looking for a 32 or 64 bits ifilter

    #region Constructor en Destructor
    /// <summary>
    /// Creates an TextReader object for the given <paramref name="fileName"/>
    /// </summary>
    /// <param name="fileName">The file to read</param>
    /// <param name="extension">Overrides the file extension of the <paramref name="fileName"/>, 
    /// the extension is used to determine the <see cref="NativeMethods.IFilter"/> that needs to
    /// be used to read the <paramref name="fileName"/></param>
    /// <param name="disableEmbeddedContent">When set to <c>true</c> the <see cref="NativeMethods.IFilter"/>
    /// doesn't read embedded content, e.g. an attachment inside an E-mail msg file. This parameter is default set to <c>false</c></param>
    /// <param name="includeProperties">When set to <c>true</c> the metadata properties of
    /// a document are also returned, e.g. the summary properties of a Word document. This parameter
    /// is default set to <c>false</c></param>
    public FilterReader(string fileName, 
                        string extension = "",
                        bool disableEmbeddedContent = false,
                        bool includeProperties = false)
    {
        try
        {
            _fileName = fileName;
            _fileStream = File.OpenRead(fileName);

            if (string.IsNullOrWhiteSpace(extension))
                extension = Path.GetExtension(fileName); 

            _filter = FilterLoader.LoadAndInitIFilter(_fileStream, extension, disableEmbeddedContent, fileName);

            if (_filter == null)
            {
                if (string.IsNullOrWhiteSpace(extension))
                    throw new IFFilterNotFound("There is no " + (Environment.Is64BitProcess ? "64 bits" : "32 bits") +
                                               "IFilter installed for the file '" + Path.GetFileName(fileName) + "'");

                throw new IFFilterNotFound("There is no " + (Environment.Is64BitProcess ? "64 bits" : "32 bits") +
                                           "IFilter installed for the extension '" + extension + "'");
            }

            _includeProperties = includeProperties;
        }
        catch (Exception)
        {
            Dispose();
            throw;
        }
    }

    /// <summary>
    /// Creates an TextReader object for the given <see cref="Stream"/>
    /// </summary>
    /// <param name="stream">The file stream to read</param>
    /// <param name="extension">The extension for the <paramref name="stream"/></param>
    /// <param name="disableEmbeddedContent">When set to <c>true</c> the <see cref="NativeMethods.IFilter"/>
    /// doesn't read embedded content, e.g. an attachment inside an E-mail msg file. This parameter is default set to <c>false</c></param>
    /// <param name="includeProperties">When set to <c>true</c> the metadata properties of
    /// a document are also returned, e.g. the summary properties of a Word document. This parameter
    /// is default set to <c>false</c></param>
    public FilterReader(Stream stream,
                        string extension,
                        bool disableEmbeddedContent = false,
                        bool includeProperties = false)
    {
        if (string.IsNullOrWhiteSpace(extension))
            throw new ArgumentException("The extension cannot be empty", "extension");

        _filter = FilterLoader.LoadAndInitIFilter(stream, extension, disableEmbeddedContent);

        if (_filter == null)
            throw new IFFilterNotFound("There is no " + (Environment.Is64BitProcess ? "64 bits" : "32 bits") +
                                       "IFilter installed for the stream with the extension '" + extension + "'");

        _includeProperties = includeProperties;
    }

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

Hmm.. I build lib for NET 3.5 because of SQL Server requires this version.
3.5 does not contain Environment.Is64BitProcess
Need to replace somehow

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

I used this solution to detect 64 bit

[DllImport("kernel32.dll", SetLastError = true, CallingConvention = CallingConvention.Winapi)]
[return: MarshalAs(UnmanagedType.Bool)]
public static extern bool IsWow64Process([In] IntPtr hProcess, [Out] out bool lpSystemInfo);

private bool Is64Bit()
{
    if (IntPtr.Size == 8 || (IntPtr.Size == 4 && Is32BitProcessOn64BitProcessor()))
    {
        return true;
    }
    else
    {
        return false;
    }
}

private bool Is32BitProcessOn64BitProcessor()
{
    bool retVal;

    IsWow64Process(Process.GetCurrentProcess().Handle, out retVal);

    return retVal;
}

The result is:

at IFilterTextReader.FilterReader..ctor(String fileName, String extension, Boolean disableEmbeddedContent, Boolean includeProperties) in f:_Samples\RedisCLR\ConsoleTest\IFilterTextReader\FilterReader.cs:line 141
at IFilterTextViewer.MainForm.SelectButton_Click(Object sender, EventArgs e) in f:_Samples\RedisCLR\ConsoleTest\IFilterTextViewer\MainForm.cs:line 112
There is no 64 bitsIFilter installed for the extension '.pdf'

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

well then you have no 64 bits ifilter... install the adobe ifilter i mentioned some posts back

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

ok, but how SQL Server works with pdf?

and what to do with rtf? use the latest your changes?

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

Yes use the latest changes...
I have to go now... already 7 PM overhere.

Bye

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

thank you,
I will try to research.

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

Hi, the research continues...

I downloaded SearchFilterView utility from this site
http://www.nirsoft.net

Run it and it shows me all ifilters including glcndFilter.dll

ifilter_3

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

another note:

when I build 'Any CPU' - app gives finally
There is no 32 bits IFilter installed for the extension '.pdf'

I have rebuilt it as x64 and receive:
There is no 64 bits IFilter installed for the extension '.pdf'

But i think it is not the reason, it is final message only.

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

Hi, it seems finally I can read pdf via standard MS pdf filter glcndFilter.dll ...

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

Could you analyze and fix code?
This works for glcndFilter.dll
http://stackoverflow.com/questions/7313828/using-ifilter-in-c-sharp-and-retrieving-file-from-database-rather-than-file-syst

Also, I tested it for txt, ppt, docx, it works too.

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

How are you trying to read from the database? Through a sqldatareader?

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

I am working with your demo app now, not with database. I have added piece of code from link above to your demo app and call method from this link and it works (!) with glcndFilter.dll on W2012 and on W8. By the way on W8 the same problem was reproduced.

I want to make working application and then back to my SQLCLR experiments.

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

Simply saying I made this test :
public static NativeMethods.IFilter LoadAndInitIFilter(Stream stream,
string extension,
bool disableEmbeddedContent,
string fileName = "")
{
string dllName, filterPersistClass;
FilterTester.ParseIFilter(extension, stream); // <------ that is their function

and received text from pdf file

But naturally need to review their code and somehow integrate it to your classes because NativeMethods intersect etc etc

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

Could you paste your source over here so that I can see what you did?
The only difference that I can see between how they do it on stackoverflow and how I do it is that I wrap a .net stream around a IStream. And on stackoverflow they just copy everything to memory.

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

I have made very simple:
created new class IFileTest and copy-paste their classes to it.

I think probably it must be a third way in your IFilterLoader class additionally to existing 2 ways (Persist Stream, Persist File).
Because this their code does not work as-is with another IFIlters, only with glcndFilter

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

It seems that is the key place

// Copy the content to global memory
byte[] buffer = new byte[s.Length];
s.Read(buffer, 0, buffer.Length);
IntPtr nativePtr = Marshal.AllocHGlobal(buffer.Length);
Marshal.Copy(buffer, 0, nativePtr, buffer.Length);

// Create a COM stream
System.Runtime.InteropServices.ComTypes.IStream comStream;
NativeMethods.CreateStreamOnHGlobal(nativePtr, true, out comStream);

// Load the contents to the iFilter using IPersistStream interface
var persistStream = (IPersistStream)filter;
persistStream.Load(comStream);

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

By the way, please check this article and messages below about
"Support for PDF file indexing"
http://www.codeproject.com/Articles/31944/Implementing-a-TextReader-to-extract-various-files

It is about Adobe but it looks like the same problem and approach.
And who knows what is inside of glcndFilter.dll ;-)

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

I changed the constructor on the iFilterReader, I added an option so that you can chose to load everything in memory first before passing it to the iFilter. This way you can set it from you own code.

Just get the latest version from GitHub, it has this change included

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

Any luck?

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

I have tested this version, yes, thank you.

  1. PDF filter:
    It works if I use 'Read into Memory' flag. It seems it was a one right way for glcndFilters.
    Now I need to think how to use it in different environments: for example when Adobe pdf filter or glcndFilter installed. May be call it and if exception then try to repeat call with 'read into memory' flag..

  2. rtf still does not work in this version.
    And on another machine with W7 rtf does not work, the filter is:

rtffilt.dll RTF Filter 2008.0.7600.16385 (win7_rtm.090713-1255) 2008.0.7600.16385 Microsoft Corporation C:\Windows\system32\rtffilt.dll {2e2294a9-50d7-4fe7-a09f-e6492e185884} {e2403e98-663b-4df6-b234-687789db8560} 7/14/2009 6:53:58 AM

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

For the PDF, just read the registry and see what filter is installed and then use the correct flag (read file into memory).

I still find it strange that you have all these problems... because I can read every file for which I installed an IFilter... I wonder what is different on your server. That includes EML, RTF, DOC(X), XLS, etc...

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

I have tested on pdf (glcndFilter) on 2 windows 8 and 2 windows server 2012 machines. It is reproducing. Now it works in the new version.

ahh,ok, RTF is reproducing on W7 and W2012 BUT that is on 3 files only. I open these files with notepad and see that inside is a plain text, no rtf formatting. Probably it is the reason.

I will test the latest version in my SQL CLR function again.
Thank you very much for your help. Your library is the best for today.

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

You wrote: "For the PDF, just read the registry and see what filter is installed "

May be it does make sense to have a public function in FilterReader which can return the filename for the ifilter? In any case it detects it inside

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

I have a very nice class that can detect filetypes... it was for a project with these kind of issues... wrong extensions, no extensions, etc...

I created a Gist ... you can find it overhere --> https://gist.github.com/Sicos1977/d968f30e23171b76abaa

For the CheckCompoundFileStorage method to work you need to add this nuget package --> https://www.nuget.org/packages/CompoundFileStorage/

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

Returning the name of the iFilter will get you in a chicken egg discussion. You don't know what flag to set before you get the iFiltername, but you get the iFilter name after the flag has been set. So it is probably better to leave this outside the IFilterTextReader class. Also when you have an MSG file with attachments you will hit all kind of iFilters.

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

Where you from if I may ask? Europe?

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

Yes, Germany

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

Nice... the Netherlands overhere... so we are almost neighbours :-)

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

yes, right ;-)
Sometimes I visited Amsterdam, IBC exhibition

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

ok, thanx again.

Btw, I have send pdf file (5mb) to your email. Probably you can check if performance can be improved.

On my server your demo app parses it ~ 10 secs.
Naturally, it is via f*g ms pdf filter :-)

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 28, 2024

Sorry but I cant speed anything up, I already optimized the code to be as fast as possible. If you use the Adobe iFilters then you have to deal with that speed. It's not the fastest iFilter there is. If you need a fast one and want to spend a few hundred euro's then you can get PDFLib TET. This way is very fast but does cost a lot.

from ifiltertextreader.

win32nipuh avatar win32nipuh commented on June 28, 2024

Hi,
I have installed Adobe PDF filter and tested it on the same pdf file which I sent.
It works ~9-10 secs too, independently on flag 'Read into memory'.
That is ok.

from ifiltertextreader.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.