Giter Club home page Giter Club logo

Comments (8)

paulcwarren avatar paulcwarren commented on May 15, 2024

Hi rmotapar,
Postgresql is a little complicated in this regard. From this it seems there are two column types for storing content; bytea and OID. bytea is stored inline. OID is a pointer to content stored externally (to the table). Both have pros and cons. The con of bytea - that we use by default - is that it is not streamed out - hence your problem I think. But let's test that theory.

If I read the documentation correctly then I think you should be able to use OID column instead of bytea. You could do this by overridding the sql that is used by Spring Boot, at start-up, to initialize the database schema. The schema that we use currently is here. If you set the following property spring.content.jpa.schema to point to a resource in your classpath that declares the column as an OID rather than bytea then I think the existing code with the getBinaryInputStream may return a real input stream that streams content rather than loading it into memory.

If that fails then there is a backup plan. We can define a PostgresqlBlobResource that uses their LargeObjectManager but that would involve code changes.

Can you give this a try and let us know how you get on?

from spring-content.

rmotapar avatar rmotapar commented on May 15, 2024

Thank you very much for the prompt response Paul! I did change the schema and tried uploading a file but it failed with the below error:

org.postgresql.util.PSQLException: ERROR: column "content" is of type oid but expression is of type bytea
  Hint: You will need to rewrite or cast the expression.
  Position: 42

In the latest driver (PgPreparedStatement), setBinaryStream method assumes that we have a bytea column.
public void setBinaryStream(int parameterIndex, InputStream value) throws SQLException { preparedParameters.setBytea(parameterIndex, value); }

It looks like postgresql used to support setBinaryStream method for oid columns prior to driver version 7.2. They do mention in the documentation here that one could set compatible=7.1 in order to allow setBinaryStream to use oid column. I did try setting this property on the connection string but it did not work (expected based on the setBinaryStreamcode above)

So it looks like we will have to use setBlob and getBlob methods instead of setBinaryStream and getBinaryStream when the column is of oid type. I will make these updates in spring-content-jpa locally and see if streaming works.

Please do let me know if you have any other thoughts. Thanks again for the help!

from spring-content.

rmotapar avatar rmotapar commented on May 15, 2024

Hi Paul,
I updated spring-content-jpa locally to use setBlob and getBlob methods and I am now able to successfully stream files to/from postgresql. I did notice that uploading the file via setBlob when using oid column is pretty slow when compared to uploading to bytea column via setBinaryStream (the performance went down by almost half). Downloading a file via streaming was pretty fast (500 MB file in 15 seconds ).

Here are the changes I did locally:

  1. Replaced getBinaryStream and setBinaryStream calls in AbstractBlobResource with getBlob and setBlob respectively.
  2. In the getInputStream method, had to add conn.setAutoCommit(false);. Without this, the driver complains with org.postgresql.util.PSQLException: Large Objects may not be used in auto-commit mode. Strangely, I did not get this error when uploading the file
  3. Disabled some unit tests as they were failing

To do this in a clean way, I am thinking we will need to create PostgresqlBlobResource that you mentioned in your previous post. Please let us know your thoughts!

Thanks,
Ramesh

from spring-content.

paulcwarren avatar paulcwarren commented on May 15, 2024

Excellent feedback Ramesh. That is very helpful.

If it requires code changes then how would you like to proceed? Would you be prepared to PR your changes? I can, of course, help you massage them towards something that we can merge. You would also have to sign our CLA before I could accept it. This is probably the quickest route.

Or you could commit your changes on a fork and I can massage and merge them in myself. You will probably take longer. I am also on vacation atm. You wouldn't get any attribution but you wouldnt have to sign the CLA.

Up to you.

I like to keep track of use cases where I can as it helps shape where we go so out of interest would you mind explaining your use case and also which Store interface you are using?

from spring-content.

paulcwarren avatar paulcwarren commented on May 15, 2024

Ping. How would you like to proceed with this @rmotapar?

from spring-content.

rmotapar avatar rmotapar commented on May 15, 2024

Hi Paul,
Sorry for the delay in responding. We are now planning to use the file system as our storage instead of JPA. We are in a bit of a hurry so the team decided that using filesystem as storage should be enough at this time. Our main requirement was to be able to stream attachments and also to make it easier to change the storage mechanism in the future if needed and spring-content provides both of these so we are pretty happy.

Regarding our use case, it deals with sending attachments for review to one or more reviewers (with a bit of workflow around the reviewers) who can add in-line comments to existing attachments or add new ones to the process. We can call it a document review or collaboration process.

Thanks again for providing this project. Would really appreciate if you can provide any thoughts on whether this will eventually become an official spring project.

Thanks,
Ramesh

from spring-content.

paulcwarren avatar paulcwarren commented on May 15, 2024

Hi Ramesh,
Sounds reasonable. Avoiding vendor lock-in and being able to move between storage is one of the central concepts behind Spring Content. So as there is no immediate urgency I'm going to leave this open and we'll fix it after SpringOne 2018 then (currently prepping slide deck for that).

Would you mind if I asked who you work for? The reason I ask is because I would very much like to include your logo in my SpringOne session. Totally understand if you cant, or dont want to say, but thought I would ask.

As for becoming an official Spring project. Certainly, from our perspective, we have always seen this as a project in incubation that would eventually graduate to the spring org in gtihub. We have also had conversations of that nature with the Spring team too. But it would help to make the case for graduation directly to the Spring team. Perhaps, raise an issue against Spring, Spring Data or Spring Cloud stating that you are a user and would like to see it graduate. I expect that they will close it but they will track the asks. Spring is very community driven in this sense.

from spring-content.

paulcwarren avatar paulcwarren commented on May 15, 2024

This commit refactored the blob resource implementations to better support chunking of content through the memory-space of the JVM. From a PostgreSQL perspective, we switched over to using OID field type and the Large Object API. Whilst this has delivered content that "streams" it does, as you note, come at a significant performance hit. None-the-less, overall this is probably preferable to using bytea fields with their 2GB size limit and loading the entire BLOB into memory as it passes through the JVM.

I am going to close this but if you see any issues please feel free to re-open.

from spring-content.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.