Giter Club home page Giter Club logo

Comments (16)

laurenz avatar laurenz commented on September 17, 2024 1

I did some code review, and I think I found the problem. oracle_fdw allocates LOB locators whenever they are needed, but only frees them at the end of the transaction, rather than when the statement is closed. That will leak memory in a DO statement with a loop. The whole matter is made worse by array prefetching, because we need way more LOB descriptors.

The fix will have to redesign how I allocate and free those descriptors, so it's not a one-liner. But at least I know what to do.

from oracle_fdw.

laurenz avatar laurenz commented on September 17, 2024 1

Commit 4ccfadd fixes the problem for me.

Could you grab the latest code and see if it works for you as well?

from oracle_fdw.

laurenz avatar laurenz commented on September 17, 2024

That looks like a double free.

I need to be able to reproduce the problem in order to fix that.

Can you send me the CREATE TABLE statement in Oracle and the CREATE FOREIGN TABLE statement in PostgreSQL?
I also need to know exactly what sequence of SQL statements leads to this error.
Please also tell me all extension (and their configuration) you have in use, particularly ones that influence the planner or the executor (like auto_explain).
Also, any non-default parameters and anything else you can think of are interesting.

from oracle_fdw.

JosefMachytkaNetApp avatar JosefMachytkaNetApp commented on September 17, 2024

Hey, here it is:

Tables:

  • source Oracle table:
SQL> desc AJIADM.CHANGEITEM
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                        NOT NULL NUMBER(18)
 GROUPID                                            NUMBER(18)
 FIELDTYPE                                          VARCHAR2(255 CHAR)
 FIELD                                              VARCHAR2(255 CHAR)
 OLDVALUE                                           CLOB
 OLDSTRING                                          CLOB
 NEWVALUE                                           CLOB
 NEWSTRING                                          CLOB
  • PostgreSQL fdw table:
migration=# \dS test."CHANGEITEM"
                            Foreign table "test.CHANGEITEM"
  Column   |          Type          | Collation | Nullable | Default |     FDW options
-----------+------------------------+-----------+----------+---------+----------------------
 ID        | bigint                 |           | not null |         | (key 'true')
 GROUPID   | bigint                 |           |          |         |
 FIELDTYPE | character varying(255) |           |          |         |
 FIELD     | character varying(255) |           |          |         |
 OLDVALUE  | text                   |           |          |         |
 OLDSTRING | text                   |           |          |         | (strip_zeros 'true')
 NEWVALUE  | text                   |           |          |         |
 NEWSTRING | text                   |           |          |         | (strip_zeros 'true')
Server: oradb
FDW options: (schema 'AJIADM', "table" 'CHANGEITEM')

Notes from tests:

  • Instance has 8 CPUs/ 32 GB RAM,
  • I tried to fiddle with PG settings shared_buffer and work_mem - tested shared_buffers up to 8GB, but no influence on crashes, tested work_mem up 64MB, no influence.
  • I also fiddled with oracle_fdw settings max_long, prefetch and lob_prefetch but nothing helped.
  • With higher settings PG just survives a bit longer, reading more row, but crashes eventually anyway...
  • Columns OLDSTRING / NEWSTRING contain ASCII 0 characters so I set strip_zero.
  • There are also other tables with CLOB Oracle columns, crashing PostgreSQL the same way and these do not contain ASCII 0 characters so this seems to not be related.
  • PostgreSQL crashes on simple select from that FDW table. In my tests I used INSERT ... SELECT and even with batch processing it crashes eventually.

Environment:

  • Source database - Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production / Version 19.12.0.0.0 (on the different instance)
  • Oracle Instant client 21.11 + oracle-instantclient-tools 21.11.0.0.0-2
  • Target PG 14.9, oracle_fdw 2.6.0-1.pgdg110+1, no additional extensions, no unusual settings, beside of shared_buffers nad work_mem all is in default.
  • On the same instance we also tested ora2pg so it has also oracle-libs package installed Version: 1.pgdg110+1

I can test what ever you would want and send you results.
Thank you very much

from oracle_fdw.

laurenz avatar laurenz commented on September 17, 2024

I created such an Oracle table and filled it five rows, where each of the CLOBs was 1000 kB in size.

Then I ran SELECT * FROM "CHANGEITEM";, and there was no error.

What is the query that you are using?

from oracle_fdw.

JosefMachytkaNetApp avatar JosefMachytkaNetApp commented on September 17, 2024

I tested:

  • plain SELECT * FROM ...
  • script looping over ID and doing select * from ... where ID=...
  • INSERT INTO ... (list of columns) SELECT list of columns with PG casts FROM ...`

All of them crash. On that script looping over ID and selecting just 1 row at the time I can see that with higher memory settings it reaches higher IDs. But then crashes anyway. I have the feeling that some internal buffer is filled after some time or something like this.

But I guess 5 rows are not enough, tables which crash have millions of dozens of millions rows on Oracle.
Small tables with just a few rows even having CLOB columns do not cause any problems.

from oracle_fdw.

laurenz avatar laurenz commented on September 17, 2024

I tried with a table with 20000 rows and CLOBs of 50MB each. Then I ran in psql:

\set SINGLELINE on
\set FETCH_COUNT 30
SELECT "ID", length("OLDVALUE"), length("OLDSTRING"), length("NEWVALUE"), length("NEWSTRING") FROM "CHANGEITEM";

And it completed without errors, and I could not see an increase in memory usage (the query fetches the CLOBs but doesn't display them).

I had another look at the call stack, and I see an AbortCurrentTransaction in there. That means that the transaction is rolling back. Did you do an explicit ROLLBACK? If not, then the statement must have failed. Can you show me the complete output you get, with all messages? Is there anything interesting before the crash in the PostgreSQL server log?

I think we have to identify the error before I can make any progress.

from oracle_fdw.

philflorent avatar philflorent commented on September 17, 2024

Hi,
Don't forget to check if you obtain an ORA-4030 or another PGA error at Oracle side (such an error is logged and can be found via adrci) . We obtain a crash at PostgreSQL side in this case. PostgreSQL should not crash but at least it can be easily mitigated.
Best regards,
Phil

from oracle_fdw.

JosefMachytkaNetApp avatar JosefMachytkaNetApp commented on September 17, 2024

I will see what I can do...
Unfortunately I do not have direct access to the Oracle instance, so I cannot run adrci there from the command line. Checking if I can run it remotely...

I switched ON all logging on PostgreSQL and set it to DEBUG5 - I attached log from the crash of the script which selects rows one by one using ID. It selects successfully some 200 000 rows one by one and then it crashes. This extract from the last log shows at the beginning 2 last successful runs of the script before the crash and then the whole info about the crash.
20231030_script_crash.txt

Now I want to try to catch log from INSERT - SELECT, but it is incredibly slow with all that logging....

from oracle_fdw.

laurenz avatar laurenz commented on September 17, 2024

It is not necessary to log debug messages, the normal error messages will do.
If you want, you can set the level to DEBUG2. That will not cause a lot of noise, but includes some oracle_fdw debug messages.

from oracle_fdw.

laurenz avatar laurenz commented on September 17, 2024

I tried running

DO
$$DECLARE
  _id bigint;
  _lenold bigint;
  _lennew bigint;
BEGIN
   FOR _id IN SELECT "ID" FROM "CHANGEITEM" LOOP
      SELECT length("OLDSTRING"), length("NEWSTRING") INTO _lenold, _lennew FROM "CHANGEITEM" WHERE "ID" = _id;
   END LOOP;
END;$$;

and the OOM killer kills my PostgreSQL server process.

Will have to try tomorrow without memory overcommit.

from oracle_fdw.

laurenz avatar laurenz commented on September 17, 2024

Looks like when my Oracle client goes OOM, it causes a segmentation violation. The stack trace looks different from yours.

There may well be a memory leak somewhere in my code. Not easy to determine if all I get is a crash.

Can you avoid the loop and the many small SELECT statements? That might be a workaround.

from oracle_fdw.

JosefMachytkaNetApp avatar JosefMachytkaNetApp commented on September 17, 2024

Thank you very much for update.
I tested latest master version from today. It is definitely much more stable and subjectively also much quicker. With default settings it still crashes but when I now set much higher lob_prefetch for the table, it works.

Details:

  • Unfortunately with default settings it still crashes on INSERT - SELECT command on that "CHANGEITEM" table with 4 CLOB columns. I am now doing batch inserts with 1M rows in each separately committed insert. First batch is successfully committed but second batch crashes.

  • Here is the command of the second crashing batch:
    INSERT INTO public.changeitem (id,groupid,fieldtype,field,oldvalue,oldstring,newvalue,newstring) SELECT "ID"::numeric,"GROUPID"::numeric,"FIELDTYPE"::text,"FIELD"::text,"OLDVALUE"::text,"OLDSTRING"::text,"NEWVALUE"::text,"NEWSTRING"::text FROM jira_fdw."CHANGEITEM" WHERE "ID" BETWEEN 1010113 AND 2010113

  • Here is log from the crash:
    20231101_crash_insertselect_changeitem_1305.txt

  • I tested again setting different lob_prefetch for the table and when I set it now to '67108864' then all works. At least until now - problematic table "CHANGEITEM" was already fully migrated.

The whole migration takes several hours so will see and will keep you informed. But even this result is truly amazing.
Thank you very much!

from oracle_fdw.

JosefMachytkaNetApp avatar JosefMachytkaNetApp commented on September 17, 2024

Today's tests of the master branch were all successful. Will continue tomorrow on other tasks.

from oracle_fdw.

JosefMachytkaNetApp avatar JosefMachytkaNetApp commented on September 17, 2024

All my additional tests of the master branch were successful. I am closing this issue.
Thank you very much.

from oracle_fdw.

laurenz avatar laurenz commented on September 17, 2024

Thanks for the feedback.
I am still a little worried about the problem you reported with smaller lob_prefetch, but I guess I'll wait until somebody else reports a similar problem.

from oracle_fdw.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.