gwenn / sqlite-regex-replace-ext Goto Github PK
View Code? Open in Web Editor NEWSQLite regex_replace() function
SQLite regex_replace() function
Compiling succeeded but did not produce a good library with this line in the icu_replace.sh script:
gcc -fPIC -O2 -Wall -shared -o icu_replace.so icu_replace.c `icu-config --ldflags`
I changed it to this:
gcc -fPIC -O2 -Wall -shared -o icu_replace.so icu_replace.c $(pkg-config --libs icu-i18n)
Ubuntu Focal (20.04) gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Dev package:
ii libicu-dev:amd64 66.1-2ubuntu2 amd64 Development files for International Components for Unicode
Just to be complete:
/sqlite-regex-replace-ext-master>ldd icu_replace.so
linux-vdso.so.1 (0x00007fff1de86000)
libicui18n.so.66 => /usr/lib/x86_64-linux-gnu/libicui18n.so.66 (0x00007f5a67d91000)
libicuuc.so.66 => /usr/lib/x86_64-linux-gnu/libicuuc.so.66 (0x00007f5a67bab000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5a679b9000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f5a677d8000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5a67689000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5a6766e000)
libicudata.so.66 => /usr/lib/x86_64-linux-gnu/libicudata.so.66 (0x00007f5a65bab000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5a65b88000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5a65b82000)
/lib64/ld-linux-x86-64.so.2 (0x00007f5a680ac000)
I get an out of memory error on master, but the same script works on the same data using regex_replace build with commit of 2020-Nov-18: https://github.com/gwenn/sqlite-regex-replace-ext/tree/458fe6978349b503095ed7a5ecf637f0b0ec4d38
Here's a minimal working example:
out-of-memory.zip
It contains:
data.csv
libSqlite3Icu.so
libSqlite3IcuReplace-458fe6978349b503095ed7a5ecf637f0b0ec4d38.so
libSqlite3IcuReplace-master.so
script-458fe6978349b503095ed7a5ecf637f0b0ec4d38.sql
script-master.sql
When I run
sqlite3 < script-458fe6978349b503095ed7a5ecf637f0b0ec4d38.sql
it works, when I run
sqlite3 < script-master.sql
it gives out of memory error:
The two sql scripts differ only with the regex_replace extension:
-- -*- mode: sql -*-
.load ./libSqlite3Icu.so
-- https://github.com/gwenn/sqlite-regex-replace-ext/tree/master - gives out of memory:
.load ./libSqlite3IcuReplace-master.so
.mode csv
.separator ',' "\n"
.header on
.import data.csv data
select
case regex_replace(' \(.+', "column", '')
when 'foo' then 'bar'
else ''
end as "column no comments"
from data
;
Both scripts run on the latest sqlite (3.35.4), I think regression is introduced by the February commits to this repo:
https://github.com/gwenn/sqlite-regex-replace-ext/commits/master
I got incorrect termination of the result string when the Replacement was exactly twice +1 as long as the input. I tried to figure out where exactly the problem occurs, but in the end I wasn't able to find it and due to time constraints, I had to move on with the quick fix, which was to increase the default size of zOutput
to 4 * nInput + 2
. Also, I noticed that nInput
should be determined before reading the actual string zInput
, because otherwise, according to the SQLite docs, the call for nInput
might invalidate the pointer to zInput
.
The error occured for me for the inputs ERROR
as input, ^ERROR$
as program and GLOBAL.DIAG
as replacement string, however only on the second+ run. I checked however with gdb, for each run (context disgarded by sqlite), the program is recompiled and the error still occurs.
Only applies to the ICU regex replace, the glib had all kinds of other weird behaviour for me.
Just wanted to put this out as a warning, sorry I can't be of more help right now to fix the problem.
Was able to compile and load extension to sqlite db, but the regex_replace() function just doesn't work. Seems to always return the second parameter.
Perhaps providing some usage examples and/or instructions would help.
Shell / Tcl scripts ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.