tcdi / plrust Goto Github PK
View Code? Open in Web Editor NEWA Rust procedural language handler for PostgreSQL
License: PostgreSQL License
A Rust procedural language handler for PostgreSQL
License: PostgreSQL License
When 2 of the same data type are passed in, the names don't match the ordinal number correctly.
CREATE OR REPLACE FUNCTION test(a int, b int)
RETURNS int AS
$$
Some(a.unwrap())
$$ LANGUAGE plrust;
=# select test(1, 2);
test
------
2
(1 row)
We can imagine a list of requirements for "is this crate safe to build as a dependency?" in a very low-trust user environment. Throwing some darts at the wall, meeting the following requirements would make it easy to justify building a given crate as a dependency for a PL/Rust function:
build.rs
unsafe
codeWe may want to also figure out how to more finely grade crates than "yes or no" and generate, essentially, a "safety recommendation" for a given crate as to whether it should be added to the passlist. In a higher-trust environment, as determined by the installing superuser DBA (who, remember, is the ultimate human source of trust for both PL/Rust and credentials to the database thus what they say is trustworthy is), the DBA may want to dial a threshold for automatic acceptance.
Currently we support minimal logging. If udf calls println!
, it is logged with log level log
and this could or could not show up on the client depending on the setting of client_min_messages.
CREATE OR REPLACE FUNCTION logging() RETURNS INTEGER
IMMUTABLE STRICT
LANGUAGE PLRUST AS
$$
println!("print");
eprintln!("eprint");
Ok(Some(0))
$$;
postgres=# select logging();
LOG: print
WARNING: eprint
logging
---------
0
(1 row)
postgres=#
postgres=# set client_min_messages = 'debug1';
SET
postgres=# select logging();
LOG: print
WARNING: eprint
logging
---------
0
(1 row)
We can consider adding enhancement to support for logging at different levels
notice!('notice');
warning!('warning');
error!('error');
log!('log');
Seriously it's annoying that they constantly have to be restarted because it means the test doesn't complete. Then GH only allows kicking it off again after all jobs have finished. This was okay at first but at some point it has started to happen way more than bearable.
Options, I think, are:
This includes, and includes rationales for
cargo check
first and then cargo rustc
later? Answer: maybe don't?CARGO_TARGET_DIR
?[profile.release]
debug-assertions = true
codegen-units = 1
lto = "fat"
opt-level = 3
panic = "unwind"
We need to support Postgres' SRF api.
In the style of https://matklad.github.io/2021/02/06/ARCHITECTURE.md.html
OK is installed
test=# \dx
List of installed extensions
Name | Version | Schema | Description
---------+---------+------------+------------------------------
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
plrust | 1.0 | plrust | plrust: Created by pgx
(2 rows)
but when run
CREATE EXTENSION IF NOT EXISTS plrust;
CREATE OR REPLACE FUNCTION sum_array(a BIGINT[]) RETURNS BIGINT
IMMUTABLE STRICT
AS
$$
[dependencies]
# Add Cargo.toml dependencies here.
[code]
Some(a.into_iter().map(|v| v.unwrap_or_default()).sum())
$$
LANGUAGE PLRUST;
SELECT sum_array(ARRAY[1,2,3]);
|
v
NOTICE: extension "plrust" already exists, skipping
ERROR: failed to build function shared library: Os { code: 2, kind: NotFound, message: "No such file or directory" }
CONTEXT: src/plrust.rs:88:10
SQL state: XX000
My configuration in potgresql.conf
#------------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#------------------------------------------------------------------------------
# Add settings for extensions here
plrust.pg_config = '/usr/bin/pg_config'
plrust.work_dir = '/var/lib/postgresql/plrust/scratch'
In scratch I can see
postgres@DESKTOP-4D4BT6J:~/plrust/scratch$ ls -R
.:
fn24576_2200_32794
./fn24576_2200_32794:
Cargo.toml src
./fn24576_2200_32794/src:
lib.rs
postgres@DESKTOP-4D4BT6J:~/plrust/scratch$ tail fn24576_2200_32794/Cargo.toml
pgx = "*"
pgx-macros = "*"
# Add Cargo.toml dependencies here.
[profile.release]
panic = "unwind"
opt-level = 3
lto = "fat"
codegen-units = 1
postgres@DESKTOP-4D4BT6J:~/plrust/scratch$ tail fn24576_2200_32794/src/lib.rs
use pgx::*;
#[pg_extern]
fn plrust_fn_32794(a: Array<i64>) -> Option<i64> {
Some(a.into_iter().map(|v| v.unwrap_or_default()).sum())
}
Happened in #159. There's no obvious sensible reason why it changed. I should not have had to comment out that assert from snooze()
.
Similar to the issue for transaction semantics around CREATE OR REPLACE FUNCTION
for concurrent transactions, there's an issue with staleness in LOADED_SYMBOLS.
https://github.com/tcdi/plrust/blob/main/src/plrust.rs#L26-L28
If a session A executes a PL/Rust function foo, it loads/caches the symbol into LOADED_SYMBOLS
.
However if another session B were to run CREATE OR REPLACE foo
we never unload the symbols from session A meaning it'll execute with a stale function definition for the remainder of session A.
Initial thoughts are to check the TopTransactionXid and store it alongside LOADED_SYMBOLS? That way if the TopTransactionXid changes, it'll always LOAD the symbol again and prevent staleness to a certain extent
We also need to forbid:
no_mangle
, link
, link_section
, etc (need to look to see if this is a complete list of stable ones). These are unsafe, but do not require the unsafe keyword (although this may change).extern
blocks, as they allow confusing LLVM about the actual signature of functions, and grant access to functions we would like users to not even have the address of. They may become unsafe extern
in a future Rust edition, but I've only seen this suggested and not proposed.Originally posted by @thomcc in #63 (comment)
pgcentralfoundation/pgrx#834 has landed and completely blows up previous assumptions regarding how PL/Rust has to work with respect to PGX (but in a good way, mostly???). But it means everything in postgrestd and PL/Rust has to reevaluate in light of this new information.
After some discussions with @JohnHVancouver, I'm not sure if we should continue down the path of using our new plrust.plrust_proc
table for storing shared objects or if we should hijack the prosrc
column from the pg_proc
system catalog?
pg_proc.prosrc
\sf
commandplrust.plrust_proc
tableI'm not sure what the right answer is. When I write this out, it's kinda clear we ought to continue with our own table. So if that's the answer, how do we handle mutating the table such that plrust can do what it needs but a regular user from SQL can't?
Not everyone is happy about having a development toolchain on their database machine.
PL/Rust should support some sort of remote compilation facility.
local
or remote
builds, and not require any development tools locally if remote
is specified.remote
is specified, PL/Rust should provide GUC settings to configure the remote.What does that remote compilation process look like? Do we send just the user's string of Rust code, or does PL/Rust generate the entire crate and ship it off?
Could we utilize well known environments like AWS Lambda, Cloudflare workers, or Fastly's compute edge to create these artifacts?
What does this API look like?
This is a request for comments, we'd like to hear user (or potential user) opinions and get feedback!
As highlighted by @jim-mlodgenski in #9, PL/Rust is currently an "Untrusted" language. This is mostly due to immaturity of the project.
It's our intent to provide a "Trusted" language handler with this repo. For a review of the distinction between trusted and untrusted, see the PostgreSQL PL/Perl docs.
PL/Rust should be restricted from using certain operations such as:
Rust is considerably different than Perl, Python, Node (via PL/v8) in that it is not interpreted, nor does it have a runtime we can embed in PL/Rust.
Rust creates either target-specific shared objects or binaries, or it creates portable wasm
artifacts.
The process of building the code can potentially mutate the build environment (via
build.rs
or proc macros), but this is a topic for a separate issue.
The shared objects (or binaries) are roughly equivalent to C code, and get executed as such. Due to it's nature, Rust also naturally includes abundant unsafe support, unlike these other interpreted languages. A Rust function from a shared library can do as it pleases with the memory of the process, for example.
At this time, PL/Rust is loaded as a PostgreSQL extension, and inside of that process it finds and dyloads the produced cdylib
.
Doing this, library initialization routines occur, and all normal precautions around calling unknown foreign functions must be followed. In general, this process is not trusted.
Since PL/Rust created libraries are ultimately loaded by the postmaster
binary, they can do whatever any other pgx
extension can do, or any C
function.
Ultimately, there are two paths we've been discussing:
#[forbid(unsafe)]
and disabling std
, ensuring users only use core
and alloc
, along with other allowlisted also #[no_std]
crates (so the user cannot simply extern crate std;
.wasmtime
(or some other runtime) and use wasm's sandboxing features from inside the postmaster process. (Note this path does not require any disallowing of unsafe code, as the function will already be sandboxed, so it's free to do whatever it wants to it's memory.)There are several factors contributing to a decision:
I'd like to see what it would look like to design (develop?) and otherwise maintain a custom rust compilation target for Postgres that could give us a nearly "safe" (as Rust defines it) and "trusted" (as Postgres defines a procedural language) plrust.
As I know very little about this topic, please take these points with a grain of salt, but I suppose we need:
postmaster
process (ie, no ability to reach into Postgres' memory space, despite us running in it)And some other points of consideration:
malloc
/free
or can we instead use Postgres' palloc/
pfree` functions?Please feel free to add other ideas here.
Hey there,
This is really cool! All the examples I see in lib.rs
are immutable functions. Is there any support for selecting/inserting/updating rows?
It's worth examining possible usage of
unsafe
codecargo cvecheck
, but I digress...unsafe
code that we will have to depend on anyways, via cargo miri
or other meansand even if none of these are directly usable by the PL/Rust build tooling, or cannot be used in continuous integration and testing scenarios, to provide possible advisories regarding using them in concert.
Some attributes are implicitly unsafe
, without containing the unsafe
token in Rust. I will refer to these as "unsafe attributes". We probably need to replicate the work in #128 to make sure we detect these.
There is an issue with OUT parameters being in the front of the argument list. When creating a function like this:
CREATE FUNCTION one_out(OUT o integer, i integer) AS
$$
Some(100 + i.unwrap())
$$
LANGUAGE plrust;
I get a compilation error:
error[E0425]: cannot find value `i` in this scope
However, if I switch the order of the arguments, it works fine.
CREATE FUNCTION one_out(i integer, OUT o integer) AS
$$
Some(100 + i.unwrap())
$$
LANGUAGE plrust;
When attempting to create a plrust package with the Dockerfile below I get a segfault.
FROM docker.io/library/postgres:15
RUN apt update
RUN apt -y install postgresql-server-dev-15 git curl build-essential pkg-config
USER postgres
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | bash -s -- -y
ENV PATH="/var/lib/postgresql/.cargo/bin:${PATH}"
RUN cargo install cargo-pgx --version 0.6.0-alpha.1 --locked
RUN cargo pgx init --pg15 /usr/bin/pg_config
WORKDIR /var/lib/postgresql
RUN git clone https://github.com/tcdi/plrust.git
RUN cd plrust && git checkout 2831168a8a2be5e3f955b3b1ed39e7699f620aa2
RUN cd plrust && cargo pgx package
Output below:
Compiling pgx-macros v0.6.0-alpha.1
Compiling pgx v0.6.0-alpha.1
Compiling plrust v0.0.0 (/var/lib/postgresql/plrust)
Finished release [optimized] target(s) in 3m 30s
Installing extension
Copying control file to target/release/plrust-pg15/usr/share/postgresql/15/extension/plrust.control
Copying shared library to target/release/plrust-pg15/usr/lib/postgresql/15/lib/plrust.so
Discovering SQL entities
Discovered 4 SQL entities: 0 schemas (0 unique), 3 functions, 0 types, 0 enums, 1 sqls, 0 ords, 0 hashes, 0 aggregates, 0 triggers
Segmentation fault (core dumped)
The command '/bin/sh -c cd plrust && cargo pgx package' returned a non-zero code: 139
Dmesg
[21089.258062] cargo-pgx[118625]: segfault at 0 ip 00007f6084adeaf6 sp 00007fff476b7f08 error 4 in libc-2.31.so[7f6084991000+15a000]
In https://github.com/tcdi/plrust/blob/23d005ec2e1df3dd4f52e7af7f0ee243346aaeed/src/generation.rs I notice a lot of discussion about dlclose
. Do we need to support this? dlclose
ing libstd-using code is not really supported (on any platform), and is broken/unsound in a number of ways.
If we need to support this, we could probably seal off the relevant parts in postgrestd -- the main issue is thread locals, although there are a lot of related problems as well.
It seems that the #[cfg(feature="postgrestd")]
implementation of pg_try
turns it into a no-op, which seems to be done so the guard doesn't inject additional setjmp
calls (for performance reasons, as I understand it).
But pg_try
can still be called directly in safe code, so making it a no-op is confusing.
For trusted plrust, we can't allow pg_try
to actually work, so I suppose we should block it?
What would make sense to me is if pgx had a feature to control error handling (maybe error_abort
or something), and a separate feature (maybe trusted_mode
) that would disable things like PANIC and pg_try
that we can't allow in a trusted PL. Making the trusted-ness rely on the error behavior seems confusing to me, unless I'm missing something.
In cases where the .so file is not found, such as cases where the .so file is only created on writer but not on replica, or cases where the machines somehow switches to use a CPU on another architecture (e.g. x86 -> aarch64, vice versa). We'd need to use plrust.recompile_function
to recompile all the existing functins.
We can introduce a boolean guc which controls whether plrust would recompile a function if it found that the .so file is missing.
create or replace function fn1(i int) returns int strict language plrust as $$
[code]
notice!("{}", "fn1 started");
let cmd = format!("select fn2({})", i);
Spi::execute(|client|
{
client.select(&cmd, None, None);
});
notice!("{}", "fn1 finished");
Some(1)
$$;
create or replace function fn2(i int) returns int strict language plrust as $$
[code]
notice!("{}", "fn2 started");
notice!("{}", "fn2 finished");
Some(2)
$$;
select fn1(1);
NOTICE: fn1 started
ERROR: already borrowed: BorrowMutError
CONTEXT: src/plrust.rs:50:56
SQL statement "select fn2(1)"
The custom Github Actions runners for arm64 (at the time of this writing) take about 30 minutes to build, which seems a bit excessive. In the past, there was some work done for other various projects (such as PGX) to decrease their build times by implementing various caching mechanisms. The same thing should be done here, but albeit in a different manner considering the arm64 runners are hosted in a completely different environment than the out-of-the-box Github Actions runners.
Some code that can be written for trusted language handlers might still soundly call unsafe
code, technically speaking (in Rust terms), but PL/Rust should probably provide a suite of blessed functions that do the typical things we expect safely. There are many ways to lock down Safe Rust, but we should probably develop a plan for allowing easily auditing and eliminating unsafe
code, including in dependencies (see #26), and thereby prevent PL/Rust code from using unsafe
to escape any boxes we put it in.
For dependencies, I believe this may be easy to do with cargo build --build-plan
or --unit-graph
or a similar functionality and simply using rustc --forbid unsafe_code
.
Slightly more complicated is the code handed directly to PL/Rust. We can't simply ban unsafe
code in that because using pgx
expands to include unsafe
code currently! We instead probably need to audit the code for unsafe
while it is flowing through the slightly misleadingly named pgx-utils
, which is PGX's SQL generation proc macro. We could hypothetically make sure everything we expand is safe code but that might be misleading and/or unsound so I am not inclined to presume that is on the table. Instead, we should find some approach to auditing the way pgx
expands.
unsafe
depsunsafe
in user fn itselfIn PL/Rust if you create a function like so and execute it:
CREATE OR REPLACE FUNCTION test_open_rel() RETURNS text
IMMUTABLE STRICT
LANGUAGE PLRUST AS
$$
[dependencies]
# Add Cargo.toml dependencies here.
[code]
use pgx::*;
let rel = rel::PgRelation::with_lock(1259, 0);
Some("foo".to_string())
$$;
it'll crash the database when assertions are enabled. Here's the logs from engine:
2022-11-03 08:55:30.605 UTC [17140] LOG: database system is ready to accept connections
TRAP: FailedAssertion("lockmode != NoLock || IsBootstrapProcessingMode() || CheckRelationLockedByMe(r, AccessShareLock, true)", File: "re
lation.c", Line: 70, PID: 17154)
postgres: hsuchen plrust 127.0.0.1(58986) SELECT(ExceptionalCondition+0x90)[0x907e90]
postgres: hsuchen plrust 127.0.0.1(58986) SELECT(relation_open+0xa7)[0x4a2327]
Presumably it's failing the lockmode != NoLock
portion.
The quick fix would be to cfg[plrust] and check for lockmode == 0 but I'm not sure if that's the right thing to do. I wonder if there's a use case in PL/Rust to attempt to lock something, and come back to re-lock it again with NoLock? I suppose if you're calling things in one transaction across multiple functions and you might've locked it earlier?
target_postgrestd
causes some problems for cross compilation, because we use {x86_64,aarch64}-unknown-linux-postgres
for target triples when std is postgrestd.
We should find a way to work around this in a satisfying and non-hacky way, but as a stop gap, when under target_postgrestd
, we should set the following environment vars:
Set CARGO_TARGET_{arch.to_uppercase()}_UNKNOWN_LINUX_POSTGRES_LINKER
to whatever value we would set CARGO_TARGET_{arch.to_uppercase()}_UNKNOWN_LINUX_GNU_LINKER
linker to if `target_postgrestd were inactive.
For example CARGO_TARGET_AARCH64_UNKNOWN_LINUX_POSTGRES_LINKER=aarch64-linux-gnu-gcc
.
Set BINDGEN_EXTRA_CLANG_ARGS_{rust_postgrestd_target}
to -target {real_rust_target} -ccc-gcc-name {target_linker} -isystem {target_header_location}
, where for now we can just use /usr/{real_target_without_unknown}/include
for the target_header_location
. For example
let real_rust_target = "aarch64-unknown-linux-gnu";
let postgrestd_target = real_target.replace("gnu", "postgres"); // `aarch64-unknown-linux-postgres`
let toolchain_prefix = real_target.replace("-unknown", ""); // `aarch64-linux-gnu`
let linker = format!("{toolchain_prefix}-gcc"); // `aarch64-linux-gnu-gcc`
let header_location = format!("/usr/{toolchain_prefix}/include");
// You'd probably attach this to the Command in reality.
let env_to_set: &[(String, String)] = &[
(
format!("CARGO_TARGET_{}_LINKER", postgrestd_target.to_uppercase().replace('-', "_")),
linker.clone(),
),
(
format!("BINDGEN_EXTRA_CLANG_ARGS_{}", postgrestd_target),
format!("-target {real_rust_target} -ccc-gcc-name {linker} -isystem {header_location}"),
),
];
This is not very robust and is extremely specific to how e.g. Debian's cross-compile toolchains are set up, but it gets the point across (and allows anybody who takes the time to set one up in the same manner to have things work).
A BorrowMutError occurs if a function errored once previously preventing it from being executed. This would prevent more complex functions from being ran again in the same session once it's errored.
It occurs here https://github.com/tcdi/plrust/blob/main/src/plrust.rs#L50
loaded_symbols
is still borrowed after an error. Related to error handling in pgx?
plrust=# CREATE OR REPLACE FUNCTION raise_error() RETURNS TEXT
IMMUTABLE STRICT
LANGUAGE PLRUST AS
$$
[dependencies]
# Add Cargo.toml dependencies here.
[code]
pgx::error!("foo error");
Some("hi".to_string())
$$;
CREATE FUNCTION
plrust=# select raise_error();
ERROR: foo error
CONTEXT: <unknown>:0:0
plrust=# select raise_error();
ERROR: already borrowed: BorrowMutError
CONTEXT: src/plrust.rs:50:56
plrust=#
is trusted or untrusted extension?
Normal Rust control flow should always simply gracefully shutdown. We should test to make sure this happens. Now that pgcentralfoundation/pgrx#613 has landed we can actually do this.
We aren't going to worry about the specific case of someone deliberately trying to signal an abort, though. That's on them.
create or replace procedure p1() language plrust as $$
[code]
Some(())
$$;
CREATE PROCEDURE
call p1();
ERROR: returned Option<T> was NULL
PL/Rust doesn't currently support user defined types
plrust=# CREATE TYPE compfoo AS (f1 int, f2 text);
CREATE OR REPLACE FUNCTION test_type(a compfoo) RETURNS TEXT
ERROR: type "compfoo" already exists
plrust-# IMMUTABLE STRICT
plrust-# LANGUAGE PLRUST AS
plrust-# $$
plrust$# [dependencies]
plrust$#
plrust$#
plrust$# [code]
plrust$# Some("hi")
plrust$# $$;
NOTICE: HELLO
NOTICE: user_deps is: {}
ERROR:
0: Oid `163863` was not mappable to a Rust type
PL/Rust has two distinct functionalities: compiling the code and executing the code. When executing doesn't have the required artifacts available, we could attempt to invoke the compilation side to get that code and thus "lazy-load" functions, but these are otherwise fairly separate events. Splitting these out as far as possible throughout the plrust
crate seems desirable, especially if we want to be able to enable remote compilation for clarity's sake if nothing else.
plrust
be a heavily feature-flagged crate with different modules.Right now PL/Rust keeps produced shared objects on the local filesystem in the directory specified by the plrust.work_dir
GUC.
We'd like to have an option for PL/Rust to keep these artifacts in a specific system catalog (configurable via GUC).
This system catalog would store the artifacts as a bytea
in a table like:
CREATE TABLE plrust.artifacts (
name text,
data bytea,
);
The setting could be like plrust.artifact_store
and it could support directory
or table
.
If directory
is specified, we could have a GUC plrust.artifact_directory
(replacing plrust.work_dir
).
If table
is specified, we could have a GUC plrust.artifact_table
. (taking a string)
We should also consider what the migration story looks like between them, and how PL/Rust does (or does not) facilitate that.
These are largely internal, but are still somewhat cryptic, yet understanding them is very important to us as we develop them.
user_crate
's main moduleuser_crate
:
mod build;
mod crate_variant;
mod crating;
mod loading;
mod ready;
mod target;
mod verify;
Hi,
I am getting the below error while executing cargo pgx init
in postgres13.
root@test:~/plrust# cargo pgx init
Discovered Postgres v13.3, v12.7, v11.12, v10.17
Downloading Postgres v10.17 from https://ftp.postgresql.org/pub/source/v10.17/postgresql-10.17.tar.bz2
Downloading Postgres v11.12 from https://ftp.postgresql.org/pub/source/v11.12/postgresql-11.12.tar.bz2
Downloading Postgres v13.3 from https://ftp.postgresql.org/pub/source/v13.3/postgresql-13.3.tar.bz2
Downloading Postgres v12.7 from https://ftp.postgresql.org/pub/source/v12.7/postgresql-12.7.tar.bz2
Untarring Postgres v11.12 to /root/.pgx/11.12
Configuring Postgres v11.12
[error] "/root/.pgx/11.12/configure" "--prefix=/root/.pgx/11.12/pgx-install" "--with-pgport=28811" "--enable-debug" "--enable-cassert"
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking which template to use... linux
checking whether NLS is wanted... no
checking for default port number... 28811
checking for block size... 8kB
checking for segment size... 1GB
checking for WAL block size... 8kB
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking for gawk... no
checking for mawk... mawk
checking whether gcc supports -Wdeclaration-after-statement, for CFLAGS... yes
checking whether gcc supports -Wendif-labels, for CFLAGS... yes
checking whether g++ supports -Wendif-labels, for CXXFLAGS... yes
checking whether gcc supports -Wmissing-format-attribute, for CFLAGS... yes
checking whether g++ supports -Wmissing-format-attribute, for CXXFLAGS... yes
checking whether gcc supports -Wformat-security, for CFLAGS... yes
checking whether g++ supports -Wformat-security, for CXXFLAGS... yes
checking whether gcc supports -fno-strict-aliasing, for CFLAGS... yes
checking whether g++ supports -fno-strict-aliasing, for CXXFLAGS... yes
checking whether gcc supports -fwrapv, for CFLAGS... yes
checking whether g++ supports -fwrapv, for CXXFLAGS... yes
checking whether gcc supports -fexcess-precision=standard, for CFLAGS... yes
checking whether g++ supports -fexcess-precision=standard, for CXXFLAGS... no
checking whether gcc supports -funroll-loops, for CFLAGS_VECTOR... yes
checking whether gcc supports -ftree-vectorize, for CFLAGS_VECTOR... yes
checking whether gcc supports -Wunused-command-line-argument, for NOT_THE_CFLAGS... no
checking whether gcc supports -Wformat-truncation, for NOT_THE_CFLAGS... yes
checking whether gcc supports -Wstringop-truncation, for NOT_THE_CFLAGS... no
checking whether the C compiler still works... yes
checking how to run the C preprocessor... gcc -E
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking allow thread-safe client libraries... yes
checking whether to build with ICU support... no
checking whether to build with Tcl... no
checking whether to build Perl modules... no
checking whether to build Python modules... no
checking whether to build with GSSAPI support... no
checking whether to build with PAM support... no
checking whether to build with BSD Authentication support... no
checking whether to build with LDAP support... no
checking whether to build with Bonjour support... no
checking whether to build with OpenSSL support... no
checking whether to build with SELinux support... no
checking whether to build with systemd support... no
checking whether to build with XML support... no
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ld used by GCC... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for ranlib... ranlib
checking for strip... strip
checking whether it is possible to strip libraries... yes
checking for ar... ar
checking for a BSD-compatible install... /usr/bin/install -c
checking for tar... /bin/tar
checking whether ln -s works... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for bison... no
checking for flex... no
checking for perl... /usr/bin/perl
configure: using perl 5.26.1
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking if compiler needs certain flags to reject unknown flags... no
checking for the pthreads library -lpthreads... no
checking whether pthreads work without any flags... no
checking whether pthreads work with -Kthread... no
checking whether pthreads work with -kthread... no
checking for the pthreads library -llthread... no
checking whether pthreads work with -pthread... yes
checking for joinable pthread attribute... PTHREAD_CREATE_JOINABLE
checking if more special flags are required for pthreads... no
checking for PTHREAD_PRIO_INHERIT... yes
checking pthread.h usability... yes
checking pthread.h presence... yes
checking for pthread.h... yes
checking for strerror_r... yes
checking for getpwuid_r... yes
checking for gethostbyname_r... yes
checking whether strerror_r returns int... no
checking for main in -lm... yes
checking for library containing setproctitle... no
checking for library containing dlopen... -ldl
checking for library containing socket... none required
checking for library containing shl_load... no
checking for library containing getopt_long... none required
checking for library containing crypt... -lcrypt
checking for library containing shm_open... -lrt
checking for library containing shm_unlink... none required
checking for library containing clock_gettime... none required
checking for library containing fdatasync... none required
checking for library containing sched_yield... none required
checking for library containing gethostbyname_r... none required
checking for library containing shmget... none required
checking for library containing readline... no
configure: WARNING:
*** Without Bison you will not be able to build PostgreSQL from Git nor
*** change any of the parser definition files. You can obtain Bison from
*** a GNU mirror site. (If you are using the official distribution of
*** PostgreSQL then you do not need to worry about this, because the Bison
*** output is pre-generated.)
configure: WARNING:
*** Without Flex you will not be able to build PostgreSQL from Git nor
*** change any of the scanner definition files. You can obtain Flex from
*** a GNU mirror site. (If you are using the official distribution of
*** PostgreSQL then you do not need to worry about this because the Flex
*** output is pre-generated.)
configure: error: readline library not found
If you have readline already installed, see config.log for details on the
failure. It is possible the compiler isn't looking in the proper directory.
Use --without-readline to disable readline support.
Currently when we read the .so from the table, we write a temporary file in the work_dir
https://github.com/tcdi/plrust/blob/main/plrust/src/plrust_proc.rs#L81-L82
and then use Library::new to instantiate the library and get symbols from it.
https://github.com/tcdi/plrust/blob/main/plrust/src/user_crate/ready.rs#L33-L34
It seems unnecessary to re-write it into disk, wondering if something like mmap with dlopen would work better here
Hi,
Is this extension support on Postgres 13?
Rustup's ability to allow pinning a rustc version is convenient, but it is becoming increasingly unwise, as the Rust ecosystem slips forward, to keep our code pinned against a nightly version. The main blocker is fixing postgrestd so it allows advancement.
This is one of the more likely routes for state leakage that could plausibly break with SQL roles, and proved instrumental in techniques like:
const
would remain acceptable as it has more restrictions on how it is used.
The shell code in init_array is always executed first leading to arbitrary shell execution potentially.
-- Repro removed temporarily
Is there something in Cargo/rustc or a more "Rust" way to detect and prevent these things?
Hello,
Currently in the plrust.work_dir = 'scratch'
we put everything in there.
This makes it difficult to have additional restrictions in place for these directories as it's hard to distinguish between a plrust_foo.so
vs plrust_fn_oid_49152
(the cargo project) vs the target
directory.
Similar to wanting to dump temporary code/artifacts in /tmp
and the final artifact in /usr/bin
we're thinking of one of the following changes:
Keep one configuration parameter plrust.work_dir = 'scratch'
and create ~/scratch/compile
where compile
has something like ~/scratch/compile/target
and ~/scratch/compile/package
. While the final plrust.so
would live in ~/scratch/artifacts
Move to two configuration parameters plrust.work_dir
and plrust.artifact_dir
where work_dir would have similar semantics today, and artifact_dir just contains the final artifact.
We are rapidly approaching (really, have passed already several times over) the point where it does not make sense to continue pinning a version, even for the convenience of anyone's security processes.
core::ffi::CStr
. It allows removing dependencies (core_cstr
), so it reduces the amount of code that anyone has to worry about in the supply chain, but by pinning to 1.61 we are losing that opportunity.std::thread::scope
. It allows removing a large dependency in the build system for pgx-pg-sys
, the rayon
dependency which brings in about a dozen other crates, so it significantly reduces the amount of code that anyone has to worry about in the supply chain, but by pinning to 1.61 we are losing that opportunity.#[link_section]
is not independently detected as unsafe
until then, only #[no_mangle]
but that is, again, still an unsupported version in security terms, so realistically, moving from an unsupported version to an unsupported version is not really acceptable.cargo pgx
for the specific pinned rustc version. These pointless rebuilds slow down every single patch I might write for PL/Rust. It can easily consume an hour or more on pure faffing about when switching back and forth, partly assisted by sometimes forgetting and only hitting the secondary issues caused by not doing so, thus having to debug it, and only then remembering to rebuild and reinstall, etc. This is already a test suite that takes too long to run and debug, this makes it even more unconscionable.unsafe
detection upstream in rustc and submitting patches for what we find, and that would only come to stable ~2 versions after it merges.std::backtrace::Backtrace
. It's intended to be a key component of newer, more sophisticated and better-scoped error-handling in Rust. Even if it's mostly going to be a diagnostic improvement, it's a lot to not have.In #25 we discussed a bit about creating sandboxing for the user-written PL/Rust code. Part of that security posture is related to dependencies.
Right now, a user could go
CREATE EXTENSION IF NOT EXISTS plrust;
CREATE OR REPLACE FUNCTION sum_array(a BIGINT[]) RETURNS BIGINT
IMMUTABLE STRICT
LANGUAGE PLRUST AS
$$
[dependencies]
some_malicious_crate = "1"
[code]
Some(a.into_iter().map(|v| v.unwrap_or_default()).sum())
$$;
SELECT sum_array(ARRAY[1,2,3]);
Resulting in the build.rs
or proc macros inside some_malicious_crate
running on the database host. This is clearly not desirable.
Unfortunately, denying dependencies with a build.rs
or proc macros would eliminate many very useful crates.
Instead, we'd like to create a way to configure an allow list of crates.
Obvious options for this are GUCs and tables.
Unfortunately, GUCs cannot be arrays. We could however have the GUC take a path perhaps.
Instead, this kind of potentially large dataset is typically stored in tables. So perhaps we could have a GUC which defines a table name which PL/Rust draws this data.
plrust.allowed_crate_table = "plrust.allowed_crates"
Where the table is
CREATE TABLE plrust.allowed_crates (
name text,
version text,
features text
);
Given this list, PL/Rust
could also pre-download and even pre-build these dependencies for use by users.
For the sake of deployment cases, it may be useful to be able to point PL/Rust
at an existing CARGO_HOME
(eg ~/.cargo/
) or existing registry. This would permit, for example, cases where a user wanted to deploy PL/Rust in an environment without internet.
While #27 is no longer on the table, in order to make #70 happen, and to make #26 easy, PL/Rust needs builds that are slightly more finely staged and offer more user input, so the user can easily recover things like the Cargo.toml and Cargo.lock that PL/Rust is going to use. In addition, we would like to run the build under various debug or other settings of their choice. Currently, we hardcode all build settings to run once the crate is provisioned:
plrust/src/user_crate/state_provisioned.rs
Lines 49 to 61 in 37c2c49
For correctness purposes, it is important that after crate generation and before the build that PL/Rust guarantees that "nothing happens" from the perspective of actually running the build. However, this is also probably the perfect time for events which examine the crate without actually fully building and running it.
We want to be careful about exposing control over the build options that are input, as compiler options are implicitly "unsafe". Thus, unlimited access to the compiler's command flags may allow instructing the compiler to run the build under parameters that can hypothetically violate any safety properties we have embedded in the PL/Rust design.
Normally, TempDir
deletes the directory you made with it when you drop it. This crate uses a Lazy<TempDir>
for its tests so that parallel access means the directory is initialized the moment it is needed. However, it is then held in a static TEMP_DIR
which means it is simply fixed in memory for the lifetime of the program. This tends to result in the Drop implementation for TempDir not executing properly... accumulating gigabytes of test output shockingly quickly and filling up drives. Either pgx-tests
or plrust
need to learn how to clean up after themselves.
In a conversation with @eeeebbbbrrrr in talking about #81 we discovered that there are some opportunities to have plrust clean up artifacts at various points
<root level scratch directory>/target
)data_directory
GUC) so that any built artifacts can also be dropped in the event of a drop database
callA declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.