Comments (18)
There was quite a bit of a Zulip discussion regarding a workaround for your particular case of hitting this problem. TLDR, until the SDK has a release that includes ed25519_verify, you can copy paste this into your code and get the exactly same functionality as the SDK will provide.
mod sys {
extern "C" {
pub fn ed25519_verify(
sig_len: u64,
sig_ptr: u64,
msg_len: u64,
msg_ptr: u64,
pub_key_len: u64,
pub_key_ptr: u64,
) -> u64;
}
}
pub fn ed25519_verify(signature: &[u8; 64], message: &[u8], public_key: &[u8; 32]) -> bool {
unsafe {
sys::ed25519_verify(
signature.len() as _,
signature.as_ptr() as _,
message.len() as _,
message.as_ptr() as _,
public_key.len() as _,
public_key.as_ptr() as _,
) == 1
}
}
from nearcore.
This is the part where it fails in this contract.
from nearcore.
Could it be that you were testing this on a arm-based mac? We don't use the same runtime in production and in developer mode on a mac because our production compiler backend doesn't support arm.
from nearcore.
I hope there is a better way to fix it. Maybe some simple way to avoid sign extending the 32bit value into a 64bit operand? My x86 knowledge never was that great and is definitely rusty these days, so not sure what our options are, or even if my reasoning here is correct.
Hm, so in finite-wasm the gas parameter is expected to be a 64 bit integer in the first place so the root cause is probably
nearcore/runtime/near-vm/compiler-singlepass/src/codegen_x64.rs
Lines 351 to 360 in 4b878ec
Simply replacing u32::try_from
with i32::try_from
would be a sufficient fix (plus whatever casts are necessary to stuff the resulting value into Imm32
), I believe.
Strongly agreed with regards to not releasing fixes on friday, though, especially as this would require a protocol version bump.
from nearcore.
Thanks for reporting this issue @robert-zaremba
Can you please also provide the smart contact code that triggers this problem?
from nearcore.
I can confirm this is a unintended change in the runtime behavior.
I don't know the exact details, yet. But here is a branch that shows that wasmer2 (what we had before the upgrade) and near-vm (what we use since the latest upgrade) behave differently.
https://github.com/jakmeier/nearcore/tree/reproduce_9393
Everything is contained in this commit: jakmeier@1874b9d
command to run test:
cargo test --package near-vm-runner --lib -- tests::runtime_errors::test_i_am_human_arithmetic_trap --exact --nocapture
result:
thread 'tests::runtime_errors::test_i_am_human_arithmetic_trap' panicked at 'Inconsistent VM Output:
NearVm:
VMOutcome: balance 8000000000000000000002 storage_usage 12 return data None burnt gas 712380595914 used gas 712380595914
Err: WebAssembly trap: An arithmetic exception, e.g. divided by zero.
Wasmer2:
VMOutcome: balance 8000000000000000000002 storage_usage 12 return data None burnt gas 7543360465350 used gas 7543360465350
Err: Smart contract panicked: signature error: invalid signature
I will search further to find out the root cause...
from nearcore.
@jakmeier would be great to assure that workspace-rs
uses exactly the same runtime.
from nearcore.
I have a PoC fix ready, see this commit: jakmeier@1c6863e
The good news upfront: This is not a security critical problem, it can't be abused. There are just some unlucky contracts that will always fail to execute some branches.
The fix is in x86 code generation for gas accounting.
The trap is hit here when we overflow the addition:
nearcore/runtime/near-vm/compiler-singlepass/src/codegen_x64.rs
Lines 396 to 397 in 4b878ec
However, the input values were small enough that they shouldn't overflow. But the cost_location
(type Imm32(u32)
) was around 3Ggas, which is larger than i32::MAX
.
So, I think what happens is that when we write this immediate value into x86 code, it just writes it as a 32bit value.
But for the addition, we specifically use 64bit addition, so the immediate value gets sign-extended into a 64-bit integer.
That would make sense if it was a signed integer, but for for an unsigned integer it makes no sense at all, so we end up with a huge number with the 32 most significant bits all set to 1.
To test my theory I added a condition that emits 3 smaller adds if the immediate has the most significant bit set. This way, sign extending doesn't matter. And voilà, it no longer hits the trap but still computes the correct gas cost!
I hope there is a better way to fix it. Maybe some simple way to avoid sign extending the 32bit value into a 64bit operand? My x86 knowledge never was that great and is definitely rusty these days, so not sure what our options are, or even if my reasoning here is correct.
Let's discuss this next week @nagisa, @Ekleog-NEAR, @akhi3030
from nearcore.
@robert-zaremba I'm really sorry that you are hitting this problem. We messed this one up, our testing should have caught it. Unfortunately, it didn't. And it was only reported to us when it was already in mainnet, so our agility is very much limited at this point.
We will come up with a solution that fixes it in mainnet as soon as safely possible. I hope you understand that rushing out a fix in our compiler-backend right as the weekend starts is not a very safe option and it would put all other users at risk of potential new bugs. The safe timline will need to be discussed with the team, which is OOO today except for myself. I will post an ETA here when we I have it.
Please let me know if you need help to come up with a workaround in your smart contract to avoid the problem. That's probably the fastest way to get the NDC onboarding flow going again, if you can't wait for the fix to land in mainnet.
from nearcore.
Thanks @jakmeier for quickly handling the problem.
I think every hashing can hit the trap at some point. We will be testing one workaround today, but it would be good to solve it in the runtime ASAP.
from nearcore.
Simply replacing u32::try_from with i32::try_from would be a sufficient fix
@nagisa why downcasting u64
value into i32
value is better than into u32
value? This is counter intuitive. If the value is too big, then with i32
we end up with a negative value, which could be a source of more serious exploits (long running transactions, maybe even infinite transactions).
from nearcore.
@nagisa Ahh, I see. Yes, your fix is much more reasonably than what I produced :D
I just finished reviewing your PR, I think we can move forward with it asap.
@robert-zaremba Since this requires a protocol upgrade, it will take a couple of days at least. Maybe by the end of the week (Sunday), we could have it effective in mainnet. But that assumes release engineers can pick it up quickly and that testing runs through smoothly, plus that validators are quick to upgrade their binaries.
I think every hashing can hit the trap at some point. We will be testing one workaround today, but it would be good to solve it in the runtime ASAP.
Hashing in the host function (in ed25519_verify
, or even just direct calls to sha256
) cannot trigger this, the gas accounting works differently there.
why downcasting u64 value into i32 value is better than into u32 value?
This isn't C, this is Rust :) If the value is too big, it will return an error, which we handle properly. The old code tried to cast to an u32
which has a range of numbers where it technically succeeds (Rust returns Ok(_)
) but the generated code is invalid.
from nearcore.
@jakmeier wasm doesn't support unsigned integers?
from nearcore.
@jakmeier wasm doesn't support unsigned integers?
What WASM supports or doesn't support is completely irrelevant here. The conversion from u64
to u32
or i32
in Rust is independent of the target. Also, the target here is x86, since the code doing the conversion is the compiler itself, not something that happens inside the VM runtime. The error happens at runtime, yes, but only because the compiler generated bad code.
On a side note, I would say the error "WebAssembly trap"
is somewhat misleading, too. It's not a trap in the user provided WASM code. But actually in the x86 code that we inserted between the user's code. There is no line in the original WASM that we could point to where this is caused. In fact, I would say the jump to the exit code taken in this scenario can only ever happen if our runtime has bugs, so it should probably say something like "Gas accounting error" instead.
from nearcore.
@robert-zaremba did you manage to get a workaround going?
Since this is not a vulnerability, the normal flow to deploy this patch would be to have it included in the next testnet release and have it run for a month there before we let it loose on mainnet. But this would put the ETA more like 5-6 weeks from now.
from nearcore.
@jakmeier not really. I've checked:
ed25519_dalek
(the original lib - hits the trap)near_crypto
(both ed25519 and secp256k1).- k256) as well as everything in https://github.com/RustCrypto/elliptic-curves and https://docs.rs/ecdsa/latest/ecdsa/ (doesn't compile)
- https://docs.rs/secp256k1 (doesn't compile)
The compilation issue is related to the wasm support (random normally is not allowed in smart contracts):
error: the wasm*-unknown-unknown targets are not supported by default, you may need to enable the "js" feature. For more information see: https://docs.rs/getrandom/#webassembly-support
--> /home/robert/.cargo/registry/src/index.crates.io-6f17d22bba15001f/getrandom-0.2.10/src/lib.rs:285:9
|
285 | / compile_error!("the wasm*-unknown-unknown targets are not supported by \
286 | | default, you may need to enable the \"js\" feature. \
287 | | For more information see: \
288 | | https://docs.rs/getrandom/#webassembly-support");
| |________________________________________________________________________^
error[E0433]: failed to resolve: use of undeclared crate or module `imp`
--> /home/robert/.cargo/registry/src/index.crates.io-6f17d22bba15001f/getrandom-0.2.10/src/lib.rs:341:9
from nearcore.
Note that 5-6 weeks is not acceptable for us :(
from nearcore.
The bot automatically closed this. I'm reopening it till we find a solution for @robert-zaremba.
from nearcore.
Related Issues (20)
- [stateless_validation] Integration with in-memory trie HOT 2
- [stateless_validation][tracking issue] Security concerns Round-1 HOT 3
- [stateless_validation] Top contracts storage HOT 1
- State sync makes GCS snapshotting more risky
- [stateless_validation] Proper error handling for `on_chunk_completed`
- [State Sync] Fix external state sync files monitoring
- StateTransitionData disk usage on mainnet
- [ProjectTracking] [State Sync] Improvements
- [Forknet] Support split storage nodes in mirror mocknet test HOT 1
- [stateless_validation] Solve 2x Latency Problem
- [stateless_validation] test_chunk_forwarding_optimization fails with --features statelessnet_protocol HOT 2
- Neard 1.37.0-rc.2 Process Crash - Stack overflow HOT 1
- Recent node; Lots of errors gc stop height.
- estimator: I/O estimation does not count syscall failures
- Consider adding validator rewards to RPC /validators
- Remove MockEpochManager HOT 1
- Improve split archival node migration
- Remove `is_chunk_producer_for_epoch` from EpochManagerAdapter
- Lychee lints CI check failing HOT 1
- [Forknet] Pre-release testing of resharding
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nearcore.