Giter Club home page Giter Club logo

Comments (18)

jakmeier avatar jakmeier commented on August 16, 2024 2

There was quite a bit of a Zulip discussion regarding a workaround for your particular case of hitting this problem. TLDR, until the SDK has a release that includes ed25519_verify, you can copy paste this into your code and get the exactly same functionality as the SDK will provide.

mod sys {
  extern "C" {
      pub fn ed25519_verify(
          sig_len: u64,
          sig_ptr: u64,
          msg_len: u64,
          msg_ptr: u64,
          pub_key_len: u64,
          pub_key_ptr: u64,
      ) -> u64;
  }
}

pub fn ed25519_verify(signature: &[u8; 64], message: &[u8], public_key: &[u8; 32]) -> bool {
    unsafe {
        sys::ed25519_verify(
            signature.len() as _,
            signature.as_ptr() as _,
            message.len() as _,
            message.as_ptr() as _,
            public_key.len() as _,
            public_key.as_ptr() as _,
        ) == 1
    }
}

from nearcore.

amityadav0 avatar amityadav0 commented on August 16, 2024 1

https://github.com/near-ndc/i-am-human/blob/6fa2bc56ab77cb82b39139fb170ed56ad4f75e24/contracts/oracle/src/lib.rs#L138

This is the part where it fails in this contract.

from nearcore.

jakmeier avatar jakmeier commented on August 16, 2024 1

Could it be that you were testing this on a arm-based mac? We don't use the same runtime in production and in developer mode on a mac because our production compiler backend doesn't support arm.

from nearcore.

nagisa avatar nagisa commented on August 16, 2024 1

I hope there is a better way to fix it. Maybe some simple way to avoid sign extending the 32bit value into a 64bit operand? My x86 knowledge never was that great and is definitely rusty these days, so not sure what our options are, or even if my reasoning here is correct.

Hm, so in finite-wasm the gas parameter is expected to be a 64 bit integer in the first place so the root cause is probably

fn emit_gas_const(&mut self, cost: u64) {
if let Ok(cost) = u32::try_from(cost) {
self.emit_gas(Location::Imm32(cost));
} else {
let cost_reg = self.machine.acquire_temp_gpr().unwrap();
self.assembler.emit_mov(Size::S64, Location::Imm64(cost), Location::GPR(cost_reg));
self.emit_gas(Location::GPR(cost_reg));
self.machine.release_temp_gpr(cost_reg);
}
}

Simply replacing u32::try_from with i32::try_from would be a sufficient fix (plus whatever casts are necessary to stuff the resulting value into Imm32), I believe.

Strongly agreed with regards to not releasing fixes on friday, though, especially as this would require a protocol version bump.

from nearcore.

jakmeier avatar jakmeier commented on August 16, 2024

Thanks for reporting this issue @robert-zaremba

Can you please also provide the smart contact code that triggers this problem?

from nearcore.

jakmeier avatar jakmeier commented on August 16, 2024

I can confirm this is a unintended change in the runtime behavior.

I don't know the exact details, yet. But here is a branch that shows that wasmer2 (what we had before the upgrade) and near-vm (what we use since the latest upgrade) behave differently.

https://github.com/jakmeier/nearcore/tree/reproduce_9393

Everything is contained in this commit: jakmeier@1874b9d

command to run test:

cargo test --package near-vm-runner --lib -- tests::runtime_errors::test_i_am_human_arithmetic_trap --exact --nocapture

result:

thread 'tests::runtime_errors::test_i_am_human_arithmetic_trap' panicked at 'Inconsistent VM Output:
NearVm:
VMOutcome: balance 8000000000000000000002 storage_usage 12 return data None burnt gas 712380595914 used gas 712380595914
Err: WebAssembly trap: An arithmetic exception, e.g. divided by zero.

Wasmer2:
VMOutcome: balance 8000000000000000000002 storage_usage 12 return data None burnt gas 7543360465350 used gas 7543360465350
Err: Smart contract panicked: signature error: invalid signature

I will search further to find out the root cause...

from nearcore.

robert-zaremba avatar robert-zaremba commented on August 16, 2024

@jakmeier would be great to assure that workspace-rs uses exactly the same runtime.

from nearcore.

jakmeier avatar jakmeier commented on August 16, 2024

I have a PoC fix ready, see this commit: jakmeier@1c6863e

The good news upfront: This is not a security critical problem, it can't be abused. There are just some unlucky contracts that will always fail to execute some branches.

The fix is in x86 code generation for gas accounting.
The trap is hit here when we overflow the addition:

self.assembler.emit_add(Size::S64, cost_location, Location::GPR(current_burnt_reg));
self.assembler.emit_jmp(Condition::Carry, self.special_labels.integer_overflow);

However, the input values were small enough that they shouldn't overflow. But the cost_location (type Imm32(u32)) was around 3Ggas, which is larger than i32::MAX.

So, I think what happens is that when we write this immediate value into x86 code, it just writes it as a 32bit value.
But for the addition, we specifically use 64bit addition, so the immediate value gets sign-extended into a 64-bit integer.
That would make sense if it was a signed integer, but for for an unsigned integer it makes no sense at all, so we end up with a huge number with the 32 most significant bits all set to 1.

To test my theory I added a condition that emits 3 smaller adds if the immediate has the most significant bit set. This way, sign extending doesn't matter. And voilà, it no longer hits the trap but still computes the correct gas cost!

I hope there is a better way to fix it. Maybe some simple way to avoid sign extending the 32bit value into a 64bit operand? My x86 knowledge never was that great and is definitely rusty these days, so not sure what our options are, or even if my reasoning here is correct.

Let's discuss this next week @nagisa, @Ekleog-NEAR, @akhi3030

from nearcore.

jakmeier avatar jakmeier commented on August 16, 2024

@robert-zaremba I'm really sorry that you are hitting this problem. We messed this one up, our testing should have caught it. Unfortunately, it didn't. And it was only reported to us when it was already in mainnet, so our agility is very much limited at this point.

We will come up with a solution that fixes it in mainnet as soon as safely possible. I hope you understand that rushing out a fix in our compiler-backend right as the weekend starts is not a very safe option and it would put all other users at risk of potential new bugs. The safe timline will need to be discussed with the team, which is OOO today except for myself. I will post an ETA here when we I have it.

Please let me know if you need help to come up with a workaround in your smart contract to avoid the problem. That's probably the fastest way to get the NDC onboarding flow going again, if you can't wait for the fix to land in mainnet.

from nearcore.

robert-zaremba avatar robert-zaremba commented on August 16, 2024

Thanks @jakmeier for quickly handling the problem.
I think every hashing can hit the trap at some point. We will be testing one workaround today, but it would be good to solve it in the runtime ASAP.

from nearcore.

robert-zaremba avatar robert-zaremba commented on August 16, 2024

Simply replacing u32::try_from with i32::try_from would be a sufficient fix

@nagisa why downcasting u64 value into i32 value is better than into u32 value? This is counter intuitive. If the value is too big, then with i32 we end up with a negative value, which could be a source of more serious exploits (long running transactions, maybe even infinite transactions).

from nearcore.

jakmeier avatar jakmeier commented on August 16, 2024

@nagisa Ahh, I see. Yes, your fix is much more reasonably than what I produced :D
I just finished reviewing your PR, I think we can move forward with it asap.

@robert-zaremba Since this requires a protocol upgrade, it will take a couple of days at least. Maybe by the end of the week (Sunday), we could have it effective in mainnet. But that assumes release engineers can pick it up quickly and that testing runs through smoothly, plus that validators are quick to upgrade their binaries.

I think every hashing can hit the trap at some point. We will be testing one workaround today, but it would be good to solve it in the runtime ASAP.

Hashing in the host function (in ed25519_verify, or even just direct calls to sha256) cannot trigger this, the gas accounting works differently there.

why downcasting u64 value into i32 value is better than into u32 value?

This isn't C, this is Rust :) If the value is too big, it will return an error, which we handle properly. The old code tried to cast to an u32 which has a range of numbers where it technically succeeds (Rust returns Ok(_)) but the generated code is invalid.

from nearcore.

robert-zaremba avatar robert-zaremba commented on August 16, 2024

@jakmeier wasm doesn't support unsigned integers?

from nearcore.

jakmeier avatar jakmeier commented on August 16, 2024

@jakmeier wasm doesn't support unsigned integers?

What WASM supports or doesn't support is completely irrelevant here. The conversion from u64 to u32 or i32 in Rust is independent of the target. Also, the target here is x86, since the code doing the conversion is the compiler itself, not something that happens inside the VM runtime. The error happens at runtime, yes, but only because the compiler generated bad code.

On a side note, I would say the error "WebAssembly trap" is somewhat misleading, too. It's not a trap in the user provided WASM code. But actually in the x86 code that we inserted between the user's code. There is no line in the original WASM that we could point to where this is caused. In fact, I would say the jump to the exit code taken in this scenario can only ever happen if our runtime has bugs, so it should probably say something like "Gas accounting error" instead.

from nearcore.

jakmeier avatar jakmeier commented on August 16, 2024

@robert-zaremba did you manage to get a workaround going?

Since this is not a vulnerability, the normal flow to deploy this patch would be to have it included in the next testnet release and have it run for a month there before we let it loose on mainnet. But this would put the ETA more like 5-6 weeks from now.

from nearcore.

robert-zaremba avatar robert-zaremba commented on August 16, 2024

@jakmeier not really. I've checked:

The compilation issue is related to the wasm support (random normally is not allowed in smart contracts):

error: the wasm*-unknown-unknown targets are not supported by default, you may need to enable the "js" feature. For more information see: https://docs.rs/getrandom/#webassembly-support
   --> /home/robert/.cargo/registry/src/index.crates.io-6f17d22bba15001f/getrandom-0.2.10/src/lib.rs:285:9
    |
285 | /         compile_error!("the wasm*-unknown-unknown targets are not supported by \
286 | |                         default, you may need to enable the \"js\" feature. \
287 | |                         For more information see: \
288 | |                         https://docs.rs/getrandom/#webassembly-support");
    | |________________________________________________________________________^

error[E0433]: failed to resolve: use of undeclared crate or module `imp`
   --> /home/robert/.cargo/registry/src/index.crates.io-6f17d22bba15001f/getrandom-0.2.10/src/lib.rs:341:9

from nearcore.

robert-zaremba avatar robert-zaremba commented on August 16, 2024

Note that 5-6 weeks is not acceptable for us :(

from nearcore.

akhi3030 avatar akhi3030 commented on August 16, 2024

The bot automatically closed this. I'm reopening it till we find a solution for @robert-zaremba.

from nearcore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.