Giter Club home page Giter Club logo

Comments (6)

gupta-ak avatar gupta-ak commented on July 27, 2024

For "Performance tuning of allocation granularity within enclave":
I wrote the following program to test memory allocation granuality between 1 byte to 1 GB:

#define ITERS 1000000

static inline size_t _GetTimespecDuration(struct timespec* start, struct timespec* end)
{
    return 1000000000ULL * (end->tv_sec - start->tv_sec) + (end->tv_nsec - start->tv_nsec);
}

static inline void _TestMalloc(size_t size, size_t maxSize)
{
    struct timespec tp0, tp1;
    size_t iters = (maxSize / size < ITERS) ? (maxSize / size) : ITERS;
    void** data = malloc(sizeof(void*) * iters);

    printf("FOR SIZE: %zu\n", size);

    OE_TEST(clock_gettime(CLOCK_REALTIME, &tp0) == 0);
    for (int i = 0; i < iters; i++)
    {
        data[i] = malloc(size);
    }
    OE_TEST(clock_gettime(CLOCK_REALTIME, &tp1) == 0);

    size_t dur = _GetTimespecDuration(&tp0, &tp1);
    printf("---malloc: %zu ns for %zu iters (%zu ns/alloc)\n", dur, iters, dur/iters);

    OE_TEST(clock_gettime(CLOCK_REALTIME, &tp0) == 0);
    for (int i = 0; i < iters; i++)
    {
        free(data[i]);
    }
    OE_TEST(clock_gettime(CLOCK_REALTIME, &tp1) == 0);

    dur = _GetTimespecDuration(&tp0, &tp1);
    printf("---free: %zu ns for %zu iters (%zu ns/free)\n", dur, iters, dur/iters);

    free(data);
}

static inline void RunMallocBenchmark()
{
    size_t pow2s[] = {
        1 << 0, /* 1 */
        1 << 1, /* 2 */
        1 << 2, /* 4 */
        1 << 3, /* 8 */
        1 << 4, /* 16 */
        1 << 5, /* 32 */
        1 << 6, /* 64 */
        1 << 7, /* 128 */
        1 << 8, /* 256 */
        1 << 9, /* 512 */
        1 << 10, /* 1K */
        1 << 11, /* 2K */
        1 << 12,  /* 4K */
        1 << 13, /* 8K */
        1 << 14, /* 16K */
        1 << 15, /* 32K */
        1 << 16, /* 64K */
        1 << 17, /* 128K */
        1 << 18, /* 256K */
        1 << 19, /* 512K */
        1 << 20, /* 1M */
        1 << 21, /* 2MB */
        1 << 22, /* 4MB */
        1 << 23, /* 8MB */
        1 << 24, /* 16 MB */
        1 << 25, /* 32 MB */
        1 << 26, /* 64 MB */
        1 << 27, /* 128 MB */
        1 << 28, /* 256 MB */
        1 << 29, /* 512 MB */
        1 << 30 /* 1GB */
    };
    size_t length = sizeof(pow2s) / sizeof(pow2s[0]);

    for (int i = 0; i < length; i++)
    {
        _TestMalloc(pow2s[i], pow2s[length - 1]);
    }
}

For enclave memory, I saw the following behavior:
DEBUG

  • Roughly constant time allocation for sizes 1 - 32 bytes at ~2200 ns per alloc.
  • Sublinear growth allocation for size 64 to 256 bytes at 2400 to 3400ns per alloc.
  • Linear growth for sizes beyond >256. At 2K bytes and beyond, malloc was increasing linearly with the allocation size. This meant that allocating 1GB took roughly 5 seconds.

RELEASE

  • Similar behavior for 1-64 as debug but 10x faster.
  • Linear growth from 256 to 4K at 1.4us to 21us.
  • Constant time at > 4K allocations.

The Debug malloc is significantly slower because it has a memset(X, X, size) inside oe_debug_malloc, so it scales linearly. The Release malloc just calls dlmalloc directly, so it scales much better.

In contrast, host memory is significantly faster with the slowest host allocations being similar to the fastest enclave allocations.

  • Constant time from sizes 1 to 64 at ~20-30ns per alloc.
  • Linear time from 128 to 4K at ~50-950ns per alloc.
  • Sublinear? time from 8K to 4MB, it's not consistent, but it gradually goes up from 950ns to 2000ns.
  • At >= 4MB allocations, malloc runs essentially in constant time at around 2000ns per alloc.

The release enclave and the host memory allocators seem similar in scaling with release enclave being consistently 10x slower than host memory.

from openenclave.

gupta-ak avatar gupta-ak commented on July 27, 2024

DEBUG

Here's the raw data I got from my program:

===Running Host malloc benchmark test.
FOR SIZE: 1
---malloc: 35387821 ns for 1000000 iters (35 ns/alloc)
---free: 9371273 ns for 1000000 iters (9 ns/free)
FOR SIZE: 2
---malloc: 16118481 ns for 1000000 iters (16 ns/alloc)
---free: 8508984 ns for 1000000 iters (8 ns/free)
FOR SIZE: 4
---malloc: 21835504 ns for 1000000 iters (21 ns/alloc)
---free: 8442885 ns for 1000000 iters (8 ns/free)
FOR SIZE: 8
---malloc: 22242299 ns for 1000000 iters (22 ns/alloc)
---free: 8525184 ns for 1000000 iters (8 ns/free)
FOR SIZE: 16
---malloc: 21983702 ns for 1000000 iters (21 ns/alloc)
---free: 8504285 ns for 1000000 iters (8 ns/free)
FOR SIZE: 32
---malloc: 24989061 ns for 1000000 iters (24 ns/alloc)
---free: 9331273 ns for 1000000 iters (9 ns/free)
FOR SIZE: 64
---malloc: 33905941 ns for 1000000 iters (33 ns/alloc)
---free: 12931925 ns for 1000000 iters (12 ns/free)
FOR SIZE: 128
---malloc: 48431844 ns for 1000000 iters (48 ns/alloc)
---free: 31925767 ns for 1000000 iters (31 ns/free)
FOR SIZE: 256
---malloc: 72416418 ns for 1000000 iters (72 ns/alloc)
---free: 46122574 ns for 1000000 iters (46 ns/free)
FOR SIZE: 512
---malloc: 127830866 ns for 1000000 iters (127 ns/alloc)
---free: 53170079 ns for 1000000 iters (53 ns/free)
FOR SIZE: 1024
---malloc: 238691464 ns for 1000000 iters (238 ns/alloc)
---free: 65725709 ns for 1000000 iters (65 ns/free)
FOR SIZE: 2048
---malloc: 240127144 ns for 524288 iters (458 ns/alloc)
---free: 59507693 ns for 524288 iters (113 ns/free)
FOR SIZE: 4096
---malloc: 252927770 ns for 262144 iters (964 ns/alloc)
---free: 44060903 ns for 262144 iters (168 ns/free)
FOR SIZE: 8192
---malloc: 118985887 ns for 131072 iters (907 ns/alloc)
---free: 22272798 ns for 131072 iters (169 ns/free)
FOR SIZE: 16384
---malloc: 61688563 ns for 65536 iters (941 ns/alloc)
---free: 11542944 ns for 65536 iters (176 ns/free)
FOR SIZE: 32768
---malloc: 32182463 ns for 32768 iters (982 ns/alloc)
---free: 6147416 ns for 32768 iters (187 ns/free)
FOR SIZE: 65536
---malloc: 17463363 ns for 16384 iters (1065 ns/alloc)
---free: 3595151 ns for 16384 iters (219 ns/free)
FOR SIZE: 131072
---malloc: 9164176 ns for 8192 iters (1118 ns/alloc)
---free: 2544866 ns for 8192 iters (310 ns/free)
FOR SIZE: 262144
---malloc: 5764722 ns for 4096 iters (1407 ns/alloc)
---free: 1464880 ns for 4096 iters (357 ns/free)
FOR SIZE: 524288
---malloc: 2920261 ns for 2048 iters (1425 ns/alloc)
---free: 770590 ns for 2048 iters (376 ns/free)
FOR SIZE: 1048576
---malloc: 1565779 ns for 1024 iters (1529 ns/alloc)
---free: 474193 ns for 1024 iters (463 ns/free)
FOR SIZE: 2097152
---malloc: 868888 ns for 512 iters (1697 ns/alloc)
---free: 349595 ns for 512 iters (682 ns/free)
FOR SIZE: 4194304
---malloc: 504093 ns for 256 iters (1969 ns/alloc)
---free: 172797 ns for 256 iters (674 ns/free)
FOR SIZE: 8388608
---malloc: 251396 ns for 128 iters (1964 ns/alloc)
---free: 278796 ns for 128 iters (2178 ns/free)
FOR SIZE: 16777216
---malloc: 124498 ns for 64 iters (1945 ns/alloc)
---free: 152898 ns for 64 iters (2389 ns/free)
FOR SIZE: 33554432
---malloc: 64899 ns for 32 iters (2028 ns/alloc)
---free: 76799 ns for 32 iters (2399 ns/free)
FOR SIZE: 67108864
---malloc: 41200 ns for 16 iters (2575 ns/alloc)
---free: 35099 ns for 16 iters (2193 ns/free)
FOR SIZE: 134217728
---malloc: 29399 ns for 8 iters (3674 ns/alloc)
---free: 20000 ns for 8 iters (2500 ns/free)
FOR SIZE: 268435456
---malloc: 7200 ns for 4 iters (1800 ns/alloc)
---free: 10900 ns for 4 iters (2725 ns/free)
FOR SIZE: 536870912
---malloc: 4200 ns for 2 iters (2100 ns/alloc)
---free: 7000 ns for 2 iters (3500 ns/free)
FOR SIZE: 1073741824
---malloc: 2500 ns for 1 iters (2500 ns/alloc)
---free: 5100 ns for 1 iters (5100 ns/free)

===Running enclave malloc benchmark test.
FOR SIZE: 1
---malloc: 2218000000 ns for 1000000 iters (2218 ns/alloc)
---free: 2006000000 ns for 1000000 iters (2006 ns/free)
FOR SIZE: 2
---malloc: 2169000000 ns for 1000000 iters (2169 ns/alloc)
---free: 1996000000 ns for 1000000 iters (1996 ns/free)
FOR SIZE: 4
---malloc: 2151000000 ns for 1000000 iters (2151 ns/alloc)
---free: 2023000000 ns for 1000000 iters (2023 ns/free)
FOR SIZE: 8
---malloc: 2163000000 ns for 1000000 iters (2163 ns/alloc)
---free: 2011000000 ns for 1000000 iters (2011 ns/free)
FOR SIZE: 16
---malloc: 2182000000 ns for 1000000 iters (2182 ns/alloc)
---free: 2012000000 ns for 1000000 iters (2012 ns/free)
FOR SIZE: 32
---malloc: 2257000000 ns for 1000000 iters (2257 ns/alloc)
---free: 2097000000 ns for 1000000 iters (2097 ns/free)
FOR SIZE: 64
---malloc: 2435000000 ns for 1000000 iters (2435 ns/alloc)
---free: 2267000000 ns for 1000000 iters (2267 ns/free)
FOR SIZE: 128
---malloc: 2775000000 ns for 1000000 iters (2775 ns/alloc)
---free: 2634000000 ns for 1000000 iters (2634 ns/free)
FOR SIZE: 256
---malloc: 3437000000 ns for 1000000 iters (3437 ns/alloc)
---free: 3299000000 ns for 1000000 iters (3299 ns/free)
FOR SIZE: 512
---malloc: 4813000000 ns for 1000000 iters (4813 ns/alloc)
---free: 4626000000 ns for 1000000 iters (4626 ns/free)
FOR SIZE: 1024
---malloc: 7465000000 ns for 1000000 iters (7465 ns/alloc)
---free: 7291000000 ns for 1000000 iters (7291 ns/free)
FOR SIZE: 2048
---malloc: 6705000000 ns for 524288 iters (12788 ns/alloc)
---free: 6651000000 ns for 524288 iters (12685 ns/free)
FOR SIZE: 4096
---malloc: 6182000000 ns for 262144 iters (23582 ns/alloc)
---free: 6153000000 ns for 262144 iters (23471 ns/free)
FOR SIZE: 8192
---malloc: 5928000000 ns for 131072 iters (45227 ns/alloc)
---free: 5906000000 ns for 131072 iters (45059 ns/free)
FOR SIZE: 16384
---malloc: 5702000000 ns for 65536 iters (87005 ns/alloc)
---free: 5688000000 ns for 65536 iters (86791 ns/free)
FOR SIZE: 32768
---malloc: 5618000000 ns for 32768 iters (171447 ns/alloc)
---free: 5644000000 ns for 32768 iters (172241 ns/free)
FOR SIZE: 65536
---malloc: 5595000000 ns for 16384 iters (341491 ns/alloc)
---free: 5574000000 ns for 16384 iters (340209 ns/free)
FOR SIZE: 131072
---malloc: 5561000000 ns for 8192 iters (678833 ns/alloc)
---free: 5570000000 ns for 8192 iters (679931 ns/free)
FOR SIZE: 262144
---malloc: 5566000000 ns for 4096 iters (1358886 ns/alloc)
---free: 5552000000 ns for 4096 iters (1355468 ns/free)
FOR SIZE: 524288
---malloc: 5530000000 ns for 2048 iters (2700195 ns/alloc)
---free: 5538000000 ns for 2048 iters (2704101 ns/free)
FOR SIZE: 1048576
---malloc: 5596000000 ns for 1024 iters (5464843 ns/alloc)
---free: 5543000000 ns for 1024 iters (5413085 ns/free)
FOR SIZE: 2097152
---malloc: 5532000000 ns for 512 iters (10804687 ns/alloc)
---free: 5535000000 ns for 512 iters (10810546 ns/free)
FOR SIZE: 4194304
---malloc: 5546000000 ns for 256 iters (21664062 ns/alloc)
---free: 5534000000 ns for 256 iters (21617187 ns/free)
FOR SIZE: 8388608
---malloc: 5532000000 ns for 128 iters (43218750 ns/alloc)
---free: 5572000000 ns for 128 iters (43531250 ns/free)
FOR SIZE: 16777216
---malloc: 5550000000 ns for 64 iters (86718750 ns/alloc)
---free: 5518000000 ns for 64 iters (86218750 ns/free)
FOR SIZE: 33554432
---malloc: 5530000000 ns for 32 iters (172812500 ns/alloc)
---free: 5519000000 ns for 32 iters (172468750 ns/free)
FOR SIZE: 67108864
---malloc: 5515000000 ns for 16 iters (344687500 ns/alloc)
---free: 5522000000 ns for 16 iters (345125000 ns/free)
FOR SIZE: 134217728
---malloc: 5520000000 ns for 8 iters (690000000 ns/alloc)
---free: 5530000000 ns for 8 iters (691250000 ns/free)
FOR SIZE: 268435456
---malloc: 5530000000 ns for 4 iters (1382500000 ns/alloc)
---free: 5513000000 ns for 4 iters (1378250000 ns/free)
FOR SIZE: 536870912
---malloc: 5528000000 ns for 2 iters (2764000000 ns/alloc)
---free: 5632000000 ns for 2 iters (2816000000 ns/free)
FOR SIZE: 1073741824
---malloc: 5608000000 ns for 1 iters (5608000000 ns/alloc)
---free: 5521000000 ns for 1 iters (5521000000 ns/free)

from openenclave.

anakrish avatar anakrish commented on July 27, 2024

@gupta-ak

  1. Are you testing this against release build of enclave? Especially since the time observed is in nanoseconds.
  2. I thought performance counter didn't work within enclave. So how are you measuring time?

from openenclave.

gupta-ak avatar gupta-ak commented on July 27, 2024

This is the debug build. I can try the release build.

The time is measured from get_clocktime. Inside the enclave, it doesn't actually measure in nanoseconds, but I'm doing 1 million iterations to account for the clock's lack of precision

from openenclave.

gupta-ak avatar gupta-ak commented on July 27, 2024

@anakrish The results for Release are a lot of better. It turns out that oe_debug_malloc does a memset on the entire block, which is why malloc was scaling by the requested size. It's still roughly 10x slower than the host, but it doesn't have the linear scaling anymore.

===Running enclave malloc benchmark test.
FOR SIZE: 1
---malloc: 2215000000 ns for 10000000 iters (221 ns/alloc)
---free: 2241000000 ns for 10000000 iters (224 ns/free)
FOR SIZE: 2
---malloc: 2194000000 ns for 10000000 iters (219 ns/alloc)
---free: 2240000000 ns for 10000000 iters (224 ns/free)
FOR SIZE: 4
---malloc: 2196000000 ns for 10000000 iters (219 ns/alloc)
---free: 2262000000 ns for 10000000 iters (226 ns/free)
FOR SIZE: 8
---malloc: 2207000000 ns for 10000000 iters (220 ns/alloc)
---free: 2257000000 ns for 10000000 iters (225 ns/free)
FOR SIZE: 16
---malloc: 2199000000 ns for 10000000 iters (219 ns/alloc)
---free: 2231000000 ns for 10000000 iters (223 ns/free)
FOR SIZE: 32
---malloc: 3395000000 ns for 10000000 iters (339 ns/alloc)
---free: 3076000000 ns for 10000000 iters (307 ns/free)
FOR SIZE: 64
---malloc: 4684000000 ns for 10000000 iters (468 ns/alloc)
---free: 4731000000 ns for 10000000 iters (473 ns/free)
FOR SIZE: 128
---malloc: 6665000000 ns for 8388608 iters (794 ns/alloc)
---free: 6778000000 ns for 8388608 iters (808 ns/free)
FOR SIZE: 256
---malloc: 6102000000 ns for 4194304 iters (1454 ns/alloc)
---free: 6165000000 ns for 4194304 iters (1469 ns/free)
FOR SIZE: 512
---malloc: 5797000000 ns for 2097152 iters (2764 ns/alloc)
---free: 5881000000 ns for 2097152 iters (2804 ns/free)
FOR SIZE: 1024
---malloc: 5656000000 ns for 1048576 iters (5393 ns/alloc)
---free: 5713000000 ns for 1048576 iters (5448 ns/free)
FOR SIZE: 2048
---malloc: 5586000000 ns for 524288 iters (10654 ns/alloc)
---free: 5666000000 ns for 524288 iters (10807 ns/free)
FOR SIZE: 4096
---malloc: 5534000000 ns for 262144 iters (21110 ns/alloc)
---free: 5618000000 ns for 262144 iters (21430 ns/free)
FOR SIZE: 8192
---malloc: 2784000000 ns for 131072 iters (21240 ns/alloc)
---free: 2809000000 ns for 131072 iters (21430 ns/free)
FOR SIZE: 16384
---malloc: 1403000000 ns for 65536 iters (21408 ns/alloc)
---free: 1411000000 ns for 65536 iters (21530 ns/free)
FOR SIZE: 32768
---malloc: 704000000 ns for 32768 iters (21484 ns/alloc)
---free: 710000000 ns for 32768 iters (21667 ns/free)
FOR SIZE: 65536
---malloc: 346000000 ns for 16384 iters (21118 ns/alloc)
---free: 5000000 ns for 16384 iters (305 ns/free)
FOR SIZE: 131072
---malloc: 167000000 ns for 8192 iters (20385 ns/alloc)
---free: 7000000 ns for 8192 iters (854 ns/free)
FOR SIZE: 262144
---malloc: 82000000 ns for 4096 iters (20019 ns/alloc)
---free: 2000000 ns for 4096 iters (488 ns/free)
FOR SIZE: 524288
---malloc: 40000000 ns for 2048 iters (19531 ns/alloc)
---free: 0 ns for 2048 iters (0 ns/free)
FOR SIZE: 1048576
---malloc: 19000000 ns for 1024 iters (18554 ns/alloc)
---free: 0 ns for 1024 iters (0 ns/free)
FOR SIZE: 2097152
---malloc: 9000000 ns for 512 iters (17578 ns/alloc)
---free: 0 ns for 512 iters (0 ns/free)
FOR SIZE: 4194304
---malloc: 3000000 ns for 256 iters (11718 ns/alloc)
---free: 0 ns for 256 iters (0 ns/free)

For the other values, the clock is too imprecise and the number of iterations aren't high enough.

from openenclave.

gupta-ak avatar gupta-ak commented on July 27, 2024

Should be done now.

from openenclave.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.