mattconte / tlsf Goto Github PK

Two-Level Segregated Fit memory allocator implementation.

C 100.00%

tlsf's Introduction

tlsf

Two-Level Segregated Fit memory allocator implementation. Written by Matthew Conte ([email protected]). Released under the BSD license.

Features

O(1) cost for malloc, free, realloc, memalign
Extremely low overhead per allocation (4 bytes)
Low overhead per TLSF management of pools (~3kB)
Low fragmentation
Compiles to only a few kB of code and data
Support for adding and removing memory pool regions on the fly

Caveats

Currently, assumes architecture can make 4-byte aligned accesses
Not designed to be thread safe; the user must provide this

Notes

This code was based on the TLSF 1.4 spec and documentation found at:

http://www.gii.upv.es/tlsf/main/docs

It also leverages the TLSF 2.0 improvement to shrink the per-block overhead from 8 to 4 bytes.

History

2016/04/10 - v3.1

Code moved to github
tlsfbits.h rolled into tlsf.c
License changed to BSD

2014/02/08 - v3.0

This version is based on improvements from 3DInteractive GmbH
Interface changed to allow more than one memory pool
Separated pool handling from control structure (adding, removing, debugging)
Control structure and pools can still be constructed in the same memory block
Memory blocks for control structure and pools are checked for alignment
Added functions to retrieve control structure size, alignment size, min and max block size, overhead of pool structure, and overhead of a single allocation
Minimal Pool size is tlsf_block_size_min() + tlsf_pool_overhead()
Pool must be empty when it is removed, in order to allow O(1) removal

2011/10/20 - v2.0

64-bit support
More compiler intrinsics for ffs/fls
ffs/fls verification during TLSF creation in debug builds

2008/04/04 - v1.9

Add tlsf_heap_check, a heap integrity check
Support a predefined tlsf_assert macro
Fix realloc case where block should shrink; if adjacent block is in use, execution would go down the slow path

2007/02/08 - v1.8

Fix for unnecessary reallocation in tlsf_realloc

2007/02/03 - v1.7

tlsf_heap_walk takes a callback
tlsf_realloc now returns NULL on failure
tlsf_memalign optimization for 4-byte alignment
Usage of size_t where appropriate

2006/11/21 - v1.6

ffs/fls broken out into tlsfbits.h
tlsf_overhead queries per-pool overhead

2006/11/07 - v1.5

Smart realloc implementation
Smart memalign implementation

2006/10/11 - v1.4

Add some ffs/fls implementations
Minor code footprint reduction

2006/09/14 - v1.3

Profiling indicates heavy use of blocks of size 1-128, so implement small block handling
Reduce pool overhead by about 1kb
Reduce minimum block size from 32 to 12 bytes
Realloc bug fix

2006/09/09 - v1.2

Add tlsf_block_size
Static assertion mechanism for invariants
Minor bugfixes

2006/09/01 - v1.1

Add tlsf_realloc
Add tlsf_walk_heap

2006/08/25 - v1.0

First release

tlsf's People

Contributors

Stargazers

Watchers

Forkers

shadercoder samplecount andryblack alexshilucky lubing521 zeromus timatxia ak-mdufour sunfirefox wanglehui biojppm chenbd lamarqua alnsn graydon velvitonator minad alenwesker rmoswela jwki an4967 menzi11 mkitby forevertsai kong123456 abc00 anlofw loveyakamoz indrajithbandara corefan puppytodd jcarrano hichard wloop andylamp akritid czllong1989 stangelandcl jiejietop zydzy1988 badembed yuanzhaoming hyx-jetbrains xieshren charliexp 123zmz123 p-fly nickzhuang0613 adg7660 ablakatovmikhail xiaboyang qilezhang felixqin mooaccel tnodir ulipe guipengliao ryan191019 churbis vladserhiienko arthurfait myotest wa2301 kawbuma verdantnibble maplebeats peteryusw shixiongfei yulei zhuyu1997 feiyunwill jinminglee jackrambor mus1afa euphy-wong magetron alexshi126 donny9 xiaoxiang781216 fastshift yangjie11 mfkiwl eecsmap leidafu superymk zqp2013 xiaokaih tutubinary llhe andrei-pokrovsky catoffly plasmacel philippe44 touowang manmuqingshan xiaolitongxue666 bigcirclelaw junglereal forestliwei hqusxylixiang

tlsf's Issues

calls printf without ifdef

The code calls printf in a few places without an ifdef.
I'm considering using this in an embedded system where printf either doesn't exist,
or can't be called in certain contexts.
Is this something you would be interested in as a pull request,
and if so what macro name would you prefer and which direction?
Thanks

block_size_max can be larger

tlsf/tlsf.c

Line 341 in deff9ab

static const size_t block_size_max = tlsf_cast(size_t, 1) << FL_INDEX_MAX;

The calculation of block_size_max doesnt take into account the fact that we still have the full range of second-level indices for FL_INDEX_MAX first-level index. Lets take a look on indices of current block_size_max block:
fl==FL_INDEX_MAX
sl==0

We can observe that we don`t use a range 2^fl - (2^(fl + 1) - 1) effectively, as we set block_size_max to the smallest possible size in this range, although we still can use all possible second-level ranges for fl=FL_INDEX_MAX, but the last one, easily without being afraid to get out of bounds. To summarize, indices we really wanna have for the block of size block_size_max are:
fl==FL_INDEX_MAX
sl==SL_INDEX_MAX

The correct calculation for block_size_max now has to be:
block_size_max = (1 << FL_INDEX_MAX) + ((1 << (FL_INDEX_MAX - SL_INDEX_COUNT_LOG2)) *(SL_INDEX_COUNT - 1))

This calculation provides us largest possible block size as smallest possible block for fl==FL_INDEX_MAX, sl==SL_INDEX_MAX, which will not get out of bounds for both mapping_search and mapping_insert. We are able to verify that this size is really the largest possible block size by checking that the next possible block size will "overflow" fl index.

Status of this project

Hi Matt,

at first, thanks for your work! I wonder what the status of this project is. Are there further developments planned? I tried to integrate this in a project of mine and did modifications to the API (expose less details, support returning free memory to the OS). I found tests by @jserv, see https://github.com/jserv/tlsf-bsd/tree/master/tests. Do you have your own testsuite? I also saw that @alnsn was working on improvements.

Are you interested in merging changes or do you want to keep things as is?

modified api:

typedef struct tlsf_s* tlsf_t;

typedef void* (*tlsf_map_t)(size_t* size, void* user);
typedef void  (*tlsf_unmap_t)(void* mem, size_t size, void* user);

tlsf_t tlsf_create(tlsf_map_t map, tlsf_unmap_t unmap, void* user);
void   tlsf_destroy(tlsf_t t);
void*  tlsf_malloc(tlsf_t t, size_t size);
void*  tlsf_calloc(tlsf_t t, size_t size);
void*  tlsf_realloc(tlsf_t t, void* mem, size_t size);
void   tlsf_free(tlsf_t t, void* mem);

See https://github.com/minad/tlsf (Sorry if the diff is in an unusable state due to dos2unix and formatting bikeshedding, but this can be reverted)

tlsf_create_with_pool crashes with this memory size

calling tlsf_create_with_pool with this exact size crashes on my machine:

size_t size = tlsf_block_size_max() + tlsf_size() + tlsf_pool_overhead();
char* mem = (char*)malloc(size);
auto t = tlsf_create_with_pool(mem, size);

The sizes which are 8 bytes bigger or smaller do not crash.

crashes in insert_free_block on this line:

current->prev_free = block;

Thread 1: EXC_BAD_ACCESS (code=2, address=0x100000019)

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x100000019)
    frame #0: 0x0000000100002ee8 gc2 test`insert_free_block(control=0x0000000106800000, block=0x0000000106801980, fl=25, sl=0) at tlsf.c:612
  * frame #1: 0x0000000100001b5c gc2 test`block_insert(control=0x0000000106800000, block=0x0000000106801980) at tlsf.c:638
    frame #2: 0x00000001000019b3 gc2 test`tlsf_add_pool(tlsf=0x0000000106800000, mem=0x0000000106801988, bytes=4294967312) at tlsf.c:1018
    frame #3: 0x00000001000020b9 gc2 test`tlsf_create_with_pool(mem=0x0000000106800000, bytes=4294973848) at tlsf.c:1100
    frame #4: 0x0000000100003fcc gc2 test`main(argc=1, argv=0x00007ffeefbff7e8) at main.cpp:301
    frame #5: 0x00007fff78e10115 libdyld.dylib`start + 1
    frame #6: 0x00007fff78e10115 libdyld.dylib`start + 1

Feature: query requested size from block (with recipe)

For a port of TLSF to our systems programming language, I had the requirement to be able to recover the requested size from a block, rather than the adjusted size. I implemented it in my port, and so far it turns out to work well, with no new spatial overhead added. I thought I'd let you know how it works, in case you (maintainer or user) are interested in replicating it:

block.size now stores the requested size shifted by two bits - for 64-bit that's no problem, but for 32-bit we'd truncate two bits (though I see the implementation has a 30-bit limit there anyway).
block_set_size() now accepts the requested size rather than the adjusted size.
block_size() transparently aligns the returned size by ALIGN_SIZE; since in all previous adjustments, adjust == align_up(size, ALIGN_SIZE), this effectively has block_size() work as it did before.
block_absorb() uses block_set_size(block_size(prev) + block_size(block) + block_header_overhead) rather than offsetting prev.size directly, which fixes the operation for the new format and also quantizes the block size of free blocks.
block_split() receives both adjusted and requested size; adjust is used for all calculations; request is passed to block_set_size().
likewise, block_trim_free(), block_trim_used(), block_prepare_used() forward the requested size along with the adjusted size.
block_trim_free_leading() just passes the same remaining size to both adjusted and requested size.
malloc() and memalign() pass the requested size to block_prepare_used() along with the adjusted size.
tlsf_realloc() passes the requested size to block_trim_used() along with the adjusted size.
a new API function tlsf_block_request_size() retrieves the requested size from block.size, without aligning it.

[Feature request] More effective reallocation API

Current reallocation way unconditionally moves entire old allocation to new memory area if there's no room to grow in place.
Growing dynamic arrays seem a bit ineffective in this case, when element needs to be inserted or prepended, rather than appended.
One of following additional functions could be used when data movement needs to be performed in special way.

void* realloc_need_move   (ptr, size);
/* Test if realloc requires data move due to lack of space to grow in place.
 * If size is smaller than actual allocation - return false unconditionally */

void* realloc_try_inplace (ptr, size);
/* Try to realloc without data move.
 * Return NULL if can't realloc without data move. */

void* realloc_or_malloc   (ptr, size);
/* Almost like default realloc, but if data move is required, just do new malloc,
 * without data movement and deletion of old area. */

void* realloc_with_offsets (ptr, size, offsets);
/* Like original realloc, but also add offsets. This should be move effective when reallocated data are moved.
 * shifts - array of two interleaved elements: offset, [pos, offset,...]
 * - positive offset shifts subsequent data towards end, resulting to gap
 * - negative offset does same shift but backward, overwriting previous elements
 * - zero offset terminates offsets list
 * Optimizations could be possible e.g., when one offset is followed by same offset in reverse direction. */

I know at least one real example, where add/remove of elements in dynamic array always done by manual malloc with special data move. Of course, it's not even tlsf user.

why is it thread-unsafe?

From the code, I can't find it is thread-unsafe, why is it thead-unsafe?

What is the practical meaning of SL_INDEX_COUNT_LOG2?

Hi,

I made a test with an 64 bit i5 CPU using 1MB pool size with different SL_INDEX_COUNT_LOG2 settings and allocation sizes.

The algorithm:

    size_t alloc_size = 1; //or 8 or 1024;
    void * p;
    uint32_t c = 0;
    do {
        p = lv_mem_alloc(alloc_size); //Just a wrapper around tlsf_alloc
        c++;
    }while(p);

    //Uses lv_tlsf_walk_pool to get some memory info
    lv_mem_monitor_t m;
    lv_mem_monitor(&m);

    printf("count: %d, free: %d\n", c, m.free_size);
    return 0;

The results:
alloc_size=1

SL_INDEX_COUNT_LOG=5 count: 32415, free: 0
SL_INDEX_COUNT_LOG=1 count: 32510, free: 0

alloc_size=8

SL_INDEX_COUNT_LOG=5 count: 32415, free: 0
SL_INDEX_COUNT_LOG=1 count: 32510, free: 0

alloc_size=1024

SL_INDEX_COUNT_LOG=5 count: 1006, free: 80
SL_INDEX_COUNT_LOG=1 count: 1009, free: 24

So with SL_INDEX_COUNT_LOG=1 always more blocks were allocated, I guess it's because of the smaller control_t overhead.

Hence, SL_INDEX_COUNT_LOG=1 always seems better than 5, but I probably I miss something. Could you comment on this?

How to change malloc memory alignment, 64-bits system must align 16-bytes, 32-bits system must align 8-bytes.

The memory address allocated by the malloc function must be aligned twice void *, which means that for 64-bit system, 16-byte alignment is required, and for 32-bit system requires 8-byte alignment.

If you simply modify the macro definition, TLSF will crash directly. How to do this modification?

diff --git a/tlsf.c b/tlsf.c
index af57573..6e87539 100644
--- a/tlsf.c
+++ b/tlsf.c
@@ -211,15 +211,15 @@ enum tlsf_public
        SL_INDEX_COUNT_LOG2 = 5,
 };

-/* Private constants: do not modify. */
+/* Private constants: do not modify. why ??? */
 enum tlsf_private
 {
 #if defined (TLSF_64BIT)

-       /* All allocation sizes and addresses are aligned to 8 bytes. */
-       ALIGN_SIZE_LOG2 = 3,

+       /* All allocation sizes and addresses are aligned to 8(->16) bytes. */
+       ALIGN_SIZE_LOG2 = 4,
         #else

-       /* All allocation sizes and addresses are aligned to 4 bytes. */
-       ALIGN_SIZE_LOG2 = 2,

+       /* All allocation sizes and addresses are aligned to 4(->8) bytes. */
+       ALIGN_SIZE_LOG2 = 3,
         #endif
        ALIGN_SIZE = (1 << ALIGN_SIZE_LOG2),

Question on block_size_min

Firstly, this is very cool - thank you for sharing it!!

In tlsf.c, the definition of block_size_min seems too small.
My reasoning is that a free block must have the size, next_free and prev_free pointers at the start, plus hold the prev_phys_block pointer at the end (from the next block in memory).
So it seems it should be sizeof(block_header_t) exactly?

I think it probably works because when asking for even 1 byte it is rounded up to 4 bytes, and hence provides a structure of 16 bytes.

/*
** A free block must be large enough to store its header minus the size of
** the prev_phys_block field, and no larger than the number of addressable
** bits for FL_INDEX.
*/
#define block_size_min  (sizeof(block_header_t) - sizeof(block_header_t*))

security issue: prevent BadAlloc

see: https://msrc-blog.microsoft.com/2021/04/29/badalloc-memory-allocation-vulnerabilities-could-affect-wide-range-of-iot-and-ot-devices-in-industrial-medical-and-enterprise-networks/

in case tlsf_alloc is called with 0xffffffffu on a 32bit system the align_up() function will set the adjusted size to 0 and the tlfs_max() will set it to the minimum which was in my case 12 bytes. Therefore asking TLSF for SIZE_T_MAX (0xffffffff) will not fail with returning NULL but will return a pointer to a 12 byte memory block.

tlsf_alloc is kind of easy to fix, but the _realloc brother is less obvious. So fixing it in adjust_request_size() isn't really working.

Feature: Allocate at Fixed Address

Would it be a good fit for tlsf to support attempting to allocate memory at a desired address within the available range? (Sort of mimicking the behavior of mmap with MAP_FIXED) Or is that categorically impossible / prohibitively expensive?

Background: For a compiler project where I need a deterministic allocator operating on a fixed address range to improve offline caching behavior, the recursive allocation features of TLSF are quite useful to compartmentalize the allocations of individual modules, and reproduce local allocation behavior on every run. But if the subranges themselves are allocated in a different order, then the address mapping changes, busting the cache, even though nothing else within the modules has changed.

How I'd expect this feature to work would be to have a function such as void* tlsf_malloc_at(tlsf_t tlsf, void *ptr, size_t bytes);, which allocates size bytes at ptr if possible; if that fails, it tries any address larger than ptr (allowing us to allocate by desired starting offset); if that fails, it regresses to regular tlsf_malloc.

Example/Test requested

Hi Matt,

If this will be the case, it is possible to have in a future release a couple of examples on how
to use the library? Especially in regards to the initialization step.

Required pool size for a single allocation

Hi, I was wondering if it's possible to calculate the required pool size for a single allocation.

I use the tlsf in my allocator and add new pools with a fixed size when running out of memory. But for some large allocations which are greater than my pool size I want to add a pool for that allocation with as little overhead as possible. But I haven't found a formula for this.

Some test on x64:

  pool size      max alloc size          delta
----------------------------------------------
  1,048,576           1,032,192         16,384
  1,048,592           1,048,576             16
134,217,728         132,120,576      2,097,152
134,217,744         134,217,728             16

Any ideas? :)

Thanks
kim

Fix docs URL

URL has changed to: http://www.gii.upv.es/tlsf/main/docs.html

How can I get free heap size?

First，it,s celebratory that tlsf work well in SDRAM with STM32H750.

But how can I know the free size just like esp_get_free_heap_size();
I try to log info after malloc, but the value not change.Or I need to count that in my way?
log_d("tlsf size:%d", tlsf_size()); log_d("tlsf align size:%d", tlsf_align_size()); log_d("tlsf block size_min:%d", tlsf_block_size_min()); log_d("tlsf block size max:%d", tlsf_block_size_max());

No license in tlfs.c

We are considering using tlsf.c in the RTEMS kernel however an issue that has been raised is the lack of license details in the source file. Having a license means the source clearly identifies its origin when in the RTEMS source code. A version with this add would be a big help. Can we add a BSD-2 license with a copyright of:

Thanks
Chris

question on maximum block size

hello matt,

the maximum block size for 32bit is calculated like this:

FL_INDEX_MAX = 30,
[...]
static const size_t block_size_max = 1 << FL_INDEX_MAX;

which evaluates to 1gb maximum size that can be allocated..
but in theory, 32bit would allow up to 4 gb (1 << 32)

Why is this limitation, and would it be safe to increase this for example to 1 << 31 for 32bit application so that block_size_max is then 2gb?