Giter Club home page Giter Club logo

string_theory's Introduction

String Theory

GitHub Build Status Coverity Scan Build Status

Introduction

String Theory is a flexible modern C++ library for string manipulation and storage. It stores data internally as UTF-8, for ease of use with existing C/C++ APIs. It can also handle conversion to and from UTF-16, UTF-32, and Latin-1, and has a variety of methods to simplify text manipulation.

In addition, String Theory includes a powerful and fast type-safe string formatter (ST::format), which can be extended with custom type formatters by end-user code.

You can find the full documentation online at https://github.com/zrax/string_theory/wiki.

Why another string library?

String Theory was originally developed to replace the half-dozen or so string types and string manipulation mini-libraries in the Plasma game engine. Because of the state of the code, it was designed primarily to reduce coding errors, provide an easy to use set of manipulation functionality with minimal surprises, handle Unicode text without a lot of overhead, and have reasonable performance. Many existing string libraries provide some subset of those features, but were hard to integrate well with Plasma, or didn't meet all of our needs. Therefore, plString (and later plFormat) were born. After it had matured a while, it seemed that other projects could benefit from the string library, and so it was ported out into its own stand-alone library, which is String Theory.

String Theory's features

String Theory is designed to provide:

  • Minimal surprises. Strings are immutable objects, so you never have to worry whether your .replace() will create a copy or modify the original -- it will always return a copy even if the new string is identical.
  • UTF-8 by default. You don't have to remember what encoding your string data came in as; by the time ST::string is constructed, its data is assumed to already be in the UTF-8 encoding. This also allows easy re-use by other character-based APIs, since you don't have to first down-convert the string data from UTF-16 or UTF-32 in order to use it.
  • Easy conversion to Unicode formats. String theory provides conversion between UTF-8, UTF-16, UTF-32 and Latin-1. In addition, it can check raw character input for validity with several mechanisms (C++ exceptions, replacement of invalid characters, or just ignore).
  • Type-safe formatting. sprintf and friends are notoriously unsafe, and are one of the most common sources of bugs in string code. ST::format uses C++11's variadic templates to provide a type-safe way to format strings. String Theory also provides a mechanism to create custom formatters for end-user code, in order to extend ST::format's capabilities.
  • Good performance. String theory is optimized to be reasonably fast on a variety of compilers and systems. For ST::string, this ends up being slightly slower than C++'s std::string due to the extra encoding work. However, in my tests ST::string_stream tends to be faster or at least on par with std::stringstream, and ST::format is in the same order of magnitude as an equivalent snprintf.
  • Reentrance. Another side-effect of immutable strings is that ST::string is a fully reentrant string object with no locking necessary.
  • Cross Platform. String Theory is supported on any platform that provides a reasonably modern C++ compiler. Additional features from newer compilers are detected and enabled when supported, but not required.
  • Minimal dependencies. Currently, String Theory has no run-time dependencies aside from the C/C++ standard libraries and runtime. Additional tools may however be necessary for building String Theory or its tests.
  • Well tested. String Theory comes with an extensive suite of unit tests to ensure it works as designed.

What String Theory is NOT

  • A full Unicode library. If you need more Unicode support than just basic UTF data conversion, you probably want to use something like ICU instead.
  • A faster version of std::string. String Theory was never designed to be faster than STL, and because of its design goal to always use UTF-8 data internally, it may be slower for some use cases. However, practical tests have shown that ST::string performs at least on par with STL in many use cases, and ST::format is usually significantly faster than many other type-safe alternatives such as boost::format.
  • A regular expression library. C++11 provides a regex library which should be usable with ST::string, and I don't have a compelling reason at this point to introduce another regular expression library to String Theory.
  • A library for working with theoretical physics. Just in case you got this far and were still uncertain :).

Platform Support

string_theory supports a variety of platforms and compilers. As of March 2023, string_theory is tested and working on:

  • GCC 12 (Arch Linux x86_64 and ARMv7)
  • GCC 11 (Ubuntu 22.04 x86_64)
  • GCC 9 (Ubuntu 20.04 x86_64)
  • GCC 7 (Ubuntu 18.04 x86_64)
  • Clang 15 (Arch Linux x86_64 and ARMv7)
  • Clang 14 (Ubuntu 22.04 x86_64)
  • Clang 10 (Ubuntu 20.04 x86_64)
  • Clang 6 (Ubuntu 18.04 x86_64)
  • AppleClang 14.0 (macOS Monterey)
  • AppleClang 13.1 (macOS Monterey)
  • AppleClang 12.0 (macOS Big Sur)
  • MSVC 2022 (x64 and x86)
  • MSVC 2019 (x64 and x86)
  • MSVC 2017 (x64 and x86)
  • MinGW-w64 GCC 12 (x86_64 and i686)
  • MinGW-w64 GCC 8 (x86_64)

As of string_theory 3.0, support for some older compilers has been dropped. You'll need a compiler that supports most of C++11.

Contributing to String Theory

String Theory is Open Source software, licensed under the MIT license. Contributions are welcome, and may be submitted as issues and/or pull requests on GitHub: http://github.com/zrax/string_theory.

Some ideas for areas to contribute:

string_theory's People

Contributors

dgelessus avatar dpogue avatar hoikas avatar zrax avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

string_theory's Issues

Cmake Overrides Directory

On Windows, when utilizing string_theory as a dependency, its scripts override any string_theory directory specification given in the cmake gui. The overridden value is the local build directory. This seems to be a major blocker to using string_theory in CWE master.

MSVC++2017 filesystem

S_T does not correctly determine the presence of std::filesystem when compiling with VC++2017. Attempting to define ST_HAVE_CXX17_FILESYSTEM causes the build to fail because S_T is not compiled with C++17 features enabled. Defining ST_HAVE_EXPERIMENTAL_FILESYSTEM allows the build to proceed with support for std::experimental::filesystem, however.

Adding the following lines to CMakeLists.txt allows one to manually define ST_HAVE_CXX17_FILESYSTEM and build with std::filesystem support. However, the autodetection incorrectly undefs that and uses ST_HAVE_EXPERIMENTAL_FILESYSTEM instead.

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

Visual Studio 2022: warning C4146 in st_string_priv.h

Including string_theory\st_string.h generates the following warning (with compiler option /W3 at least):

string_theory\st_string_priv.h(157,40): warning C4146: unary minus operator applied to unsigned type, result still unsigned

Moving the - inside the static_cast should fix this issue.

Hex Format Prefix Inconsistent

When using the {#x} format specifier, eg ST::printf(fp, "\t{#x},", *codes++);, the 0x prefix will not be applied if the value passed in is 0. Other values correctly receive the prefix.

Heap buffer overflow when using ST::string::replace

Testcase:

#include <string_theory/string>

const ST::string sAmount = "$AMOUN$";
ST::string amount = "$8,565";

ST::string no_heap_buffer_overflow()
{
    // This line does not produce an address violation:
    return (ST::string("$MERCNAME$")).replace(sAmount, amount);
}

ST::string heap_buffer_overflow()
{
    // But this line does:
   return (ST::string("Insurance Claim: $MERCNAME$")).replace(sAmount, amount);
}

int main()
{
    no_heap_buffer_overflow();
    heap_buffer_overflow();
}

==55732==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000005c at pc 0x000000270606 bp 0x7ffcf5d0fd00 sp 0x7ffcf5d0f4a8
READ of size 7 at 0x60300000005c thread T0
#0 0x270605 in MemcmpInterceptorCommon(void*, int ()(void const, void const*, unsigned long), void const*, void const*, unsigned long) (/tmp/test+0x270605) (BuildId: 483272504e84bd37)
#1 0x270b2d in memcmp (/tmp/test+0x270b2d) (BuildId: 483272504e84bd37)
#2 0x3308bb in std::char_traits::compare(char const*, char const*, unsigned long) /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/char_traits.h:399:9
#3 0x3307c4 in _ST_PRIVATE::compare_cs(char const*, char const*, unsigned long) /usr/local/include/string_theory/st_string_priv.h:50:16
#4 0x330103 in _ST_PRIVATE::find_cs(char const*, unsigned long, char const*, unsigned long) /usr/local/include/string_theory/st_string_priv.h:127:17
#5 0x32d840 in ST::string::replace(ST::string const&, ST::string const&, ST::case_sensitivity_t) const /usr/local/include/string_theory/st_string.h:2286:31
#6 0x32d198 in heap_buffer_overflow() /tmp/test.cpp:15:55
#7 0x32d399 in main /tmp/test.cpp:21:5
#8 0x7f78d9f44b49 in __libc_start_call_main (/lib64/libc.so.6+0x27b49) (BuildId: 245240a31888ad5c11bbc55b18e02d87388f59a9)
#9 0x7f78d9f44c0a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x27c0a) (BuildId: 245240a31888ad5c11bbc55b18e02d87388f59a9)
#10 0x2541b4 in _start (/tmp/test+0x2541b4) (BuildId: 483272504e84bd37)

0x60300000005c is located 0 bytes after 28-byte region [0x603000000040,0x60300000005c)
allocated by thread T0 here:
#0 0x32a911 in operator new[](unsigned long) (/tmp/test+0x32a911) (BuildId: 483272504e84bd37)
#1 0x32e584 in ST::buffer::buffer(char const*, unsigned long) /usr/local/include/string_theory/st_charbuffer.h:122:37
#2 0x32df57 in ST::string::_set_utf8(char const*, unsigned long, ST::utf_validation_t) /usr/local/include/string_theory/st_string.h:101:17
#3 0x32d59e in ST::string::string(char const*, unsigned long, ST::utf_validation_t) /usr/local/include/string_theory/st_string.h:165:13
#4 0x32d17e in heap_buffer_overflow() /tmp/test.cpp:15:12
#5 0x32d399 in main /tmp/test.cpp:21:5
#6 0x7f78d9f44b49 in __libc_start_call_main (/lib64/libc.so.6+0x27b49) (BuildId: 245240a31888ad5c11bbc55b18e02d87388f59a9)
#7 0x7f78d9f44c0a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x27c0a) (BuildId: 245240a31888ad5c11bbc55b18e02d87388f59a9)
#8 0x2541b4 in _start (/tmp/test+0x2541b4) (BuildId: 483272504e84bd37)

Compiled with clang++ test.cpp -fsanitize=address -Wall -g3 -o test

MSVC++2015 experimental_filesystem

MSVC++2015 supports std::experimental::filesystem, but this is not correctly detected and enabled by string_theory cmake. Everything works if I manually #define ST_HAVE_EXPERIMENTAL_FILESYSTEM in st_config.h

ST::string::to_path() returns wrong type

On gcc-8, calling ST::string::to_path() returns an std::experimental::filesystem::v1::__cx11::path object. When assigning the result of this operation to a variable of type std::filesystem::path, a compiler error results because that type is aliased to std::filesystem::__cx11::path.

A newly generated st_config.h reveals this pertinent section:

#if (__cplusplus > 201402L) || (defined(_MSVC_LANG) && (_MSVC_LANG > 201402L))
#define ST_HAVE_CXX17_STRING_VIEW
/* #undef ST_HAVE_EXPERIMENTAL_STRING_VIEW */
/* #undef ST_HAVE_CXX17_FILESYSTEM */
#define ST_HAVE_EXPERIMENTAL_FILESYSTEM
#endif

ST::format() fails tests on Solaris 10/GCC 5.5

Attempting to build string_theory on Solaris 10 using GCC 5.5.

First tests on format.decimal fail with:

[ RUN      ] format.decimal
FanCode/H-uru/string_theory/test/test_format.cpp:173: Failure
      Expected: ST::string::from_literal("" "xx127xx" "", sizeof("xx127xx") - 1)
      Which is: ST::string{"xx127xx"}
To be equal to: ST::format("xx{}xx", std::numeric_limits<int8_t>::max())
      Which is: ST::string{"xxx"}
FanCode/H-uru/string_theory/test/test_format.cpp:174: Failure
      Expected: ST::string::from_literal("" "xx+127xx" "", sizeof("xx+127xx") - 1)
      Which is: ST::string{"xx+127xx"}
To be equal to: ST::format("xx{+}xx", std::numeric_limits<int8_t>::max())
      Which is: ST::string{"xxx"}
FanCode/H-uru/string_theory/test/test_format.cpp:175: Failure
      Expected: ST::string::from_literal("" "xx-128xx" "", sizeof("xx-128xx") - 1)
      Which is: ST::string{"xx-128xx"}
To be equal to: ST::format("xx{}xx", std::numeric_limits<int8_t>::min())
      Which is: ST::string{"xx�xx"}
FanCode/H-uru/string_theory/test/test_format.cpp:176: Failure
      Expected: ST::string::from_literal("" "xx-128xx" "", sizeof("xx-128xx") - 1)
      Which is: ST::string{"xx-128xx"}
To be equal to: ST::format("xx{+}xx", std::numeric_limits<int8_t>::min())
      Which is: ST::string{"xx�xx"}
[  FAILED  ] format.decimal (0 ms)

=========
Other tests failing:
[  FAILED  ] 5 tests, listed below:
[  FAILED  ] format.decimal
[  FAILED  ] format.hex
[  FAILED  ] format.hex_upper
[  FAILED  ] format.octal
[  FAILED  ] format.binary

Will keep this issue updated as debugging progresses. Insights appreciated.

MSVC++2013 Build Broken

3>D:\string_theory\src\st_string.cpp(284): error C2511: 'ST::string &ST::string::operator +=(const char16_t *)' : overloaded member function not found in 'ST::string'
3>        D:\string_theory\include\st_string.h(115) : see declaration of 'ST::string'
3>D:\string_theory\src\st_string.cpp(285): error C2671: 'ST::string::+=' : static member functions do not have 'this' pointers
3>D:\string_theory\src\st_string.cpp(286): error C2671: 'ST::string::+=' : static member functions do not have 'this' pointers
3>D:\string_theory\src\st_string.cpp(290): error C2511: 'ST::string &ST::string::operator +=(const char32_t *)' : overloaded member function not found in 'ST::string'
3>        D:\string_theory\include\st_string.h(115) : see declaration of 'ST::string'
3>D:\string_theory\src\st_string.cpp(291): error C2671: 'ST::string::+=' : static member functions do not have 'this' pointers
3>D:\string_theory\src\st_string.cpp(292): error C2671: 'ST::string::+=' : static member functions do not have 'this' pointers
3>D:\string_theory\src\st_string.cpp(308): error C2511: 'ST::string &ST::string::operator +=(char16_t)' : overloaded member function not found in 'ST::string'
3>        D:\string_theory\include\st_string.h(115) : see declaration of 'ST::string'
3>D:\string_theory\src\st_string.cpp(309): error C2671: 'ST::string::+=' : static member functions do not have 'this' pointers
3>D:\string_theory\src\st_string.cpp(310): error C2671: 'ST::string::+=' : static member functions do not have 'this' pointers
3>D:\string_theory\src\st_string.cpp(314): error C2511: 'ST::string &ST::string::operator +=(char32_t)' : overloaded member function not found in 'ST::string'
3>        D:\string_theory\include\st_string.h(115) : see declaration of 'ST::string'
3>D:\string_theory\src\st_string.cpp(315): error C2671: 'ST::string::+=' : static member functions do not have 'this' pointers
3>D:\string_theory\src\st_string.cpp(316): error C2671: 'ST::string::+=' : static member functions do not have 'this' pointers
3>D:\string_theory\src\st_string.cpp(1527): error C2668: 'ST::operator +' : ambiguous call to overloaded function
3>        D:\string_theory\include\st_string.h(1162): could be 'ST::string ST::operator +(wchar_t,const ST::string &)'
3>        D:\string_theory\include\st_string.h(1161): or       'ST::string ST::operator +(char,const ST::string &)'
3>        D:\string_theory\include\st_string.h(1154): or       'ST::string ST::operator +(const ST::string &,wchar_t)'
3>        D:\string_theory\include\st_string.h(1153): or       'ST::string ST::operator +(const ST::string &,char)'
3>        D:\string_theory\include\st_string.h(1126): or       'ST::string ST::operator +(const wchar_t *,const ST::string &)'
3>        D:\string_theory\include\st_string.h(1121): or       'ST::string ST::operator +(const ST::string &,const wchar_t *)'
3>        D:\string_theory\include\st_string.h(1119): or       'ST::string ST::operator +(const char *,const ST::string &)'
3>        D:\string_theory\include\st_string.h(1118): or       'ST::string ST::operator +(const ST::string &,const char *)'
3>        D:\string_theory\include\st_string.h(1117): or       'ST::string ST::operator +(const ST::string &,const ST::string &)'
3>        while trying to match the argument list '(const ST::string, char32_t)'
3>D:\string_theory\src\st_string.cpp(1531): error C2244: 'operator +' : unable to match function definition to an existing declaration
3>        D:\string_theory\src\st_string.cpp(1530) : see declaration of 'operator +'
3>D:\string_theory\src\st_string.cpp(1531): fatal error C1903: unable to recover from previous error(s); stopping compilation
3>st_stringstream.cpp
3>Generating Code...
2>C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\Lib.exe /OUT:"D:\string_theory\build\vc++2013\test\gtest-1.8.0\RelWithDebInfo\gtest.lib" /NOLOGO  /machine:X86 "gtest.dir\RelWithDebInfo\gtest-all.obj"
3>Done building project "string_theory.vcxproj" -- FAILED.

Cmake less than 3.5 deprecation

Just something i wanted to make an issue for:

CMake Deprecation Warning at external/string_theory/CMakeLists.txt:21 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.

When using modern versions of cmake you will see this warning every time you build string_theory which is quite annoying when using the repo as a git submodule through add_subdirectory

Release tarballs fail to build due to missing gtest submodule

GitHub tarballs are generated using git-archive, which does not include the contents of git submodules. In our case, that means the gtest folder is empty.

This causes problems if you try to run cmake with the contents of a downloaded tarball because string_theory defaults to trying to build tests, and fails when it can't find gtest.

Our options appear to be going back to vendoring gtest in the repo (rather than as a submodule) or adding some detection of the gtest folder before trying to enable tests.

RelWithDebInfo cannot be mixed with Release in string_theory on MSVC

Mixing Release string_theory and RelWithDebInfo clients or vice versa yields errors in MSVC14:

  • mismatch detected for '_ITERATOR_DEBUG_LEVEL': value '2' doesn't match value '0' in <filename.obj> [...]
  • unresolved external symbol __imp__invalid_parameter
  • unresolved external symbol __imp__CrtDbgReportW

Choose the C++ standard

Hi there, I tried out 2.3 and I really like this library.

In my case I need to enforce C++11, but CMakeLists.txt chooses the latest available.
I get around it by building as an external project and replacing the CMakeLists.txt in the patch step, with stuff related to newer standards removed.
Ideally there would be an option that lets me restrict the features to C++11.

Relicense under the MIT license

string_theory was originally ported out of the plString and plFormat classes and associated tests from the H-uru/Plasma project. Since H-uru/Plasma is licensed under the GPLv3, and since string_theory is therefore considered a derivative work under the terms of the GPL, this means that string_theory exists today also under the GPLv3 license.

In order to allow greater flexibility in the use of string_theory by other applications and libraries, I would like to relicense string_theory under a more permissive license. The MIT license allows for maximum reuse by being compatible with a large variety of open- and closed-source software and licenses, while still maintaining the copyright and license for derivatives and end users.

This proposal is for the immediate relicensing of string_theory to the MIT license. For more information about the license terms, please see https://en.wikipedia.org/wiki/MIT_License. In order to go ahead with this proposal, we need permission from everyone who has contributed to the included code to relicense their code under the new terms. The complete list includes:

Please respond below with a statement clearly indicating if you will allow your code to be relicensed under the MIT license.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.