Giter Club home page Giter Club logo

fast-query-parsers's Introduction

Litestar Logo - Light Litestar Logo - Dark

Project Status
CI/CD Publish CI
Package PyPI - Version PyPI - Support Python Versions PyPI - Downloads
Community Reddit Discord Matrix Medium Twitter Blog
Meta Litestar Project License - MIT Litestar Sponsors linting - Ruff code style - Black

Fast Query Parsers

This library includes ultra-fast Rust based query string and urlencoded parsers. These parsers are used by Litestar, but are developed separately - and can of course be used separately.

Installation

pip install fast-query-parsers

Usage

The library exposes two function parse_query_string and parse_url_encoded_dict.

parse_query_string

This function is used to parse a query string into a list of key/value tuples.

from fast_query_parsers import parse_query_string

result = parse_query_string(b"value=1&value=2&type=dollar&country=US", "&")
# [("value", "1"), ("value", "2"), ("type", "dollar"), ("country", "US")]

The first argument to this function is a byte string that includes the query string to be parsed, the second argument is the separator used.

Benchmarks

Query string parsing is more than x5 times faster than the standard library:

stdlib parse_qsl parsing query string: Mean +- std dev: 2.86 us +- 0.03 us
.....................
parse_query_string parsing query string: Mean +- std dev: 916 ns +- 13 ns
.....................
stdlib parse_qsl parsing urlencoded query string: Mean +- std dev: 8.30 us +- 0.10 us
.....................
parse_query_string urlencoded query string: Mean +- std dev: 1.50 us +- 0.03 us

parse_url_encoded_dict

This function is used to parse a url-encoded form data dictionary and parse it into the python equivalent of JSON types.

from urllib.parse import urlencode

from fast_query_parsers import parse_url_encoded_dict

encoded = urlencode(
    [
        ("value", "10"),
        ("value", "12"),
        ("veggies", '["tomato", "potato", "aubergine"]'),
        ("nested", '{"some_key": "some_value"}'),
        ("calories", "122.53"),
        ("healthy", "true"),
        ("polluting", "false"),
        ("json", "null"),
    ]
).encode()

result = parse_url_encoded_dict(encoded, parse_numbers=True)

# result == {
#     "value": [10, 12],
#     "veggies": ["tomato", "potato", "aubergine"],
#     "nested": {"some_key": "some_value"},
#     "calories": 122.53,
#     "healthy": True,
#     "polluting": False,
#     "json": None,
# }

This function handles type conversions correctly - unlike the standard library function parse_qs. Additionally, it does not nest all values inside lists.

Note: the second argument passed to parse_url_encoded_dict dictates whether numbers should be parsed. If True, the value will be parsed into an int or float as appropriate, otherwise it will be kept as a string. By default the value of this arg is True.

Benchmarks

Url Encoded parsing is more than x2 times faster than the standard library, without accounting for parsing of values:

stdlib parse_qs parsing url-encoded values into dict: Mean +- std dev: 8.99 us +- 0.09 us
.....................
parse_url_encoded_dict parse url-encoded values into dict: Mean +- std dev: 3.77 us +- 0.08 us

To actually mimic the parsing done by parse_url_encoded_dict we will need a utility along these lines:

from collections import defaultdict
from contextlib import suppress
from json import loads, JSONDecodeError
from typing import Any, DefaultDict, Dict, List
from urllib.parse import parse_qsl


def parse_url_encoded_form_data(encoded_data: bytes) -> Dict[str, Any]:
    """Parse an url encoded form data into dict of parsed values"""
    decoded_dict: DefaultDict[str, List[Any]] = defaultdict(list)
    for k, v in parse_qsl(encoded_data.decode(), keep_blank_values=True):
        with suppress(JSONDecodeError):
            v = loads(v) if isinstance(v, str) else v
        decoded_dict[k].append(v)
    return {k: v if len(v) > 1 else v[0] for k, v in decoded_dict.items()}

With the above, the benchmarks looks like so:

python parse_url_encoded_form_data parsing url-encoded values into dict: Mean +- std dev: 19.7 us +- 0.1 us
.....................
parse_url_encoded_dict parsing url-encoded values into dict: Mean +- std dev: 3.69 us +- 0.03 us

Contributing

All contributions are of course welcome!

Repository Setup

  1. Run cargo install to setup the rust dependencies and poetry install to setup the python dependencies.
  2. Install the pre-commit hooks with pre-commit install (requires pre-commit).

Building

Run poetry run maturin develop --release --strip to install a release wheel (without debugging info). This wheel can be used in tests and benchmarks.

Benchmarking

There are basic benchmarks using pyperf in place. To run these execute poetry run python benchrmarks.py.

fast-query-parsers's People

Contributors

alti3 avatar baseplate-admin avatar bollwyvl avatar dependabot[bot] avatar goldziher avatar jacobcoffee avatar provinzkraut avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

dmvinson bollwyvl

fast-query-parsers's Issues

Cargo.lock is outdated

Hi, when trying to package starlite with nix I ran into issues building this dependency. The cargo metadata command as part of the maturin build process fails because --frozen is passed, but the lock file is outdated. A fresh clone of this repo and cargo metadata updates the lock file too:

diff --git a/Cargo.lock b/Cargo.lock
index 633867c..318953f 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -31,7 +31,7 @@ checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd"

 [[package]]
 name = "fast_query_parsers"
-version = "0.2.0"
+version = "0.3.0"
 dependencies = [
  "lazy_static",
  "pyo3",

Could a new version of the lockfile be merged in order to fix this issue? For now I'm manually patching it during builds.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.