Giter Club home page Giter Club logo

Comments (4)

sabberworm avatar sabberworm commented on May 3, 2024

Thanks for your interest in the project.

I have no qualms with making private methods protected. Send me a pull request and I’ll merge it.

As for the streaming parser (still not a real streaming parser as it will only work for streams to files – which are rewindable and advancable at will): that would be a wonderful addition and a very welcome change. I see two main obstacles (as the project currently stands):

  • consumeExpression uses inputLeft, which gets the entire rest of the string. We would have to get rid of both these methods. This means changing the handling of unicode escapes to match one character at a time.
  • Some methods (like consume) are too monolithic to be easily overridden by subclasses. These would have to be split into smaller methods first.

On a related note: What this project currently does not do is the parsing of selectors and the like. It even breaks fundamentally on something like body[data-some-token="{"] because it will try to start the declaration block at the next { it sees. To really be “code that could actually parse CSS3”, this project will have to either start parsing selectors or at least adapt consumeUntil to respect the nesting of parens and quotes (except when parsing comments, where it is also currently used).

from php-css-parser.

slusarz avatar slusarz commented on May 3, 2024

Re the streaming parser: you are actually incorrect about the need for the stream to be a file resource. We use PHP temporary streams all over the place. Tremendously useful for things like reading data from an IMAP server. Since mail attachments can, and frequently are, large data chunks - and often base64 encoded - they would normally take 10-20 MB to store in a PHP string and then another 10-20 MB to decode using the built-in PHP base 64 decoding function. By instead feeding the IMAP server output directly into a PHP temporary stream, we guarantee that no more than 2 MB of memory is ever used. And the base64 decoding is done on-the-fly, so there is no need to duplicate data. Thus we can handle 10+MB attachments with only a 2MB hit to memory - useful since we often have 1000's of connections a second on a single machine.

Now for CSS parsing, the odds that someone has a 2MB+ CSS data string is very small so memory savings is not a killer feature in this regard. The flip side is that you are treating a string like a stream currently anyway - doing tokenization character by character - so using streams just makes sense from a coding perspective.

So there's two things that could be done:

  1. Have separate classes for parsing string data and stream data, and use whichever class reflects the input from the consumer. --OR--
  2. Maintain the single parsing class, and rewrite it so that the parsing methods handle stream data. If string data is passed into the constructor, just convert it to a stream internally ($fp = fopen('php://temp', 'r+'); fwrite($fp, $data); rewind($fp);) This seems to be the correct way of doing things because it doesn't require a rewrite of Parser (from a consumer's perspective), you don't have to change the definitions of any of the internal methods, and there is less code to maintain.

However, the problem with 2 is that, by allowing non-ASCII input, you do have to write your stream to correctly parse multibyte characters. This is annoying. (Aside: we do string tokenization for IMAP data, but the nice thing about IMAP data is that it is ASCII only so I don't have to worry about this). It shouldn't be that hard to find a stream class/code example that handles UTF-8 stream data. But this becomes progressively more difficult if you want to support other multibyte charsets.

From our project's perspective, we don't need any of the multibyte stuff (or, at the most, anything outside of UTF-8) since we require UTF-8 for everything and don't use non-ASCII stuff in any of our CSS files. So if this is going to be too complicated to add upstream, I'll stick with the conversion of private -> protected and do what we need in our code. However, since you have shown interest in having stream handling code here, it is worth my time to work with you and see if we can come up with something.

from php-css-parser.

sabberworm avatar sabberworm commented on May 3, 2024

There are different definitions of what a streaming parser is but I was thinking of a stream that possibly may not yet be read till the very end. Such parsers may stop at any given moment and wait for more data to become available. They can either do this using asynchronous I/O or on a thread by waiting. Since neither is possible with PHP, we can’t do this kind of streaming parser.

But of course you’re right: streams whose data is available at any given time are not limited to point to files.

I think if we’re going to implement the changes necessary to deal with a stream instead of a string, we might as well do it in the base class.

from php-css-parser.

unclecheese avatar unclecheese commented on May 3, 2024

I've submitted a PR for the private visibility. That was bringing me down, too!

from php-css-parser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.