Giter Club home page Giter Club logo

htmlkit's People

Contributors

iabudiab avatar pendowski avatar tali avatar vladvlasov256 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

htmlkit's Issues

some tags in XHTML are not closed after being parsed

When I parse an XHTML, I discovered that the tags in the are not being closed:

<head>
 <title>page</title>
 <meta charset="utf-8"></meta>
<link href="css/template.css" rel="stylesheet" type="text/css"></link>
</head>

the innerHTML returns:

<head>
 <title>page</title>
 <meta charset="utf-8">
<link href="css/template.css" rel="stylesheet" type="text/css">
</head>

and for example if I query [document querySelector:@"link"].outerHTML I get
<link href="css/template.css" rel="stylesheet" type="text/css">

[document querySelector:@"meta"].innerHTML
<object returned empty description>

[document querySelector:@"meta"].outerHTML
<meta charset="utf-8">

Title is correct, though.
My XHTML is not valid anymore and the Webview fails parsing it. Is there a way to avoid this loss of information?
thanks!

nextSiblingElement returns itself for any HTMLElement

HTMLNode.m, line 128

- (HTMLElement *)nextSiblingElement
{
	HTMLNode *node = self.previousSibling;
	while (node && node.nodeType != HTMLNodeElement) {
		node = node.nextSibling;
	}
	return node.asElement;
}

Because of an iteration starts with a previous element the next sibling element it the element itself. Possibly a typo.

Large memory consumption (spikes) by HTMLParser

If you load a lot of html into HTMLParser, it uses upwards of 500-600MB for parsing at peak. More autorelease pools need to be added. I found a couple of places that are the worst offenders:

- (void)HTMLInsertionModeInBody:(HTMLToken *)token

- (NSString *)innerHTML

HTMLTokenizer nextObject

Also, for a high number of nodes, always creating NSMutableOrderedSet is not ideal - that uses about 40-50MB.

HTMLKit erroneous HTMLText serialization and entity decoding

Let's say you have the following html:

<html>
  <body>
    <div> &lt;example&gt; </div>
  </body>
</html>

Then on:

HTMLDocument *doc = [HTMLDocument documentWithString:html];

and printing out the html via doc.rootElement.outerHTML you get partially decoded entities:
<html><head></head><body> <div>&lt;example></div> </body></html> where &lt; is left as is, and &gt; is decoded.

Not sure what the correct thing to do here is.

Best way to loop through the HTML contents?

I've created a small testing app with HTMLKit inside it. I want to present a UITableView with all the HTML in it. I load the HTML as follows:

NSString *htmlString = @"<div><div><p>Test!</p></div><p>Test!</p><h1>HTMLKit</h1><p>Hello there!</p></div>";
        
 // Via parser
HTMLParser *parser = [[HTMLParser alloc] initWithString:htmlString];
HTMLDocument *document = [parser parseDocument];
HTMLElement *head = document.body;
[self.items addObject:[self addElement:head]];
HTMLElement *body = document.body;
[self.items addObject:[self addElement:body]];

And these are the addElement related functions:

- (Entry *)addElement:(HTMLElement *)element {
    Entry *entry = [[Entry alloc] init];
    
    if (element.childElementsCount > 0) {
        entry.tags = [self enumerate:element];
    }
    entry.tag = element.outerHTML;
    
    return entry;
}

- (NSArray *)enumerate:(HTMLElement *)element {
    NSMutableArray *items = [@[] mutableCopy];
    for (int i = 0; i < element.childElementsCount; i++) {
        HTMLElement *child = [element childElementAtIndex:i];
        
        [items addObject:[self addElement:child]];
    }
    
    return items;
}

This has the screenshot below as a result:
screen shot 2017-05-15 at 15 06 58

Ideally, I'd like for <body> to not contain the <div><div>..etc. What's the best way to achieve this with HTMLKit?

Custom self closing tags are not detected properly.

Hi,

I have an issue with HTML-Kit. When I parse an HTML content which has few self-closing custom tags, it is not detected properly.

Here is the sample code,

Screenshot 2020-05-11 at 11 24 22 AM

NSArray *images = [self.document querySelectorAll:@"MyTag"];

Pls, see the array results in below image

Screenshot 2020-05-11 at 11 24 27 AM

HTMLKit use non public API ?

Hi, I have a app using HTMLKit. When I sent my app to app store to review , app store reported "querySelectAll: " referenced non public API. It's really true ? If it's true , is it possible to fix it ?

HTMLKit erroneous HTMLElement attribute value serialization

HTMLKit erroneous HTMLElement attribute value serialization

This is basically the same bug as in #16 but with HTMLElement attribute values:

Serializing the following HTML:

<body key="& testing 0x00A0"></body>

would produce:

<body key="&amp; testing 0x00A0"></body>

Whereas it should be:

<body key="&amp; testing &nbsp;"></body>

:gt(0) can't get elements

When I write the code as follow, it works:

CSSSelector * gtSelector(NSInteger index)
{
	NSString *name = [NSString stringWithFormat:@":gt(%ld)", (long)index];
	return namedBlockSelector(name, ^BOOL(HTMLElement * _Nonnull element) {
		NSUInteger elementIndex = [element.parentElement indexOfChildNode:element];

		if (index > 0) {
			return elementIndex > index;
		} else {
            return (elementIndex > index && elementIndex < element.parentElement.childNodesCount);
		}
	});
}

Swift 2.3 - Build Issue

I installed your library as you suggested by using cocoa-pods, but after I tried to include it in my project in my bridge-header I keep getting bunch of errors that these *.h files like HTMLElement.h etc. does not exist, even doe I see them in pod folder. Any idea why?

I'm using XCode 8 and Swift 2.3.

UPDATE
What I had to do to make it works (for now), I had to mark each of these "missing" files as "public" in cocoa-pod.

Implement HTML escaping for arbitrary string input

This looks like a powerful library to navigate around HTML nodes, however what would be the simplest method of obtaining cleaned up 'plain text' from HTML input? I'd like it to preserve any 'invalid' non-html tags such as John Do <[email protected]> and not try and parse it as NSAttributedString's initWithHTML does.

[FR] Add support for innerText

Would be great if we could get support for https://www.w3schools.com/jsref/prop_node_innertext.asp

Example html:
nbsp.zip

Text Content shows nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ nbsp;‌ on a hidden div.

Inner Text is much cleaner:

Introducing you to:
 JLL IDEAs
 PropTech Challenge

 
Hi,

JLL IDEAs - A PropTech Innovation Challenge in partnership with AGNIi and Startup India is inviting applications for exciting technologies, ideas, products and tools which have a use case in Real Estate in areas like Sustainability, Smart Buildings, Real Estate Valuation, Occupancy and Space Planning, Transport/Urban mobility, asset management etc.

Top 3 Winners -
- Cash awards up to INR 25 Lacs

Top 10 winners -
- Co-working space for 1 year
- Opportunity to get business orders from JLL, our partners and clients
- Access to mentorship from top Real Estate experts

Last Date: Extended till 30th September 2019

Good luck!

Team - Startup India
Apply Now


You received this email because you subscribed to our list. You can unsubscribe at any time.

Retain cycle in HTMLNodeIterator

When you do a simple doc.body, the document gets retained by the iterator in root and referenceNode and the iterator gets retained by the document in attachNodeIterator - hence neither gets deallocated.

- (instancetype)initWithNode:(HTMLNode *)node
				 showOptions:(HTMLNodeFilterShowOptions)showOptions
					  filter:(id<HTMLNodeFilter>)filter
{
	self = [super init];
	if (self) {
		_root = node;     // retains doc
		_filter = filter;     // retains doc
		_whatToShow = showOptions;
		_referenceNode = _root;
		_pointerBeforeReferenceNode	= YES;
		[_root.ownerDocument attachNodeIterator:self]; // doc retains iterator
	}
	return self;
}

A possible fix is to convert the nodeIterator array into a weak objects NSHashTable.

Circular references between parent and child cause infinite recursion

Example Stacktrace:

Crashed: com.app.imap.download
0    CoreFoundation                 0x19d7b39f4 __CFStringChangeSizeMultiple + 244
1    CoreFoundation                 0x19d7ae61c __CFStringAppendBytes + 620
2    CoreFoundation                 0x19d79fa78 __CFStringAppendFormatCore + 12648
3    CoreFoundation                 0x19d7b2224 _CFStringAppendFormatAndArgumentsAux2 + 48
4    CoreFoundation                 0x19d6dbcf8 -[__NSCFString appendFormat:] + 100
5    App                            0x1029faec4 -[HTMLElement outerHTML] (HTMLElement.m:158)
6    Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
7    Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
8    App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
9    App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
10   Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
...
2548 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2549 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2550 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2551 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2552 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2553 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2554 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2555 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2556 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2557 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2558 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2559 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2560 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2561 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2562 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2563 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2564 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2565 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2566 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2567 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2568 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2569 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2570 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2571 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2572 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2573 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2574 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2575 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2576 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2577 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2578 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2579 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2580 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2581 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2582 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2583 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2584 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2585 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2586 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2587 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2588 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2589 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2590 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2591 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2592 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2593 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2594 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2595 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2596 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2597 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2598 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2599 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2600 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2601 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2602 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2603 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2604 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2605 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2606 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2607 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2608 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2609 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2610 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2611 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2612 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2613 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2614 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2615 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2616 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2617 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2618 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2619 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2620 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2621 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2622 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2623 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2624 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2625 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2626 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2627 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2628 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2629 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2630 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2631 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2632 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2633 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2634 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2635 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2636 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2637 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2638 Foundation                     0x19e16d7d0 -[NSObject(NSKeyValueCoding) valueForKey:] + 268
2639 Foundation                     0x19e1c9940 -[NSArray(NSKeyValueCoding) valueForKey:] + 392
2640 App                            0x102a00848 -[HTMLNode innerHTML] (HTMLNode.m:727)
2641 App                            0x1029fb16c -[HTMLElement outerHTML] (HTMLElement.m:182)
2642 App                            0x1026c74d8 -[MRLinkParser parseUnsubscribeWithDocument:] (MRLinkParser.m:123)

Maybe put in a check for that? Not sure how to reproduce but seeing this out in the wild.

Recursion crash

frame #8386: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8387: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8388: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8389: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8390: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8391: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8392: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8393: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8394: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8395: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8396: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8397: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8398: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8399: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8400: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8401: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8402: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8403: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8404: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8405: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8406: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8407: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8408: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8409: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8410: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8411: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8412: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8413: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8414: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8415: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8416: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8417: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8418: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8419: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8420: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8421: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8422: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8423: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8424: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8425: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8426: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8427: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8428: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8429: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8430: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8431: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8432: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8433: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8434: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8435: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8436: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8437: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8438: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8439: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8440: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8441: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8442: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8443: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8444: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8445: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8446: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8447: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8448: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8449: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8450: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8451: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8452: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8453: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8454: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8455: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8456: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8457: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8458: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8459: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8460: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8461: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8462: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8463: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8464: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8465: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8466: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8467: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8468: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8469: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8470: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8471: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8472: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8473: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8474: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8475: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8476: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8477: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8478: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8479: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8480: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8481: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8482: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8483: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8484: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8485: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8486: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8487: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8488: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8489: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8490: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8491: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8492: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8493: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8494: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8495: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8496: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8497: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8498: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8499: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8500: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8501: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8502: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8503: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8504: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8505: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8506: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8507: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8508: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8509: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8510: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8511: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8512: 0x00007fff36033e0f CoreFoundation`__RELEASE_OBJECTS_IN_THE_ARRAY__ + 122
frame #8513: 0x00007fff35f32e3e CoreFoundation`-[__NSArrayM dealloc] + 289
frame #8514: 0x00007fff35fd592d CoreFoundation`-[__NSOrderedSetM dealloc] + 157
frame #8515: 0x00007fff6c50dd16 libobjc.A.dylib`object_cxxDestructFromClass(objc_object*, objc_class*) + 83
frame #8516: 0x00007fff6c5076c3 libobjc.A.dylib`objc_destructInstance + 94
frame #8517: 0x00007fff6c50762b libobjc.A.dylib`_objc_rootDealloc + 62
frame #8518: 0x00007fff6c52152a libobjc.A.dylib`AutoreleasePoolPage::releaseUntil(objc_object**) + 134
frame #8519: 0x00007fff6c507c30 libobjc.A.dylib`objc_autoreleasePoolPop + 175
frame #8520: 0x00007fff35aa998b CoreData`developerSubmittedBlockToNSManagedObjectContextPerform + 411
frame #8521: 0x0000000103c8f84f libdispatch.dylib`_dispatch_client_callout + 8
frame #8522: 0x0000000103c96df1 libdispatch.dylib`_dispatch_lane_serial_drain + 777
frame #8523: 0x0000000103c97ba8 libdispatch.dylib`_dispatch_lane_invoke + 438
frame #8524: 0x0000000103ca5045 libdispatch.dylib`_dispatch_workloop_worker_thread + 676
frame #8525: 0x0000000103d1b0b3 libsystem_pthread.dylib`_pthread_wqthread + 290
frame #8526: 0x0000000103d1af1b libsystem_pthread.dylib`start_wqthread + 15

NSInvalidArgumentException raised when trying to change an attribute of a cloned element

Impossible to change an attribute of a cloned element if its original element had attributes.

An example:

HTMLElement *element = [HTMLElement new];
element.elementId = @"originalId";
    
HTMLElement *clone = [element cloneNodeDeep:YES];
NSString *cloneId = @"cloneId";
clone.elementId = cloneId

The last line raises the NSInvalidArgumentException:
_[_NSSingleEntryDictionaryI setObject:forKeyedSubscript:]: unrecognized selector sent to instance 0x7fcbd143e150

The root cause is the HTMLElement copy method:

- (id)copyWithZone:(NSZone *)zone
{
	...
	copy->_attributes = [_attributes copy];
	...
}

If the _attributes is not nil a copy will be a NSDictionary not a HTMLOrderedDictionary.

textContent strips <br/>s

let element:HTMLElement = HTMLElement(tagName: "div")
element.innerHTML = "Line<br/>Breaks"
print("\(element.textContent)")

output:

LineBreaks

desired:

Line\nBreaks

At leas this is how NSAttributedString's initWithHTML works. Anything I need to do to get this to work properly?

Xcode12.5: 'new' is unavailable

Getting this error at build time in Xcode 12.5
HTMLParser.m: 'new' is unavailable 'init' has been explicitly marked unavailable here

Screen Shot 2021-04-28 at 11 02 32 AM

Not being able to use HTMLKit via Xcode Package Manager

When I try to add HTMLKit as a dependency I get an error message at the Resolving package dependencies stage:

because HTMLKit >=0.9.4 contains incompatible tools version and root depends on HTMLKit 3.1.0..<4.0.0, version solving failed.

Tested on Xcode 11.4 and 12 beta.

Colons are stripped from attributes

The problem manifest itself when reading a HTML file and outputting it again to a HTML string.

debug.html

<!DOCTYPE html>
    <head>
        <title>debug</title>
    </head>
    <body>
        <svg id="draw_area" width="600" height="800" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1">
            <image id="overlay_img" xlink:href="foo.png" x="0" y="0" width="600" height="800" />
        </svg>
    </body>
</html>
    NSString *dpath = [[NSBundle mainBundle] pathForResource:@"debug" ofType:@"html"];
    NSString *dcontent = [NSString stringWithContentsOfFile:dpath
                                   encoding:NSUTF8StringEncoding error:nil];
    
    HTMLDocument* ddocument = [HTMLDocument documentWithString:dcontent];
    dcontent = [[ddocument documentElement] innerHTML];

After above code the variable dcontent contains:

<head>
        <title>debug</title>
    </head>
    <body>
        <svg id="draw_area" width="600" height="800" xmlns="http://www.w3.org/2000/svg" xmlns xlink="http://www.w3.org/1999/xlink" version="1.1">
            <image id="overlay_img" xlink href="foo.png" x="0" y="0" width="600" height="800"></image>
        </svg>
    

</body>

Note how xmlns:xlink is converted to xmlns xlink and xlink:href to xlink href after outputting the content to HTML again. This broke the svg.

Occasional Internal Consistency Error

Every once in a while, I get a crash on an "internal consistency" error.

My app is fetching some HTML pages in a background process and pulling out some interesting stuff. A few times, I've had the app (in the simulator) crash on the following:

2019-07-13 21:30:37.026686-0400 dreamwidth[65452:90258127] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: '*** -[NSHashTable NSHashTable {
[10] <HTMLNodeIterator: 0x60000eae1680>
}
] count underflow'
*** First throw call stack:
(
	0   CoreFoundation                      0x000000010af6a6fb __exceptionPreprocess + 331
	1   libobjc.A.dylib                     0x00000001098abac5 objc_exception_throw + 48
	2   CoreFoundation                      0x000000010af6a555 +[NSException raise:format:] + 197
	3   Foundation                          0x000000010827f9d7 hashProbe + 407
	4   Foundation                          0x000000010827fe5c -[NSConcreteHashTable removeItem:] + 49
	5   myapp                               0x0000000106d01e3d -[HTMLDocument detachNodeIterator:] + 93
	6   myapp                               0x0000000106d115b7 -[HTMLNodeIterator dealloc] + 87

I suspect that it's something to do with timing: it doesn't happen all the time, but when it does, it crashes my app. I suspect that the HTML document is being deallocated when this happens.

I've tried to avoid iterating on, say, .childNodes to avoid creating node iterators, but the crash still happens every so often.

Revert workaround in #12 once fixed in Xcode

Xcode 8.3 has an issue with default modulemaps file in the sources folder, the workaround in #12 was to rename the file and point Xcode to it. This causes the SwiftPM build to throw lots of warnings because of the generated umbrella header.

The workaround should be reverted once Xcode fixes the issue.

Redefinition of module 'HTMLKit' Xcode 8.3

There was such error after updating Xcode.

2017-04-04 14 10 40

In the Source/include project folder is the file module.modulemap. After the removal of this file, the project is built without errors.

Wrong declaration of a block in the HTMLNodeFilterValue

File HTMLNodeFilter.m:

@interface HTMLNodeFilterBlock ()
{
	BOOL (^ _block)(HTMLNode *);
}
@end

The _block must be an HTMLNodeFilterValue instead of BOOL. Possibly it was a typo.

This issue does nothing on simulators but it leads wrong behavior on devices.

Example:

HTMLDocument *document = [HTMLDocument documentWithString:@"<div id=\"id\"></div>"];
    
NSString *divId = @"id";
HTMLNodeFilterBlock *filter = [HTMLNodeFilterBlock filterWithBlock:^HTMLNodeFilterValue(HTMLNode * _Nonnull node) {
        HTMLElement *element = (HTMLElement *)node;
        return [element.elementId isEqualToString:divId] ? HTMLNodeFilterAccept : HTMLNodeFilterSkip;
}];
    
HTMLNodeIterator *iterator = [document nodeIteratorWithShowOptions:HTMLNodeFilterShowElement filter:filter];
    
HTMLElement *element = (HTMLElement*)iterator.nextObject;

On a simulator, the element is the div. But on a device, the element is the html.

Retain cycle in HTMLRange

HTMLRanges are being retained by the document when attached on initialization. This is basically the same as in #4

Collision with Category Names

The NSString+HTMLKit.h and NSCharacterSet+HTMLKit.h categories contain generic named category methods. They are causing collisions with other methods I've already been using. It would be great if you could prefix the methods to reduce potential collisions. Perhaps the prefix could be "html_" or something similar. Thanks!

Implement DOM Ranges

Could we please discuss implementing DOM range feature? I would be pleased to contribute for this part of project

Help in retrieving the attributes of a node

Hello,

First of all, a very nice library you have created here. I am using it for a feature in my app where I need to load a HTML string , select node with particular tags and fetch the attributes of name and value from those tags. I have reached this far :

HTMLParser *parser = [[HTMLParser alloc]initWithString : htmlString];
HTMLElement *htmlElement = [[HTMLElement alloc] initWithTagName:@"//input[@type = 'hidden']"];
NSArray *nodes = [parser parseFragmentWithContextElement : htmlElement];

for(HTMLNode *node in nodes)
{
   // Here I want to have the attributes of the node 
  // e.g. something like node.Attributes["name"].Value and  node.Attributes["value"].Value

}

It would be great if you could help me out and guide me in the right direction as I am quite new to iOS.
Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.