Giter Club home page Giter Club logo

soup's Introduction

soup

Build Status GoDoc Go Report Card

Web Scraper in Go, similar to BeautifulSoup

soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup.

Exported variables and functions implemented till now :

var Headers map[string]string // Set headers as a map of key-value pairs, an alternative to calling Header() individually
var Cookies map[string]string // Set cookies as a map of key-value  pairs, an alternative to calling Cookie() individually
func Get(string) (string,error) {} // Takes the url as an argument, returns HTML string
func GetWithClient(string, *http.Client) {} // Takes the url and a custom HTTP client as arguments, returns HTML string
func Post(string, string, interface{}) (string, error) {} // Takes the url, bodyType, and payload as an argument, returns HTML string
func PostForm(string, url.Values) {} // Takes the url and body. bodyType is set to "application/x-www-form-urlencoded"
func Header(string, string) {} // Takes key,value pair to set as headers for the HTTP request made in Get()
func Cookie(string, string) {} // Takes key, value pair to set as cookies to be sent with the HTTP request in Get()
func HTMLParse(string) Root {} // Takes the HTML string as an argument, returns a pointer to the DOM constructed
func Find([]string) Root {} // Element tag,(attribute key-value pair) as argument, pointer to first occurence returned
func FindAll([]string) []Root {} // Same as Find(), but pointers to all occurrences returned
func FindStrict([]string) Root {} //  Element tag,(attribute key-value pair) as argument, pointer to first occurence returned with exact matching values
func FindAllStrict([]string) []Root {} // Same as FindStrict(), but pointers to all occurrences returned
func FindNextSibling() Root {} // Pointer to the next sibling of the Element in the DOM returned
func FindNextElementSibling() Root {} // Pointer to the next element sibling of the Element in the DOM returned
func FindPrevSibling() Root {} // Pointer to the previous sibling of the Element in the DOM returned
func FindPrevElementSibling() Root {} // Pointer to the previous element sibling of the Element in the DOM returned
func Children() []Root {} // Find all direct children of this DOM element
func Attrs() map[string]string {} // Map returned with all the attributes of the Element as lookup to their respective values
func Text() string {} // Full text inside a non-nested tag returned, first half returned in a nested one
func FullText() string {} // Full text inside a nested/non-nested tag returned
func SetDebug(bool) {} // Sets the debug mode to true or false; false by default
func HTML() {} // HTML returns the HTML code for the specific element

Root is a struct, containing three fields :

  • Pointer containing the pointer to the current html node
  • NodeValue containing the current html node's value, i.e. the tag name for an ElementNode, or the text in case of a TextNode
  • Error containing an error in a struct if one occurrs, else nil is returned. A detailed text explaination of the error can be accessed using the Error() function. A field Type in this struct of type ErrorType will denote the kind of error that took place, which will consist of either of the following
    • ErrUnableToParse
    • ErrElementNotFound
    • ErrNoNextSibling
    • ErrNoPreviousSibling
    • ErrNoNextElementSibling
    • ErrNoPreviousElementSibling
    • ErrCreatingGetRequest
    • ErrInGetRequest
    • ErrReadingResponse

Installation

Install the package using the command

go get github.com/anaskhan96/soup

Example

An example code is given below to scrape the "Comics I Enjoy" part (text and its links) from xkcd.

More Examples

package main

import (
	"fmt"
	"github.com/anaskhan96/soup"
	"os"
)

func main() {
	resp, err := soup.Get("https://xkcd.com")
	if err != nil {
		os.Exit(1)
	}
	doc := soup.HTMLParse(resp)
	links := doc.Find("div", "id", "comicLinks").FindAll("a")
	for _, link := range links {
		fmt.Println(link.Text(), "| Link :", link.Attrs()["href"])
	}
}

Contributions

This package was developed in my free time. However, contributions from everybody in the community are welcome, to make it a better web scraper. If you think there should be a particular feature or function included in the package, feel free to open up a new issue or pull request.

soup's People

Contributors

akmubi avatar algogrit avatar anaskhan96 avatar aykevl avatar bdwyertech avatar bigzhu avatar cptth avatar danilopolani avatar darccio avatar deepsheth avatar dustmop avatar enrico204 avatar ghostlandr avatar gyga8k avatar natiiix avatar photonios avatar rafikkasmi avatar salmondx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

soup's Issues

FindAll Regex

Hi, I'm just starting out with go, so this question mgiht be dumb.

Is there a way, with this library to findall regular expression ?

If it is not implemented, will it be? or am I looking at the wrong package?

thanks

Feature Request: User Defined Headers

Loving this library so far :-)

It would be really useful to be able to define our own headers, like user-agent for example.

Then I'd be able to use this for sites that require auth :-)

how to get element's parent node

example: get p's NodeValue text

<p>
    <a class="btn" herf=""> </a>
    text
</p>

Is there any ideal of getting a's parent p as root type. I don't see any record in the DFS.

catchPanic() prints to stdout on some semi-valid cases

libraries should not catch panic() and then print to stdout.

If I have the following code

th := row.Find("th")
if th.Error == nil && th.Text() == "Service Expiry Date" {
    ...
}

and the call to row.Find("th") returns a structure with nil FirstChild member of Pointer struct, then soup will panic, catch the panic, and print to stdout.

By catching the panic, it makes it super hard to figure out what is causing the panic because all I got when running my program was this:

2017/12/18 11:17:56 Error occurred in Text() : runtime error: invalid memory address or nil pointer dereference

however if I comment out the defer catchPanic("Text()") call, then I get a much more helpful error:

DEBUG: DEBUG: &{0xc0422408c0 <nil> <nil> 0xc042240930 0xc042240a10 3 th th  [{ align right} { width 200}]}
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x28 pc=0x688cdb]

goroutine 1 [running]:
github.com/anaskhan96/soup.Root.Text(0xc0422409a0, 0x7a7f94, 0x2, 0x0, 0x0, 0xc0420123f0, 0x63)
        C:/Users/chrome/Development/go/src/github.com/anaskhan96/soup/soup.go:219 +0xbb
main.fetchDate(0xc042052400, 0x1f)
        C:/Users/chrome/Development/go/src/gitlab.corp.xxx.com/se/shcheck/cmd/shcheck/main.go:96 +0x8e6
main.main()
        C:/Users/chrome/Development/go/src/gitlab.corp.xxx.com/se/shcheck/cmd/shcheck/main.go:112 +0x23d

which then if I look at soup.go line 219, i see

	k := r.Pointer.FirstChild
checkNode:
	if k.Type != html.TextNode {

and it is now clear to me that r.Pointer.FirstChild is nil, and I need to check that it is not nil before calling Text().

However, you should be checking that value in your library, and returning string, error, in my opinion.

FindNextSibling bug

From the source code I see FindNextSibling calls r.Pointer.NextSibling.NextSibling which wrongly assumes NextSibling should have another NextSibling, and crash when it does not.

e.g.

const html = `<html>

  <head>
      <title>DOM Tutorial</title>
  </head>

  <body>
      <a>DOM Lesson one</a><p>Hello world!</p>
  </body>

</html>`

func main() {
	doc := soup.HTMLParse(html)
	link := doc.Find("a")
	next := link.FindNextSibling()
	fmt.Println(next.Text())
}

// $ panic: runtime error: invalid memory address or nil pointer dereference

This also applies for FindPrevSibling.

BTW, I suggest there should be FindNextSibling and FindNextSiblingElement as the spec describes. (This might be another issue, I guess what you want to implement is FindNextSiblingElement.)

How to use selectors?

bs allows you to use select for using CSS selectors. Any such thing in this library?

[BUG]: Search classes with spaces fails every time (even in the weather example you provided)

Hi, I tried your weather example and it always trows an "invalid memory address". I tried to reproduce the same bug with another website and it can actually search only those classes without any spaces inside of them. I don't know why but your parser stopped understanding spaces.
I added a fmt.Println() function in order to print the only class search with spaces (grid), that's the code:

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
	"strings"

	"github.com/anaskhan96/soup"
)

func main() {
	fmt.Printf("Enter the name of the city : ")
	city, _ := bufio.NewReader(os.Stdin).ReadString('\n')
	city = city[:len(city)-1]
	cityInURL := strings.Join(strings.Split(city, " "), "+")
	url := "https://www.bing.com/search?q=weather+" + cityInURL
	resp, err := soup.Get(url)
	if err != nil {
		log.Fatal(err)
	}
	doc := soup.HTMLParse(resp)
	grid := doc.Find("div", "class", "b_antiTopBleed b_antiSideBleed b_antiBottomBleed")
	fmt.Println("Print grid:", grid)
	heading := grid.Find("div", "class", "wtr_titleCtrn").Find("div").Text()
	conditions := grid.Find("div", "class", "wtr_condition")
	primaryCondition := conditions.Find("div")
	secondaryCondition := primaryCondition.FindNextElementSibling()
	temp := primaryCondition.Find("div", "class", "wtr_condiTemp").Find("div").Text()
	others := primaryCondition.Find("div", "class", "wtr_condiAttribs").FindAll("div")
	caption := secondaryCondition.Find("div").Text()
	fmt.Println("City Name : " + heading)
	fmt.Println("Temperature : " + temp + "˚C")
	for _, i := range others {
		fmt.Println(i.Text())
	}
	fmt.Println(caption)
}

And that's the output:

Enter the name of the city : New York
Print grid: {<nil>  element `div` with attributes `class b_antiTopBleed b_antiSideBleed b_antiBottomBleed` not found}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x61d1f5]

goroutine 1 [running]:
github.com/anaskhan96/soup.findOnce(0x0, 0xc42005be68, 0x3, 0x3, 0xc420050000, 0x4aa247, 0xc420261e00)
	/home/fef0/go/src/github.com/anaskhan96/soup/soup.go:304 +0x315
github.com/anaskhan96/soup.Root.Find(0x0, 0x0, 0x0, 0x6e1e60, 0xc420242070, 0xc42005be68, 0x3, 0x3, 0x0, 0x0, ...)
	/home/fef0/go/src/github.com/anaskhan96/soup/soup.go:120 +0x8d
main.main()
	/home/fef0/Code/Go/Test/Test.go:26 +0x4e3
exit status 2

If you notice in the second line it was impossible to found the grid, but in facts it happens only because there are spaces in the class name.
I hope you can fix that as soon as possible, bye for now!

Find by single class

Currently Find("a", "class", "message") would only work if it was <a class="message"></a> but would not work on <a class="message input-message"></a> even though they are both of class message.

Could this be added?

Crashed with SIGSEGV

Trying to run the test weather.go in my machine and got this.

Enter the name of the city : Brisbane
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x665715]

goroutine 1 [running]:
github.com/anaskhan96/soup.findOnce(0x0, 0xc0000bdea8, 0x3, 0x3, 0x0, 0x70207e, 0x13)
/home/stevek/go/src/github.com/anaskhan96/soup/soup.go:345 +0x315
github.com/anaskhan96/soup.Root.Find(0x0, 0x0, 0x0, 0x75c820, 0xc000364040, 0xc0000bdea8, 0x3, 0x3, 0x0, 0x0, ...)
/home/stevek/go/src/github.com/anaskhan96/soup/soup.go:121 +0x82
main.main()
/home/stevek/tmp/go-lang/src/weather.go:24 +0x49d
exit status 2

fatal error: concurrent map iteration and map write

fatal error: concurrent map iteration and map write

goroutine 833 [running]:
runtime.throw(0x7161cb, 0x26)
        /root/.gvm/gos/go1.15.5/src/runtime/panic.go:1116 +0x72 fp=0xc000069938 sp=0xc000069908 pc=0x437312
runtime.mapiternext(0xc000069a10)
        /root/.gvm/gos/go1.15.5/src/runtime/map.go:853 +0x554 fp=0xc0000699b8 sp=0xc000069938 pc=0x412574 
github.com/anaskhan96/soup.setHeadersAndCookies(0xc0002e4600)
        /root/go/pkg/mod/github.com/anaskhan96/[email protected]/soup.go:145 +0x87 fp=0xc000069b28 sp=0xc0000699b8 pc=0x6831a7
github.com/anaskhan96/soup.GetWithClient(0xc000288660, 0x24, 0xc0004b3da0, 0x0, 0x0, 0x0, 0x0)
        /root/go/pkg/mod/github.com/anaskhan96/[email protected]/soup.go:117 +0x18b fp=0xc000069be0 sp=0xc000069b28 pc=0x682cab

log.Fatal

Hi,

Is it possible for you to replace log.Fatal instances with something else that returns an error instead?

It feels unfair if the entire program shuts down because soup couldn't find an element, and so on. I would rather like to handle the error when it cannot find something or it cannot parse html etc.

Thank you, been using this. :)

Navigating to Parent

In order for proper selection it would be awesome to be able to navigate to the current elements Parent, then keep going through siblings and all.

Right now it is quite hard to properly find what I am looking for from a strict top-down view.

Proposal: Add an "Empty" func to Root that would make it easier to tell when a query didn't return results

Right now I suppose you would do this by checking if error was non-nil and then check the error to see if it contained "not found", which you would only know about if you read the source code of this project 😄

I think what I am proposing is to add something that does that check for you in the library. Maybe something like:

func (r Root) Empty() bool {
    if r.Error == nil {
         return false
    }
    return strings.Contains(string(r.Error), "not found")
}

Is this something other people would see as valuable? I would use it sorta like this:

main := doc.Find("section", "class", "gramb")
if main.Empty() {
  return errors.New("No results for this query")
}
defs := main.FindAll("span", "class", "ind")
// Other processing here

Right now I'm just checking if main.Error is non nil and returning no results. Would just be nice (I think) to have a cleaner interface around it.

If you think this is worth doing I'd love to take a crack at it!

Thanks for this library, it's immensely helpful to my side project 😄

&nbsp; causes no text to be returned

An odd issue I'm having while trying to use soup to parse Fmylife's site for FMLs is when I get an FML that has the (&)nbsp; tag

<p class="block">
<a href="/article/today-on-the-bus-i-saw-my-ex-girlfriend-get-on-despite-several-seats-being-open-she-specifically_190836.html">
<span class="icon-piment"></span>&nbsp;
[Insert FML text here] FML
</a>
</p>

when I try to call the text, it returns blank text and nothing else.

I usually call it using .Find("p", "class", "block").Find("a").Text() and if it doesn't have the whitespace tag, it returns fine.

soup.HTMLParse() returning nil

This method was previously working but for some reason, it returns nil every single time now

//example
t, _ := soup.Get("https://google.com")
fmt.Println(soup.HTMLParse(t)) //prints {address <nil>}

invalid memory address or nil pointer dereference when chaining methods

package main

import (
	"fmt"
	"log"
	"net/http"
	"time"

	"github.com/anaskhan96/soup"
)

func main() {
	go func() {
		http.ListenAndServe(":12345", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			fmt.Fprint(w, "OK")
		}))
	}()

	time.Sleep(time.Second)

	resp, err := soup.Get("http://127.0.0.1:12345/")
	if err != nil {
		log.Println("Error:", err.Error())
		return
	}

	doc := soup.HTMLParse(resp)
	r := doc.Find("Semething").Find("SomethingElse")
	fmt.Println(r.Error)
}

Hello, If I try to chain Find and FindAll method of non-existent tags like in the example above, I get a panic error

$ go run .
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x66ce1b]

goroutine 1 [running]:
github.com/anaskhan96/soup.findOnce(0x6b64c0?, {0xc00011fe50?, 0x1, 0x1}, 0x2?, 0x0)
        /home/alex/go/pkg/mod/github.com/anaskhan96/[email protected]/soup.go:502 +0xfb
github.com/anaskhan96/soup.Root.Find({0x0, {0x0, 0x0}, {0x766ee0, 0xc000238030}}, {0xc00011fe50?, 0x1, 0x1})
        /home/alex/go/pkg/mod/github.com/anaskhan96/[email protected]/soup.go:268 +0xa5
main.main()
        /home/alex/test/play3/main.go:24 +0x1ca
exit status 2

I believe that both func findOnce and func findAllofem should be checking if n *html.Node is nil before proceeding with the processing.
Am I understanding this correctly?

Thanks,
Alex

Is this possible to search by some attribute?

Let us assume we have such html

<body>
<div class="container">
    <div this-attr="don't care about it's value at all">
</div>
</body>
</html>

And if we're searching like:
doc.Find("div", "this-attr")
It yields an error (I think it is an expected).

Function findOnce accesses the second argument 🤔

	if uni == true {
		if n.Type == html.ElementNode && matchElementName(n, args[0]) {
			if len(args) > 1 && len(args) < 4 {
				for i := 0; i < len(n.Attr); i++ {
					attr := n.Attr[i]
					searchAttrName := args[1]
					searchAttrVal := args[2]
					if (strict && attributeAndValueEquals(attr, searchAttrName, searchAttrVal)) ||
						(!strict && attributeContainsValue(attr, searchAttrName, searchAttrVal)) {
						return n, true
					}
				}
			} else if len(args) == 1 {
				return n, true
			}
		}
	}
	uni = true
	for c := n.FirstChild; c != nil; c = c.NextSibling {
		p, q := findOnce(c, args, true, strict)
		if q != false {
			return p, q
		}
	}
	return nil, false
}

So my question is whether it's possible or not. Thank you!

Find Or

I have case where I want element have div or p I dont know how to make it probably its not possible with existing lib and we will need something FindOr

Should Text() return all sibling text?

For example:

<div align="center">
<a href="search_3.asp?action=up">up</a>
&nbsp;
<a href="search_3.asp?action=down">down</a>
(2021-9-20~2021-9-26)
</div>

Current, div.Text() only returns &nbsp;, should it return &nbsp;(2021-9-20~2021-9-26) will be better?

Remove global variable in find.go

nodeLinks is a global variable in the file find.go which is being initialised to a new slice of capacity 10 whenever FindAll() in soup.go is being called. This creates problems when the FindAll() function is called concurrently in the driver program, as nodeLinks keeps on getting initialised without fetching the nodes for either of the functions.

InnerHTML

Is there any way to get the equivalent of .HTML(), but excluding the element's own markup (just like JS' .innerHMTL), without having to resort to regex?

An example:

element.HTML() yields <p><a href="square-cover-art.jpeg">My <em>wacky</em> label with <strong>bold</strong> and <code>code</code> and stuff “hmmm”</a></p>

I want to get <a href="square-cover-art.jpeg">My <em>wacky</em> label with <strong>bold</strong> and <code>code</code> and stuff “hmmm”</a>

I guess I could iterate over element.Children() and concatenate each child's .HTML(), but I think having a .InnerHTML() would make things nicer (and a tad better when it comes to performance I guess)

I'm willing to make a PR :)

How to find an element with particular value?

Hi!

I have html files not using ids. With beautifulsoup it is easy to find such element using find("Some text"):

<span style="color: #012345">Some text</span>

Is the only way to find this to use FindAll("span") and then iterating through all found spans? In this case, how can I check whether a particular span element contains text? I wouldn't like to disable debugging, since, I guess, empty span is not necessary a critical error.

Crash accessing results of FindAll("span")

package main

import (
        "fmt"
        "github.com/anaskhan96/soup"
        "os"
)

func main() {
        resp, err := soup.Get("https://slashdot.com")
        if err != nil {
                os.Exit(1)
        }
        doc := soup.HTMLParse(resp)
        spans := doc.FindAll("span")
        for _, span := range spans {
                fmt.Println(span.Text())
        }
}

Result:

$ ./test-span
Slashdot
Stories
Polls
Deals
 Login
 Sign up
RSS
Facebook
Google+
Twitter
Newsletter
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x61aad7]

goroutine 1 [running]:
github.com/anaskhan96/soup.Root.Text(0xc420439030, 0x6bbff4, 0x4, 0x0, 0x0, 0x0, 0xa)
	go/src/github.com/anaskhan96/soup/soup.go:257 +0xa7
main.main()
	test-span.go:17 +0x1dd

Empty strings throw errors on Text()

When doing a FindAll(“td”), then calling Text() on a result, a null pointer error is thrown whenever an empty/nil value is encountered in the slice.

“runtime error: invalid memory address or nil pointer dereference
errorString”

An error object should be returned instead, or an empty string.

Find() ret Root field value Pointer maybe nil

rsp, err := soup.Get(pSp.pageQueue[pSp.pageIndex])
if err != nil { log.Printf("get page : %s, err : %s", pSp.pageQueue[pSp.pageIndex], err) }
doc := soup.HTMLParse(rsp)
pageExist := doc.Find("div", "class", "page")
just like this
type Root struct {
Pointer *html.Node
NodeValue string
Error error
}
pageExist type is Root
Pointer maybe nil,so not suggest using like this
doc.Find("div", "class", "tags").FindAll("span", "class", "tag-item")
sometimes may cause panic
better use like this:
pageExist := doc.Find("div", "class", "page")
if pageExist.Pointer == nil { return//or do something else }
aLinks := pageExist.FindAll("a")

how can I get the nested element?

I want to get a element that contain other element. Such like this:
html:

<div id="view">
hello
<p>hello</p>
</div>

go:

doc := soup.HTMLParse(html)
text := doc.Find("div", "id", "view").Text()
fmt.Println(text)

In this sample, it just output "hello". I want it to output "hello<p>hello</p>". How can I do that?
Thanks for having a look.

Hello~ I'm Student in KwangWoon University city of Seoul. Korea

Hello! I'm Korean university student. If nothing is uncomfortable to you, I want to commit for 'soup' . I'm not a proffessional programer so, I can't commit difficult coding...

But, I can commit your repository More Examples Of Soup !
or, I can translate some guidelines.

If you are Okay to it, I can Do it for a month.. and I will pull request for you.
Is it Okay me to do Write Examples of 'Soup' or some translation for guidelines or your repository

Please reply to me !! :) I will work very hard!!

I'm not very skillful at English so, I'm sorry if you can't read this English..

Sincerly, lionking6792

Anything akin to BeautifulSoup's Comment?

Just curious if soup has anything similar to how BeautifulSoup lets your parse HTML comments in Python?

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#comments-and-other-special-strings

Trying to parse some HTML where some data is commented, and able to do the following in Python:

from bs4 import BeautifulSoup, Comment
comments = soup.find_all(text=lambda text: isinstance(text, Comment))
comments_soup = BeautifulSoup(comment, 'lxml')

Is there anything close to that here? Or any chance or adding something like it?

Versioning

Thanks for making an awesome package.

Go Modules can't get latest version of this package because version tag. It only can get 1.0.1, not 1.1.
Can you add new tag named 1.1.0?

How to get http status code?

I want to get the status code from the Http request to make sure the response from the website returns 200 Ok and continue the process.
Sometimes website returns 404

Any way to get the HTTP status code?

findOnce break after the first child node.

If the element is not found in the first child node, the value is returned, and the loop has no effect.

I think this should be if q {

for c := n.FirstChild; c != nil; c = c.NextSibling {
	p, q := findOnce(c, args, true, strict)
	if !q {
		return p, q
	}
}

soup/soup.go

Line 504 in cb47551

if q != false {

Check if element exists without triggering warnings in console?

I'm curious if there's a way to check if an element exists and have a Boolean returned if it does or does not exist rather than having the console just output something like

2017/06/06 11:21:52 Error occurred in Find() : Element `div` with attributes `class title` not found

In the examples file weather.go element not found

Hello. help me, please.
Code:
package main

import (
"bufio"
"fmt"
"log"
"os"
"strings"

"github.com/anaskhan96/soup"
)

func main() {
fmt.Printf("Enter the name of the city : ")
city, _ := bufio.NewReader(os.Stdin).ReadString('\n')
city = city[:len(city)-1]
cityInURL := strings.Join(strings.Split(city, " "), "+")
url := "https://www.bing.com/search?q=weather+" + cityInURL
resp, err := soup.Get(url)
if err != nil {
log.Fatal(err)
}
doc := soup.HTMLParse(resp)
fmt.Println(doc)
grid := doc.Find("div", "class", "b_antiTopBleed b_antiSideBleed b_antiBottomBleed")
fmt.Println("grid = ", grid)
heading := grid.Find("div", "class", "wtr_titleCtrn").Find("div").Text()


GOROOT=C:\Go #gosetup
GOPATH=C:\Users\User\go\src\soup;C:\Users\User\go #gosetup
C:\Go\bin\go.exe build -o C:\Users\User\AppData\Local\Temp___go_build_weather_go.exe C:\Users\User\go\src\soup\weather.go #gosetup
C:\Users\User\AppData\Local\Temp___go_build_weather_go.exe #gosetup
Enter the name of the city : moscow
{0xc0002600e0 html }
grid = { element div with attributes class b_antiTopBleed b_antiSideBleed b_antiBottomBleed not found}
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x8 pc=0x6628dc]

goroutine 1 [running]:
github.com/anaskhan96/soup.findOnce(0x0, 0xc000471e70, 0x3, 0x3, 0xc000190000, 0xc000471b58, 0x468093)
C:/Users/User/go/src/soup/src/github.com/anaskhan96/soup/soup.go:392 +0x31c
github.com/anaskhan96/soup.Root.Find(0x0, 0x0, 0x0, 0x760a00, 0xc000153820, 0xc000471e70, 0x3, 0x3, 0x0, 0x0, ...)
C:/Users/User/go/src/soup/src/github.com/anaskhan96/soup/soup.go:167 +0x94
main.main()
C:/Users/User/go/src/soup/weather.go:27 +0x5ef

Process finished with exit code 2

go.mod: no matching versions for query "v1.2"

It seems go mod/get is unable to understand shorter version strings

I forked your repo and changed it to 1.2.1 and go get did work again.
Currently I'm only able to get v1.1.1

Could you release a new version with 3 digits?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.