Giter Club home page Giter Club logo

Comments (5)

lipoja avatar lipoja commented on May 26, 2024 1

Thank you for using this library. And bigger THANKS for reporting this issue! I will check it and try to fix it soon.

from urlextract.

dominikstraessle avatar dominikstraessle commented on May 26, 2024

I've got the same issue with the following error message:

ValueError                                Traceback (most recent call last)
<ipython-input-158-447ea8a54a6f> in <module>()
      1 text = '[img: http://www.newsisfree.com/images/fark/smh.com.au.gif ([smh.com.au])]'
----> 2 urls = list(set(url_extractor.find_urls(text)))
      3 urls

~/.local/lib/python3.6/site-packages/urlextract/urlextract_core.py in find_urls(self, text, only_unique)
    754         urls = self.gen_urls(text)
    755         urls = set(urls) if only_unique else urls
--> 756         return list(urls)
    757 
    758     def has_urls(self, text):

~/.local/lib/python3.6/site-packages/urlextract/urlextract_core.py in gen_urls(self, text)
    737             validated = self._validate_tld_match(text, tld, offset + tld_pos)
    738             if tld_pos != -1 and validated:
--> 739                 tmp_url = self._complete_url(text, offset + tld_pos, tld)
    740                 if tmp_url:
    741                     yield tmp_url

~/.local/lib/python3.6/site-packages/urlextract/urlextract_core.py in _complete_url(self, text, tld_pos, tld)
    558         complete_url = self._remove_enclosure_from_url(
    559             complete_url, tld_pos-start_pos, tld)
--> 560         if not self._is_domain_valid(complete_url, tld):
    561             return ""
    562 

~/.local/lib/python3.6/site-packages/urlextract/urlextract_core.py in _is_domain_valid(self, url, tld)
    630         # <scheme>://<authority>/<path>?<query>#<fragment>
    631 
--> 632         host = url_parts.gethost()
    633         if not host:
    634             return False

~/.local/lib/python3.6/site-packages/uritools/split.py in gethost(self, default, errors)
    155             return _ip_literal(host[1:-1])
    156         elif host.startswith(self.LBRACKET) or host.endswith(self.RBRACKET):
--> 157             raise ValueError('Invalid host %r' % host)
    158         # TODO: faster check for IPv4 address?
    159         return _ipv4_address(host) or uridecode(host, 'utf-8', errors).lower()

ValueError: Invalid host 'smh.com.au]'

Code:

text = '[img: http://www.newsisfree.com/images/fark/smh.com.au.gif ([smh.com.au])]'
urls = list(set(url_extractor.find_urls(text)))
urls

I'm using Python 3.6.5 and IPython 6.4.0

from urlextract.

lipoja avatar lipoja commented on May 26, 2024

@dominikstraessle Hi, could tell me what version of urlextract are you using? I can not reproduce your error with the provided text and current version (0.9) of urlextract.

OK, I manage to reproduce the error. Thanks.

from urlextract.

lipoja avatar lipoja commented on May 26, 2024

Issue should be fixed, I've added both text to tests files.

@thoppe Right now urlextract does not return text "et.al.[10]" as URL. If you are parsing something specific you might use your own settings that fit your needs. For example by setting the stop characters using set_stop_chars_right and set_stop_chars_left.

from urlextract.

thoppe avatar thoppe commented on May 26, 2024

from urlextract.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.