Giter Club home page Giter Club logo

Comments (14)

thp avatar thp commented on June 30, 2024

What is your system locale and what does your shell script output?

from urlwatch.

marbon87 avatar marbon87 commented on June 30, 2024
» locale
LANG=de_DE.UTF-8
LC_CTYPE=de_DE.UTF-8
LC_NUMERIC=de_DE.UTF-8
LC_TIME=de_DE.UTF-8
LC_COLLATE=de_DE.UTF-8
LC_MONETARY=de_DE.UTF-8
LC_MESSAGES=de_DE.UTF-8
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=

the shell scripts prints a txt-file via cat

from urlwatch.

thp avatar thp commented on June 30, 2024

Ok, thanks. And which encoding is the text file in? (the file utility might give a clue, or enca or something).

from urlwatch.

marbon87 avatar marbon87 commented on June 30, 2024

File says the following

» file file.txt
file.txt: UTF-8 Unicode text

from urlwatch.

thp avatar thp commented on June 30, 2024

Thanks, I can work with that. Probably it makes most sense that urlwatch treats shell output as whatever the system encoding is set to, that's probably the best assumption we can make there.

from urlwatch.

thp avatar thp commented on June 30, 2024

What does Python3's sys.getdefaultencoding() give?

% python3
Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'

from urlwatch.

thp avatar thp commented on June 30, 2024

I cannot reproduce this issue here on Ubuntu:

% hexdump -C ~/foo
00000000  23 21 2f 62 69 6e 2f 73  68 0a 0a 65 63 68 6f 20  |#!/bin/sh..echo |
00000010  22 53 61 70 70 65 72 6c  c3 b6 74 3f 22 0a        |"Sapperl..t?".|
0000001e
% set | egrep -a "^(LC_|LANG)"
LANG=en_US.UTF-8
LANGUAGE=en_US
LC_ADDRESS=de_AT.UTF-8
LC_IDENTIFICATION=de_AT.UTF-8
LC_MEASUREMENT=de_AT.UTF-8
LC_MONETARY=de_AT.UTF-8
LC_NAME=de_AT.UTF-8
LC_NUMERIC=de_AT.UTF-8
LC_PAPER=de_AT.UTF-8
LC_TELEPHONE=de_AT.UTF-8
LC_TIME=de_AT.UTF-8
% ./urlwatch --list
1: ~/foo
% ./urlwatch
===========================================================================
01. CHANGED: ~/foo
===========================================================================

---------------------------------------------------------------------------
CHANGED: ~/foo
---------------------------------------------------------------------------
--- @   Fri, 12 Feb 2016 10:13:27 +0100
+++ @   Fri, 12 Feb 2016 10:14:08 +0100
@@ -1 +1 @@
-Sapperlöt
+Sapperlöt?

---------------------------------------------------------------------------


-- 
urlwatch 2.1, Copyright 2008-2016 Thomas Perl
Website: http://thp.io/2008/urlwatch/
watched 1 URLs in 0 seconds

from urlwatch.

marbon87 avatar marbon87 commented on June 30, 2024
>>> sys.getdefaultencoding()
'utf-8'

from urlwatch.

marbon87 avatar marbon87 commented on June 30, 2024

can you give me your test script?

from urlwatch.

thp avatar thp commented on June 30, 2024

By the way, the character \xfc is actually latin-1, not utf-8:

>>> b'\xfc'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 0: invalid start byte
>>> b'\xfc'.decode('latin-1')
'ü'

from urlwatch.

marbon87 avatar marbon87 commented on June 30, 2024

seems to have someting to do with wrong locales. the urlwatch job is executed with cron and there i have the following locales:

LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

I will try to set locales in the script but i am not sure if that works also for urlwatch

from urlwatch.

marbon87 avatar marbon87 commented on June 30, 2024

now i get the mail with the changes but also an exception in the cron log:

Traceback (most recent call last):
 File "/usr/bin/urlwatch", line 376, in <module>
   main(parser.parse_args())
 File "/usr/bin/urlwatch", line 343, in main
   report.finish()
 File "/usr/lib/python3.5/site-packages/urlwatch/handler.py", line 128, in finish
   ReporterBase.submit_all(self, self.job_states, duration)
 File "/usr/lib/python3.5/site-packages/urlwatch/reporters.py", line 81, in submit_all
   cls(report, cfg, job_states, duration).submit()
 File "/usr/lib/python3.5/site-packages/urlwatch/reporters.py", line 296, in submit
   print(self._green(line))
UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 6: ordinal not in range(128)

from urlwatch.

marbon87 avatar marbon87 commented on June 30, 2024

ok, i solved it by disabling detailed output to console in the config:

    details: false

from urlwatch.

thp avatar thp commented on June 30, 2024

Opened #51 as a possible fix for users in case somebody else runs into this.

from urlwatch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.