Comments (4)
I'm using below query spyql "SELECT count_agg(*) AS n FROM json('file.json')"
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2854: character maps to
from spyql.
Hi @LavanyaDS. This is a problem related to encodings, probably this (http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html#files-in-a-typical-platform-specific-encoding):
UnicodeDecodeError may be thrown when reading such files (if the data is not actually in the encoding returned by locale.getpreferredencoding())
One way to get the preferred encoding is:
$ spyql "IMPORT locale SELECT locale.getpreferredencoding()"
locale_getpreferredencoding
UTF-8
Likely, there is a mismatch between that file encoding and your system's preferred encoding. The solution on how spyql should support these cases is not completely obvious to me, maybe spyql should allow the specification of the encoding and/or allow choosing how to handle decoding errors.
One thing you can try is reading from the standard input:
spyql "SELECT count_agg(*) AS n FROM json" < file.json
Another things you can try:
- changing the system's preferred encoding
- re-encoding the file
Please let me know how it went and if you need further help (I will need to know what OS are you using).
from spyql.
@LavanyaDS Were you able to solve the problem?
We can prioritise support for different encodings and encoding errors handling.
from spyql.
@LavanyaDS There is one other option: setting the env variable PYTHONIOENCODING. With it, you can set the encoding and the behavior on encoding errors.
http://docs.python.org/3/using/cmdline.html#envvar-PYTHONIOENCODING
from spyql.
Related Issues (20)
- Double quotes HOT 2
- Column names with TEXT format HOT 3
- Interactive Spyql HOT 2
- EXPLODE OUTER
- JSON benchmark HOT 17
- 404 status on downloading sample.csv file HOT 2
- Support reading and writing of JSON objects (not JSON lines) HOT 1
- Support for line comments
- `parse_structure` performance can be improved by less regular expressions HOT 2
- [FR] COUNT(DISTINCT col1) ... GROUP BY HOT 3
- Cannot run CI/CD pipe on Python 3.11
- general text file support HOT 6
- [FR] join 2 CSV files HOT 9
- Cannot dump JSON object with Null values to string HOT 2
- Function to dump columns that are dictionaries as json
- how to get distinct top level json keys HOT 4
- improve error messaging when I'm holding it wrong HOT 1
- [Feature request] Support multiple tables in FROM clause HOT 1
- [Feature request] Define data access interface HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spyql.