hipstermojo / paperoni Goto Github PK
View Code? Open in Web Editor NEWAn article extractor in Rust
License: MIT License
An article extractor in Rust
License: MIT License
Hey, love your project!
With one site I was testing it, I noticed that the resulting epub contains an img src reference to the error 404 status page.
Reproduction is trivial:
<img src="nonexistent"/>
OEBPS/index.html
entry like this: <img src="a2dafe1d007e2607f549726f0e7c9886.html"></img>
and an OEBPS/a2dafe1d007e2607f549726f0e7c9886.html
containing the error 404 page.Probably the html fetching code in fetch_html
in http.rs
should be extracted into a helper function and used in download_images
too, as it handles both error status codes and redirection.
Not sure what would be the right course of action for when the image is missing, perhaps inserting the original (broken) url as source, so if the problem is transient and the image url starts to work, readers may load it then.
PS: this bug is only a manifestation of another bug I'm still hunting, looks as if the image url gets mangled somehow and that's what causes the 404. I'll open another bug as soon as I figured out what's happening.
I try cargo (+nightly) isntall paperoni --version 0.6.1-alpha1
, cargo (+nightly) install --path .
, but it say:
C:\Windows\SYSTEM32\cmd.exe - cargo +nightly install --path . Installing paperoni v0.6.1-alpha1 (D:\home\paperoni)
Updating crates.io index
Compiling paperoni v0.6.1-alpha1 (D:\home\paperoni)
Building [=======================> ] 382/383: paperoni(bin)
C
error: linking with `x86_64-w64-mingw32-gcc` failed: exit code: 1
|
= note: "x86_64-w64-mingw32-gcc" "-fno-use-linker-plugin" "-Wl,--dynamicbase" "-Wl,--disable-auto-image-base" "-m64" "-Wl,--high-entropy-va" "C:\\Users\\scillidan\\scoop\\persist\\rustup\
\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\self-contained\\crt2.o" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\ni
ghtly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\rsbegin.o" "C:\\Users\\SCILLI~1\\AppData\\Local\\Temp\\rustcOWz1yl\\symbols.o" "D:\\home\\paperoni\\target\\release\\d
eps\\paperoni-6e3f06bd8efa48db.paperoni.569cee38-cgu.0.rcgu.o" "D:\\home\\paperoni\\target\\release\\deps\\paperoni-6e3f06bd8efa48db.paperoni.569cee38-cgu.1.rcgu.o" "D:\\home\\paperoni\\tar
get\\release\\deps\\paperoni-6e3f06bd8efa48db.paperoni.569cee38-cgu.10.rcgu.o" "D:\\home\\paperoni\\target\\release\\deps\\paperoni-6e3f06bd8efa48db.paperoni.569cee38-cgu.11.rcgu.o" "D:\\ho
me\\paperoni\\target\\release\\deps\\paperoni-6e3f06bd8efa48db.paperoni.569cee38-cgu.12.rcgu.o" "D:\\home\\paperoni\\target\\release\\deps\\paperoni-6e3f06bd8efa48db.paperoni.569cee38-cgu.1
3.rcgu.o" "D:\\home\\paperoni\\target\\release\\deps\\paperoni-6e3f06bd8efa48db.paperoni.569cee38-cgu.14.rcgu.o" "D:\\home\\paperoni\\target\\release\\deps\\paperoni-6e3f06bd8efa48db.papero
ni.569cee38-cgu.15.rcgu.o" "D:\\home\\paperoni\\target\\release\\deps\\paperoni-6e3f06bd8efa48db.paperoni.569cee38-cgu.2.rcgu.o" "D:\\home\\paperoni\\target\\release\\deps\\paperoni-6e3f06b
d8efa48db.paperoni.569cee38-cgu.3.rcgu.o" "D:\\home\\paperoni\\target\\release\\deps\\paperoni-6e3f06bd8efa48db.paperoni.569cee38-cgu.4.rcgu.o" "D:\\home\\paperoni\\target\\release\\deps\\p
aperoni-6e3f06bd8efa48db.paperoni.569cee38-cgu.5.rcgu.o" "D:\\home\\paperoni\\target\\release\\deps\\paperoni-6e3f06bd8efa48db.paperoni.569cee38-cgu.6.rcgu.o" "D:\\home\\paperoni\\target\\r
elease\\deps\\paperoni-6e3f06bd8efa48db.paperoni.569cee38-cgu.7.rcgu.o" "D:\\home\\paperoni\\target\\release\\deps\\paperoni-6e3f06bd8efa48db.paperoni.569cee38-cgu.8.rcgu.o" "D:\\home\\pape
roni\\target\\release\\deps\\paperoni-6e3f06bd8efa48db.paperoni.569cee38-cgu.9.rcgu.o" "D:\\home\\paperoni\\target\\release\\deps\\paperoni-6e3f06bd8efa48db.3u2oyh1wkukqqmhx.rcgu.o" "-L" "D
:\\home\\paperoni\\target\\release\\deps" "-L" "D:\\home\\paperoni\\target\\release\\build\\wepoll-ffi-566b0ea5c26b07a2\\out" "-L" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.cargo\\reg
istry\\src\\github.com-1ecc6299db9ec823\\winapi-x86_64-pc-windows-gnu-0.4.0\\lib" "-L" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.cargo\\registry\\src\\github.com-1ecc6299db9ec823\\win
dows_x86_64_gnu-0.36.1\\lib" "-L" "D:\\home\\paperoni\\target\\release\\build\\curl-sys-883e38784dd70c3f\\out\\build" "-L" "D:\\home\\paperoni\\target\\release\\build\\libnghttp2-sys-e2b80c
b9428da129\\out\\i\\lib" "-L" "D:\\home\\paperoni\\target\\release\\build\\libz-sys-51ef1bb56c5bf4be\\out\\lib" "-L" "D:\\home\\paperoni\\target\\release\\build\\libz-sys-51ef1bb56c5bf4be\\
out\\lib" "-L" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib" "-Wl,-Bstatic" "D:\\home\\paperon
i\\target\\release\\deps\\libsurf-941a87862f03cf61.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libencoding_rs-6d74480f08bc7936.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libhtt
p_client-99e7c2e95ae90adb.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libhttp_types-bf255262bd492fd9.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libserde_qs-1dc99c4759284ce2.rli
b" "D:\\home\\paperoni\\target\\release\\deps\\libanyhow-d052505ef171e716.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libserde_urlencoded-b3ab6941713a2540.rlib" "D:\\home\\paperoni\\t
arget\\release\\deps\\libserde_json-9595b45a4a8b4770.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libryu-1bb4d87da5d581f2.rlib" "D:\\home\\paperoni\\target\\release\\deps\\librand-ee48
e7202ecf5c7a.rlib" "D:\\home\\paperoni\\target\\release\\deps\\librand_pcg-df7f14b0c9258021.rlib" "D:\\home\\paperoni\\target\\release\\deps\\librand_chacha-a6e8af39a2f5accd.rlib" "D:\\home
\\paperoni\\target\\release\\deps\\librand_core-f670257ea51c0fe0.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libgetrandom-68f0723244c0a290.rlib" "D:\\home\\paperoni\\target\\release\\
deps\\libinfer-4a9e4b400e1e98c9.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libcookie-b9dc88837c828cd2.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libhkdf-9acc0d44450110e0.rlib"
"D:\\home\\paperoni\\target\\release\\deps\\libhmac-e35eff58f5a461b3.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libcrypto_mac-fe54f22d8776ead8.rlib" "D:\\home\\paperoni\\target\\rel
ease\\deps\\libsha2-b9fd8f49ed65bcc1.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libcpufeatures-9fc9ed5dd92493c3.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libdigest-dc0c606987
902ccd.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libblock_buffer-c8d8f10d63513bda.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libaes_gcm-48771181c75228bf.rlib" "D:\\home\\pape
roni\\target\\release\\deps\\libghash-61c0bc28789868fa.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libpolyval-fa9b1adbbfa25811.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libcpu
id_bool-edb052b4c344f447.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libuniversal_hash-fc225984447af799.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libsubtle-0f165c6d74565e16.rl
ib" "D:\\home\\paperoni\\target\\release\\deps\\libctr-54cfb23bf8191968.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libaes-c67a5f32b80a286b.rlib" "D:\\home\\paperoni\\target\\release\
\deps\\libaes_soft-7e187c12c5d4ee17.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libopaque_debug-c2191d7fbe4b0d3d.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libcipher-293cfc4277
1f7969.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libaead-bfae4f918ca54ba9.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libgeneric_array-4a2bb9f543ef551b.rlib" "D:\\home\\papero
ni\\target\\release\\deps\\libtypenum-18b79e7d651700ab.rlib" "D:\\home\\paperoni\\target\\release\\deps\\librand-667ba3d5738049d8.rlib" "D:\\home\\paperoni\\target\\release\\deps\\librand_c
hacha-ac1861df96876e75.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libppv_lite86-65b19aa911940063.rlib" "D:\\home\\paperoni\\target\\release\\deps\\librand_core-290302d26cf176da.rlib"
"D:\\home\\paperoni\\target\\release\\deps\\libtime-92498302bd83e567.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libtime_macros-29da1f989b29fbd1.rlib" "D:\\home\\paperoni\\target\\re
lease\\deps\\libstandback-a2b954a01a9e1ba0.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libisahc-46d4d85288b2f2bd.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libsluice-2c6b4d2521
2ef222.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libtracing_futures-45ea5518f8eeb3de.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libtracing-3c2920162aa6adf1.rlib" "D:\\home\\p
aperoni\\target\\release\\deps\\libtracing_core-d1dc8ce030d48f55.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libpin_project-5c765da71849b8a9.rlib" "D:\\home\\paperoni\\target\\release
\\deps\\libhttp-f7befd5f142295d9.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libitoa-07fdda6d2c542af4.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libbytes-bdd84e023ffce305.rlib"
"D:\\home\\paperoni\\target\\release\\deps\\libfnv-f5458277a26775b1.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libbytes-1fcf76527294d030.rlib" "D:\\home\\paperoni\\target\\release\\
deps\\libflume-be9c9ec8df1533a8.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libspinning_top-2be2b4131ac57b22.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libcurl-a27132c48d797bc1
.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libcurl_sys-fd8dae3040406c95.rlib" "D:\\home\\paperoni\\target\\release\\deps\\liblibz_sys-129c1e69ae8d7916.rlib" "D:\\home\\paperoni\\tar
get\\release\\deps\\liblibnghttp2_sys-463b9290be936b7a.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libcrossbeam_utils-d93b578e78e4f21a.rlib" "D:\\home\\paperoni\\target\\release\\deps
\\libmd5-78980f1132112e88.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libderive_builder-3f3f0a5e0b68822e.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libdirectories-57ec9746dd5fd
3f6.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libdirs_sys-5319742fc9ff8253.rlib" "D:\\home\\paperoni\\target\\release\\deps\\liburl-11d4fa7aa775e612.rlib" "D:\\home\\paperoni\\targe
t\\release\\deps\\libidna-9a09dce00c5e3085.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libunicode_normalization-4f3af3f88d455c41.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libt
inyvec-282fcc9c51425297.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libtinyvec_macros-15dd4d62e08289d8.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libunicode_bidi-1c89d5f2d4e737
17.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libform_urlencoded-512cf539ab91fc00.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libpercent_encoding-9157cd0de8f01711.rlib" "D:\\ho
me\\paperoni\\target\\release\\deps\\libfutures-0aa611515e484a17.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libfutures_executor-459860b2650da301.rlib" "D:\\home\\paperoni\\target\\re
lease\\deps\\libfutures_util-9f7d4854ac92ea7e.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libfutures_channel-138ac52f9e2c19bf.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libfutu
res_sink-3ecac82732589a0f.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libfutures_task-11d44f9b7dd6505c.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libpin_utils-2a33a223f5e45755.
rlib" "D:\\home\\paperoni\\target\\release\\deps\\libasync_std-3697a8e51839ceb9.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libasync_global_executor-288ce09e38e4d13c.rlib" "D:\\home\\
paperoni\\target\\release\\deps\\libblocking-5a39e93c076a4804.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libatomic_waker-96bb34526222fa84.rlib" "D:\\home\\paperoni\\target\\release\\
deps\\libnum_cpus-469f0db9ef0bd23e.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libasync_executor-8f8cd993bccee6f0.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libasync_task-02e64
460c597395a.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libasync_io-593ef4d94c425d56.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libslab-0f5c8875e092ddcd.rlib" "D:\\home\\papero
ni\\target\\release\\deps\\libpolling-39c097c08e84c4ac.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libwepoll_ffi-f5568ea5a6a8d168.rlib" "D:\\home\\paperoni\\target\\release\\deps\\lib
socket2-a0281c32fc5938a2.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libfutures_lite-55bf2fe328a0b319.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libfastrand-7cc3aea138fcb154.rl
ib" "D:\\home\\paperoni\\target\\release\\deps\\libwaker_fn-8b2fc604cb7d4f31.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libparking-c91d73b1b6211785.rlib" "D:\\home\\paperoni\\target\
\release\\deps\\libfutures_io-60618bfe0ac76350.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libasync_channel-e28a913838e465c7.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libconcu
rrent_queue-effa9135397a43e3.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libcache_padded-d49296ffa48d11b9.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libasync_lock-737b3f16905da
ff3.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libevent_listener-584356e966bf8b41.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libpin_project_lite-26c55270422aeafc.rlib" "D:\\ho
me\\paperoni\\target\\release\\deps\\libfutures_core-7553c50dad5aaec9.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libkv_log_macro-67eabfc1fbfada7e.rlib" "D:\\home\\paperoni\\target\\r
elease\\deps\\libbase64-2623bdbb38cb76a5.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libkuchiki-495efbcf67a1c557.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libselectors-ec08a01
20521f20f.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libthin_slice-fd010b16451b24df.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libservo_arc-51428185e6d18ed3.rlib" "D:\\home\\p
aperoni\\target\\release\\deps\\libstable_deref_trait-e3dcf1cf1d61086f.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libnodrop-5839ea9cf90d05c5.rlib" "D:\\home\\paperoni\\target\\releas
e\\deps\\libfxhash-93bbfdab7af6d970.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libcssparser-626295b0a9a9df5a.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libitoa-f3ea2c1f79e20bf
9.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libdtoa_short-2f6931e4e2b9ba1d.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libdtoa-ceb60a25af145f5c.rlib" "D:\\home\\paperoni\\targ
et\\release\\deps\\libmatches-673d332bed71cd66.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libhtml5ever-bd1c562db35297d1.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libmarkup5ev
er-143f7f1ef5ea939b.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libstring_cache-39b7e105166aad09.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libprecomputed_hash-cd22d168187bf8a1
.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libphf_shared-182db4deb80648f6.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libparking_lot-9fcf82f1b7bdc6b2.rlib" "D:\\home\\paperoni
\\target\\release\\deps\\libparking_lot_core-abecfde63fbf61b6.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libwindows_sys-eb260f5d66f7c241.rlib" "D:\\home\\paperoni\\target\\release\\d
eps\\libphf-21702447219f6c21.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libphf_shared-3ce254eb735eb12e.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libsiphasher-6a9458c8e4b0c6c3
.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libtendril-43097fc0bfea7dd6.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libutf8-b58eb774f763e1fa.rlib" "D:\\home\\paperoni\\target\\
release\\deps\\libfutf-5e00f70bd211fa86.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libdebug_unreachable-405d3bae4156f0f2.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libmac-a3a2
1fe81790397f.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libepub_builder-4d57c25648e1a411.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libzip-f969dbab4d5225eb.rlib" "D:\\home\\pa
peroni\\target\\release\\deps\\libbyteorder-2e790e8fd0adb0ad.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libcrc32fast-92d382e81b74c0bf.rlib" "D:\\home\\paperoni\\target\\release\\deps
\\libuuid-32d5ba993cbb91ff.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libgetrandom-803cb8b47b65a4f2.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libtempdir-d479cd3e443e6879.rlib
" "D:\\home\\paperoni\\target\\release\\deps\\libremove_dir_all-cd7759012ba7705b.rlib" "D:\\home\\paperoni\\target\\release\\deps\\librand-6368db5252a48ebf.rlib" "D:\\home\\paperoni\\target
\\release\\deps\\libmustache-11d6dcb77e66cfdb.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libserde-dcbe26d07793b373.rlib" "D:\\home\\paperoni\\target\\release\\deps\\liblog-03e60c55c5
900028.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libhtml_escape-85eef5e0d9d12637.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libutf8_width-97fff62e5d3ee15f.rlib" "D:\\home\\pa
peroni\\target\\release\\deps\\liberror_chain-ea7af8eafc046cb7.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libbacktrace-7df18df9c752a9c0.rlib" "D:\\home\\paperoni\\target\\release\\de
ps\\libobject-d06d66cb43930848.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libaddr2line-21b586fa8959438b.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libgimli-94f88a511481939c.rl
ib" "D:\\home\\paperoni\\target\\release\\deps\\librustc_demangle-75e4730839c04250.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libitertools-e108faf68296b97e.rlib" "D:\\home\\paperoni\
\target\\release\\deps\\libeither-7f0f8876d9f0765e.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libflexi_logger-46ff6e540b6c50c0.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libgl
ob-2309bbbf46d0e011.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libyansi-d9418d92cdd41a5d.rlib" "D:\\home\\paperoni\\target\\release\\deps\\liblog-2592442ae2b45a1e.rlib" "D:\\home\\pa
peroni\\target\\release\\deps\\libvalue_bag-50e8b2dc53064c39.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libthiserror-78fb1545069ccd2e.rlib" "D:\\home\\paperoni\\target\\release\\deps
\\libclap-5300119fbcb515c4.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libyaml_rust-0870ffa9b7924b75.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libvec_map-c920af88a15282c2.rlib
" "D:\\home\\paperoni\\target\\release\\deps\\libtextwrap-7c88b45723a62d1b.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libunicode_width-812e654aff2d58dc.rlib" "D:\\home\\paperoni\\tar
get\\release\\deps\\libstrsim-d929cd944081eeee.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libchrono-77e277cec1c61992.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libnum_integer-
49a0edf4ea76e455.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libnum_traits-ff64e9202a6e2cc7.rlib" "D:\\home\\paperoni\\target\\release\\deps\\liblibc-936218c68b4b8bbd.rlib" "D:\\home\
\paperoni\\target\\release\\deps\\libtime-31c04ec7a5ed58ef.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libindicatif-86b0ad211e4e5b36.rlib" "D:\\home\\paperoni\\target\\release\\deps\\
libregex-2ceb1f31594e9516.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libaho_corasick-a8dca5bb2b873a4e.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libmemchr-a9faeb6c6dbe5c59.rli
b" "D:\\home\\paperoni\\target\\release\\deps\\libregex_syntax-12f6c7f4220418a8.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libconsole-4dc14ba48380af28.rlib" "D:\\home\\paperoni\\targ
et\\release\\deps\\libterminal_size-67fc02fe8e2f58fa.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libencode_unicode-649c125072d7b684.rlib" "D:\\home\\paperoni\\target\\release\\deps\\l
ibonce_cell-475b613bd2d1434b.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libnumber_prefix-db23409d4773d83e.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libcomfy_table-7a3280394f5
db496.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libstrum-6e97da475b0e4aa7.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libcrossterm-ff5264640a880de3.rlib" "D:\\home\\paperoni\\
target\\release\\deps\\libparking_lot-9d61e76bddda9149.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libparking_lot_core-bb3ef75db8879b4c.rlib" "D:\\home\\paperoni\\target\\release\\dep
s\\libsmallvec-836c4d0812f4724d.rlib" "D:\\home\\paperoni\\target\\release\\deps\\liblock_api-518ab45b93c70f2c.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libscopeguard-0452d9f37d2542
3f.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libinstant-3ac8d4963e6d96f1.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libcfg_if-f034564c707c0810.rlib" "D:\\home\\paperoni\\targ
et\\release\\deps\\libbitflags-fcf9e03ff1b10fb5.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libcrossterm_winapi-f72ee33b0193f02e.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libc
olored-b0097d4ee55cde61.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libatty-247d29dcd19754da.rlib" "D:\\home\\paperoni\\target\\release\\deps\\libwinapi-63360c72750a2d37.rlib" "D:\\ho
me\\paperoni\\target\\release\\deps\\liblazy_static-2502d935f698c842.rlib" "-Wl,--start-group" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-
gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\libstd-4edeb2b547c3769d.rlib" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\
\x86_64-pc-windows-gnu\\lib\\libpanic_unwind-1ec8c03660bafa69.rlib" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-p
c-windows-gnu\\lib\\libobject-a1587f883db5c56c.rlib" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\
lib\\libmemchr-c47ca0e2f91f3c80.rlib" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\libaddr2li
ne-8b3acd7654cf134f.rlib" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\libgimli-a17b37a6d55f6
0e6.rlib" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\librustc_demangle-5706c0c644fb9483.rli
b" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\libstd_detect-1ae2683e9a90e88e.rlib" "C:\\Use
rs\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\libhashbrown-f4eb45558630f23f.rlib" "C:\\Users\\scillida
n\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\libminiz_oxide-7fa805f5a02308a4.rlib" "C:\\Users\\scillidan\\scoop\\
persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\libadler-970338302df777b4.rlib" "C:\\Users\\scillidan\\scoop\\persist\\rustup\
\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\librustc_std_workspace_alloc-edefbb04e1133746.rlib" "C:\\Users\\scillidan\\scoop\\persist\\rus
tup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\libunwind-5dc216d7b02b1410.rlib" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\t
oolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\libcfg_if-1013dfef24609176.rlib" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nigh
tly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\liblibc-b3414282f43d4367.rlib" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-win
dows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\liballoc-63ed9b2d46d87edf.rlib" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\r
ustlib\\x86_64-pc-windows-gnu\\lib\\librustc_std_workspace_core-043dcb5cef4e65e2.rlib" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib
\\rustlib\\x86_64-pc-windows-gnu\\lib\\libcore-0e5c5feeef4bd6da.rlib" "-Wl,--end-group" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\li
b\\rustlib\\x86_64-pc-windows-gnu\\lib\\libcompiler_builtins-6737bc89b09eace9.rlib" "-Wl,-Bdynamic" "-ladvapi32" "-lws2_32" "-lcrypt32" "-lwindows" "-lbcrypt" "-lwinapi_advapi32" "-lwinapi_
bcrypt" "-lwinapi_cfgmgr32" "-lwinapi_credui" "-lwinapi_crypt32" "-lwinapi_cryptnet" "-lwinapi_fwpuclnt" "-lwinapi_gdi32" "-lwinapi_kernel32" "-lwinapi_msimg32" "-lwinapi_mswsock" "-lwinapi
_ncrypt" "-lwinapi_ntdll" "-lwinapi_ole32" "-lwinapi_opengl32" "-lwinapi_secur32" "-lwinapi_shell32" "-lwinapi_synchronization" "-lwinapi_user32" "-lwinapi_winspool" "-lwinapi_ws2_32" "-lad
vapi32" "-luserenv" "-lkernel32" "-lws2_32" "-lbcrypt" "-lgcc_eh" "-l:libpthread.a" "-lmsvcrt" "-lmingwex" "-lmingw32" "-lgcc" "-lmsvcrt" "-luser32" "-lkernel32" "-Wl,--nxcompat" "-nostartf
iles" "-L" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib" "-L" "C:\\Users\\scillidan\\scoop\\pe
rsist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\self-contained" "-o" "D:\\home\\paperoni\\target\\release\\deps\\paperoni-6e3f06
bd8efa48db.exe" "-Wl,--gc-sections" "-no-pie" "-Wl,-O1" "-nodefaultlibs" "C:\\Users\\scillidan\\scoop\\persist\\rustup\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86
_64-pc-windows-gnu\\lib\\rsend.o"
= note: D:\home\paperoni\target\release\deps\libcurl_sys-fd8dae3040406c95.rlib(url.o):url.c:(.text$Curl_init_userdefined+0xb): undefined reference to `__imp___acrt_iob_func'
D:\home\paperoni\target\release\deps\libcurl_sys-fd8dae3040406c95.rlib(url.o):url.c:(.text$Curl_open+0x73): undefined reference to `__imp___acrt_iob_func'
D:\home\paperoni\target\release\deps\libcurl_sys-fd8dae3040406c95.rlib(cookie.o):cookie.c:(.text$Curl_cookie_init+0x4e): undefined reference to `__imp___acrt_iob_func'
D:\home\paperoni\target\release\deps\libcurl_sys-fd8dae3040406c95.rlib(cookie.o):cookie.c:(.text$Curl_flush_cookies+0x88): undefined reference to `__imp___acrt_iob_func'
D:\home\paperoni\target\release\deps\libcurl_sys-fd8dae3040406c95.rlib(doh.o):doh.c:(.text$dohprobe+0x35e): undefined reference to `__imp___acrt_iob_func'
D:\home\paperoni\target\release\deps\libcurl_sys-fd8dae3040406c95.rlib(mprintf.o):mprintf.c:(.text$curl_mprintf+0x29): more undefined references to `__imp___acrt_iob_func' follow
= help: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified
= note: use the `-l` flag to specify native libraries to link
= note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo (see https://doc.rust-lang.org/cargo/reference/build-scripts.html#cargorustc-link-libki
ndname)
error: could not compile `paperoni` due to previous error
error: failed to compile `paperoni v0.6.1-alpha1 (D:\home\paperoni)`, intermediate artifacts can be found at `D:\home\paperoni\target`
I install the nuwen-mingw-gcc, looks like itβs related to this.
I'm a rookie, I read its prompt but final not successful :(
Produced with version 0.4.0-alpha1
Some pages have some of their img src
attributes changed by the readibility code, resulting in broken urls that causes error 404 during fetching.
One page where I've noticed this bug is https://blindsquirrel.substack.com/p/etsy-the-power-of-special
Here the first image ("Strong & Competitive Market Position") gets mangled. It's img
node is this (it's quite funky and has a data-attrs attribute, but the src url works):
<img src="https://cdn.substack.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba6067a-991d-47a0-b1ea-b24adf135f82_1099x454.png" data-attrs="{"src":"https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7ba6067a-991d-47a0-b1ea-b24adf135f82_1099x454.png","height":454,"width":1099,"resizeWidth":null,"bytes":177952,"alt":null,"title":null,"type":"image/png","href":null}" alt="">
Then somehow during readibility processing src
attribute gets changed to this value, appears to be the same as data-attrs
attribute value, just url-decoded (the \ escaping is introduced during logging):
{\"src\":\"https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7ba6067a-991d-47a0-b1ea-b24adf135f82_1099x454.png\",\"height\":454,\"width\":1099,\"resizeWidth\":null,\"bytes\":177952,\"alt\":null,\"title\":null,\"type\":\"image/png\",\"href\":null}
As it no longer looks like an absolute url (starts with {"src": "https://
instead of https://
), fix_relative_uris
will prepend the original base url, https://blindsquirrel.substack.com/p/
, resulting in this broken url:
https://blindsquirrel.substack.com/p/%7B%22src%22:%22https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/
images/7ba6067a-991d-47a0-b1ea-b24adf135f82_1099x454.png%22,%22height%22:454,%22width%22:1099,%22resizeWidth%22:null,%22bytes%22:177952,%22alt%22:null,%22title%22:null,%22type%22:%22image/png%22,%22href%22:null%7D, new_url=https://blindsquirrel.substack.com/p/%7B%22src%22:%22https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7ba6067a-991d-47a0-b1ea-b24adf135f82_1099x454.png%22,%22height%22:454,%22width%22:1099,%22resizeWidth%22:null,%22bytes%22:177952,%22alt%22:null,%22title%22:null,%22type%22:%22image/png%22,%22href%22:null%7D
The weird thing is that not all images have this problem. Here's an img
node that does not get mangled:
<img src="https://cdn.substack.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9f4b3eaa-0537-403d-b9c8-a91f88724cc8_959x617.png" data-attrs="{"src":"https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/9f4b3eaa-0537-403d-b9c8-a91f88724cc8_959x617.png","height":617,"width":959,"resizeWidth":576,"bytes":null,"alt":"Focus on Marketplace \nin Six Core Geographies \nCore markets have our largest concentrations ofbuyers and \nsellers and present the most significant growth opportunities \nTOP GEOGRAPHIES: \nWe are building local marketplaces globally and deepening \nlocal Etsy communities globally \nLarge and Growing Addressable Market \nEstimated 2018 Total Retail and Online TAM \nSl.7T \neat \n049B \nto ALL \nretail top geqraphies \n-SIOOB \nGER \"ANY \n8.9B \n2018 GLOBAL GMS \nETSY MARKET SHARE ","title":null,"type":null,"href":null}" alt="Focus on Marketplace
in Six Core Geographies
Core markets have our largest concentrations ofbuyers and
sellers and present the most significant growth opportunities
TOP GEOGRAPHIES:
We are building local marketplaces globally and deepening
local Etsy communities globally
Large and Growing Addressable Market
Estimated 2018 Total Retail and Online TAM
Sl.7T
eat
049B
to ALL
retail top geqraphies
-SIOOB
GER "ANY
8.9B
2018 GLOBAL GMS
ETSY MARKET SHARE ">
The only difference I've noticed is that every img
node that fails has an alt=""
attribute and every good img
has a non-empty alt
attribute.
Would it be possible to include the page title (the first header) in the epub chapters? Maybe as an option?
I'm using --merge
more now but it's a bit confusing at time when I'm reading the generated epub and a new "chapter" starts without the original title for context. Even in single-url ebooks, I'm sometimes missing the title at the top of the document.
Hey! I'm interested in a Rust library that can perform the "mozilla readability" algorithm. I noticed that paperoni is not using any third party dependency to do the algorithm, does that mean that you implemented it yourself?
Is it possible to use paperoni as a library?
I noticed there is a Rust version of the readability library: https://github.com/kumabook/readability
Have you considered using that and contributing to it?
Cheers, great work!
Hi! I've been using paperoni for a few weeks to load web articles on my e-reader, and I'm very grateful that you made this and gave it to the world! π
One feature I'm missing is the ability to have multiple URLs bundled together as a single epub file. E.g. for multi-part blog articles or actual books that are published on the web with a URL per chapter.
Do you think it could be interesting to add this to paperoni?
I've done a bit of Rust, so if you're interested but don't have time to implement it, I might be able to contribute with some guidance to navigate the code π§
I recently installed paperoni 0.5.0 (from 0.4.1) and I suspect the new CSS is interacting with the epub rendering on my e-reader (remarkable 2) in a such a way that the text becomes so small it is barely readable anymore. In addition, but that's a minor issue, the spacing around the titles has become a bit extreme, at least to my taste.
Here's an example of the two first page of an article converted to epub by paperoni and rendered on the tablet.
My understanding is that consistent epub rendering is hard as every device/manufacturer/app does things differently (a bit like the web of two decades ago...)
Maybe paperoni could have an option to disable the custom CSS to work around issues like this?
Hi, Is this project still being developed or maintained. I see from your comment in #22 that you had intended to do a release in late Jan 2022. You appear not have have made any changes since.
I had been thinking about developing just this sort of tool in Rust before I found paperoni
, and would be happy to work with you on it, or if you are no longer interested in the project, to adopt it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.