Comments (3)
contents of our config file:
# pywb config file
# ========================================
#
# Settings for each collection
collections:
# <name>: <cdx_path>
# collection will be accessed via /<name>
# <cdx_path> is a string or list of:
# - string or list of one or more local .cdx file
# - string or list of one or more local dirs with .cdx files
# - a string value indicating remote http cdx server
ArtBase: /Users/rhiz/Desktop/my_archive/cdx/
# ex with filtering: filter CDX lines by filename starting with 'dupe'
#pywb-filt: {'index_paths': './sample_archive/cdx/', 'filters': ['filename:dupe*']}
# indicate if cdx files are sorted by SURT keys -- eg: com,example)/
# SURT keys are recommended for future indices, but non-SURT cdxs
# are also supported
#
# * Set to true if cdxs start with surts: com,example)/
# * Set to false if cdx start with urls: example.com)/
#
# default:
# surt_ordered: true
# list of paths prefixes for pywb look to 'resolve' WARC and ARC filenames
# in the cdx to their absolute path
#
# if path is:
# * local dir, use path as prefix
# * local file, lookup prefix in tab-delimited sorted index
# * http:// path, use path as remote prefix
# * redis:// path, use redis to lookup full path for w:<warc> as key
archive_paths: /Users/rhiz/Desktop/my_archive/warcs/
# The following are default settings -- uncomment to change
# Set to '' to disable the ui
# ==== UI: HTML/Jinja2 Templates ====
# template for <head> insert into replayed html content
#head_insert_html: ui/head_insert.html
# template to for 'calendar' query,
# eg, a listing of captures in response to a ../*/<url>
#
# may be a simple listing or a more complex 'calendar' UI
# if omitted, will list raw cdx in plain text
#query_html: ui/query.html
# template for search page, which is displayed when no search url is entered
# in a collection
#search_html: ui/search.html
# template for home page.
# if no other route is set, this will be rendered at /, /index.htm and /index.html
#home_html: ui/index.html
# error page temlpate for may formatting error message and details
# if omitted, a text response is returned
#error_html: ui/error.html
# ==== Other Paths ====
# list of host names that pywb will be running from to detect
# 'fallthrough' requests based on referrer
#
# eg: an incorrect request for http://localhost:8080/image.gif with a referrer
# of http://localhost:8080/pywb/index.html, pywb can correctly redirect
# to http://localhost:8080/pywb/image.gif
#
#hostpaths: ['http://localhost:8080']
# Rewrite urls with absolute paths instead of relative
#absoulte_paths: true
# List of route names:
# <route>: <package or file path>
# default route static/default for pywb defaults
static_routes:
static/default: pywb/static/
# ==== New / Experimental Settings ====
# Not yet production ready -- used primarily for testing
# Enable simple http proxy mode
enable_http_proxy: true
# enable cdx server api for querying cdx directly (experimental)
enable_cdx_api: true
# custom rules for domain specific matching
# set to false to disable
#domain_specific_rules: rules.yaml
# Memento support, enable
enable_memento: true
# Use lxml parser, if available
use_lxml_parser: false
# Replay content in an iframe
framed_replay: true
from pywb.
Issue was caused by improper encoding detection. To solve this issue, and potentially others, switching to just using raw bytes for html rewriting, as suggested by @despens
Since most encodings are ascii compatible, this should lead to better results. Will need to detect UTF-16 and other rare encodings and properly decode them, but in general seems like this will work.
from pywb.
Implemented in 70b7e29, to be part of 0.4.7 release
from pywb.
Related Issues (20)
- Canonicalize non-GET URLs with native JSON values HOT 1
- Some pages cannot be replayed successfully
- Not replaying XHR POST request from legacy collection HOT 1
- XML files not replaying with included XSL
- PYWB stripping out part of URLs on timeline page <url>#/<something> HOT 1
- Pywb failing to handle self-redirects from OutbackCDX HOT 4
- String not set as translatable in template HOT 1
- Indexing Errors with YouTube JSON in POST Request Payload
- switch_locale not adding locale if missing from URL
- No search results (by domain) if default_locale set HOT 1
- Strings not translatable in VueUI by default
- URL of the zoom image in VueUI not using static_prefix HOT 2
- warc.gz files created by grab-site throw multiple errors when adding to a collection HOT 2
- `this` rewriting can affect dynamically generated content on the page
- Invited card HOT 1
- Allow for catch-all wildcard *, in ACLs HOT 3
- relative imports in .js are not loaded
- Replaying webpack content
- pywb record inserting domain and collection name into recorded URL on specific sites HOT 2
- Vimeo HLS/m3u8 download support
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pywb.