Giter Club home page Giter Club logo

Comments (3)

protonpopsicle avatar protonpopsicle commented on May 28, 2024

contents of our config file:

# pywb config file
# ========================================
#
# Settings for each collection

collections:
    # <name>: <cdx_path>
    # collection will be accessed via /<name>
    # <cdx_path> is a string or list of:
    #  - string or list of one or more local .cdx file
    #  - string or list of one or more local dirs with .cdx files
    #  - a string value indicating remote http cdx server
    ArtBase: /Users/rhiz/Desktop/my_archive/cdx/

    # ex with filtering: filter CDX lines by filename starting with 'dupe'
    #pywb-filt: {'index_paths': './sample_archive/cdx/', 'filters': ['filename:dupe*']}

# indicate if cdx files are sorted by SURT keys -- eg: com,example)/
# SURT keys are recommended for future indices, but non-SURT cdxs
# are also supported
#
#   * Set to true if cdxs start with surts: com,example)/
#   * Set to false if cdx start with urls: example.com)/
#
# default:
# surt_ordered: true

# list of paths prefixes for pywb look to 'resolve'  WARC and ARC filenames
# in the cdx to their absolute path
#
# if path is:
#   * local dir, use path as prefix
#   * local file, lookup prefix in tab-delimited sorted index
#   * http:// path, use path as remote prefix
#   * redis:// path, use redis to lookup full path for w:<warc> as key

archive_paths: /Users/rhiz/Desktop/my_archive/warcs/

# The following are default settings -- uncomment to change
# Set to '' to disable the ui

# ==== UI: HTML/Jinja2 Templates ====

# template for <head> insert into replayed html content
#head_insert_html: ui/head_insert.html

# template to for 'calendar' query,
# eg, a listing of captures  in response to a ../*/<url>
#
# may be a simple listing or a more complex 'calendar' UI
# if omitted, will list raw cdx in plain text
#query_html: ui/query.html

# template for search page, which is displayed when no search url is entered
# in a collection
#search_html: ui/search.html

# template for home page.
# if no other route is set, this will be rendered at /, /index.htm and /index.html
#home_html: ui/index.html


# error page temlpate for may formatting error message and details
# if omitted, a text response is returned
#error_html: ui/error.html

# ==== Other Paths ====

# list of host names that pywb will be running from to detect
# 'fallthrough' requests based on referrer
#
# eg: an incorrect request for http://localhost:8080/image.gif with a referrer
# of http://localhost:8080/pywb/index.html, pywb can correctly redirect
# to http://localhost:8080/pywb/image.gif
#

#hostpaths: ['http://localhost:8080']

# Rewrite urls with absolute paths instead of relative
#absoulte_paths: true

# List of route names:
# <route>: <package or file path>
# default route static/default for pywb defaults
static_routes:
          static/default: pywb/static/

# ==== New / Experimental Settings ====
# Not yet production ready -- used primarily for testing

# Enable simple http proxy mode
enable_http_proxy: true

# enable cdx server api for querying cdx directly (experimental)
enable_cdx_api: true

# custom rules for domain specific matching
# set to false to disable
#domain_specific_rules: rules.yaml

# Memento support, enable
enable_memento: true

# Use lxml parser, if available
use_lxml_parser: false

# Replay content in an iframe
framed_replay: true

from pywb.

ikreymer avatar ikreymer commented on May 28, 2024

Issue was caused by improper encoding detection. To solve this issue, and potentially others, switching to just using raw bytes for html rewriting, as suggested by @despens
Since most encodings are ascii compatible, this should lead to better results. Will need to detect UTF-16 and other rare encodings and properly decode them, but in general seems like this will work.

from pywb.

ikreymer avatar ikreymer commented on May 28, 2024

Implemented in 70b7e29, to be part of 0.4.7 release

from pywb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.