Giter Club home page Giter Club logo

redarc's Introduction

BTC: bc1qefrtlq8yw0qeljyf4pfj7qrg8fd6edaswhrh4l

ETH: 0xf31Ba658fff0D85829991Ed292f55234234B00d5

redarc's People

Contributors

ryebreadgit avatar yakabuff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

redarc's Issues

Anonymous/Username Option.

Add options to enable/disbable usernames in browsable index.

Default should stay anonymous but add the option to enable usernames, ideally this should be able to be turned on/off without having to reroll.

Add r/libreal, r/conservative to UI?

A lot of academic research on Reddit involves political stuff. I know the UI is mainly supposed to be a demo, but if you are going to add a few more subreddits those might be interesting contenders.

Thanks so much for making and maintaining this -- really awesome project and super impressive progress so far!

KeyError: 'gilded'

Traceback (most recent call last):
  File "/home/red/redarc_docker/redarc/scripts/load_sub.py", line 33, in <module>
    gilded = sub_dict['gilded']
KeyError: 'gilded'

When entering AskReddit_submissions

Elasticsearch Documentation

Can we get some more documentation on elasticsearch? docker-compose-es.yml seems to get it running, but setting the host to http://localhost doesn't seem to work as I get "curl: (7) Failed to connect to localhost port 9200 after 2 ms: Connection refused" when trying to use es_batch.sh. I noticed that the docker container created by compose doesn't mention port 9200 anywhere so I tried adding them in, but get the same error.

EDIT:
Changing the ports in compose to "127.0.0.1:9200:9200" seems to get the connection working. But now I'm getting this error,

split: illegal option -- -
usage: split [-l line_count] [-a suffix_length] [file [prefix]]
       split -b byte_count[K|k|M|m|G|g] [-a suffix_length] [file [prefix]]
       split -n chunk_count [-a suffix_length] [file [prefix]]
       split -p pattern [-a suffix_length] [file [prefix]]
split: illegal option -- -
usage: split [-l line_count] [-a suffix_length] [file [prefix]]
       split -b byte_count[K|k|M|m|G|g] [-a suffix_length] [file [prefix]]
       split -n chunk_count [-a suffix_length] [file [prefix]]
       split -p pattern [-a suffix_length] [file [prefix]]
Warning: Couldn't read data from file "./torrents_es_sub.*", this makes an
Warning: empty POST.
{"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}Warning: Couldn't read data from file "./torrents_es_com.*", this makes an
Warning: empty POST.
{"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}%

EDIT 2:
I had to remove --verbose from the split commands in es_batch.sh. It seems to run fine, but the search still doesn't work. I get: Error 500. Something went wrong or searching is disabled

EDIT 3:
The data appears to be in the volume in the es01 container:
Untitled

However, the file size of 19.2MB matches the processed submission size, but 86.9MB does not match the match the processed comments size, which is 94.1MB. I'm not sure if this is the problem.

Error with npm ci

First off, thanks so much for starting this project. Much appreciated!

I have followed your instructions for Docker installation. The first 3 commands work fine. But then when running the $ docker build . -t redarc command, I get the below error.

`ERROR [12/16] RUN npm ci 0.5s

[12/16] RUN npm ci:
#0 0.512 npm ERR! code EUSAGE
#0 0.514 npm ERR!
#0 0.514 npm ERR! The npm ci command can only install with an existing package-lock.json or
#0 0.514 npm ERR! npm-shrinkwrap.json with lockfileVersion >= 1. Run an install with npm@5 or
#0 0.514 npm ERR! later to generate a package-lock.json file, then try again.
#0 0.515 npm ERR!
#0 0.515 npm ERR! Clean install a project
#0 0.515 npm ERR!
#0 0.515 npm ERR! Usage:
#0 0.515 npm ERR! npm ci
#0 0.515 npm ERR!
#0 0.515 npm ERR! Options:
#0 0.515 npm ERR! [-S|--save|--no-save|--save-prod|--save-dev|--save-optional|--save-peer|--save-bundle]
#0 0.515 npm ERR! [-E|--save-exact] [-g|--global] [--global-style] [--legacy-bundling]
#0 0.515 npm ERR! [--omit <dev|optional|peer> [--omit <dev|optional|peer> ...]]
#0 0.515 npm ERR! [--strict-peer-deps] [--no-package-lock] [--foreground-scripts]
#0 0.515 npm ERR! [--ignore-scripts] [--no-audit] [--no-bin-links] [--no-fund] [--dry-run]
#0 0.515 npm ERR! [-w|--workspace [-w|--workspace ...]]
#0 0.515 npm ERR! [-ws|--workspaces] [--include-workspace-root] [--install-links]
#0 0.515 npm ERR!
#0 0.515 npm ERR! aliases: clean-install, ic, install-clean, isntall-clean
#0 0.515 npm ERR!
#0 0.515 npm ERR! Run "npm help ci" for more info
#0 0.516
#0 0.516 npm ERR! A complete log of this run can be found in:
#0 0.516 npm ERR! /root/.npm/_logs/2023-06-01T13_12_09_578Z-debug-0.log


Dockerfile:16

14 |
15 | RUN mv config_default.json config.json
16 | >>> RUN npm ci
17 |
18 | WORKDIR /redarc/redarc-frontend

ERROR: failed to solve: process "/bin/sh -c npm ci" did not complete successfully: exit code: 1`

Documentation Clarification Needed

I'm running into a number of errors attempting to get this running. If any of them are real errors and not my own mistakes, then I will create separate issues for them. For now, I am assuming this is my own misunderstanding of the instructions, hence this issue requesting better documentation.

  1. Instructions to download the contents of git into working directory missing from docker setup (realized this when I read what docker build does).
  2. Instructions on where to place the reddit dumps missing.
  3. List of prerequisites missing (and/or not everything installed by scripts)
  4. Is elasticsearch something I need to set up entirely separately and direct redarc to? I'm confused as to how this works.

I can't for the life of me figure this second one out. I tried searching the codebase for references to the submissions zst files and couldn't find anything.

For number 3, I found I needed to get the first script running:
python3
python3-pip
pip install pyscopg2-binary

I ran the first script on reddit/submissions/2023-09.zst. I am unsure if that's what I'm supposed to do. Anyway, I tried running it from both docker exec inside the redarc container and from outside the container. Either way, I would get some sort of connection error. Wrong password or connection refused, depending on... I don't know. Oddly, it seems to be attempting to connect to localhost? That's not where the postgres db is. And the working directory is only in the redarc container. Maybe I'm misunderstanding this.

The web frontend does load. But obviously as above, there's no subreddits listed.

Cheers and thanks for the excellent frontend.

Correct API search capitalization

When I first setup I was checking submissions?author=-Archivist and it worked,
but now I have to use submissions?author=-archivist (lower case A)

The same with subs subreddit=AskReddit it's now subreddit=askreddit

I think this is breaking, while reddit ignores capitalization it returns pages for both upper and lower, so your api should too for when people copy paste things as they are on reddit, examples......

https://old.reddit.com/user/-archivist/
https://old.reddit.com/user/-Archivist/
https://old.reddit.com/r/askreddit/ (actually corrects/redirs)
https://old.reddit.com/r/AskReddit/

Suggested fix would be to support both ideally given inconsistencies in ps data.

Using API

Sorry this may not be an issue but just a very basic question. I have now successfully built with docker, and I do see the Redarc site when opening localhost. However, I'm not sure how exactly to query things. Based on your API instructions, I tried things like this:

http://localhost/api/search/comments?body=love

But this would always result in an 'Internal Server Error'. Any help would be much appreciated!

Add loading notifications.

Add loading content in browsable indexes. when the database is busy I've noticed the page looks like it's doing nothing for 4-10 seconds when loading a thread, it would be nice to notify users that content is loading rather than leaving them thinking it's doing nothing.

Add dark mode.

Everything has to have a dark mode these days ๐Ÿคฃ

Collapse removed/deleted comments.

Add toggle to hide/collapse removed/deleted comments in threads to prevent having to scroll through heavily nuked threads to find remaining comments.

Don't remove all together due to remaining comments sometimes being nested among removed.

/applications/postgres-docker isn't shared properly

After run docker pull postgres , I tried to run this with given code, however I got this error below,

The path /applications/postgres-docker is not shared from the host and is not known to Docker. You can configure shared paths from Docker -> Preferences... -> Resources -> File Sharing.

It looks like docker pull postgres does not work properly, but I'm not sure yet.
Please help me to solve this problem, Thanks.

Add alt image for thumbnail toggle.

Add an alternate image when the thread has no thumbnail and make it more clear what this is. Change 'toggle' to 'toggle thumbnail'? (also note to users that it's external?)

Roadmap

  • fix indexer/optimize index strategy
  • replace elasticsearch with postgres fts
  • change date to unix timestamps + refactor app
  • rename folders: ingest, frontend, api
  • refactor api. replace express api with python
  • subreddit watcher
  • image downloader queue
  • image api/view image on frontend
  • turn each worker into separate docker container
  • image_downloader container
  • subreddit_worker container
  • reddit_worker container
  • index_worker container
  • redis container
  • rotating logs
  • permalinks
  • image file name
  • image backfill script
  • thread backfill script
  • tweak dockerignores
  • pin deps
  • add subreddits to archive via web ui
  • unlist subreddits via web ui
  • add postgres POSTGRES_PASSWORD envar to .env file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.