Giter Club home page Giter Club logo

kleingebaeck's Introduction

Kleingebäck - kleinanzeigen.de Backup

Kleingebaeck Logo

Go Report Card Actions Go Coverage GitHub License GitHub release German

Die deutsche Version des READMEs findet Ihr hier.

This tool can be used to backup ads on the german ad page https://kleinanzeigen.de

It downloads all (or only the specified ones) ads of one user into a directory, each ad into its own subdirectory. The backup will contain a textfile Adlisting.txt which contains the ad contents as the title, body, price etc. All images will be downloaded as well.

Screenshots

This is the index of my kleinanzeigen.de Account:

Index

Here I download my ads on the commandline:

Download

And this is the backup directory after download:

Download

Here's a directory for one ad:

Download

The same thing under windows:

Downloading ads:

Download

Backup directory after download:

Download

And one ad listing directory:

Download

Installation

The tool doesn't need authentication and doesn't have any dependencies. Just download the binary for your platform from the releases page and you're good to go.

Installation using a pre-compiled binary

Go to the latest release page and look for your OS and platform. There are two options to install the binary:

  1. Directly download the binary for your platform, e.g. kleingebaeck-linux-amd64-0.0.5, rename it to kleingebaeck (or whatever you like more!) and put it into your bin dir (e.g. $HOME/bin or as root to /usr/local/bin).

Be sure to verify the signature of the binary file. For this also download the matching kleingebaeck-linux-amd64-0.0.5.sha256 file and:

cat kleingebaeck-linux-amd64-0.0.5.sha25 && sha256sum kleingebaeck-linux-amd64-0.0.5

You should see the same SHA256 hash.

  1. You may also download a binary tarball for your platform, e.g. kleingebaeck-linux-amd64-0.0.5.tar.gz, unpack and install it. GNU Make is required for this:
tar xvfz kleingebaeck-linux-amd64-0.0.5.tar.gz
cd kleingebaeck-linux-amd64-0.0.5
sudo make install

Installation from source

You will need the Golang toolchain in order to build from source. GNU Make will also help but is not strictly neccessary.

If you want to compile the tool yourself, use git clone to clone the repository. Then execute go mod tidy to install all dependencies. Then just enter go build or - if you have GNU Make installed - make.

To install after building either copy the binary or execute sudo make install.

Using the docker image

A pre-built docker image is available, which you can use to test the app without installing it. To download:

docker pull ghcr.io/tlinden/kleingebaeck:latest

To execute kleingebaeck inside the image and download ads to a local directory, do something like this:

mkdir myads
docker run -u `id -u $USER` -v ./myads:/backup ghcr.io/tlinden/kleingebaeck:latest -u XXX -v
ls -l myads/ein-buch-mit-leeren-seiten
total 792
drwxr-xr-x 2 scip root   4096 Jan 23 12:58 ./
drwxr-xr-x 3 scip scip   4096 Jan 23 12:58 ../
-rw-r--r-- 1 scip root 131650 Jan 23 12:58 1.jpg
-rw-r--r-- 1 scip root  81832 Jan 23 12:58 2.jpg
-rw-r--r-- 1 scip root 134050 Jan 23 12:58 3.jpg
-rw-r--r-- 1 scip root   1166 Jan 23 12:58 Adlisting.txt

We map the local user to the one inside the image so the permission will match. You'll need to create the directory first before executing docker run. And the local directory myads will be mapped to /backup inside the container.

The options -u XXX -v are kleingebaeck options, replace XXX with your actual kleinanzeigen.de user id.

A list of available images is here

Commandline options:

Usage: kleingebaeck [-dvVhmoc] [<ad-listing-url>,...]
Options:
-u --user    <uid>      Backup ads from user with uid <uid>.
-d --debug              Enable debug output.
-v --verbose            Enable verbose output.
-o --outdir  <dir>      Set output dir (default: current directory)
-l --limit   <num>      Limit the ads to download to <num>, default: load all.
-c --config  <file>     Use config file <file> (default: ~/.kleingebaeck).
   --ignoreerrors       Ignore HTTP errors, may lead to incomplete ad backup.
-m --manual             Show manual.
-h --help               Show usage.
-V --version            Show program version.

If one  or more <ad-listing-url>'s  are specified, only  backup those,
otherwise backup all ads of the given user.

Configfile

You can create a config file to save typing. By default ~/.kleingebaeck is being used but you can specify one with -c as well.

Format is simple:

user = 1010101
loglevel = verbose
outdir = "test"

Environment Variables

Kleingebaeck can also be configured using environment variables. Just prefix the config variables with KLEINGEBAECK_ and put them to upper case. Eg:

% KLEINGEBAECK_OUTDIR=/backup kleingebaeck -v

Usage

To setup the tool, you need to lookup your userid on kleinanzeigen.de. Go to your ad overview page while NOT being logged in:

https://www.kleinanzeigen.de/s-bestandsliste.html?userId=XXXXXX

The XXXXX part is your userid.

Put it into the configfile as outlined above. Also specify an output directory. Then just execute kleingebaeck.

Inside the output directory you'll find a new subdirectory for each ad. Every directory contains a file Adlisting.txt, which will look somewhat like this:

Title: A book I sell
Price: 99 € VB
Id: 1919191919
Category: Sachbücher
Condition: Sehr Gut
Created: 10.12.2023

This is the description text.

Pay with paypal.

You can change the formatting using the template config variable. The supplied sample config contains the default template.

All images will be stored in the same directory.

Tool Behavior

There are a bunch of things you might want to know about the behavior of the kleingebäck tool:

  • all HTML pages and IMAGEs are always being downloaded
  • we use a (customizable) user agent
  • we respect HTTP cookies
  • in the case of an error, the tool does 3 retries, the time it waits between tries is longer for each retry
  • image download is parallized using small time differences to look more natural
  • same images are not being overwritten on subsequent download

The latter needs to be elaborated a bit more:

If you publish an ad on kleinanzeigen.de and post images, those images will be reduced in size by the site (by compressing and down sizing them). This reduced images will be downloaded by kleingebäck. However, you may still own the original images and may want to put them into that backup directory so that you have all things for one ad together.

You can easily do that, because kleingebäck won't overwrite those original images. It uses something called a distance hash using goimagehash. This algorithmus checks the similarity of images. If an image has been resized it is still very similar to the original one. We accept a maximum of a distance of 5, everything above leads to overwrite.

This works with resizes, cropped and otherwise manipulated images as long as the image still shows the original contents good enough.

Also note, that this is NOT a caching mechanism: the images will be downloaded anyway during each run. We also can't look at the file names because kleinanzeigen.de renames all images to numbers. And those might even change if the user re-arranges the images.

You can override this behavior using the --force option. Another option, --ignoreerrors, can be used to ignore all kinds of image errors.

Documentation

You can read the documentation online or locally once you have installed kleingebaeck with: kleingebaeck --manual.

Kleingebäck?

The name is derived from "kleinanzeigen backup": "klein" (german for small) and "back". In german "bäck" is spelled the same as the english "back" so "kleinbäck" was short enough, but it's not a valid german word. "Kleingebäck" however is: it means "Cookies" in english :)

Getting help

Although I'm happy to hear from kleingebaeck users in private email, that's the best way for me to forget to do something.

In order to report a bug, unexpected behavior, feature requests or to submit a patch, please open an issue on github: https://github.com/TLINDEN/kleingebaeck/issues.

Please repeat the failing command with debugging enabled -d and include the output in the issue.

Related projects

I could not find any projects specifically designed to backup kleinanzeigen.de ads, however there's a bot project which is also able to download ads: kleinanzeigen-bot. However, be aware that kleinanzeigen.de is actively fighting bots! Look at this issue. The problem with these kind of bots is, that they login into your account using your credentials. If the company is able to detect bot activity they can associate it easily with your account and lock you out. So be careful.

kleingebäck doesn't need to login, it just accesses public available web pages. Kleinanzeigen.de could hardly do anything against it, once because it is legal. There's no difference between a browser and a commandline client. Both run on the clientside and it is not kleinanzeigen.de's decision which software one uses to access their pages. And second: because you can use it to download any ads, not just yours. So it is not really clear if the activity is associated in any way with the ad owner. In addition to that comes the fact that kleingebäck is just a backup tool. It is not intendet to be used on a daily basis. You cannot use it to view regular ads or maintain your own ads. You'll need to use the mobile app or the browser page with a login. So, in my point of view, the risk is very minimal.

There is another Tool available named kleinanzeigen-enhanced. It is a complete Ad management system targeting primarily commercial users. You have to pay a monthly fee, perhaps there's also a free version available, but I haven't checked. The tool is implemented as a Chrome browser extension, which explains why it was possible to implement it without an API. It seems to be a nice solution for power users by the looks of it. And it includes backups.

Copyright and License

Licensed under the GNU GENERAL PUBLIC LICENSE version 3.

Author

T.v.Dein

kleingebaeck's People

Contributors

tlinden avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

kleingebaeck's Issues

rand.Seed() deprecated

linter:

Error: SA1019: rand.Seed has been deprecated since Go 1.20 and an alternative has been available since Go 1.0: As of Go 1.20 there is no reason to call Seed with a random value. Programs that call Seed with a known value to get a specific sequence of results should use New(NewSource(seed)) to obtain a local random generator. (staticcheck)

Error: issues found

[Closed] [Off-Topic] scraping blocking

Dear Tom,

for my offtopic problem related to kleinanzeigen, I have a little question about your scraping of kleinanzeigen.

I'm using the following https://github.com/vinc3PO/ebayKleinanzeigenAlert project to get an alert of new ads.

But since a few time, the python requests to kleinanzeigen get blocked. I wan't try to fix this Issue by set custom headers, but this isn't working.

Here is my example:

    def get_webpage(self) -> str:
        custom_header = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0",
            "Connection": "keep-alive",
            "Upgrade-Insecure-Requests": "1",
            "Sec-Fetch-Dest": "document",
            "Sec-Fetch-Mode": "navigate",
            "Sec-Fetch-Site": "same-origin",
            "Sec-Fetch-User": "?1",
            "Pragma": "no-cache",
            "Cache-Control": "no-cache"
        }
        
        response = requests.get("https://www.kleinanzeigen.de/s-bestandsliste.html?userId=13786652", headers=custom_header)
        print(response.text)
        print(response)

And I get the following error:

ebkAlert  | >> Starting Ebay alert
ebkAlert  | Processing link - id: 1 - link: https://www.ebay-kleinanzeigen.de/s-apple/k0 
ebkAlert  | test
ebkAlert  | {
ebkAlert  |  "title": "IP-Bereich vorübergehend gesperrt.",
ebkAlert  |  "message": "In deinem IP-Bereich kam es vor Kurzem mehrfach zu unsicheren Versuchen, unsere Plattform zu verwenden.  Dies kann auch durch andere Personen erfolgt sein. Daher wurde dieser IP-Bereich zur Vorbeugung von Betrug zeitweilig von der Nutzung von Kleinanzeigen ausgeschlossen. Bitte versuche es später erneut.",
ebkAlert  |  "more": "Mehr informationen: https://themen.kleinanzeigen.de/ip-eingeschraenkt/",
ebkAlert  |  "ref": "Ref#: 0.daa0d517.1705918554.53f9de6d  IP#: 217.237.125.225"
ebkAlert  | }
ebkAlert  | <Response [403]>
ebkAlert  | << Ebay alert finished

Now my actually question: Did you have seen such an error in the development of kleingebaeck and have eventually a soution for this?

And sorry for the SPAM in your issues, but I found only an expired mail in your website.

Regards

panic: runtime error: invalid memory address or nil pointer dereference on Windows10

Executing kleingebaeck on Windows leads to a panic:

C:\Temp>kleingebaeck-windows-amd64-0.0.6.exe -o c:\Temp\test https://www.kleinanzeigen.de/s-anzeige/sata-festplatte-western-digital-80-gb/2622583419-225-4545
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x0 pc=0xceb279]

goroutine 1 [running]:
main.Main()
        /home/scip/dev/kleingebaeck/main.go:113 +0x5d9
main.main()
        /home/scip/dev/kleingebaeck/main.go:48 +0x13

C:\Temp>kleingebaeck-windows-amd64-0.0.6.exe -o c:\Temp\test https://www.kleinanzeigen.de/s-anzeige/sata-festplatte-western-digital-80-gb/2622583419-225-4545 -d
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x0 pc=0xceb279]

goroutine 1 [running]:
main.Main()
        /home/scip/dev/kleingebaeck/main.go:113 +0x5d9
main.main()
        /home/scip/dev/kleingebaeck/main.go:48 +0x13

Same thing with -u:

C:\Temp>kleingebaeck-windows-amd64-0.0.6.exe -o c:\Temp\test -u ********
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x0 pc=0xceb279]

goroutine 1 [running]:
main.Main()
        /home/scip/dev/kleingebaeck/main.go:113 +0x5d9
main.main()
        /home/scip/dev/kleingebaeck/main.go:48 +0x13

[bug-report] crash with large ad download on FreeBSD

Command:

kleingebaeck -u 127291161 -o bak2 -c kleinanzeigen.conf -d

Config;

adnametemplate = "{{.Slug}}-{{.ID}}"
2024-02-09T09:55.10 CET DEBUG: got cookies? /home/scip/dev/kleingebaeck/fetch.go: 82
    program_info:
        go_version: go1.21.1
        pid: 41268
    cookies: []
panic: runtime error: slice bounds out of range [:-1]

goroutine 15813 [running]:
github.com/tlinden/yadu.(*Handler).appendAttr(0x86c13b960, 0x870180?, {{0x8c957b, 0x5}, {{}, 0x3, {0x839100, 0x86c87e080}}})
        /home/scip/go/pkg/mod/github.com/tlinden/[email protected]/handler.go:216 +0x617
github.com/tlinden/yadu.(*Handler).Handle.func1({{0x8c957b, 0x5}, {{}, 0x3, {0x839100, 0x86c87e080}}})
        /home/scip/go/pkg/mod/github.com/tlinden/[email protected]/handler.go:81 +0x105
log/slog.Record.Attrs({{0xc169c22b878e58e3, 0xed93a073c, 0xc62740}, {0x8cddf3, 0xb}, 0xfffffffffffffffc, 0x8026e5, {{{0x8c957b, 0x5}, {{...}, ...}}, ...}, ...}, ...)
        /usr/lib/go-1.21/src/log/slog/record.go:84 +0xc4
github.com/tlinden/yadu.(*Handler).Handle(_, {_, _}, {{0xc169c22b878e58e3, 0xed93a073c, 0xc62740}, {0x8cddf3, 0xb}, 0xfffffffffffffffc, 0x8026e5, ...})
        /home/scip/go/pkg/mod/github.com/tlinden/[email protected]/handler.go:77 +0x1e5
log/slog.(*Logger).log(0x86c198550, {0x9845d8, 0xc91b60}, 0xfffffffffffffffc, {0x8cddf3, 0xb}, {0x86c490f08, 0x6, 0x6})
        /usr/lib/go-1.21/src/log/slog/logger.go:225 +0x1e4
log/slog.Debug({0x8cddf3?, 0x877363258?}, {0x86c490f08?, 0xc91b60?, 0x86c1fa000?})
        /usr/lib/go-1.21/src/log/slog/logger.go:250 +0x96
main.ScrapeImages.func1()
        /home/scip/dev/kleingebaeck/scrape.go:203 +0x425
golang.org/x/sync/errgroup.(*Group).Go.func1()
        /home/scip/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1
        /home/scip/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:72 +0x96

[feature-request] Provide ARM binaries

Description

Kleingebaeck is currently only available for amd64 architecture. I would like to run kleingebaeck on an ARM based server (Debian) without compiling the code myself 😀
If you could add binaries for ARM (64bit) architecture, this would be very helpful.

Thanks for kleingebaeck 🥳

-u overwrites directories even if created during current loop

Ad directories may already exist but for different reasons:

  1. the user created them (for whatever reason): overwrite
  2. a previous run of kleingebäck created it: overwrite
  3. a previous iteration of the current process created it: handle it

The reason is simple: the ad in question has a duplicate name as another ad but may have different contents (still, for whatever reasons, see #70).

So we need to handle it somehow and in way so that the next run does the same:

Possiblities are:

  1. warn and overwrite
  2. warn and skip
  3. ask the user for a decision
  4. add some kind of version like dir-v2

Variant 1 would be the easiest, giving the user a chance to rename the ad on the site.

Variant 2 could lead to loss of backup. Hmmm...

Variant 3 could be impossible if running unattended - not good

Variant 4 could lead to mess if ad 1 gets deleted. Then during the next run ad 2 would not be a duplicate and no -v2 would be appended so it would overwrite ad 1, which would then be lost.

So I thing, I'll go with the warning and maybe add a parameter so the user options.

use on windows machine?

Hello,
can I use this also on a windows maschine? I installed "make" and also perl with pod2text but I get

Das System kann den angegebenen Pfad nicht finden.
#CGO_LDFLAGS='-static' go build -tags osusergo,netgo -ldflags "-extldflags=-static" -o kleingebaeck
Der Befehl "#CGO_LDFLAGS" ist entweder falsch geschrieben oder
konnte nicht gefunden werden.
make: *** [Makefile:44: buildlocal] Error 1```

Cannot parse HCL Config file on Windows

HCL doesn't accept windows path spec:

C:\Temp>kleingebaeck-windows-amd64-0.0.6.exe -c config.hcl
←[91mERR←[0m Failure ←[2merror=←[0m"config.hcl:3,13-15: Invalid escape sequence; The symbol \"T\" is not a valid escape sequence selector., and 1 other diagnostic(s)"

C:\Temp>type config.hcl
user = 39679966
verbose = true
outdir = 'C:\Temp\Test'
template = "Title: {{.Title}}\r\nPrice: {{.Price}}\r\nId: {{.Id}}\r\nCategory: {{.Category}}\r\nCondition: {{.Condition}}\r\nCreated: {{.Created}}\r\n\r\n{{.Text}}\r\n"
C:\Temp>type config.hcl

Improvents ideas

Firstly I would say thanks for your cool project.

I have a few ideas, which could improve this projects for me (and others ?).

Last week i tried to build a docker container with the main branch and it was always failing with an error by compiling the package.

Now my question: Is the main branch a stable branch?

Today I've tried it again and now it works.

From golang:1.21.5
WORKDIR /root/
RUN apt update -y
RUN apt-get install -y git make
RUN git clone https://github.com/TLINDEN/kleingebaeck --branch main
#RUN git clone https://github.com/TLINDEN/kleingebaeck --branch v0.1.0 # fix for last week
WORKDIR	 /root/kleingebaeck
RUN make
RUN make install
version: "3.9"
services:

  kleinanzeigen-backup:
    container_name: kleinanzeigen-backup
    volumes:
      - ./config:/config
      - /mnt/ubuntu-nfs/backup/kleinanzeigen:/backup
    command: /bin/bash -c '
      kleingebaeck -u x -d && echo "success ad"  ; # telegram?
      kleingebaeck -u x -d && echo "success wd" ;
      kleingebaeck -u x -d && echo "success jb" '

    working_dir: /backup
    image: kleingebaeck:latest
    build: .
  1. In the new version v0.1.1 the ads are saved with the id.

Unfortunately this isn't working in my environment, because I use "hochschieben" and the add gets a new ID and I get duplicates.
I already have read you limitations , but could you please add a tag to exclude the id in the backup folder for the ad?

  1. Because I'm using "hochschieben" on the other way, I have sometime corrupted pictures, which doesn't are downloadable and provide a 200 success code.
    If the scraper get such a picture, it fails with the following error:

folgt

Unfortunately I could't find any skip error options, could you please ad another flag for this?

Behave more realistic

Add more headers, use cookies, possibly make some aspects configurable (like UserAgent) or even random.

See discussion in: #48

New flag for picture overwrite

Hello Tom,

like mentioned in #30 (comment) it would be really nice, if there is an option to skip the picture download for already downloaded items.

But this feature has a very low priority...

Regards

689 ads but only 524 folders in output dir?

Description
First let me say thank you very much (!) for this amazing app!!!

I did a first test and it worked - I have many folders in my output dir but what I don't understand is the difference from the script output

> .\kleingebaeck-windows-amd64-0.3.3.exe -u 127291161 -o "Backup"
Successfully downloaded 689 ads with 4154 images to Backup

689 is correct - this is also shown on the kleinanzeigen site

image

But I only have 524 folders in my Backup folder - shouldn't it also be 689 folders?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.