Giter Club home page Giter Club logo

paschok / diploma Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 143.14 MB

Bachelor Thesis: Classsification of Advertisements by means of Supervised Learning Methods

Python 55.43% C++ 0.25% C 1.13% Objective-C 0.04% XSLT 0.21% Roff 0.02% HTML 0.01% Shell 0.01% GAP 0.02% PowerShell 0.01% Batchfile 0.01% Fortran 0.01% Smarty 0.01% Jupyter Notebook 42.87%
machine-learning nlp nlp-machine-learning amazon spacy-nlp nmf nmf-decomposition scrapy-spider scrapy-crawler diploma-project

diploma's Introduction

Diploma

My bachelor work in Hochschule Merseburg written in Python, using Native Language Processing of ML

My bachelor thesis is: ***Classification of advertisements by means of supervised learning methods ***

Work process:

  • Learn about NLP
  • Scrap data
  • Try NLTK / spacy on datasets
  • Learn more about hclustering algorithms / Neural networks / Other NLP methods like Topic Modelling, W2W and so on
  • Code the Diploma
  • Write a Diploma itself = Thesis

My bachelor has two major branches:

  1. Data
    • Scrapping data from web using scapy, google useragent or proxies. I used to scrap amazon with proxie, but because of lagging and switching off decided to use useragent and time.sleep()
  2. ML
    • Code implemenation

Commits

One of the 2 branches above: subproject: message. Not including README.md.

Example:

Data: amazon: added new spider

README.md: update

Data comes from these websites:

  • obszone
    • had problems downloading american products for sale, so had to use a litle trick with url
  • geebo
  • adlandpro
  • pennysaverusa
  • hoobly
  • oodle
  • gumtree
  • letgo
  • salespider
  • ebay
  • amazon

Amazon data issues:

When entering departments on amazon you can scrap either 400 pages of common products of said department, or go into Feature Categories and scrap precise products.
For instance: 400 pages of automotive department OR Car care, car electronics and so on.

diploma's People

Contributors

paschok avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.