Giter Club home page Giter Club logo

crawler's Introduction

#D11Hack Twitter URL

Problem Statement

We have a website called string-factory.com. The website has a page that looks like this —

A typical page contains two types of content —

  1. Links to other similar pages, it is marked with a red colored anchor tags.
  2. A set of strings in white.

Our goal is to find the lexicographically smallest string on the website.

To achieve this we should first implement a crawler inside crawler.js. The crawler opens the website's home page ie — http://localhost:8080, extracts the strings that are displayed in white color and moves on to the next set of pages using the links that are displayed in red. This is done till the time all pages have not been parsed. Once all the the pages have been parsed and the strings in white have been extracted from all of them, we need to find the lexicographically smallest string amongst them. That's all!

For Example: Lets say there are just three pages on the website viz. a0, a1 and a2.

  • From a0 you can go to both a1 and a2.

  • From a1 you can only goto a2

  • From a2 you can go back to a0 & a1

In the above case I first load page a0 I extract all the strings in white from that page,

{
  strings: [
+ 'rga',
+ 'abd',
+ 'ebd',
+ 'bde'
  ]
}

Next I should go to a1 and then extract all the strings from there —

{
  strings: [
  'rga',
  'abd',
  'ebd',
  'bde',
+ 'wwd',
+ 'asw',
+ 'mjl'
  ]
}

Finally I should go to a2, extract all the strings and add it to my list and stop.

{
  strings: [
  'rga',
  'abd',
  'ebd',
  'bde',
  'wwd',
  'asw',
  'mjl',
+ 'bcc',
+ 'foo',
+ 'aax'
  ]
}

Here we can do a simple comparison between all the strings and find that aax is the lexicographically smallest string. So the answer here would be aax.

Getting Started

The project is built using express and pug. To run it locally follow these steps.

  1. Install node >= 8
  2. Clone the project git clone [email protected]:dream11/crawler.git
  3. Install npm dependencies npm i
  4. Start the server npm start
  5. Visit http://localhost:8080 in the browser.

The project consists of a mock webserver that serves string-factory.com. The server is throttles responses and sends a 429 response in case of overload, this is to mimic real world latency.

crawler.js is where you should start writing the logic. You can also test your code using npm test. The test cases runs and asserts if your logic works. It will also log the run time on the console for benchmarking.

The team would be glad to assist you in setting up the project on your computer. Feel free to stop by and ask questions.

Judging Criteria

Once your tests pass you should submit a pull request with your solution. Each pull request will be executed on travis to compute the run time of your logic. If a pull request is found that uses any form of plagiarism or uses hard coded results, you will be disqualified. In the end, if your solution has the least run time, you win!

Solution

Like seriously??

FAQ

  • What does lexicographically smallest string mean? The lexicographically smallest string would be the first string in an english dictionary where all the words are sorted in alphabetical order. Eg: a < b or aa < ab.

  • Can we libraries? Sure go ahead!

Please don't hesitate to get in touch with the team or shoot us a query directly on @d11engg.

  • Can we use libraries?

crawler's People

Contributors

tusharmath avatar znck avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.