Giter Club home page Giter Club logo

flatmap's Introduction

flatmap

This is my self-educational project. Not to be confused with flatMap. I took the idea from the BerlinHousing repository. My project is going to be somewhat similar once stabilized, currently it is still in progress. The ultimate goal is to utilize an interactive map to show properties, mainly apartments, some useful information about them (address, area, price etc) and the nearby facilities.

Why Java?

First, I'm learning it and trying out different things from the language ecosystem. Second, I generally like it :)

What's used

Currently, the following great stuff helps me to advance with my project:

  1. Jsoup as the base for the parser implementations
  2. PostgreSQL and its JDBC driver to save results to the database
  3. HikariCP as the database connection pool implementation
  4. Dotenv to be able to change application parameters between launches
  5. JCommander to parse command line arguments
  6. Greenrobot EventBus to dispatch and handle events
  7. Cucumber + JUnit to write and execute tests
  8. WireMock to be able to run integration tests without connecting to the real websites
  9. SLF4J backed by Logback for logging

flatmap's People

Contributors

franzose avatar

Watchers

 avatar

flatmap's Issues

Reorganize code

  1. Create a new package named com.janiwanow.flatmap.internal and move there the following packages:

    1. com.janiwanow.flatmap.cli (along with renaming cli to console)
    2. com.janiwanow.flatmap.db
    3. com.janiwanow.flatmap.event
    4. com.janiwanow.flatmap.http
    5. com.janiwanow.flatmap.parser (except cli and impl sub-packages)
    6. com.janiwanow.flatmap.util
  2. Create a new package named com.janiwanow.flatmap.console and move there the following commands:

    1. com.janiwanow.flatmap.db.cli.PurgeDatabaseCommand
    2. com.janiwanow.flatmap.db.cli.SetupDatabaseCommand
    3. com.janiwanow.flatmap.db.cli.TruncateDatabaseCommand
    4. com.janiwanow.flatmap.offer.cli.CheckRelevanceCommand
    5. com.janiwanow.flatmap.parser.cli.ParseWebsitesCommand
  3. Create a new package named com.janiwanow.flatmap.realty. Each sub-package must represent a parsed website (e.g. com.janiwanow.flatmap.realty.n1 stands for https://n1.ru, com.janiwanow.flatmap.realty.sakhcom stands for https://sakh.com etc). Move there content of the following packages:

    1. com.janiwanow.flatmap.data
    2. com.janiwanow.flatmap.parser.impl
    3. com.janiwanow.flatmap.offer

Check for obsolete property offers

In the future, it's necessary to be able to maintain freshness of the data put on the interactive map. Currently there's no mechanism to distinguish active offers from the obsolete ones. Empty pages or pages returning 404 error are just ignored. Also they may contain some text like "this offer is obsolete" or so.

It's necessary to implement a separate command which would check the existing properties kept in the database for freshness and mark them obsolete if they were.

Incorrect resource file paths

After building an executable jar and trying to run the application I got the following exception:

[main] INFO com.janiwanow.flatmap.util.ResourceFile - Try reading a resource file at path "check_required_tables.sql"...
Exception in thread "main" java.lang.ExceptionInInitializerError
        at com.janiwanow.flatmap.EntryPoint.main(EntryPoint.java:27)
Caused by: java.lang.IllegalStateException: Could not read SQL query from file.
        at com.janiwanow.flatmap.db.cli.SetupDatabaseCommand.<clinit>(SetupDatabaseCommand.java:26)
        ... 1 more
Caused by: java.io.FileNotFoundException: Resource file at path "check_required_tables.sql" was not found.
        at com.janiwanow.flatmap.util.ResourceFile.readToString(ResourceFile.java:34)
        at com.janiwanow.flatmap.db.cli.SetupDatabaseCommand.<clinit>(SetupDatabaseCommand.java:24)
        ... 1 more

Possible solution
Change the ResourceFile helper so that it uses the provided class loader instead of its own.

Write README file

Currently there is no README file in the project root directory. It should be written and contain a basic project information, installation and setup steps, and so on.

AddressExtractor could not parse the address from the page

The AddressExtractor could not parse the address from the page. It seems that this commit introduced the bug: aca7e23

java.lang.NullPointerException: null
	at com.janiwanow.flatmap.parser.impl.sakhcom.AddressExtractor.extract(AddressExtractor.java:24)
	at com.janiwanow.flatmap.parser.PropertyDetailsExtractor.extract(PropertyDetailsExtractor.java:74)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1605)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
	at com.janiwanow.flatmap.parser.PropertyDetailsExtractor.extract(PropertyDetailsExtractor.java:104)
	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1776)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1763)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1016)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1665)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1598)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)

RoomsExtractor were unable to parse rooms from page

Context
Noticed the following exception:

INFO com.janiwanow.flatmap.parser.PropertyDetailsExtractor - Could not extract property details from https://novosibirsk.n1.ru/view/33070687/
java.lang.NumberFormatException: For input string: ""
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68)
	at java.base/java.lang.Integer.parseInt(Integer.java:668)
	at java.base/java.lang.Integer.parseInt(Integer.java:776)
	at com.janiwanow.flatmap.util.Numbers.parseInt(Numbers.java:18)
	at com.janiwanow.flatmap.parser.impl.n1.RoomsExtractor.extract(RoomsExtractor.java:21)
	at com.janiwanow.flatmap.parser.impl.n1.SpaceExtractor.extract(SpaceExtractor.java:43)
	at com.janiwanow.flatmap.parser.impl.n1.SpaceExtractor.extract(SpaceExtractor.java:22)
	at com.janiwanow.flatmap.parser.PropertyDetailsExtractor.extract(PropertyDetailsExtractor.java:72)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1605)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
	at com.janiwanow.flatmap.parser.PropertyDetailsExtractor.extract(PropertyDetailsExtractor.java:110)
	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1776)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1763)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1016)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1665)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1598)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)

Solution

  1. Check all property types for location of the element containing rooms number
  2. Add checks to the com.janiwanow.flatmap.parser.impl.n1.RoomsExtractor.

Make SpaceExtractor more reliable

The .card-living-content-params-list__name selector must be used instead of .offer-card-factoids .text to find the desired values.

Check constraints are not handled by the com.janiwanow.flatmap.db.cli.PropertyDetailsListener

Context
The property database table has the following check constraints:

total_area NUMERIC (4, 2) NOT NULL CHECK (total_area > 0.0),
living_space NUMERIC (4, 2) NOT NULL CHECK (living_space > 0.0),
kitchen_area NUMERIC (4, 2) NOT NULL CHECK (kitchen_area > 0.0),
rooms SMALLINT NOT NULL CHECK (rooms > 0),
price_amount NUMERIC (20, 2) NOT NULL CHECK (price_amount > 0.0),

PropertyDetailsListener does not handle cases when some of the above parameters may be less than or equal zero. This leads to exceptions like this one:

Exception in thread "main" org.postgresql.util.PSQLException: ERROR: new row for relation "property" violates check constraint "property_total_area_check"
  Подробности: Failing row contains (3c05a871-9b8f-4dad-ae0a-86cba1db383a, https://examplsdasdfe.com, adlkajlkjs, 0.00, 1.00, 1.00, 1, 1000000.00, RUB, null, null).
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2510)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2245)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:311)
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:447)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:368)
	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:159)
	at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:125)
	at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java)
	at com.janiwanow.flatmap.db.cli.PropertyDetailsListener.save(PropertyDetailsListener.java:65)
	at com.janiwanow.flatmap.db.cli.PropertyDetailsListener.main(PropertyDetailsListener.java:32)

Solution
I suppose we could simply bypass such instances of PropertyDetails since, in common, zeros mean there were some issue during parsing property details, i.e. parser could not parse desired figures.

Make parsers be aware of the HTTP connection options

Context
By the current design, any instance of WebsiteParser is given a pre-arranged HttpConnection instance. However, a parser may need to customise connection options.

Take SakhcomParser as an example. It cannot use the given HTTP connection as is because it must set a cookie to fetch URLs from a particular city. For that purpose it uses a separate HttpConnectionBuilder instance which is passed via constructor. If you look at EntryPoint, you'll see that it's not the same instance which is passed to ParseWebsiteCommand.

This may introduce bugs if the end user decides to use custom HTTP connection options to run the parse command. The first HTTP connection, i.e. that one passed to the console command, will receive those options but the second, i.e. passed to SakhcomParser, will not.

Solution
I see two possible solutions for this issue:

  1. Expose HTTP connection options by adding a new com.janiwanow.flatmap.http.HttpConnectionOptions class and the com.janiwanow.flatmap.http.HttpConnection::getOptions() method.
  2. Instead, add com.janiwanow.flatmap.http.HttpConnection::newBuilder() which is supposed to return a new com.janiwanow.flatmap.http.HttpConnectionBuilder on each call and pass the current connection options to that builder.

The latter approach seems simpler to me because:

  1. We don't need to write a lot of stuff
  2. In this case, we can remove HttpConnectionBuilder from the constructor of SakhcomParser (and we won't need to add it to other parsers in the future)

One thing I'm a bit concerned about is that HttpConnection becomes aware that it can be built somehow. On the other hand, it's not a big deal since it still does not know how exactly it can be built.

org.postgresql.util.PSQLException: ERROR: numeric field overflow

It seems that one of the columns (property.total_area, property.living_space, property.kitchen_area) received a value exceeding the limit. Precision must be changed for those columns.

  Подробности: A field with precision 4, scale 2 must round to an absolute value less than 10^2.
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2510)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2245)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:311)
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:447)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:368)
	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:159)
	at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:125)
	at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java)
	at com.janiwanow.flatmap.db.cli.PropertyDetailsListener.save(PropertyDetailsListener.java:65)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
	at org.greenrobot.eventbus.EventBus.invokeSubscriber(EventBus.java:510)
	at org.greenrobot.eventbus.EventBus.postToSubscription(EventBus.java:433)
	at org.greenrobot.eventbus.EventBus.postSingleEventForEventType(EventBus.java:414)
	at org.greenrobot.eventbus.EventBus.postSingleEvent(EventBus.java:387)
	at org.greenrobot.eventbus.EventBus.post(EventBus.java:268)
	at com.janiwanow.flatmap.event.GreenRobotEventDispatcher.dispatch(GreenRobotEventDispatcher.java:23)
	at com.janiwanow.flatmap.parser.cli.ParseWebsitesCommand.dispatchEvent(ParseWebsitesCommand.java:101)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
	at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
	at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
	at com.janiwanow.flatmap.parser.cli.ParseWebsitesCommand.execute(ParseWebsitesCommand.java:70)
	at com.janiwanow.flatmap.cli.Application.run(Application.java:47)
	at com.janiwanow.flatmap.EntryPoint.main(EntryPoint.java:33)

Unable to run tests

Context
Test suite fails to run due to an NPE:

java.lang.ExceptionInInitializerError
	at com.janiwanow.flatmap.CucumberEventListener.<clinit>(CucumberEventListener.java:12)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at io.cucumber.core.plugin.PluginFactory.newInstance(PluginFactory.java:81)
	at io.cucumber.core.plugin.PluginFactory.instantiate(PluginFactory.java:71)
	at io.cucumber.core.plugin.PluginFactory.create(PluginFactory.java:55)
	at io.cucumber.core.plugin.Plugins.createPlugins(Plugins.java:48)
	at io.cucumber.core.plugin.Plugins.<init>(Plugins.java:25)
	at io.cucumber.junit.Cucumber.<init>(Cucumber.java:159)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at org.junit.internal.builders.AnnotatedBuilder.buildRunner(AnnotatedBuilder.java:104)
	at org.junit.vintage.engine.discovery.DefensiveAllDefaultPossibilitiesBuilder$DefensiveAnnotatedBuilder.buildRunner(DefensiveAllDefaultPossibilitiesBuilder.java:114)
	at org.junit.internal.builders.AnnotatedBuilder.runnerForClass(AnnotatedBuilder.java:86)
	at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:70)
	at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:37)
	at org.junit.vintage.engine.discovery.DefensiveAllDefaultPossibilitiesBuilder.runnerForClass(DefensiveAllDefaultPossibilitiesBuilder.java:57)
	at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:70)
	at org.junit.vintage.engine.discovery.ClassSelectorResolver.resolveTestClass(ClassSelectorResolver.java:66)
	at org.junit.vintage.engine.discovery.ClassSelectorResolver.resolve(ClassSelectorResolver.java:47)
	at org.junit.platform.engine.support.discovery.EngineDiscoveryRequestResolution.lambda$resolve$2(EngineDiscoveryRequestResolution.java:134)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1631)
	at java.base/java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:127)
	at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:502)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:488)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:150)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:543)
	at org.junit.platform.engine.support.discovery.EngineDiscoveryRequestResolution.resolve(EngineDiscoveryRequestResolution.java:185)
	at org.junit.platform.engine.support.discovery.EngineDiscoveryRequestResolution.resolve(EngineDiscoveryRequestResolution.java:125)
	at org.junit.platform.engine.support.discovery.EngineDiscoveryRequestResolution.resolveCompletely(EngineDiscoveryRequestResolution.java:91)
	at org.junit.platform.engine.support.discovery.EngineDiscoveryRequestResolution.run(EngineDiscoveryRequestResolution.java:82)
	at org.junit.platform.engine.support.discovery.EngineDiscoveryRequestResolver.resolve(EngineDiscoveryRequestResolver.java:113)
	at org.junit.vintage.engine.discovery.VintageDiscoverer.discover(VintageDiscoverer.java:44)
	at org.junit.vintage.engine.VintageTestEngine.discover(VintageTestEngine.java:63)
	at org.junit.platform.launcher.core.DefaultLauncher.discoverEngineRoot(DefaultLauncher.java:168)
	at org.junit.platform.launcher.core.DefaultLauncher.discoverRoot(DefaultLauncher.java:155)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:128)
	at org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor.processAllTestClasses(JUnitPlatformTestClassProcessor.java:102)
	at org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor.access$000(JUnitPlatformTestClassProcessor.java:82)
	at org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor.stop(JUnitPlatformTestClassProcessor.java:78)
	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.stop(SuiteTestClassProcessor.java:61)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
	at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
	at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
	at com.sun.proxy.$Proxy2.stop(Unknown Source)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.stop(TestWorker.java:132)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
	at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.NullPointerException
	at java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
	at java.base/java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
	at java.base/java.util.Properties.put(Properties.java:1337)
	at java.base/java.util.Properties.setProperty(Properties.java:225)
	at com.janiwanow.flatmap.db.TestConnectionFactory.<clinit>(TestConnectionFactory.java:21)
	... 73 more

com.janiwanow.flatmap.CucumberTestRunner > initializationError FAILED
    java.lang.ExceptionInInitializerError
        Caused by: java.lang.NullPointerException

This happens if the TEST_DB_* variables are missing.

Solution
Fail-fast by using the existing Env utility.

Randomise user agent

Currently, a default user agent provided by Jsoup is used. I suppose it's better to define a set of user agents and use them randomly on each HTTP request.

Delay between HTTP requests

In order to avoid a spontaneous excessive loads to the websites that we parse, a delay between requests must be implemented. The real load may be not so huge but we must be polite in advance.

Delay must be configurable and support randomness.

Parsing always starts from the first page of the offer list

Context
Currently parsing always starts from the first page of the offer list. It would be nice to have an option to begin from an arbitrary page.

Solution

  1. Add "--start-from" parameter to the "parse" command, pass it to parsers
  2. Modify parsers so that they respect this parameter and start parsing from the given page

ERROR: duplicate key value violates unique constraint "property_offer_url_key"

Issue
Sometimes the following exception may occur:

Exception in thread "main" org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "property_offer_url_key"
  Подробности: Key (offer_url)=(https://example.com) already exists.
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2510)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2245)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:311)
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:447)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:368)
	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:159)
	at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:125)
	at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java)
	at com.janiwanow.flatmap.db.cli.PropertyDetailsListener.save(PropertyDetailsListener.java:65)
	at com.janiwanow.flatmap.db.cli.PropertyDetailsListener.main(PropertyDetailsListener.java:32)

Context
This exception may occur if a property were parsed in the past and now we try to add its details again into the property database table. Currently, there are no checks for duplicate keys.

Solution
To introduce a check and update the property details if it already exists in the database.

  1. https://www.postgresql.org/docs/11/sql-insert.html (ON CONFLICT section)
  2. https://stackoverflow.com/questions/1109061/insert-on-duplicate-update-in-postgresql
  3. https://medium.com/@jonashavers/how-to-execute-an-upsert-with-postgresql-e1b965ebb6ea

RoomsExtractor could not find the desired figure on the page

It seems that the RoomsExtractor could not find the desired figure on the page.

Exception:

java.lang.NumberFormatException: For input string: ""
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68)
	at java.base/java.lang.Integer.parseInt(Integer.java:668)
	at java.base/java.lang.Integer.parseInt(Integer.java:776)
	at com.janiwanow.flatmap.util.Numbers.parseInt(Numbers.java:18)
	at com.janiwanow.flatmap.parser.impl.sakhcom.RoomsExtractor.extract(RoomsExtractor.java:21)
	at com.janiwanow.flatmap.parser.impl.sakhcom.SpaceExtractor.extract(SpaceExtractor.java:57)
	at com.janiwanow.flatmap.parser.impl.sakhcom.SpaceExtractor.extract(SpaceExtractor.java:24)
	at com.janiwanow.flatmap.parser.PropertyDetailsExtractor.extract(PropertyDetailsExtractor.java:75)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1605)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
	at com.janiwanow.flatmap.parser.PropertyDetailsExtractor.extract(PropertyDetailsExtractor.java:104)
	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1776)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1763)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1016)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1665)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1598)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)

Solution

  1. Check offer pages of all types of properties parsed by the SakhcomParser and query for the appropriate selectors.
  2. Ensure that RoomsExtractor would return 1 if the figure was not found on the page.

Handle area (total, living, kitchen) correctly

Context
There may be several variations of area from page to page:

  1. No area information
  2. Total area only: Площадь: 177 м²
  3. Incomplete (total, living): Площадь: 377 м² (жилая: 377 м²)
  4. Full information (total, living, kitchen): Площадь: 100 м² (жилая: 75 м², кухня: 18 м²)

Currently, if an area “component” isn't found, it will be set to zero. But zero is an invalid value from the database table point of view since there are some check constraints set up.

Solution

  1. Skip the offer completely if it has no area information
  2. If living space (“жилая”) is absent, set it equal to the total area
  3. Allow zero values for the kitchen area by altering check constraint

Parsers autodiscovery

Context
Currently, each parser must be registered manually in EntryPoint. I don't know whether I'm going to add more parsers or not. But in any case, it would be handy to avoid manual registration each time a new parser were rolled out.

Solution
Implement an autodiscovery, most likely by using Reflection API. Maybe use https://github.com/ronmamo/reflections or even an IoC container: Guice, Spring.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.