nasa-jpl-memex / geoparser Goto Github PK
View Code? Open in Web Editor NEWExtract and Visualize location from any file
License: Apache License 2.0
Extract and Visualize location from any file
License: Apache License 2.0
Using Solr for storing and retrieving data from.
One collection called "Uploaded_Files" will store all geoparsed uploaded files.
Schema could be something close to:
file_name = <file_name>
extracted_text =
location_names = [list of locations name]
lat/lon = [list of location dictionary(key: location name, value: lat/long)]
**This schema may change as we learn more about Solr Spatial query
We have decided to replace the current CherryPy server with Girder (it is using CherryPy as server by itself)
As we are replacing current CherryPy serve to Girder, we need to define a REST URL to POST uploaded file to Girder Filesystem
@smadha I have update the "return_points" url to work with both uploaded files and crawled data.
Now you can call return_points to get point from both.
However, you need to update the code you have now that called return_points, here is the update:
For uploaded files: http://localhost:8000/return_points/<file_name>/uploaded_files
Example: http://localhost:8000/return_points/5c0024-25.pdf/uploaded_files
For crawled data: http://localhost:8000/return_points/<solr_url>/<core_name>
Example: http://localhost:8000/return_points/http://crawl.dyndns.org/solr/domain
Please update both.
Each file in menu will show results underneath, if the result is long it goes off screen.
Each file/result should be collapsable as well as be able to scroll up and down
Below APIs need to be mocked
[API signature], [Method],
[Sample Response]
https://drive.google.com/open?id=1ASR0j0lzT8GqifZ0ep6WMBV9SaAOENPHUIqUrrR7dbo
Setup cherrypy to server static files and template HTML file.
An input box on top left for file upload.
simplify text like we do for indexed data before indexing file text
Usually crawled data are being indexed to either Solr or Elasticsearch.
GeoParser should be able to get the URL to either of these to indexing machines and domain name, scan whole indexes and geoparse them.
The result (location name and point) will be stored in Solr internally along side with path to crawled data.
Delete button in front of each uploaded file should remove file from server and Solr.
Some text return from Tika has some characters that cannot be indexed to Solr.
Need to show more data on popups to analyze individual bubble.
-Add a black bar with heading on top of page
-Use Bootstrap icons and create a navigation switch
-Remove temp images used
As of now index.html is called once and stored as template object. To reflect any change in index.html we had to restart server.
Progress bar should show the status of file being uploaded as well as file being geoparsed with status text underneath it.
We are decided to use CherryPy for server instead of Flask.
CherryPy is more stable and also as we are going to join with other Memex geo projects we should be using same technologies they use.
Server should start the Geoparsing process as soon as file/s being uploaded to server and send the status back to front-end.
For example 5989-9131EN.pdf returns many location twice thrice
Quick
Quick
Quick
Data
Data
U.S.
Start solr,l-g-g,tika geotopic server through manage.py
@MBoustani @chrismattmann any thoughts on this?
Uploaded files will appear as soon as they get uploaded, but if page get refreshed they will not be shown any more.
Solution:
Have server look up in uploaded folder and send list of files to front-end.
After the Front-end send files to Server, Girder should create Collection and Folder if they are not already created.
I am using this page:
http://tools.cherrypy.org/wiki/DirectToDiskFileUpload
to setup upload file capability.
Adding exposed = True as per https://github.com/Kitware/minerva/blob/master/server/loader.py
need to fix the issue with not being able to upload files using Django
We have already made collection and a folder called "uploaded_files" within collection.
As user upload a file, client can create and item with in "uploaded_files" using Girder REST API.
@smadha please review Grider docs REST API to see how you can create item with in folder.
Geoparser plugin for girder can have multiple jobs running using Girder.
Each job can be called using REST URL and will return the results as JSON.
When the GeoParser app loads, have Girder to login to be able to use Girder.
Username: girder
Password: girder
Base 64 encode: Z2lyZGVyOmdpcmRlcg==
@lawongsta Any ideas?
We need to hide the last side menu when the points are in map to be able to view map in full screen.
Now the menu covers 1/4 of screen.
GeoParser at this stage is using two servers as backend.
1- CherryPy as web server
2- Grider for file system and running jobs
Beside running two servers at same time could increase the chance of application failure there is problem if calling Girder on different port by CherryPy in different port (cross-domain issue), therefore these two can be merged to one and have Girder take care of everything.
As we are indexing millions of records I can see lot of issues with solr.
We initially did lot of handholding editing data types and using encoding but it's now constantly failing with OutOfMemoryError. I have tried increasing mem upto 1.5 gigs but it only gives us some extra time.
At this moment I am trying to create a new schema which addresses below limitations
I am planning below -
I am right now doing it only for indexed data on a new branch. As uploaded files thing works with no issues.
Fetching all the coordinates from solr and transforming them to csv is taking lot of time.
This can be done asynchronously to save time of main thread.
After user types domain name and indexed URL put the result under save button that user can come back and on/off point on map.
Pleas change the map engine from GeoJS to OpenLayer.
In CherryPy there is a configuration for statics file such as CSS, JS and imgs.
After each file uploaded, name of file will appear under "upload file" section.
@smadha Can you please put a remove icon by each file or check with @lawongsta about how to remove a file and maybe send a request to server to remove a file?
@smadha should we use REST URL to send the remove command to server for each file?
@smadha Please update the menu as below:
Make one single file for JS and CSS plugin.
Minimize both app and plugin JS and CSS.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.