Giter Club home page Giter Club logo

Comments (13)

ctram avatar ctram commented on June 5, 2024 1

@ychoy @JMStudiosJoe

The scraper is working and is able to write to a file within the project. The flow of the scraper is such:

  1. Make POST to http://sanjoseca.gov/Facilities/Facility/Search, with the complete query being something like http://sanjoseca.gov/Facilities/Facility/Search?featureIDs=&categoryIDs=15&occupants=null&keywords=&pageSize=100&pageNumber=1&sortBy=3&currentLatitude=null&currentLongitude=null&isReservableOnly=false. This returns HTML listing artworks.
  2. Have scraper follow the link to each individual artwork.
  3. Have scraper set data to certain categories such as "artist", "description"...
  4. Iterate through the data and clean it up, formatting.
  5. Save to file.

The cleanup is not so straightforward because the HTML is inconsistent from individual page to page.

<div class="editorContent">
  <font class="Subhead1">1737 Trees</font>

  <font class="Subhead2">
     Artist: Angela Buenning Filo<br>
     <font class="Normal">2006</font><br>
  </font>
</div>
<div class="editorContent">
    <div class="Normal" style="text-align: left;">
        <font class="Subhead1">
            8 Minutes<em><br></em>
            <font class="Subhead2">
                  Artists: Merge Conceptual Design (Franka Diehnelt and Claudia Reisenberger) 
            </font><br>
       </font>
       2013
</div>

I'll think some more on how to grab this data without too much hassle.

Btw, how were the geolocation computed in art.js ?

For the data being scraped, is the idea to use the postal address to determine the lat and long coordinates?

from heartofthevalley.

ychoy avatar ychoy commented on June 5, 2024

Got 60 records from the spreadsheet, researched and found additional information from the SJC website, and added it the spreadsheet.

from heartofthevalley.

ychoy avatar ychoy commented on June 5, 2024

@amygcho will work on scraping and parsing data about City sponsored public art from http://sanjoseca.gov/facilities, and adding data to art.js

from heartofthevalley.

ctram avatar ctram commented on June 5, 2024

@JMStudiosJoe @amygcho added start code on branch

79bd87b

from heartofthevalley.

JMStudiosJoe avatar JMStudiosJoe commented on June 5, 2024

@ychoy @JMStudiosJoe ...

@ctram awesome job. I would assume taking the postal address and converting it to lon/lat. from what I could tell every public art link title stated with Public Art: and looks to be the same with Artist? Please let me know if need more help on this and I’ll do what I can.

Sent with GitHawk

from heartofthevalley.

ctram avatar ctram commented on June 5, 2024

@JMStudiosJoe I am able to save the address but having issues coming up with neatly getting the details (artist, title, etc) under their proper labels because of the issue with inconsistent HTML structure. Please take a look if you have time. I'll scrape all these 200 or so pages later, if the number of exceptions are reasonable, it might be worth it to just manually clean up the oddballs.

I'm currently writing the data as JSON, so a future task is to inject that data into the map.

Where did the current data come from, how did it get into JS object format in art.js file?

from heartofthevalley.

JMStudiosJoe avatar JMStudiosJoe commented on June 5, 2024

@ctram Current data came from a spreadsheet (outdated) and @ychoy manually entering in data. I have not gone into the art.js file been mainly going after that webscraper.

from heartofthevalley.

ctram avatar ctram commented on June 5, 2024

@JMStudiosJoe @ychoy To check, am I OK to use the MapBox API key to generate the geolocation based on postal address?

from heartofthevalley.

ctram avatar ctram commented on June 5, 2024

@JMStudiosJoe @ychoy I believe we can make X amount of API requests per month before they start charging someone's card? : ]

from heartofthevalley.

JMStudiosJoe avatar JMStudiosJoe commented on June 5, 2024

@ctram yes MapBox API should be good to use and this won't be making that many requests per month

from heartofthevalley.

ychoy avatar ychoy commented on June 5, 2024

@ctram , thanks for working on the scraper! Once you start on inputting the data from the scrape into art.js, there may be duplicate information - I think we got about 60 records from the City's website into art.js. It's okay to overwrite what I have and just take the information you get from the City's website.

For geocoding lat and long - we've been trying to use everything Open Street Maps for this project. Maybe consider using Nominatim-Browser https://www.npmjs.com/package/nominatim-browser? It won't be entirely accurate because sometimes the position of the public art/mural will not be at the lat and long of the postal address. But until all of this information is inputted into OSM and able to be queried, this will work for now.

This is the general format of each JS object in art.js. We have a separate issue of the art.js needing to be cleaned up (because I injected a lot HTML tags, since some pieces have multiple artists and thus multiple websites, etc.). So I propose that we add additional attributes to look out for. sourceOfInformation would be the City of San Jose Public Art Program and the sourceURL is the specific webpage with the details about the public art/mural piece.. If the information exists regarding artist website, include it.

                "geometry": {
                    "type": "Point",
                    "coordinates": [
                    ]
                },
                "properties": {
                    "title": "",
                    **"artist1": "",
                    "artist2": "",
                    "artist3": "",
                    "artist1website": "",
                    "artist2website": "",
                    "artist3website": "",**
                    "description": "",
                    "**sourceOfInformation": "", 
                    "sourceURL": "",** 
                    "address": "",
                    "city": "",
                    "country": "",
                    "postalCode": "",
                    "state": ""
                }
            } 

I realized I hadn't updated the API key. I have a key from CFA, which should allow for more API requests each month. I'll update it today.

Let me know if you have any more questions.

from heartofthevalley.

ctram avatar ctram commented on June 5, 2024

@ychoy Thanks! To be clear, the art.js data came from the city website, but did not come from http://sanjoseca.gov/Facilities, is that correct?

Yes, I will be working to consolidate all the data into a single JSON file.

I have Nominatim up and running, thanks for the suggestion!

@ychoy @JMStudiosJoe might you know how to get JSON data to the client without necessitating a call to a server? I am saving the scraped data as JSON; I'm not familiar with how to include JSON data with the index.html file download; for example, would you include a <script> tag with a source to the JSON?

from heartofthevalley.

JMStudiosJoe avatar JMStudiosJoe commented on June 5, 2024

@ctram likely we will have a frontend site such as react or angular that will serve the file as needed, at least that would be apart of the plan

from heartofthevalley.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.