Giter Club home page Giter Club logo

webscraping's Introduction

Webscraping

Yukun (Tifa) Tan's SPIN project.

Updated by ARFC.

Scrapes reactor coordinates from Wikipedia.

License

This is under a CC-BY license.

How-to-use

Run: python scraping_wikidata.py

Output File: coordinates.sqlite

webscraping's People

Contributors

katyhuff avatar ytan15 avatar nsryan2 avatar

Stargazers

PEP 8 Speaks avatar

Watchers

 avatar James Cloos avatar Madicken Munk avatar  avatar Aditya Bhosale avatar Mark Kamuda avatar Xin avatar Huan Yan avatar Snehal Chandan avatar Jin Whan Bae avatar Andrei Rykhlevskii avatar  avatar

webscraping's Issues

Cannot put the new pandas dataframe into sqlite table

Hi Dr. @katyhuff ! As mentioned earlier today, seems like the code

CREATE TABLE testTable(
        'index', 'Name' TEXT, 'Coord' TEXT, 'Long' REAL, 'Lat' REAL
    )
    '''

(lines 84-86 in https://github.com/ytan15/webscraping/blob/master/scraping_wikidata.py)
is not working. The error messages says "probably unsupported type".

Earlier it was

CREATE TABLE testTable(
        'index', 'Name' OBJECT, 'Coord' OBJECT
    )
    '''

and it was working fine.
Can you please take a look at it? Thanks!

License update

I'm submitting a ...

  • feature request

Expected Behavior

There should be a LICENSE file that covers the entire repository.

Actual Behavior

There is a subsection in the README.md file that claims the repository is under CC-BY, and I'm not sure if that's adequate.

How can this issue be closed?

First, a determination has to be made about the type of License under which to cover this repository.
Second, that License should then be added through a pull request to this repository.

Webscrape results missing data

The results of Webscrape are missing some reactors that are currently shutdown.
This is especially the case for reactors out of the United States

Scrape from the PRIS database

Next step, if you're getting frustrated with wikipedia: check out the PRIS database. If necessary, @jbae11 can help with understanding it and perhaps would be willing to show you what he's done so far.

Some, but not all, of the information we want, should be in that database.

Scrape from Wikidata and Wikipedia

Hi @ytan15 ! Sorry it took a while to describe this goal. Let's see what you can get into your sqlite3 file out of wikipedia alone. Here are some tutorials on scraping wikipedia and wikidata:

You ought to be able to find names and locations of reactors. You may also be able to find some of the other columns as well. Let us know what you can find!

First Task: Create Sqlite file

This should be a mostly empty database.

  • Create a python script
  • In the python script, import and use the sqlite3 package to create a "reactors.sqlite" file.
  • This file should have one table, called reactors, and should have the following columns:

ID, Name, Lat, Long, Institution, Country, Type, Fuel, Enrichment, Electrical Capacity, Thermal Capacity, Thermal Efficiency, Capacity Factor

Need some nuclear knowledge boost & clarification of our goals

Hi Dr. @katyhuff ! I've been looking into scraping from wikidata, and I think I've grabbed the gist of it. So I started to expand on scraping_wikidata.py, trying to find more information. However, I've encountered some confusions-

  1. From my understanding, nuclear reactor is the energy source of a nuclear power plant, and nuclear power plant is the key facility of a nuclear power station, is it right? Are we looking for nuclear power plant, nuclear power station, or nuclear reactor?
  2. I was trying to find the country of a nuclear reactor, by adding the line
 ?reactors wdt:P17 ?country .

into the query. However, the problem with that is, not all nuclear reactors have a country attribute (because in wikidata "country" means "sovereign state of this item"). For example, Bhabha Atomic Research Centre (https://www.wikidata.org/wiki/Q854682) doesn't have "country" attribute, although from the description we know that it is based in India. Hence the issue is, if I add this line into the query, it will filter out this entry (Bhabha Atomic Research Centre). Is this something that we should be concerned about?

  1. Although I haven't started on Wikipedia yet, I'm a bit concerned that would querying from wikidata and wikipedia possibly cause any overlap, since wikidata stores the data of wikipedia?

  2. If Wikidata contains data of wikipedia, how come there is "Category: Nuclear power reactor types" in Wikipedia, but not Wikidata? Am I having some kind of misunderstanding?

Would you mind discussing these issues with me? Either on here or I could make an appointment with you if you feel like it would easier to talk in person.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.