Giter Club home page Giter Club logo

Comments (32)

aufdenkampe avatar aufdenkampe commented on September 21, 2024

We discussed prioritizing this in our 2017-08-10 EnviroDIY Data Portal call.

We specifically discussed using it to filling gaps in data. Notes from call:

  • Upload csv to fill gaps in existing data in the database
  • This would likely be implemented as an offline process that gets queued for execution and the user would be notified when it is finished

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

This will also be an important feature for the next set of features for submitting data "manually" for discrete measurements (i.e. stream kit nitrate, pH, etc.).

from odm2datasharingportal.

horsburgh avatar horsburgh commented on September 21, 2024

Need a design for how columns in a CSV file uploaded are matched with sensors/variables registered for that site. Also - this will be a resource-intensive task. So, we will want to take the actual execution of the parsing and loading of data offline so that the main web server does not get bogged down with these types of requests.

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

OK. Let's discuss.

from odm2datasharingportal.

fryarludwig avatar fryarludwig commented on September 21, 2024

Did we ever agree on a CSV format for us to use?

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

No, we haven't. I'll put that on my TO DO list.

from odm2datasharingportal.

horsburgh avatar horsburgh commented on September 21, 2024

@aufdenkampe - if we are going to consider this, we need a design for how a user would format a file for upload to the site.

I'm thinking something like this would be really simple and easy to create/parse:

RegistrationToken:7a0519ff-8259-4090-892b-74d4981aefc6
SamplingFeatureUUID:bc9fd032-1f34-45eb-a001-492af2998e49
ResultUUID:cfce1736-8616-4975-b784-c2b52e5ca003
TimeStamp,DataValue
2016-12-08T14:00:00-07:00,18.23
2016-12-08T14:05:00-07:00,18.45
2016-12-08T14:10:00-07:00,18.23
2016-12-08T14:15:00-07:00,18.45
2016-12-08T14:20:00-07:00,18.20
2016-12-08T14:25:00-07:00,18.13
2016-12-08T14:30:00-07:00,18.05

But, the original note above says that people might want to upload the log file from their SD card. What do those files look like?

from odm2datasharingportal.

horsburgh avatar horsburgh commented on September 21, 2024

@aufdenkampe and @SRGDamia1 - I think this is the last issue in the priority issues release that is not complete. We were waiting for feedback from you on the format of the CSV files that users could upload to fill gaps. Easiest for us is going to be to use the format that I suggest above, but this isn't the format that is being used to log data on the SD card on the loggers. Can we get a sample output file from one of the loggers?

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

On today's call we decided to go with the CSV format written to the SD card by the EnviroDIY/ModularSensors code.

Here is a short example for Filename: OMS01_Mayfly_170304_2018-04-17.csv

"Sampling Feature: bfd91225-eda1-4ed5-8568-e19cde6e8614","584ae2a3-c003-49fd-8a1c-d0e8ad97dc07","7d8850ba-9438-46df-a318-f14cd6a1dd5d","3fdab72e-74c5-432e-95ba-daf9bf7ff8ba"
"Data Logger: OMS01_Mayfly_170304","EnviroDIY Mayfly","EnviroDIY Mayfly","MaximDS3231"
"Data Logger: OMS01_Mayfly_170304","Free SRAM","batteryVoltage","temperatureRTC"
"Data Logger: OMS01_Mayfly_170304","Bit","Volt","degreeCelsius"
"Date and Time in UTC-6","FreeRam","Battery","BoardTemp"
2018-04-17 20:30:00,11673,4.806,23.00
2018-04-17 20:32:00,11673,4.806,22.75

I'll provide an example file for an existing station that we can use for development and testing.

A few points from our discussion:

  • The upload modal window in the UI will include a Help link to an info page we create at https://www.envirodiy.org, where we'll describe expected format and syntax.
  • The 5 header lines get written to the file every time the logger is restarted, which is very common. Therefore, the parser will need to ignore these extra header lines.
  • The files will get processed on a different server, and the user will get an email notification when complete.
  • The email notification will include some info on how many values were copied for each ResultUUID and if the SamplingFeatureUUID was found. Also reporting the time range and the UTC offset might be useful.

from odm2datasharingportal.

horsburgh avatar horsburgh commented on September 21, 2024

Notes on uploading CSV file:

  1. We will go with the option to upload the datalogger CSV file
  2. The datalogger CSV files may have multiple copies of the header in them because the header gets written to the file every time the logger powers up - these will all be the same, and so any copies of the header after the first one can be ignored
  3. The header contains the SamplingFeatureUUID and the ResultUUIDs, so we can match directly to the site and result in the database. Each column can be processed separately.
  4. If the SamplingFeatureUUID is not found in the database, the whole thing will bail out
  5. If a ResultUUID is not found in the database, the program will just skip it
    • The user will receive some report (via email) with messaging like:
    • Sampling feature not found. No data added to the database. (in the case that the file is for a sampling feature not in the database).
    • XXX DataValues added to the database for ResultUUID XXXXX (repeat this message for each Result in the file)
    • Data load for file XXX complete.
  6. In the location we provide the link to upload the file, we should provide a link to a page on the envirodiy.org website that describes the datalogger file format (TODO: Anthony to provide link to USU)
  7. TODO: Anthony will provide a CSV datalogger file example from one of the loggers for USU to work from.

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

All, this is "hot off the press". I just got these datalogger CSV files from @fisherba using the ModularSensors code, deployed on May 13th. I haven't really looked at them yet, but it appears that many files are created, perhaps starting a certain number of lines are written. It looks like we'll need to create functionality to load up a series of files in batch (i.e. select one or more files at a time).

These deployments were very buggy, so hopefully not representative, but I think we want to design for a messy collection of files such as these.

See: https://drive.google.com/drive/folders/1tQAgD-fjeoR97H44fbOucZc6wbklRdQz?usp=sharing

Just look at these files:

  • from CMP01_Mayfly_170292_2018-05-13.csv
  • to CMP01_Mayfly_170292_2018-07-16.csv

from odm2datasharingportal.

horsburgh avatar horsburgh commented on September 21, 2024

@aufdenkampe - unfortunately, we aren't going to have time to work on a more complex solution here. It's going to have to be up to the user to manage their files. They can submit each file individually or they can combine files on their end and submit a combined file. Either way, the functionality we are working on will not change.

It seems like some thought may be needed in how those files are constructed. There are huge blocks of those files that just consist of repeated header information.

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

Here are links for four stations from @fisherba, including the glitchy station mentioned in #12 (comment). The other four stations didn't have glitches that caused rebooting, so the data is in a single file per station.

Files are organized by station in the IESF_StationData_Fisher Google Drive folder, corresponding to these stations:

@horsburgh, it is true that the files for the glitchy CMP01 station are a bit of a mess (because it was clear the station was rebooting frequently), but I do think they offer a useful use case of what some users might need. I'm about to load another set of files, which might be more typical, where there were 4 files for a station. Again, each time the station restarts (which is often during testing mode) a csv file is created with a date stamp if one for that day doesn't already exist and a new header gets written to the csv for that day. This is a fail-safe approach that only gets messy when the station is rebooting often.

In my recent experience, it is relatively trivial scripting with the Pandas library to open a batch of similar files and combine them into a single data frame. I was thus thinking that batch loading would not be a very heavy lift for your team.

from odm2datasharingportal.

horsburgh avatar horsburgh commented on September 21, 2024

@aufdenkampe - batch file handling with upload, offline processing, etc. is more than we can accomplish with the time we've got left. I think we can do the single file upload/processing.

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

In our EnviroDIY_WSU_DataFiles Google Drive folder is another set of data files for the following stations that we recently deployed:

These are a great use case, because none of these stations have live data (yet), because of poor cell phone service. Therefore, the team is going out and manually pulling SD cards, which always starts a new file. This team would very much like to upload their data to the website, even if there is a delay of weeks, because TSA is a great data viewer, plus the benefit of WOF web services. Having the capability for them to upload a few files at a time would be very useful. Note that each file has the SamplingFeature UUID, so the upload window doesn't need to be just for a single site.

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

@horsburgh, it makes sense that the UI and backend features for handling multiple files at a time might be a bit of extra work. I certainly don't have much experience with those functions.

That said, can we ask @Maurier and @jcaraballo17 to explore what it would take to serialize the single-file offline processing into a multi-file processing? I've been impressed at how those guys have found easy solutions for some features that at first sounded complicated? If they decide once they dig into it that adding multi-file capabilities would be too much, then I understand that we would need to drop that feature.

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

Now that this feature is on https://envirodiysandbox.usu.edu, @fisherba and I will work on testing this feature.

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

I just did some testing for the KINNI_Logger1 site (https://envirodiysandbox.usu.edu/sites/KINNI_Logger1)/, and a few things didn't work.

First, the file upload function didn't work on Safari web browser. Clicking on the blue paperclip button opened a select file box, but once I did, that box closed and nothing else happened. The app still showed "No file chosen."

Second, using Chrome, I was able to upload a file, but I didn't see the data in the data portal or TSA.

  • I did get an email with this message in the body: "Your data upload for site KINNI_Logger1 is complete."
    • A few more message details would be helpful, as we discussed, such as "XX lines of data ingested into database", or preferably list each variable separately, giving the number of lines ingested for each.
  • However, I don't see the data online

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

Also, I get a "Download failed - Server problem" error from Chrome when I try to download data using the new "Download Sensor Data" button from https://envirodiysandbox.usu.edu/sites/KINNI_Logger1 or https://envirodiysandbox.usu.edu/sites/KINNI_Logger2

Maybe that's the problem with upload for these sites.

I was able to download data from https://envirodiysandbox.usu.edu/sites/160065_Limno_Crossroads/ without a problem.

from odm2datasharingportal.

horsburgh avatar horsburgh commented on September 21, 2024

@aufdenkampe - I just tested in Safari on my Mac on the production server. I am not having the issue with the file upload function. When I click on the blue paperclip icon, I get the standard file browser dialog, and I am able to select files for upload.

Also - I think the data upload worked as designed. When you view data on the website, only the most recent 72 hours of data are shown in the sparkline plots and tabular view. ALL of the data are included in the download files. So, if you had two lines of data that were more recent than the data you uploaded in the file and there was more than 72 hours difference between the most recent two data values and the data you loaded, you won't see the data you loaded in the sparkline plots or the tabular view. Given the dates you mention (two data points from October are viewable, but data from June/July are not), this seems like what has happened. So - no bug here.

I also tested the file download on the production server. It is working as expected. Can you try it again from production? There was a brief delay before the file was downloaded, but It came down OK.

https://data.wikiwatershed.org/sites/KINNI_Logger1/

That I can see, all of the issues you are reporting here have been resolved on production.

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

@horsburgh, thanks for testing this further. I'll give it a try on production.

The issue I had with the data not showing up on sandbox wasn't about the sparkline plots, but rather that the data were not visible either in the tables or in TSA, both of which show all the data and therefore should have also shown the new data.

I'll let you know if it all works on production.

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

I just looked at https://data.wikiwatershed.org/sites/KINNI_Logger1/ on production, and found:

  • Download csv files works well for individual variables and the entire station! Also, all the data is there, from June 7 to Oct. 14, 2017 (plus a few random dummy values from before and after). Awesome!
  • Upload button on Safari opens a dialog box to select a file. Awesome!
  • Clicking on the table icon opens a box with the last 3-days of data. I now see that this is the behavior throughout the site. It wasn't originally that way, and I hadn't noticed the change when it happened, but it is a good idea to limit to 3-days. The tables were way to long before. So that's good!
  • TSA shows all the data when I select custom dates. I had tried to expand the date range previously, but it hadn't worked. I now see that more than a month of data slows TSA down considerably, and that perhaps I had experienced some performance issues on Sandbox. So that's good too!

So https://data.wikiwatershed.org/sites/KINNI_Logger1/ looks good on production.
I didn't try to upload data, since it was all there already.

I'll now try to upload a file for a different site.

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

I just successfully tested CSV data upload to http://data.envirodiy.org/sites/WW-N2%20(WSU-1)/.
It looks great!

The only hitch I had was that the one csv files that was saved in an Excel-generated CSV UTF-8 format didn't upload (I got an error about the SamplingFeatureUUID being wrong). When I re-saved to regular CSV, it started uploading but still hasn't completed after a long time. The file is 2018.06.27-N2-WSU1.xlsx and it's derivatives in our EnviroDIY_WSU_DataFiles Google Drive folder. Any ideas on that?

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

BTW, for some reason, TSA doesn't show any Datasets/variables for this site.
http://data.envirodiy.org/tsa/?sitecode=WW-N2%20(WSU-1)&view=datasets&plot=false
Is that a bug?

from odm2datasharingportal.

horsburgh avatar horsburgh commented on September 21, 2024

@aufdenkampe - If the correctly formatted files work, that's great. I don't have time to look at the unsuccessful file right now, but don't have ideas off the top of my head.

from odm2datasharingportal.

horsburgh avatar horsburgh commented on September 21, 2024

@aufdenkampe - what is the actual site code for the site that is having TSA issues?

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

@horsburgh, I agree that we can close this issue because the files from the logger are all loading fine. Thanks for taking a moment to think of some ideas. It looks to me that passing through Excel stripped the quotes around strings and also reformatted the DateTime string. I'll try to add that to the documentation. Let's close this issue.

Regarding the site with TSA issues, it is:
http://data.envirodiy.org/sites/WW-N2%20(WSU-1)/
The SiteName is "Elba". If you select it on http://data.envirodiy.org/tsa/ from the Site facet, it doesn't seem to have any variables. However, if you type "Elba" into the search from the Datasets tab, you can find all the variables.

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

Update: I got the Excel-generated CSV file to load by doing the following:

  • Reformatting DateTime in Excel using a custom yyyy-mm-dd hh:mm:ss format
  • Opening in a text editor
  • Adding quotes around all strings to all headers rows, via find , and replace with ",".
  • Saving in Unicode (UTF-8) format rather than Unicode (UTF-8, with BOM), which is the default CSV format from Excel. Note that this step can also be done in Excel, by saving as the plain Comma Separated Values (.csv) format that can be found in the Speciality Formats list.

from odm2datasharingportal.

horsburgh avatar horsburgh commented on September 21, 2024

@aufdenkampe - Juan made a fix to the TSA to escape more special characters in the site codes. In this case, the parentheses were the culprit and the reason why the list of variables was not showing up correctly in TSA. Please keep in mind with sitecodes that they are used in URLs. So, I recommend not using any special characters, including spaces and parentheses. Keep site codes to alpha-numeric, dashes, and underscores. Juan's fix should have resolved this case, but if people continue to use special characters, we may run into this again.

You may have to clear your cache to get the link to work correctly.

I'm closing this issue since you have verified that the file upload worked correctly for a correctly formatted file. FYI - you may want to pass along to whoever is doing workshops, etc. to tell people that if they open their data file in Excel and allow Excel to save their data file, it will be changed when they open it again - likely in a way that breaks things. Students in my Hydroinformatics class have this happen ALL the time.

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

Awesome! Thanks!

Thanks for fixing the site name thing. I suspected that it might be a special character thing.

I just noticed one other related TSA bug.
The link to TSA at the very top of the black bar on the home screen points to http://data.envirodiy.org/data.wikiwatershed.org/tsa, which won't resolve.

from odm2datasharingportal.

dbressler75 avatar dbressler75 commented on September 21, 2024

Anthony, could you share an example of properly formatted .csv file that can be uploaded to data.envirodiy.org?

from odm2datasharingportal.

aufdenkampe avatar aufdenkampe commented on September 21, 2024

@dbressler75 , I pointed to lots of examples in these specific comments:

This should also answer your questions in #294.

from odm2datasharingportal.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.