Giter Club home page Giter Club logo

camtrap-dp's Introduction

Camtrap DP

DOI

Camera Trap Data Package (or Camtrap DP for short) is a community developed data exchange format for camera trap data.

Usage

See the documentation website.

Contribute

Questions? Suggestions? Contribute to the development of Camtrap DP by watching the repository and participating in issue discussions.

camtrap-dp's People

Contributors

biogeek avatar danstowell avatar kbubnicki avatar niconoe avatar peterdesmet avatar stijnvanhoey avatar timrobertson100 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

camtrap-dp's Issues

Revert event_ back to sequence_ and update definitions

In GitLab by @peterdesmet on Sep 8, 2020, 19:22

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

There was some confusion between species occurrence events and events as defined in this standard, which are based on sequences of associated multimedia files. It was suggested to rename all event_ based terms to sequence_ terms:

package event_intervalsequence_interval

See also old old discussion on using sequence_cutoff.

Current definition:

Maximum interval between sub-sequent observations which are considered as one independent observation i.e. an event. Specified in seconds.

New definition:

Maximum interval between consecutive images to be considered as one sequence. Specified in seconds.

media event_idsequence_id

Current definition:

A unique identifier of an event within a project. Events are what project researchers or software have defined as independent occasions, containing one or more multimedia files (e.g. a single video or a sequence of consecutive images).

New definition:

A unique identifier of a sequence within a project. Sequences are defined by the sequence_interval in the package metadata and contain one or more multimedia files (e.g. a single video or a sequence of consecutive images). They are the source for sequence-based observations.

observation event_idsequence_id

Current definition:

The unique identifier of the event (collection of multimedia files) that is the source of this observation. Foreign key to media:event_id.

New definition:

The unique identifier of the sequence (collection of multimedia files) that is the source of this observation. Foreign key to media:sequence_id.

Note: also update foreign key

New term for observations: has_human

In GitLab by @peterdesmet on Aug 12, 2020, 05:14

Media with (recognizable) humans are subject to privacy concerns and can potentially not be shared publicly. It is therefore useful to indicate if an observation (based on sequence or single photo) contains a human. The field observation_type allows to indicate "Human", but it is not mutually exclusive with the values Animal and Vehicle and definitely not with the suggested extended vocabulary for observation_type (see #12), e.g. a human often occurs during setup/pickup.

I think it would be more useful to consider has_human as a separate boolean term, with TRUE/FALSE to indicate if an observation (and thus related media) contains a human and is therefore subject to privacy issues (i.e. the media files are potentially not shared in the data package).

New term for media: order_in_sequence

In GitLab by @peterdesmet on Aug 12, 2020, 03:34

It should be possible to order media files in a sequence as they were taken, i.e. chronologically. Often this cannot be done based on the date_recorded (see #9) alone, as some media files have the exact same timestamp, especially if it does not include milliseconds. It would therefore be useful to have a term order_in_sequence with an integer index indicating the intended order of the media files within a sequence:

sequence_id date_recorded order_in_sequence file_name
7bff18d3-a088-4695-850f-f20913730ee6 2018-01-20T09:15:40Z  0  20180227091227-IMG_0001.JPG
7bff18d3-a088-4695-850f-f20913730ee6  2018-01-20T09:15:43Z  1  20180227091227-IMG_0002.JPG
7bff18d3-a088-4695-850f-f20913730ee6  2018-01-20T09:15:43Z  2  20180227091227-IMG_0003.JPG
7bff18d3-a088-4695-850f-f20913730ee6  2018-01-20T09:15:44Z  3  20180227091227-IMG_0004.JPG
7bff18d3-a088-4695-850f-f20913730ee6  2018-01-20T09:15:45Z  4  20180227091227-IMG_0005.JPG
7bff18d3-a088-4695-850f-f20913730ee6  2018-01-20T09:15:45Z  5  20180227091227-IMG_0006.JPG
7bff18d3-a088-4695-850f-f20913730ee6  2018-01-20T09:15:46Z  6  20180227091227-IMG_0007.JPG

Use dc:temporal rather than temporal_coverage

In GitLab by @peterdesmet on Aug 20, 2020, 08:45

As mentioned in the data package specs, Dublin Core already has terms to define temporal coverage, see:

"temporal": {
  "name": "19th Century",
  "start": "1800-01-01",
  "end": "1899-12-31"
}

I suggest to reuse these rather than define our own temporal_coverage.

Use of file_name and file_location

In GitLab by @peterdesmet on Aug 18, 2020, 02:12

  • file_name is the name of the media file, e.g. IMG0001.jpg
  • file_location is the full URL or path of the media file, including the name of the file, e.g. gs://wildlife_insights/Project/Images/CT-011/IMG0001.jpg

Note: the example https://trapper.org/storage/resource/media/259024/file/ for file_location is a bit confusing here, since it does not end with jpg.

Given that both fields link to a file, wouldn't it be more convenient to have:

  • file_path: Local path to the file in the data package
  • file_url: Remote URL to the file served online

A data package can include one, the other or both.

format = default or any for datetimes?

In GitLab by @peterdesmet on Aug 31, 2020, 5:03

Deployment start/end and media timestamp currently mention in the definition that the format should be:

as an ISO 8601 formatted string in UTC (e.g. YYYY-MM-DDThh:mm:ssZ).

but the format is any:

Any parsable representation of the type (https://specs.frictionlessdata.io/table-schema/#date).

Should we restrict this to format: default which is that exactly YYYY-MM-DDThh:mm:ssZ?

I think that format is also valid for milliseconds (could apply to timestamp)?

Separate observations in observations.csv and media.csv

In GitLab by @peterdesmet on Aug 7, 2020, 11:12

Currently observations.csv contains observations and files. This has 2 drawbacks:

  1. If multiple observations were made for a single (e.g. movie) file, the file information needs to be repeated for each observation.
  2. Sequence-based observations (e.g. 1 male, 2 female, 5 juvenile wild boars = 3 obs based on a burst of 20 subsequent images) need to be repeated for every image (resulting in 60 observations), while not every image might even have the observed animal. The alternative is concatenating the 20 image files per observation row (e.g. delimited list in a single field), but that is messy too.

Proposal

I think observations in observations.csv should be as atomic as possible, i.e. the actual observed groups of same sex, life stage and potentially behaviour. These observations are linked to the evidence they are based on: a sequence (to keep with the term generally used for this), which can either be a single image file (for very detailed observations), a single movie file, or a burst of subsequent images that are considered as a whole. These can be organized in media.csv where each row is a media file.

Below an example:

  1. Sequence A contains 4 images and is identified as a whole: 3 observations (1 male, 2 female, 5 juvenile wild boars)
  2. Sequence B contains 4 images and is identified as a whole: it is considered blank
  3. image_21, image_22, image_23 are identified separately (not sequence based), so they get assigned their own sequence_id (C, D, E)
  4. Sequence F contains 1 movie file and is identified as a whole: containing 1 fox

observations.csv

observation_id deployment_id sequence_id sequence_start sequence_end species_latin lifestage sex count obs_type
1 X A 2020-03-15 00:34:00 2020-03-15 00:34:15 Sus scrofa adult male 1
2 X A 2020-03-15 00:34:00 2020-03-15 00:34:15 Sus scrofa  adult  female 2
3 X A 2020-03-15 00:34:00 2020-03-15 00:34:15 Sus scrofa juvenile unknown 5
4 X B 2020-03-15 01:20:00 2020-03-15 01:20:15 blank
5 X C 2020-03-15 06:15:00 2020-03-15 06:15:00   Capreolus capreolus adult  female  1
6 X D 2020-03-15 06:15:05 2020-03-15 06:15:05     blank
7 X E 2020-03-15 06:15:10 2020-03-15 06:15:10  Capreolus capreolus adult  female  1 blank
8 X F 2020-03-15 07:33:12 2020-03-15 07:33:12  Vulpes vulpes adult unknown 1

media.csv

media_id deployment_id sequence_id time_stamp file_name file_mimetype
1 X A 2020-03-15 00:34:00 image_1 jpeg
2 X A 2020-03-15 00:34:05 image_2 jpeg
3 X A 2020-03-15 00:34:10 image_3 jpeg
4 X A 2020-03-15 00:34:15 image_4 jpeg
5 X A 2020-03-15 01:20:00 image_11 jpeg
6 X A 2020-03-15 01:20:05 image_12 jpeg
7 X A 2020-03-15 01:20:10 image_13 jpeg
8 X A 2020-03-15 01:20:15 image_14 jpeg
9 X C 2020-03-15 06:15:00 image_21 jpeg
10 X D 2020-03-15 06:15:05 image_22 jpeg
11 X E 2020-03-15 06:15:10 image_23 jpeg
12 X F 2020-03-15 07:33:12 movie_1 mp4

Organizing the data this way has the advantages:

  1. media.csv only contains the information per file once. It has the same number of rows as there are files.
  2. observations.csv contains the (atomic) observations as made by the identifier, either based on sequences or individual files.
  3. By including the sequence_start timestamp in observations.csv, that file contains all the necessary information (together with deployments.csv) to do most research analyses. media.csv can be considered metadata/evidence the observations are based on.

How to describe target species?

In GitLab by @peterdesmet on Sep 8, 2020, 14:04

This issue pivoted to a discussion on how to indicate target species

The current definition of taxonomic coverage is:

Taxonomic coverage for this data package. It is based on a set of unique values from scientific_name field in observations.csv table.

Could this be broadened to allow the inclusion of species that were available for determination (e.g. in a project defined species list) or were considered target species, but did not result in observations? Or should we consider another way to allow the inclusion of target species?

Add sequence_based property to package metadata

In GitLab by @peterdesmet on Sep 8, 2020, 19:28

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

To better understand the usability of datasets (e.g. for AI learning) it would be good to know if the dataset uses sequence or media based classifications. Proposal is to add a sequence_based term to project information. If both is possible for a single project, than we cannot use a boolean, but should maybe have:

classification_source: media, sequence, both

New term for deployments: camera_detection_distance

In GitLab by @peterdesmet on Aug 10, 2020, 16:30

In Agouti, deployments have a field detectionDistance. It is not often populated, but maybe it could be added to the deployments schema as camera_detection_distance, indicating up to which max distance the camera will trigger.

New term for observations: is_domesticated

In GitLab by @peterdesmet on Aug 12, 2020, 05:33

Since we are using scientific names at species level from Catalogue of Life in Agouti, both dog and wolf get scientific name Canis lupus. We therefore have an extra field is_domesticated to indicate the difference (which is important for analysis). The same applies to Ovis aries. Using subspecies names could solve the problem, but the term is_domesticated is more broadly applicable to indicate if the photographed animal was wild or domesticated.

Could this term (TRUE/FALSE) be added to observations?

Deployment tags

In GitLab by @peterdesmet on Aug 17, 2020, 03:31

Deployments can be organized in:

  • Sessions, which I interpret as temporal groupings:

    Camera trap deployments can be grouped into sessions (sampling 'events'). Common sessions could be seasons (wet and dry), months, years or other types of logical groupings when field sampling occurs.

  • Arrays, which I interpret as spatial groupings:

    A name for a logical grouping of deployments into (spatial) arrays. This could be for thematic or logistical reasons.

  • Feature types, which I interpret as location property associated with the deployment:

    Type of feature (if any) that camera deployment is associated with. If other, more info can be provided in the comments field.

  • Habitat, which I interpret as a location property associated with the deployment:

    Short characterization of the habitat.

However, in Agouti deployments can be organized with tags which do not necessarily assume a temporal, spatial, feature type or habitat grouping. The context is up to the project manager. For example

Centre, Edge, Forest Island, Gallery Forest
2010, 2011, 2012, 2013, 2014, 2018, ecoduct, faunatunnel,
C13 (plot 267 carcass) part 1, C13 (plot 267 carcass) part 2, C14 (plot 102 control), C15 (plot 8 control) part 1, ...

Some of these could be mapped to the terms above, but to allow automated export, can we add a context-less grouping tags to deployments?

Add cam_direction to deployment

In GitLab by @peterdesmet on Sep 8, 2020, 18:41

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

In addition to cam_height and cam_angle, it would be useful to know the camera trap orientation/direction (wind direction).

The EUROMAMMAL table suggests north; west; south; east, but I would suggest a (decimal) degrees from north (e.g. south = 180). The same is used for heading in Movebank.

My proposal:

{
  "name": "orientation",
  "type": "number", # Or integer
  "format": "default",
  "description": "The direction of the camera in decimal degrees clockwise from north (0 = north, 90 = east, 180 = south, 270 = west).", # Or degrees
  "example": "311.2", # Or integer
  "constraints": {
    "required": false,
    "minimum": 0,
    "maximum": 360
   }
}

Use organization title, project title and _platform_title, rather than name

In GitLab by @peterdesmet on Aug 20, 2020, 04:44

In the generic data package specs the term title is used for name. E.g. the package, licenses, and even contributors have the term title.

For consistency, I would therefore update the camtrap-schema terms name to title as well. This applies to:

Expand vocabulary for observation_type

In GitLab by @peterdesmet on Aug 12, 2020, 05:08

The current suggested vocabulary for observation_type has:

Human
Animal
Vehicle

CTMS has the field photoType with:

Start
End
Set Up
Blank
Animal
Staff
Unknown
Unidentifiable
Timelapse

In Agouti we have information regarding:

(Blank -> mapped to is_empty)
Setup/Pickup
Unknown
Timelapse

Can the vocabulary for observation_type be expanded to include those values?

Use lowercase values for controlled lists

In GitLab by @peterdesmet on Aug 29, 2020, 2:14

Can we use lowercase values for all camtrap controlled lists/enum, such as in:

I don't really have strong arguments except:

  • It is what I mostly encounter in other biodiversity information vocabularies, such as GBIF vocabs often used by Darwin Core and Movebank vocabularies (search for "controlled list", e.g. for attachment type)
  • It's a bit easier to type
  • We use lowercase values for term names as well
  • It looks more like a controlled value (to me)

Add "Other" to bait_use

In GitLab by @peterdesmet on Aug 20, 2020, 08:32

bait_use currently allows 5 values that map well with CTMS, but Other is missing

camtrap ctms  remarks
None  None
Scent  Scent
Food  Meat Food better as it is more generic
Visual  Visual
Acoustic  Acoustic
- Other

Add cam_firmware to deployment

In GitLab by @peterdesmet on Sep 8, 2020, 18:44

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

For some projects it might be useful to know the firmware of the camera that was used. This needs further community feedback before we can decide if and how this should be part of the standard.

Add type_of_flash to deployment

In GitLab by @peterdesmet on Sep 8, 2020, 18:48

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

Cameras use different types of flashes to take pictures in the dark, which might have an effect on the animal (i.e. it won't return to that place). Values could be:

white flash
infrared flash
black flash

This needs further community feedback before we can decide if and how this should be part of the standard.

Clarify definition of event_interval, use event_ throughout

In GitLab by @peterdesmet on Aug 7, 2020, 10:05

The package property event_interval is defined as:

Maximum interval between sub-sequent observations which are considered as one independent observation i.e. an event. Specified in seconds.

  • I find it difficult to wrap my head around this definition. Is it the maximum time between two images that are considered to be part of the same sequence? So if set to 30 seconds, then images A (20:30:00), B (20:30:15), C (20:32:00) would be considered as 2 sequences: A+B and C? If so, then that is what we call in Agouti sequence_cutoff.
  • Can we change the name to something like sequence_cutoff? I find that clearer to understand.

Use dc:spatial rather than spatial_coverage

In GitLab by @peterdesmet on Aug 21, 2020, 11:31

Dublin Core already has terms to define temporal coverage, see https://www.dublincore.org/specifications/dublin-core/usageguide/qualifiers/#spatial. It allows:

I guess we can also still use http://json.schemastore.org/geojson.json

"spatial": {
  "name": "...",
}

I suggest to reuse these rather than define our own spatial_coverage.

Update cam_ to camera_

In GitLab by @peterdesmet on Sep 8, 2020, 18:53

If noticed myself Cmd+F for camera_id, while the term name is cam_id. Since we don't abbreviate other concepts (deployment, event, sequence, observation, location) I think it makes sense to use the full camera_ in camera related terms.

New term for observations: timestamp (previously event_start and event_end)

In GitLab by @peterdesmet on Aug 12, 2020, 03:16

See #5:

  • event_start: start of the sequence the observation is based on (i.e. timestamp of first media file)
  • event_end: end of the sequence the observation is based on (i.e. timestamp of last media file)

The names event_start/event_end are in line with terms start/end in deployment.

Note 1: one can argue that these terms can be derived from the media.csv, but it is very convenient to have them here, as one can start a scientific analyses using deployments (location information) and observations (time + species information) and consider (the very often much larger) media file as metadata.

Note 2: one can argue that event_end is not entirely necessary. It might even be incorrect in case of video files since the length of the video is not taken into account (see observation 8 Vulpes vulpes in #5). Despite that drawback, I would argue it is useful to have a event_end, as it allows to check for very long events.

Add distance to observations

In GitLab by @peterdesmet on Sep 8, 2020, 18:55

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

Some studies know or have calculated the distance of the observed animal from the camera. Since this can differ from species to species, it is not a property of the deployment, but of the observation. It might be useful to add this term to observations, expressed in meters or centimeters.

Develop milestones to have a roadmap for this initiative

In GitLab by @kbubnicki on Aug 27, 2020, 4:10

I believe we need a well defined milestones for this initiative especially that I expect that more and more people will join us soon. So the steps could be e.g.

  1. Develop JSON schemas in a relatively small group of people.
  2. Consult developed schemas with a broader audience of camera trapping researchers and practitioners (online questionnaires?)
  3. Develop R/Python tools to facilitate work with camtrap data packages (e.g. camtrapR)
  4. Write a paper.
  5. Integrate the solution with the existing camtrap data management software (Agouti, TRAPPER). Actually these developments are partly ongoing and can be done in parallel to the other points.

This should be described in more details and preferably some deadlines for each step should be also specified.

Add identification_granularity (event_based, media_based) at project level

In GitLab by @peterdesmet on Aug 31, 2020, 3:09

See https://gitlab.com/oscf/camtrap-package-schemas/-/issues/5#note_403971988:

... But maybe we can add this term i.e. event_based (classification) at the project-level metadata? I am thinking about camera trapping datasets catalogues/repositories? Use case: lets say I am looking only for datasets with single-file classifications to train my specific AI model?

And https://gitlab.com/oscf/camtrap-package-schemas/-/issues/5#note_404174593:

Yes, event_based_identification/event_based_classification (depending on #3) at project level seems relevant to me. As both are possible in a single project, we probably want more than a boolean and could e.g. have identification_granularity: media, event, [event, media].

Create Frictionless pattern to support conditional constraints

In GitLab by @peterdesmet on Sep 25, 2020, 14:25

Conditional constraints are not supported by Table Schema (see frictionlessdata/specs#169).

So to support:

scientific_name = required if observation_type = "animal"

We would have to extend the schema as suggested in https://stackoverflow.com/a/38781027/2463806, i.e. (if I interpret this correctly) adding the following at the end of observations-table-schema.json:

"anyOf": [
  {
    "properties": {
       "observation_type": { "const": "animal" }
    },
    "required": ["scientific_name"]
  }
]

Or (using the JSON Schema (draft-07)):

"if": {
  "properties": {
    "observation_type": { "const": "animal" }
  },
  "required": ["observation_type"]
},
"then": { "required": ["scientific_name"] }

But in my understanding, it wouldn't be picked up by any of the Table Schema validators, so it's probably not worth adding and just leave scientific_name not required.

Consider table "individual"

In GitLab by @peterdesmet on Sep 8, 2020, 20:14

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

individual_id in observations allows to indicate repeat viewings of individuals, but apart from the sex and the life_stage (at the time) no further information (like other tag numbers) can be provided. Information on recognizable/marked individuals that did were not observed by the camera traps cannot be shared either.

To share this information, we could consider adding an extra file individuals.csv, where further information regarding recognizable/marked individuals can be shared. Fields could be:

  • individual_id
  • scientific_name
  • life_stage: debatable since it can change over time
  • sex: not likely to change for camera trap observed species
  • other_identifiers: e.g. ring_number:HR456; movebank_id:894
  • comments

But I think this needs further community feedback before we can decide if and how this should be part of the standard.

Update term name "date_recorded" for media files?

The original issue

Id: 9
Title: Update term name "date_recorded" for media files?

could not be created.
This is a dummy issue, replacing the original one. It contains everything but the original issue description. In case the gitlab repository is still existing, visit the following link to show the original issue:

TODO

Add identifier and identification date to observations + rename uncertainty to confidence

In GitLab by @peterdesmet on Aug 3, 2020, 06:07

The observation schema currently does not have a field to indicate the name of the person(s) or algorithm that have made the species identification. There is a field in the example data package called classified_by, but that one indicates the method of classification (e.g. by Human)

I think both pieces (who + method) of information are relevant, as well the date of identification. I also think uncertainty should be renamed to confidence as that seems to be what the score implies (1 = very confident, rather than very uncertain).

observation_id identified_by identification_date identification_method identification_confidence
- - -  classified_by uncertainty
1 name_of_algorithm_v2 2018-09-11T14:33:19 Machine 0.45
1 Jim Casaer 2018-09-12T12:00:34 Human 1

I think we should also settle on a verb: species identification (used as such in Darwin Core), determination, or classification, and use that consistently in definitions.

New term: taxon_id (observations) + taxon_id_reference (datapackage.json)

In GitLab by @peterdesmet on Aug 12, 2020, 05:23

To allow better linking of data, it might be useful to share a taxon_id in addition to scientific_name, e.g. https://www.gbif.org/species/5219243 or 5219243 for Vulpes vulpes. In Agouti we store Catalogue of Life identifiers.

Note: this is a minor request. Sharing the scientific name is probably sufficient.

Add further camera settings to deployment

In GitLab by @peterdesmet on Sep 8, 2020, 18:51

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

Cameras can have additional settings that are not captured in the current deployment terms, such as:

  • detection_zone: Detection focus of camera trap (centered or full) full; centered
  • ir_mode: IR mode of camera trap far; near

This needs further community feedback before we can decide if and how this should be part of the standard.

Read camtrap packages with camtrapR

In GitLab by @peterdesmet on Aug 25, 2020, 9:25

This is more of an implementation issue, but I'm logging it here so we don't forget.

Users will likely want to use the camtrapR package to analyse their data. We should provide functionality/tutorials to read a camtrap package and map it to the camtrapR data model. Data package reading functionality can probably be included by using datapackage.r.

I'm not a user of camtrapR myself, so I'm not very familiar with it. Jakub, any thoughts how how to provide this functionality?

How to express different events/sequences in a long video?

In GitLab by @peterdesmet on Sep 2, 2020, 20:20

Ferdinando Urbano presented an interesting use case: how to express a long video being chunked into different events/sequences?

I think this can be done by repeating the media_id but associating different event_ids:

media_id event_id timestamp file
1 A 2020-09-02T10:00:00 video_1.mp4
1 B 2020-09-02T10:05:00 video_1.mp4
1 C 2020-09-02T10:10:00 video_1.mp4

I.e. the opposite of where event_id generally groups different media_ids. To know what sections of the video are chunked, the timestamps should be different, but I'm not entirely sure if that fits the definition of:

Date and time when the multimedia file was recorded

Other suggestions?

Second level observation_type

In GitLab by @peterdesmet on Sep 2, 2020, 3:43

As suggested in this comment, observation_type will only contain high-level AI understandable categories. But:

As I understand that it can be important to distinguish between project staff and other human observations we can consider adding new term e.g. is_staff or maybe even better a boolean field setup or sth similar to mark all records made during setup/picking-up a camera or just changing batteries/sd cards etc. With this solution you can still get a proper 1st step classification by AI model for this field.

We should define such a field, its name and the controlled values for it.

CTMS has the field photoType with:

Start
End
Set Up
~Blank~
~Animal~
Staff
Unknown
Unidentifiable
Timelapse

In Agouti we have information regarding:

~Blank~
Setup/Pickup
Unknown
Timelapse

Fields with ~ can be expressed in observation_type

New term for media: exif_data

In GitLab by @peterdesmet on Aug 12, 2020, 04:06

Agouti stores the exif data per media file:

"{"EXIF": {"ISO": 200, "Make": "RECONYX", "Flash": "Auto, Fired", "Model": "HC600 HYPERFIRE", "ColorSpace": "sRGB", "CreateDate": "2017:07:16 21:27:42", "ModifyDate": "2017:07:16 21:27:42", "ExifVersion": "0220", "XResolution": 72, "YResolution": 72, "ExposureMode": "Auto", "ExposureTime": "1/30", "WhiteBalance": "Manual", "ExifImageWidth": 1920, "ResolutionUnit": "inches", "ExifImageHeight": 1080, "FlashpixVersion": "0100", "DateTimeOriginal": "2017:07:16 21:27:42", "SceneCaptureType": "Standard", "YCbCrPositioning": "Co-sited", "ComponentsConfiguration": "Y, Cb, Cr, -"}, "File": {"FileName": "IMG_1398.JPG", "FileSize": "258 kB", "FileType": "JPEG", "MIMEType": "image/jpeg", "Directory": "/opt/app/uploads/deployment-images/20161222132759-untitled/014e28a6-408e-4309-8bfc-5006ea8f0803", "ImageWidth": 1920, "ImageHeight": 1080, "BitsPerSample": 8, "ExifByteOrder": "Little-endian (Intel, II)", "FileAccessDate": "2017:07:16 21:27:42+00:00", "FileModifyDate": "2017:07:16 21:27:42+00:00", "ColorComponents": 3, "EncodingProcess": "Baseline DCT, Huffman coding", "FilePermissions": "rwxrwxrwx", "YCbCrSubSampling": "YCbCr4:2:2 (2 1)", "FileTypeExtension": "jpg", "FileInodeChangeDate": "2017:08:09 12:11:59+00:00"}, "ExifTool": {"ExifToolVersion": 10.16}, "Composite": {"ImageSize": "1920x1080", "Megapixels": 2.1, "ShutterSpeed": "1/30"}, "MakerNotes": {"Contrast": 160, "Sequence": "6 of 10", "MoonPhase": "Last Quarter", "Sharpness": 32, "UserLabel": "NPHK33", "Brightness": 0, "Saturation": 0, "EventNumber": 138, "TriggerMode": "Motion Detection", "FirmwareDate": "2016:12:29", "SerialNumber": "H600HJ12269102", "BatteryVoltage": "8.92 V", "FirmwareVersion": "4.2.0", "DateTimeOriginal": "2017:07:16 21:27:42", "MakerNoteVersion": "0xf101", "MotionSensitivity": 100, "AmbientTemperature": "22 C", "InfraredIlluminator": "On", "AmbientTemperatureFahrenheit": "72 F"}, "SourceFile": "/opt/app/uploads/deployment-images/20161222132759-untitled/014e28a6-408e-4309-8bfc-5006ea8f0803/IMG_1398.JPG"}"

Even though exif data is available within the actual media files, it might be useful to share this as exif_data in media.csv as well, as it contains a wealth of information, e.g. temperature.

There are some drawbacks too:

  • exif data might contain sensitive information that would not be shared if the images are not public
  • exif data substantially increases the file size of media.csv (e.g. 324MB unzipped/43MB zipped vs 3.09GB unzipped/98.4MB zipped for 1,223,400 media files)

But as an none required term, it might be useful to have the option to share this information?

Merge locations table into deployments

In GitLab by @peterdesmet on Aug 3, 2020, 17:40

In the example data package, the latitude and longitude in deployments.csv are the same as in locations.csv. I assume they are just copied from the related location and do not function as a more precise camera position within a location. It is a bit confusing to have both.

I also notice that there are 80 locations without associated deployments: what is the use of these?


I suggest to simplify the schema by merging the deployments and locations table into one table called deployments.csv.

deployment_id location_id latitude  longitude ...  habitat comments
S10-A-351-767  A-351-767  23.80923 52.67643  ...  Mixed temperate low-land forest snowing, clearing in forest
S12-A-351-767  A-351-767  23.80923 52.67643  ...  Mixed temperate low-land forest raining, clearing in forest

Suggestions for fields from locations.csv in the merged deployments table:

  1. location_id: if deployments and locations are merged, then this is no longer a required field, as it is no longer a foreign key. Could remain as an identifier for a location (i.e. a lat/long/habitat group).
  2. deployment_code: this field has an oddly strict definition ("Code which in combination with location_id gives a unique identifier of deployment within a project."). I would remove or loosen the definition.
  3. location_name: field that could be added: a more human readable label for the location identified by location_id
  4. country: not sure this field is necessary, giving that it can easily be derived from latitude/longitude. Would be useful at a project level though (in spatial_coverage), ideally as ISO_3166 country codes.
  5. timezone: not sure this field is necessary, giving that it can easily be derived from latitude/longitude. Users could also wrongly assume it is the timezone used in start and end (which are UTC).
  6. habitat: keep, would be repeated for every deployment
  7. comments: both locations and deployments have comments, so either 1) we have 2 fields, 2) comments for deployments (snowing, raining) and locations (clearing in forest) are merged when exporting or 3) we do not support comments for locations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.