tdwg / camtrap-dp Goto Github PK

View Code? Open in Web Editor NEW

40.0 15.0 5.0 4.92 MB

Camera Trap Data Package (Camtrap DP)

Home Page: https://camtrap-dp.tdwg.org

License: MIT License

Python 39.27% Ruby 0.35% HTML 48.47% TeX 11.91%

oscibio camera-trap-data frictionlessdata

camtrap-dp's Introduction

Camtrap DP

Camera Trap Data Package (or Camtrap DP for short) is a community developed data exchange format for camera trap data.

Usage

See the documentation website.

Contribute

Questions? Suggestions? Contribute to the development of Camtrap DP by watching the repository and participating in issue discussions.

camtrap-dp's People

Contributors

Stargazers

Watchers

Forkers

danstowell bbeyer lindangulopez morenoluisg biogeek

camtrap-dp's Issues

Revert event_ back to sequence_ and update definitions

In GitLab by @peterdesmet on Sep 8, 2020, 19:22

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

There was some confusion between species occurrence events and events as defined in this standard, which are based on sequences of associated multimedia files. It was suggested to rename all event_ based terms to sequence_ terms:

package `event_interval` → `sequence_interval`

See also old old discussion on using sequence_cutoff.

Current definition:

Maximum interval between sub-sequent observations which are considered as one independent observation i.e. an event. Specified in seconds.

New definition:

Maximum interval between consecutive images to be considered as one sequence. Specified in seconds.

media `event_id` → `sequence_id`

Current definition:

A unique identifier of an event within a project. Events are what project researchers or software have defined as independent occasions, containing one or more multimedia files (e.g. a single video or a sequence of consecutive images).

New definition:

A unique identifier of a sequence within a project. Sequences are defined by the sequence_interval in the package metadata and contain one or more multimedia files (e.g. a single video or a sequence of consecutive images). They are the source for sequence-based observations.

observation `event_id` → `sequence_id`

Current definition:

The unique identifier of the event (collection of multimedia files) that is the source of this observation. Foreign key to media:event_id.

New definition:

The unique identifier of the sequence (collection of multimedia files) that is the source of this observation. Foreign key to media:sequence_id.

Note: also update foreign key

New term for observations: has_human

In GitLab by @peterdesmet on Aug 12, 2020, 05:14

Media with (recognizable) humans are subject to privacy concerns and can potentially not be shared publicly. It is therefore useful to indicate if an observation (based on sequence or single photo) contains a human. The field observation_type allows to indicate "Human", but it is not mutually exclusive with the values Animal and Vehicle and definitely not with the suggested extended vocabulary for observation_type (see #12), e.g. a human often occurs during setup/pickup.

I think it would be more useful to consider has_human as a separate boolean term, with TRUE/FALSE to indicate if an observation (and thus related media) contains a human and is therefore subject to privacy issues (i.e. the media files are potentially not shared in the data package).

New term for media: order_in_sequence

In GitLab by @peterdesmet on Aug 12, 2020, 03:34

It should be possible to order media files in a sequence as they were taken, i.e. chronologically. Often this cannot be done based on the date_recorded (see #9) alone, as some media files have the exact same timestamp, especially if it does not include milliseconds. It would therefore be useful to have a term order_in_sequence with an integer index indicating the intended order of the media files within a sequence:

sequence_id	date_recorded	order_in_sequence	file_name
7bff18d3-a088-4695-850f-f20913730ee6	2018-01-20T09:15:40Z	0	20180227091227-IMG_0001.JPG
7bff18d3-a088-4695-850f-f20913730ee6	2018-01-20T09:15:43Z	1	20180227091227-IMG_0002.JPG
7bff18d3-a088-4695-850f-f20913730ee6	2018-01-20T09:15:43Z	2	20180227091227-IMG_0003.JPG
7bff18d3-a088-4695-850f-f20913730ee6	2018-01-20T09:15:44Z	3	20180227091227-IMG_0004.JPG
7bff18d3-a088-4695-850f-f20913730ee6	2018-01-20T09:15:45Z	4	20180227091227-IMG_0005.JPG
7bff18d3-a088-4695-850f-f20913730ee6	2018-01-20T09:15:45Z	5	20180227091227-IMG_0006.JPG
7bff18d3-a088-4695-850f-f20913730ee6	2018-01-20T09:15:46Z	6	20180227091227-IMG_0007.JPG

Use dc:temporal rather than temporal_coverage

In GitLab by @peterdesmet on Aug 20, 2020, 08:45

As mentioned in the data package specs, Dublin Core already has terms to define temporal coverage, see:

"temporal": {
  "name": "19th Century",
  "start": "1800-01-01",
  "end": "1899-12-31"
}

I suggest to reuse these rather than define our own temporal_coverage.

Use of file_name and file_location

In GitLab by @peterdesmet on Aug 18, 2020, 02:12

file_name is the name of the media file, e.g. IMG0001.jpg
file_location is the full URL or path of the media file, including the name of the file, e.g. gs://wildlife_insights/Project/Images/CT-011/IMG0001.jpg

Note: the example https://trapper.org/storage/resource/media/259024/file/ for file_location is a bit confusing here, since it does not end with jpg.

Given that both fields link to a file, wouldn't it be more convenient to have:

file_path: Local path to the file in the data package
file_url: Remote URL to the file served online

A data package can include one, the other or both.

format = default or any for datetimes?

In GitLab by @peterdesmet on Aug 31, 2020, 5:03

Deployment start/end and media timestamp currently mention in the definition that the format should be:

as an ISO 8601 formatted string in UTC (e.g. YYYY-MM-DDThh:mm:ssZ).

but the format is any:

Any parsable representation of the type (https://specs.frictionlessdata.io/table-schema/#date).

Should we restrict this to format: default which is that exactly YYYY-MM-DDThh:mm:ssZ?

I think that format is also valid for milliseconds (could apply to timestamp)?

Separate observations in observations.csv and media.csv

In GitLab by @peterdesmet on Aug 7, 2020, 11:12

Currently observations.csv contains observations and files. This has 2 drawbacks:

If multiple observations were made for a single (e.g. movie) file, the file information needs to be repeated for each observation.
Sequence-based observations (e.g. 1 male, 2 female, 5 juvenile wild boars = 3 obs based on a burst of 20 subsequent images) need to be repeated for every image (resulting in 60 observations), while not every image might even have the observed animal. The alternative is concatenating the 20 image files per observation row (e.g. delimited list in a single field), but that is messy too.

Proposal

I think observations in observations.csv should be as atomic as possible, i.e. the actual observed groups of same sex, life stage and potentially behaviour. These observations are linked to the evidence they are based on: a sequence (to keep with the term generally used for this), which can either be a single image file (for very detailed observations), a single movie file, or a burst of subsequent images that are considered as a whole. These can be organized in media.csv where each row is a media file.

Below an example:

Sequence A contains 4 images and is identified as a whole: 3 observations (1 male, 2 female, 5 juvenile wild boars)
Sequence B contains 4 images and is identified as a whole: it is considered blank
image_21, image_22, image_23 are identified separately (not sequence based), so they get assigned their own sequence_id (C, D, E)
Sequence F contains 1 movie file and is identified as a whole: containing 1 fox

observations.csv

observation_id	deployment_id	sequence_id	sequence_start	sequence_end	species_latin	lifestage	sex	count	obs_type
1	X	A	2020-03-15 00:34:00	2020-03-15 00:34:15	Sus scrofa	adult	male	1
2	X	A	2020-03-15 00:34:00	2020-03-15 00:34:15	Sus scrofa	adult	female	2
3	X	A	2020-03-15 00:34:00	2020-03-15 00:34:15	Sus scrofa	juvenile	unknown	5
4	X	B	2020-03-15 01:20:00	2020-03-15 01:20:15					blank
5	X	C	2020-03-15 06:15:00	2020-03-15 06:15:00	Capreolus capreolus	adult	female	1
6	X	D	2020-03-15 06:15:05	2020-03-15 06:15:05					blank
7	X	E	2020-03-15 06:15:10	2020-03-15 06:15:10	Capreolus capreolus	adult	female	1	blank
8	X	F	2020-03-15 07:33:12	2020-03-15 07:33:12	Vulpes vulpes	adult	unknown	1

media.csv

media_id	deployment_id	sequence_id	time_stamp	file_name	file_mimetype
1	X	A	2020-03-15 00:34:00	image_1	jpeg
2	X	A	2020-03-15 00:34:05	image_2	jpeg
3	X	A	2020-03-15 00:34:10	image_3	jpeg
4	X	A	2020-03-15 00:34:15	image_4	jpeg
5	X	A	2020-03-15 01:20:00	image_11	jpeg
6	X	A	2020-03-15 01:20:05	image_12	jpeg
7	X	A	2020-03-15 01:20:10	image_13	jpeg
8	X	A	2020-03-15 01:20:15	image_14	jpeg
9	X	C	2020-03-15 06:15:00	image_21	jpeg
10	X	D	2020-03-15 06:15:05	image_22	jpeg
11	X	E	2020-03-15 06:15:10	image_23	jpeg
12	X	F	2020-03-15 07:33:12	movie_1	mp4

Organizing the data this way has the advantages:

media.csv only contains the information per file once. It has the same number of rows as there are files.
observations.csv contains the (atomic) observations as made by the identifier, either based on sequences or individual files.
By including the sequence_start timestamp in observations.csv, that file contains all the necessary information (together with deployments.csv) to do most research analyses. media.csv can be considered metadata/evidence the observations are based on.

How to describe target species?

In GitLab by @peterdesmet on Sep 8, 2020, 14:04

This issue pivoted to a discussion on how to indicate target species

The current definition of taxonomic coverage is:

Taxonomic coverage for this data package. It is based on a set of unique values from scientific_name field in observations.csv table.

Could this be broadened to allow the inclusion of species that were available for determination (e.g. in a project defined species list) or were considered target species, but did not result in observations? Or should we consider another way to allow the inclusion of target species?

Add sequence_based property to package metadata

In GitLab by @peterdesmet on Sep 8, 2020, 19:28

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

To better understand the usability of datasets (e.g. for AI learning) it would be good to know if the dataset uses sequence or media based classifications. Proposal is to add a sequence_based term to project information. If both is possible for a single project, than we cannot use a boolean, but should maybe have:

classification_source: media, sequence, both

New term for deployments: camera_detection_distance

In GitLab by @peterdesmet on Aug 10, 2020, 16:30

In Agouti, deployments have a field detectionDistance. It is not often populated, but maybe it could be added to the deployments schema as camera_detection_distance, indicating up to which max distance the camera will trigger.

New term for observations: is_domesticated

In GitLab by @peterdesmet on Aug 12, 2020, 05:33

Since we are using scientific names at species level from Catalogue of Life in Agouti, both dog and wolf get scientific name Canis lupus. We therefore have an extra field is_domesticated to indicate the difference (which is important for analysis). The same applies to Ovis aries. Using subspecies names could solve the problem, but the term is_domesticated is more broadly applicable to indicate if the photographed animal was wild or domesticated.

Could this term (TRUE/FALSE) be added to observations?

Deployment tags

In GitLab by @peterdesmet on Aug 17, 2020, 03:31

Deployments can be organized in:

Sessions, which I interpret as temporal groupings:

Camera trap deployments can be grouped into sessions (sampling 'events'). Common sessions could be seasons (wet and dry), months, years or other types of logical groupings when field sampling occurs.
Arrays, which I interpret as spatial groupings:

A name for a logical grouping of deployments into (spatial) arrays. This could be for thematic or logistical reasons.
Feature types, which I interpret as location property associated with the deployment:

Type of feature (if any) that camera deployment is associated with. If other, more info can be provided in the comments field.
Habitat, which I interpret as a location property associated with the deployment:

Short characterization of the habitat.

However, in Agouti deployments can be organized with tags which do not necessarily assume a temporal, spatial, feature type or habitat grouping. The context is up to the project manager. For example

Centre, Edge, Forest Island, Gallery Forest
2010, 2011, 2012, 2013, 2014, 2018, ecoduct, faunatunnel,
C13 (plot 267 carcass) part 1, C13 (plot 267 carcass) part 2, C14 (plot 102 control), C15 (plot 8 control) part 1, ...

Some of these could be mapped to the terms above, but to allow automated export, can we add a context-less grouping tags to deployments?

Add cam_direction to deployment

In GitLab by @peterdesmet on Sep 8, 2020, 18:41

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

In addition to cam_height and cam_angle, it would be useful to know the camera trap orientation/direction (wind direction).

The EUROMAMMAL table suggests north; west; south; east, but I would suggest a (decimal) degrees from north (e.g. south = 180). The same is used for heading in Movebank.

My proposal:

{
  "name": "orientation",
  "type": "number", # Or integer
  "format": "default",
  "description": "The direction of the camera in decimal degrees clockwise from north (0 = north, 90 = east, 180 = south, 270 = west).", # Or degrees
  "example": "311.2", # Or integer
  "constraints": {
    "required": false,
    "minimum": 0,
    "maximum": 360
   }
}

Differences between observations.csv and schema for observations.csv

In the example data package there are some differences in field names between the example and schema. One of these should be updated.

in example	in schema
`classified_by`	not defined?
not included	`observation_type` (Human, Vehicle, Animal)
`species_count`	`count`
`species_count_new`	`count_new`
`beh_vigilance`	not defined

deployment_id should be defined as foreign key in resource schema

In GitLab by @peterdesmet on Aug 25, 2020, 2:56

Table schema allows to identify foreignKeys: https://specs.frictionlessdata.io/table-schema/#foreign-keys In the proposed schema, deployment_id is a foreign key in both media.csv and observations.csv. It should be defined as such.

Use organization title, project title and _platform_title, rather than name

In GitLab by @peterdesmet on Aug 20, 2020, 04:44

In the generic data package specs the term title is used for name. E.g. the package, licenses, and even contributors have the term title.

For consistency, I would therefore update the camtrap-schema terms name to title as well. This applies to:

organization.title: https://gitlab.com/oscf/camtrap-package-schemas/-/blob/master/camtrap-package-profile.json#L13-17
project.title: https://gitlab.com/oscf/camtrap-package-schemas/-/blob/master/camtrap-package-profile.json#L114-118
_platform_title: https://gitlab.com/oscf/camtrap-package-schemas/-/blob/master/camtrap-package-profile.json#L114-118

How to map 'Road underpass/overpass/bridge'

In GitLab by @peterdesmet on Aug 21, 2020, 3:00

CTMS has the feature Road underpass/overpass/bridge, which in camtrap is split in feature_type Road underpass, Road overpass, Road bridge.

How can one now map the generic term? E.g. Road tunnel/bridge
Is it necessary to have the specific terms?

Expand vocabulary for observation_type

In GitLab by @peterdesmet on Aug 12, 2020, 05:08

The current suggested vocabulary for observation_type has:

Human
Animal
Vehicle

CTMS has the field photoType with:

Start
End
Set Up
Blank
Animal
Staff
Unknown
Unidentifiable
Timelapse

In Agouti we have information regarding:

(Blank -> mapped to is_empty)
Setup/Pickup
Unknown
Timelapse

Can the vocabulary for observation_type be expanded to include those values?

Use lowercase values for controlled lists

In GitLab by @peterdesmet on Aug 29, 2020, 2:14

Can we use lowercase values for all camtrap controlled lists/enum, such as in:

sampling_design: simple random, systematic random, ...
bait_use: none, scent, food, ...
sex: female, male, undefined
...

I don't really have strong arguments except:

It is what I mostly encounter in other biodiversity information vocabularies, such as GBIF vocabs often used by Darwin Core and Movebank vocabularies (search for "controlled list", e.g. for attachment type)
It's a bit easier to type
We use lowercase values for term names as well
It looks more like a controlled value (to me)

Add "Other" to bait_use

In GitLab by @peterdesmet on Aug 20, 2020, 08:32

bait_use currently allows 5 values that map well with CTMS, but Other is missing

camtrap	ctms	remarks
None	None
Scent	Scent
Food	Meat	Food better as it is more generic
Visual	Visual
Acoustic	Acoustic
-	Other

Indicate primary key

In GitLab by @peterdesmet on Aug 31, 2020, 4:33

I think it might be good to indicate the primary key in the schemas: https://specs.frictionlessdata.io/table-schema/#primary-key

Add cam_firmware to deployment

In GitLab by @peterdesmet on Sep 8, 2020, 18:44

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

For some projects it might be useful to know the firmware of the camera that was used. This needs further community feedback before we can decide if and how this should be part of the standard.

Update definition of "count"

In GitLab by @peterdesmet on Sep 4, 2020, 15:10

Current definition mentions removed field sequence_based:

Number of individuals identified (optionally of given age, sex and behaviour; see below). Interpretation depends on the value of sequence_based field.

Add type_of_flash to deployment

In GitLab by @peterdesmet on Sep 8, 2020, 18:48

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

Cameras use different types of flashes to take pictures in the dark, which might have an effect on the animal (i.e. it won't return to that place). Values could be:

white flash
infrared flash
black flash

This needs further community feedback before we can decide if and how this should be part of the standard.

Clarify definition of event_interval, use event_ throughout

In GitLab by @peterdesmet on Aug 7, 2020, 10:05

The package property event_interval is defined as:

Maximum interval between sub-sequent observations which are considered as one independent observation i.e. an event. Specified in seconds.

I find it difficult to wrap my head around this definition. Is it the maximum time between two images that are considered to be part of the same sequence? So if set to 30 seconds, then images A (20:30:00), B (20:30:15), C (20:32:00) would be considered as 2 sequences: A+B and C? If so, then that is what we call in Agouti sequence_cutoff.
Can we change the name to something like sequence_cutoff? I find that clearer to understand.

Use dc:spatial rather than spatial_coverage

In GitLab by @peterdesmet on Aug 21, 2020, 11:31

Dublin Core already has terms to define temporal coverage, see https://www.dublincore.org/specifications/dublin-core/usageguide/qualifiers/#spatial. It allows:

DCMI box encoding scheme: https://www.dublincore.org/specifications/dublin-core/dcmi-box/
DCMI point encoding scheme: https://www.dublincore.org/specifications/dublin-core/dcmi-point/

I guess we can also still use http://json.schemastore.org/geojson.json

"spatial": {
  "name": "...",
}

I suggest to reuse these rather than define our own spatial_coverage.

Make species info conditionally required on observation_type=Animal

In GitLab by @peterdesmet on Aug 21, 2020, 2:31

taxonomic_coverage has 3 properties: species_latin, species_common and count (see also #14 to update these names). The first 2 are required, but I think it cannot always be guaranteed that a species_common/vernacular_name is available. I would make that term optional.

Issue scope is now this: https://gitlab.com/oscf/camtrap-package-schemas/-/issues/25#note_408998361

Update cam_ to camera_

In GitLab by @peterdesmet on Sep 8, 2020, 18:53

If noticed myself Cmd+F for camera_id, while the term name is cam_id. Since we don't abbreviate other concepts (deployment, event, sequence, observation, location) I think it makes sense to use the full camera_ in camera related terms.

New term for observations: timestamp (previously event_start and event_end)

In GitLab by @peterdesmet on Aug 12, 2020, 03:16

See #5:

event_start: start of the sequence the observation is based on (i.e. timestamp of first media file)
event_end: end of the sequence the observation is based on (i.e. timestamp of last media file)

The names event_start/event_end are in line with terms start/end in deployment.

Note 1: one can argue that these terms can be derived from the media.csv, but it is very convenient to have them here, as one can start a scientific analyses using deployments (location information) and observations (time + species information) and consider (the very often much larger) media file as metadata.

Note 2: one can argue that event_end is not entirely necessary. It might even be incorrect in case of video files since the length of the video is not taken into account (see observation 8 Vulpes vulpes in #5). Despite that drawback, I would argue it is useful to have a event_end, as it allows to check for very long events.

Add distance to observations

In GitLab by @peterdesmet on Sep 8, 2020, 18:55

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

Some studies know or have calculated the distance of the observed animal from the camera. Since this can differ from species to species, it is not a property of the deployment, but of the observation. It might be useful to add this term to observations, expressed in meters or centimeters.

Rename mimetype to mediatype

In GitLab by @peterdesmet on Aug 21, 2020, 3:54

file_minetype could be updated to file_mediatype. Mine type is an old term: https://en.wikipedia.org/wiki/Media_type

Note: data packages also use the term mediatype: https://specs.frictionlessdata.io/data-resource/#optional-properties

What delimiter to use for delimited fields (comments, tags)

In GitLab by @peterdesmet on Aug 31, 2020, 2:29

I noticed in the now deprecated location$comments:

"pattern": "[^,;]"

What does that indicate?

Develop milestones to have a roadmap for this initiative

In GitLab by @kbubnicki on Aug 27, 2020, 4:10

I believe we need a well defined milestones for this initiative especially that I expect that more and more people will join us soon. So the steps could be e.g.

Develop JSON schemas in a relatively small group of people.
Consult developed schemas with a broader audience of camera trapping researchers and practitioners (online questionnaires?)
Develop R/Python tools to facilitate work with camtrap data packages (e.g. camtrapR)
Write a paper.
Integrate the solution with the existing camtrap data management software (Agouti, TRAPPER). Actually these developments are partly ongoing and can be done in parallel to the other points.

This should be described in more details and preferably some deadlines for each step should be also specified.

Add identification_granularity (event_based, media_based) at project level

In GitLab by @peterdesmet on Aug 31, 2020, 3:09

See https://gitlab.com/oscf/camtrap-package-schemas/-/issues/5#note_403971988:

... But maybe we can add this term i.e. event_based (classification) at the project-level metadata? I am thinking about camera trapping datasets catalogues/repositories? Use case: lets say I am looking only for datasets with single-file classifications to train my specific AI model?

And https://gitlab.com/oscf/camtrap-package-schemas/-/issues/5#note_404174593:

Yes, event_based_identification/event_based_classification (depending on #3) at project level seems relevant to me. As both are possible in a single project, we probably want more than a boolean and could e.g. have identification_granularity: media, event, [event, media].

Create Frictionless pattern to support conditional constraints

In GitLab by @peterdesmet on Sep 25, 2020, 14:25

Conditional constraints are not supported by Table Schema (see frictionlessdata/specs#169).

So to support:

scientific_name = required if observation_type = "animal"

We would have to extend the schema as suggested in https://stackoverflow.com/a/38781027/2463806, i.e. (if I interpret this correctly) adding the following at the end of observations-table-schema.json:

"anyOf": [
  {
    "properties": {
       "observation_type": { "const": "animal" }
    },
    "required": ["scientific_name"]
  }
]

Or (using the JSON Schema (draft-07)):

"if": {
  "properties": {
    "observation_type": { "const": "animal" }
  },
  "required": ["observation_type"]
},
"then": { "required": ["scientific_name"] }

But in my understanding, it wouldn't be picked up by any of the Table Schema validators, so it's probably not worth adding and just leave scientific_name not required.

Consider table "individual"

In GitLab by @peterdesmet on Sep 8, 2020, 20:14

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

individual_id in observations allows to indicate repeat viewings of individuals, but apart from the sex and the life_stage (at the time) no further information (like other tag numbers) can be provided. Information on recognizable/marked individuals that did were not observed by the camera traps cannot be shared either.

To share this information, we could consider adding an extra file individuals.csv, where further information regarding recognizable/marked individuals can be shared. Fields could be:

individual_id
scientific_name
life_stage: debatable since it can change over time
sex: not likely to change for camera trap observed species
other_identifiers: e.g. ring_number:HR456; movebank_id:894
comments

But I think this needs further community feedback before we can decide if and how this should be part of the standard.

Update term name "date_recorded" for media files?

The original issue

Id: 9
Title: Update term name "date_recorded" for media files?

could not be created.
This is a dummy issue, replacing the original one. It contains everything but the original issue description. In case the gitlab repository is still existing, visit the following link to show the original issue:

TODO

Should platform metadata go in sources?

In GitLab by @peterdesmet on Aug 21, 2020, 3:37

How should one use internal properties such as _id or _platform_name, etc.? Just include them as such (column name or property in package.json)?

Add identifier and identification date to observations + rename uncertainty to confidence

In GitLab by @peterdesmet on Aug 3, 2020, 06:07

The observation schema currently does not have a field to indicate the name of the person(s) or algorithm that have made the species identification. There is a field in the example data package called classified_by, but that one indicates the method of classification (e.g. by Human)

I think both pieces (who + method) of information are relevant, as well the date of identification. I also think uncertainty should be renamed to confidence as that seems to be what the score implies (1 = very confident, rather than very uncertain).

observation_id	identified_by	identification_date	identification_method	identification_confidence
-	-	-	classified_by	uncertainty
1	name_of_algorithm_v2	2018-09-11T14:33:19	Machine	0.45
1	Jim Casaer	2018-09-12T12:00:34	Human	1

I think we should also settle on a verb: species identification (used as such in Darwin Core), determination, or classification, and use that consistently in definitions.

Rename species_latin / species_common to scientific_name, vernacular_name

In GitLab by @peterdesmet on Aug 12, 2020, 05:19

Since observations might be at genus or subspecies level, I think it would be better to rename species_latin and species_common to scientific_name and vernacular_name. Those terms are more widely used too, e.g. in the Darwin Core Standard (scientificName and vernacularName).

New term for deployments: setup_by / pickup_by

In Agouti we can indicate the name of the person who deployed and picked up the camera. Maybe this can be added to the deployment schema as setup_by and pickup_by?

New term: taxon_id (observations) + taxon_id_reference (datapackage.json)

In GitLab by @peterdesmet on Aug 12, 2020, 05:23

To allow better linking of data, it might be useful to share a taxon_id in addition to scientific_name, e.g. https://www.gbif.org/species/5219243 or 5219243 for Vulpes vulpes. In Agouti we store Catalogue of Life identifiers.

Note: this is a minor request. Sharing the scientific name is probably sufficient.

Make project acronym optional

In GitLab by @peterdesmet on Aug 20, 2020, 04:48

Would it be possible to make project.acronym optional, rather than required? Not all projects will have an acronym, and we therefore don't have it in Agouti.

Why allow "Both" for animal_types and sensor_method

In GitLab by @peterdesmet on Aug 20, 2020, 05:19

animal_types is an array, so it can contain:

Marked
Unmarked
Marked, Unmarked

Why is the property Both necessary if one can set it to Marked, Unmarked?

Add further camera settings to deployment

In GitLab by @peterdesmet on Sep 8, 2020, 18:51

Note: this was suggested in the 2020-09-08 EUROMAMMAL technical call

Cameras can have additional settings that are not captured in the current deployment terms, such as:

detection_zone: Detection focus of camera trap (centered or full) full; centered
ir_mode: IR mode of camera trap far; near

This needs further community feedback before we can decide if and how this should be part of the standard.

Read camtrap packages with camtrapR

In GitLab by @peterdesmet on Aug 25, 2020, 9:25

This is more of an implementation issue, but I'm logging it here so we don't forget.

Users will likely want to use the camtrapR package to analyse their data. We should provide functionality/tutorials to read a camtrap package and map it to the camtrapR data model. Data package reading functionality can probably be included by using datapackage.r.

I'm not a user of camtrapR myself, so I'm not very familiar with it. Jakub, any thoughts how how to provide this functionality?

How to express different events/sequences in a long video?

In GitLab by @peterdesmet on Sep 2, 2020, 20:20

Ferdinando Urbano presented an interesting use case: how to express a long video being chunked into different events/sequences?

I think this can be done by repeating the media_id but associating different event_ids:

media_id	event_id	timestamp	file
1	A	2020-09-02T10:00:00	video_1.mp4
1	B	2020-09-02T10:05:00	video_1.mp4
1	C	2020-09-02T10:10:00	video_1.mp4

I.e. the opposite of where event_id generally groups different media_ids. To know what sections of the video are chunked, the timestamps should be different, but I'm not entirely sure if that fits the definition of:

Date and time when the multimedia file was recorded

Other suggestions?

Second level observation_type

In GitLab by @peterdesmet on Sep 2, 2020, 3:43

As suggested in this comment, observation_type will only contain high-level AI understandable categories. But:

As I understand that it can be important to distinguish between project staff and other human observations we can consider adding new term e.g. is_staff or maybe even better a boolean field setup or sth similar to mark all records made during setup/picking-up a camera or just changing batteries/sd cards etc. With this solution you can still get a proper 1st step classification by AI model for this field.

We should define such a field, its name and the controlled values for it.

CTMS has the field photoType with:

Start
End
Set Up
~Blank~
~Animal~
Staff
Unknown
Unidentifiable
Timelapse

In Agouti we have information regarding:

~Blank~
Setup/Pickup
Unknown
Timelapse

Fields with ~ can be expressed in observation_type

New term for media: exif_data

In GitLab by @peterdesmet on Aug 12, 2020, 04:06

Agouti stores the exif data per media file:

"{"EXIF": {"ISO": 200, "Make": "RECONYX", "Flash": "Auto, Fired", "Model": "HC600 HYPERFIRE", "ColorSpace": "sRGB", "CreateDate": "2017:07:16 21:27:42", "ModifyDate": "2017:07:16 21:27:42", "ExifVersion": "0220", "XResolution": 72, "YResolution": 72, "ExposureMode": "Auto", "ExposureTime": "1/30", "WhiteBalance": "Manual", "ExifImageWidth": 1920, "ResolutionUnit": "inches", "ExifImageHeight": 1080, "FlashpixVersion": "0100", "DateTimeOriginal": "2017:07:16 21:27:42", "SceneCaptureType": "Standard", "YCbCrPositioning": "Co-sited", "ComponentsConfiguration": "Y, Cb, Cr, -"}, "File": {"FileName": "IMG_1398.JPG", "FileSize": "258 kB", "FileType": "JPEG", "MIMEType": "image/jpeg", "Directory": "/opt/app/uploads/deployment-images/20161222132759-untitled/014e28a6-408e-4309-8bfc-5006ea8f0803", "ImageWidth": 1920, "ImageHeight": 1080, "BitsPerSample": 8, "ExifByteOrder": "Little-endian (Intel, II)", "FileAccessDate": "2017:07:16 21:27:42+00:00", "FileModifyDate": "2017:07:16 21:27:42+00:00", "ColorComponents": 3, "EncodingProcess": "Baseline DCT, Huffman coding", "FilePermissions": "rwxrwxrwx", "YCbCrSubSampling": "YCbCr4:2:2 (2 1)", "FileTypeExtension": "jpg", "FileInodeChangeDate": "2017:08:09 12:11:59+00:00"}, "ExifTool": {"ExifToolVersion": 10.16}, "Composite": {"ImageSize": "1920x1080", "Megapixels": 2.1, "ShutterSpeed": "1/30"}, "MakerNotes": {"Contrast": 160, "Sequence": "6 of 10", "MoonPhase": "Last Quarter", "Sharpness": 32, "UserLabel": "NPHK33", "Brightness": 0, "Saturation": 0, "EventNumber": 138, "TriggerMode": "Motion Detection", "FirmwareDate": "2016:12:29", "SerialNumber": "H600HJ12269102", "BatteryVoltage": "8.92 V", "FirmwareVersion": "4.2.0", "DateTimeOriginal": "2017:07:16 21:27:42", "MakerNoteVersion": "0xf101", "MotionSensitivity": 100, "AmbientTemperature": "22 C", "InfraredIlluminator": "On", "AmbientTemperatureFahrenheit": "72 F"}, "SourceFile": "/opt/app/uploads/deployment-images/20161222132759-untitled/014e28a6-408e-4309-8bfc-5006ea8f0803/IMG_1398.JPG"}"

Even though exif data is available within the actual media files, it might be useful to share this as exif_data in media.csv as well, as it contains a wealth of information, e.g. temperature.

There are some drawbacks too:

exif data might contain sensitive information that would not be shared if the images are not public
exif data substantially increases the file size of media.csv (e.g. 324MB unzipped/43MB zipped vs 3.09GB unzipped/98.4MB zipped for 1,223,400 media files)

But as an none required term, it might be useful to have the option to share this information?

Merge locations table into deployments

In GitLab by @peterdesmet on Aug 3, 2020, 17:40

In the example data package, the latitude and longitude in deployments.csv are the same as in locations.csv. I assume they are just copied from the related location and do not function as a more precise camera position within a location. It is a bit confusing to have both.

I also notice that there are 80 locations without associated deployments: what is the use of these?

I suggest to simplify the schema by merging the deployments and locations table into one table called deployments.csv.

deployment_id	location_id	latitude	longitude	...	habitat	comments
S10-A-351-767	A-351-767	23.80923	52.67643	...	Mixed temperate low-land forest	snowing, clearing in forest
S12-A-351-767	A-351-767	23.80923	52.67643	...	Mixed temperate low-land forest	raining, clearing in forest

Suggestions for fields from locations.csv in the merged deployments table:

location_id: if deployments and locations are merged, then this is no longer a required field, as it is no longer a foreign key. Could remain as an identifier for a location (i.e. a lat/long/habitat group).
deployment_code: this field has an oddly strict definition ("Code which in combination with location_id gives a unique identifier of deployment within a project."). I would remove or loosen the definition.
location_name: field that could be added: a more human readable label for the location identified by location_id
country: not sure this field is necessary, giving that it can easily be derived from latitude/longitude. Would be useful at a project level though (in spatial_coverage), ideally as ISO_3166 country codes.
timezone: not sure this field is necessary, giving that it can easily be derived from latitude/longitude. Users could also wrongly assume it is the timezone used in start and end (which are UTC).
habitat: keep, would be repeated for every deployment
comments: both locations and deployments have comments, so either 1) we have 2 fields, 2) comments for deployments (snowing, raining) and locations (clearing in forest) are merged when exporting or 3) we do not support comments for locations.

tdwg / camtrap-dp Goto Github PK

camtrap-dp's Introduction

Camtrap DP

Usage

Contribute

camtrap-dp's People

Contributors

Stargazers

Watchers

Forkers

camtrap-dp's Issues

package event_interval → sequence_interval

media event_id → sequence_id

observation event_id → sequence_id

Proposal

observations.csv

media.csv

Recommend Projects

Recommend Topics

Recommend Org

package `event_interval` → `sequence_interval`

media `event_id` → `sequence_id`

observation `event_id` → `sequence_id`