Can someone help explain the difference between the SampleProduced object and the Samp

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Sampling Activity and Sample Produced vs Sample,about diggsml/schema-dev

Comments (9)

dponti commented on May 28, 2024

@sdeaton75, @cking24343
The SampleProduced objects identify what samples are produced by the activity; they contain only two properties - a reference to the physical sample produced (the Sample feature) and the location of where the physical sample was extracted from the investigation target (eg. Natural Ground). In the following example snippet, there is a sample collected by activity ID sa_p_Boliva_4_YP1_B1_150. The activity occurred between the depths of 150-152 ft, and the sample extracted was located at a depth of 150-151.5. In this case there was 75% recovery and the lost core is assumed to come from the bottom of the interval:

<samplingActivity>
	<SamplingActivity gml:id="sa_p_Boliva_4_YP1_B1_150">
		<investigationTarget>Natural Ground</investigationTarget>
		<projectRef xlink:href="#p_Boliva_4"/>
		<samplingFeatureRef xlink:href="#p_Boliva_4_YP1_B1"/>
		<samplingLocation>
			<LinearExtent gml:id="DGSC5A-4FA-3FE-43BF-6422E" srsDimension="1"
				srsName="#p_Boliva_4_YP1_B1-lsr">
				<gml:posList>150 152</gml:posList>
			</LinearExtent>
		</samplingLocation>
		<activityType>collect</activityType>
		<lengthLongestPortion uom="ft">0.75</lengthLongestPortion>
		<rockQualityDesignationLength uom="ft">1.0</rockQualityDesignationLength>
		<sampleProduced>
			<SampleProduced gml:id="sp_p_Boliva_4_YP1_B1_150">
				<sampleRef xlink:href="#s_p_Boliva_4_YP1_B1_150"/>
				<location>
					<LinearExtent gml:id="s_p_Boliva_4_YP1_B1_150-sloc" srsDimension="1"
						srsName="#p_Boliva_4_YP1_B1-lsr">
						<gml:posList>150 151.5</gml:posList>
					</LinearExtent>
				</location>
			</SampleProduced>
		</sampleProduced>
		<samplingEquipment>
			<CoringDevice gml:id="DGS8825-110C-1559-A721-5844E">
				<barrelOuterDiameter uom="mm">94</barrelOuterDiameter>
				<linerPresent>true</linerPresent>
				<linerType>porcelain</linerType>
			</CoringDevice>
		</samplingEquipment>
		<samplingMethod>
			<Specification gml:id="DGS440C-14AA-B2-1ED9-56434">
				<gml:name>Core</gml:name>
			</Specification>
		</samplingMethod>
	</SamplingActivity>
</samplingActivity>

The Sample feature records information about the physical sample itself once it's collected; including custody information, dimensions, labels and containers (descriptions about the lithology of a sample is carried within the observation property at the top level of a DIGGS file). The Sample feature does not contain the location from which the sample was collected (that's part of the activity). By separating the sampling location from the Sample this can serve as a way to anonymize the Sample for QC purposes where you would not want a lab to know how the sample was produced or where it came from (the DIGGS file sent to the lab would only contain the Sample features, not the linking SamplingActivity features).

The Sample feature associated with the SampleProduced object above would be encoded like this:

<sample>
	<Sample gml:id="s_p_Boliva_4_YP1_B1_150">
		<gml:name>s_p_Boliva_4_YP1_B1_150</gml:name>
		<projectRef xlink:href="#p_Boliva_4"/>
		<samplingActivityRef xlink:href="#sa_p_Boliva_4_YP1_B1_150"/>
		<sampleProducedRef xlink:href="#sp_p_Boliva_4_YP1_B1_150"/>
		<sampleLocationRef xlink:href="#s_p_Boliva_4_YP1_B1_150-sloc"/>
		<classification>rock</classification>
		<initialContainer>
			<Container gml:id="DGS26E-14C7-2D28-8E00-4F5C4">
				<containerLabel>B1-150</containerLabel>
				<containerType>Bag</containerType>
			</Container>
		</initialContainer>
		<sampleDimensions>
			<SampleDimensions gml:id="DGS26E-14C7-2D28-8E00-4F5C5">
				<diameter uom="in">1.5</diameter>
				<length uom="ft">1.5</length>
			</SampleDimensions>
		</sampleDimensions>
		<chainOfCustodyEvent>
			<ChainOfCustodyEvent gml:id="s_p_Boliva_4_YP1_B1_150-coc1">
				<gml:description>Shipped to ABC laboratory viq FedEx</gml:description>
				<dateInitiated>2017-04-09</dateInitiated>
				<shippingContainer>
					<Container gml:id="DGS26E-14C7-2D28-8E00-4F5C6">
						<containerType>FedEx Box</containerType>
						<containerWeight uom="lbf">3.5</containerWeight>
					</Container>
				</shippingContainer>
			</ChainOfCustodyEvent>
		</chainOfCustodyEvent>
	</Sample>
</sample>

from schema-dev.

sdeaton75 commented on May 28, 2024

Thanks for the response Dan. To clarify a few points for us.

This structure appears to be designed principally to support environmental related sampling that requires the use of blind sampling (rinsate, blanks, etc) and samples that don't have a location or a depth. So, for these types of scenarios you would have Sample objects for samples that are "rinsate" which have no location, no depth, no sampling activity and no sample produced object. It would only have a Sample object? Is that correct? Please provide an example of how this should work.
It also supports duplicate types of sampling for environmental purposes. If it is a duplicate sample, it would have the one SamplingActivity. From that sampling activity, there would be 2 SampleProduced objects. Each SampleProduced object would have a Sample object?
For each of the Sample objects, is the all of the chainOfCustodyEvent data included? That seems to be very redundant? Wouldn't it make sense to have that referenced outside of the sample as well because if you ship off 500 samples to a lab and repeating that for each one seems to be repeating a lot of data that would be the same.

from schema-dev.

dponti commented on May 28, 2024

@sdeaton75,

I wouldn't say that the structure is designed "principally" to support environmental related sampling but is designed to accommodate sampling for environmental purposes and other activities where sampling involves subsampling, aggregation, or other scenarios different from a 1:1 association between a sample (and it's position) collected from a borehole. There is always a requirement for a Sample feature to relate to a SamplingActivity feature, regardless of whether the sample is a blank or a collected sample from a borehole (Note, the property samplingActivityRef in the Sample feature is mandatory). Specific classes of sampling activities are specified by the mandatory samplingActivityType property. samplingActivityType must contain a value that derives from a controlled list. The permitted terms are:

collect - activity created a sample or samples by collection from a sampling feature;
aggregate - activity created a sample or samples by aggregating existing samples; associated Sample(s) should contain more than one sourceSampleRef;
subsample - activity created a sample or samples by subsampling an existing sample; only one sourceSampleRef should be specified for the related Sample(s);
test - activity produced a test, standard or blank sample that does not relate to any field sample or sampling feature; activity will point to a project and no sourceSampleRef should be specified for the associated Sample(s).
none - the sample activity failed to produce any physical sample;

In the case of a blank or rinsate that you mentioned, the SamplingActivity would contain a samplingActivityType with a value of "test". There would be no samplingFeatureRef instantiated in the SamplingActivity feature, and the sampleProduced object(s) would contain no location information. Here is example code for this:

<samplingActivity>
    <SamplingActivity gml:id="sa_p_Boliva_4_YP1_B1_150">
	<investigationTarget>Natural Ground</investigationTarget>
	<projectRef xlink:href="#p_Boliva_4"/>
	<activityType>test</activityType>
	<sampleProduced>
		<SampleProduced gml:id="sp_p_Boliva_4_YP1_B1_150">
			<sampleRef xlink:href="#s_p_Boliva_4_YP1_B1_150"/>
		</SampleProduced>
	</sampleProduced>
    </SamplingActivity>
</samplingActivity>
<sample>
   <Sample gml:id="s_p_Boliva_4_YP1_B1_150">
	<gml:name>s_p_Boliva_4_YP1_B1_150</gml:name>
	<projectRef xlink:href="#p_Boliva_4"/>
	<samplingActivityRef xlink:href="#sa_p_Boliva_4_YP1_B1_150"/>
	<sampleProducedRef xlink:href="#sp_p_Boliva_4_YP1_B1_150"/>
	<classification>fluid</classification>
	<initialContainer>
		<Container gml:id="DGS26E-14C7-2D28-8E00-4F5C4">
			<containerLabel>B1-150</containerLabel>
			<containerType>Jar</containerType>
		</Container>
	</initialContainer>
   </Sample>
</sample>

Not including location information for a SampleProduced object would also occur with other types of samples where a location would make no sense, such as aggregated samples from multiple sites collected for assessing properties of material used for fill or roadbeds, etc.

The answer to this is yes. However, having multiple samples produced from one activity is not just for sample duplicates. Take a soil core from an SPT test, for example taken from 10 - 11.5 feet depth in a borehole. Upon extruding/exposing the sample, it is noted that there's a lithology change at 11 ft in the sample. At that point, the sample is divided at the contact and two samples are bagged. In this case the activity (the SPT test that produces the soil core) produces two samples that have unique location info (one from 10-11 ft and the other from 11 - 11.5 ft). Both samples reference the same "collect" type activity.

This scenario begs the question as to the distinction between the scenario above and subsampling. The distinction is made based on the time the activity occurs. In the above case, the production of the two samples occurs at the rig as a result of the coring activity. If, at the rig the sample was collected and bagged (one sample), and then the sample was subsequently split into two in the office before they are sent off to the lab for testing, then a second samplingActivity would occur, with the original sample as the source sample, the two "splits" as the samples produced (with associated Sample features) and the activity type listed as "subsample".

Good point. Your concern over including a multitude of identical ChainOfCustodyEvent objects within an xml instance can be handled in the same way that v 2.5 now handles equipment, businessAssociates, etc. with the inclusion of gml's AssociationAttributeGroup and OwnershipAttributeGroup attribute groups to the ChainOfCustodyEventPropertyType. This allows the chainOfCustodyEvent property to either include a ChainOfCustodyEvent object or a reference to another ChainOfCustodyEvent object defined elsewhere in the xml instance.

So for example, for one sample, the ChainOfCustodyEvent object is fully instantiated:

<sample>
   <Sample gml:id="s_p_Boliva_4_YP1_B1_150">
	<gml:name>s_p_Boliva_4_YP1_B1_150</gml:name>
	<projectRef xlink:href="#p_Boliva_4"/>
	<samplingActivityRef xlink:href="#sa_p_Boliva_4_YP1_B1_150"/>
	<sampleProducedRef xlink:href="#sp_p_Boliva_4_YP1_B1_150"/>
	<classification>fluid</classification>
             ...
	<chainOfCustodyEvent>
		<ChainOfCustodyEvent gml:id="s_p_Boliva_4_YP1_B1_150-coc1">
			<gml:description>Shipped to ABC laboratory via FedEx</gml:description>
			<dateInitiated>2017-04-09</dateInitiated>
			<shippingContainer>
				<Container gml:id="DGS26E-14C7-2D28-8E00-4F5C6">
					<containerType>FedEx Box</containerType>
					<containerWeight uom="lbf">3.5</containerWeight>
				</Container>
			</shippingContainer>
		</ChainOfCustodyEvent>
	</chainOfCustodyEvent>
   </Sample>
</sample>

And for subsequent samples, shipped the same way at the same time as the above sample, instead of producing separate ChainOfCustodyEvent's for each sample, you could use the xlink:href attribute to point to the gml:id of the fully instantiated ChainOfCustodyEvent object in the original sample:

<sample>
   <Sample gml:id="s_p_Boliva_4_YP2_B1_150">
	<gml:name>s_p_Boliva_4_YP2_B1_150</gml:name>
	<projectRef xlink:href="#p_Boliva_4"/>
	<samplingActivityRef xlink:href="#sa_p_Boliva_4_YP2_B1_150"/>
	<sampleProducedRef xlink:href="#sp_p_Boliva_4_YP2_B1_150"/>
	<classification>fluid</classification>
             ...
	<chainOfCustodyEvent xlink:href="#s_p_Boliva_4_YP1_B1_150-coc1"/>
   </Sample>
</sample>

I've made this change to the working 2.5a on GitHub and pushed those changes to the server so that this addition is now available. This change has no effect on existing 2.5a instance documents.

from schema-dev.

Rogerchandler commented on May 28, 2024

Dan - can the chainofcustodyevent tag contain more than one chainofcustody records? If so this could work well within the sample as long as it is easy for a receiving program to determine easily which samples where included in which COC.

I am nervous about including the sample depth information in the sampling activity element rather than the sample element itself. If we are trying to put historical data into DIGGS then all we know is the sample top and base. To add this to DIGGS we would have to add the samplingActivity tag that may contain no more information than the top and base of the sample. I can see why this works for all types of samples in active collection stage but am worried that we are making it too complicated or the 80% rule.

Can you put my worries to rest?

from schema-dev.

dponti commented on May 28, 2024

@Rogerchandler - in keeping with GML's object property rule, you can only have one ChainOfCustodyEvent object within a chainOfCustodyEvent property tag, but the schema allows for more than one chainOfCustodyEvent property tags within a Sample feature, eg:

<chainOfCustodyEvent>
    <ChainOfCustodyEvent gml:id="abc">
      ...
    </ChainOfCustodyEvent>
</chainOfCustodyEvent>
<chainOfCustodyEvent>
    <ChainOfCustodyEvent gml:id="def">
      ...
    </ChainOfCustodyEvent>
</chainOfCustodyEvent>

... and so on. So you can have multiple ChainOfCustodyRecords for a Sample.

In regards to your concern about the additional complexity of having SampleActivity features with depth information for samples for legacy cases, I understand your concern, but I think there's a middle ground that can make things a bit less bulky in the xml encoding (and perhaps put your worries to rest) in the case of legacy data where only limited information about sampling activities may have been preserved.

First off, the DIGGS model distinguishes between information about the collection or creation of a sample or samples (or the attempt to collect or create a sample or samples) from information about the physical sample itself. In this model, the depth at which a sample is collected is a property of the activity that produced the sample, not the sample itself, so it makes sense that sample collection location is part of the activity and not the sample, with the one benefit of this being that the location info can be hidden from another user (such as a lab) for qa/qc purposes, something that is important for environmental work. So, fundamental to DIGGS is the notion that every physical sample MUST BE associated with one sampling activity (eg. the activity that produced the sample). Sampling activities, however, may produce zero or many samples, so there is not necessarily a specific one-to-one relationship from a sampling activity to a sample. We can take advantage of this to simplify encoding sample information from legacy data where there is limited information.

It would help to recognize what is required for a SamplingActivity and for a Sample. This is the minimum required information needed to instantiate a SamplingActivity feature:

gml:id
2 investigationTarget (controlled list, use "natural ground" for ground investigations
project reference
activityType (controlled list, one of aggregate, collect, none, subsample, test) - see my last post to Scott on definition of those terms
if the activity occurs at a specific locality (eg. depth interval within a borehole), there would be a required reference to the sampling feature, and a required activityLocation record that would hold the depth interval for the activity
and ONLY if the activity produces one or more samples, there would be a required SampleProduced object within a sampleProduced property tag containing the sample location for each sample the activity produces.

This is the minimum required information needed to instantiate a Sample feature:

gml:id
project reference
sampling activity reference
sample produced reference
sample name

So, let's take the case of legacy data where we know where samples were collected in a borehole and we at least know their names (eg. field labels), but know nothing specific about how the samples were obtained. In this case, best practice would be to instantiate ONE SamplingActivity feature in the xml document for all of the samples in the borehole.The SamplingActivity feature would contain properties associating the activity to the borehole and project id's, would have a value of "collect" for the activityType, would have a depth interval for the activityLocation that spans the full depth range of the samples, and would contain one sampleProduced property (with depth location incorporated in the encapsulated SampleProduced object) for each of the samples. The xml instance would also contain one Sample feature for each of the samples that would contain the sample's name and the reference to the SampleProduced object in the SampleActivity feature corresponding to that sample's depth. All Sample's in the borehole would reference the same SamplingActivity feature.

So, for this you would have only one SamplingActivity feature for all the samples in the borehole which hold the depths, instead of one SamplingActivity feature for each sample, so this reduces the complexity (or at least bulk) of the xml. If you knew (or could infer) that some samples were obtained from SPT tests and others from Pitcher samples, then you could create a single Sampling Activity record for each type of sample rather than one activity for each sample. The degree to which you would group or separate samples within an activity would depend on the amount of info you have about the samples and the activities that produced them.

Let's take another case where we have a borehole log that lists water content, bulk density, and particle size data presumably taken from samples collected at various depths in a borehole. We don't know anything about the sampling method, nor anything about the samples specifically (eg. no name or description) other than the depth. While we know that samples must have been collected in order to obtain the results we don't want to "add" dummy data into the system to create sample records here. The important info (in fact the only real info) are the water content, density and particle size results associated with the various depths in a borehole, which are, in fact, test results and therefore would be encoded in a Test object instead. You woudn't instantiate any Sample or SamplingActivity records in this case.

So there are ways to consolidate sample depth information into SamplingActivity records where the only relevant info we have about how samples were collected is the sample depth, which can simplify the xml for that case and yet not require "exceptions" to be made to the data model for the benefit of legacy data at the expense of failing to fully anonymize (it that's a word) environmental samples.

Are you worrying any less? :-)

from schema-dev.

sdeaton75 commented on May 28, 2024

So I think we have one final question on the Sample related structure. What is the reason for having the SampleProduced object contain a SampleRef object and the Sample object contain the SampleProducedRef object? It seems that if one referenced the other, that would be sufficient. However, having each object reference the other object is referencing from parent to child and child to parent.

from schema-dev.

dponti commented on May 28, 2024

Scott - It's not necessary to have both references - I included it to make is easier to extract Sample information from a SamplingActivity feature, but this can be done by querying Sample features that reference the same SamplingActivity feature through the SamplingActivityRef property (required for Samples). If we consider that the SamplingActivity ->Sample relationship is parent-child, then to be consistent with how we've handled these relationships elsewhere in Diggs, then only the "child" feature should carry the parent's reference. The relationship between SampleProduced -> Sample is not really parent-child because this is strictly a one-to-one relationship, but because SampleProduced is contained in the parent SamplingActivity record, then we should drop the reference in the SampleProduced object and maintain the one in Sample. This way, all of the information about a Sample, including its referenced SamplingActivity and its location, can be derived from the Sample feature, eg. Sample contains a reference to the associated SamplingActivity feature and another reference to the associated SampleProduced object within the SamplingActivity record. Does this make sense? If so, I'll make the change and upload the revision to GitHub. Dan

…

On Mon, May 14, 2018 at 9:57 AM, Scott Deaton ***@***.***> wrote: So I think we have one final question on the Sample related structure. What is the reason for having the SampleProduced object contain a SampleRef object and the Sample object contain the SampleProducedRef object? It seems that if one referenced the other, that would be sufficient. However, having each object reference the other object is referencing from parent to child and child to parent. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AZgjH3cWIHlGahoEKEQg6Wk4V9vPgq6fks5tybdygaJpZM4TFBmc> .

-- Daniel J Ponti Earthquake Science Center U.S. Geological Survey 345 Middlefield Road MS 977 Menlo Park, CA 94025-3591 ph: (650) 329-5679 fax: (650)329-5163

from schema-dev.

sdeaton75 commented on May 28, 2024

Yes, I think that makes sense.

from schema-dev.

dponti commented on May 28, 2024

Change made. Also removed optional sampleLocationRef from Sample object (redundant - can get Sample location from the SampleProduced object without the need for the location reference), and added an enumeration value of "unknown" to the activityRef property of SamplingActivity for use with legacy data where no SamplingActivity information is known other than that a sample was produced.

from schema-dev.

Sampling Activity and Sample Produced vs Sample about schema-dev HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent