Giter Club home page Giter Club logo

Comments (8)

jonquandt avatar jonquandt commented on July 30, 2024

@nirmalapudota -- thanks for this feedback. This is probably something we'll try to handle with our Search Service (#1), but at a minimum I can think about a way to make it more explicit about the lastModified vs. publish date.

from api.

nirmalapudota avatar nirmalapudota commented on July 30, 2024

thank you for the response. getting API results based on "published_date" would be very helpful when wanted to extract data daily and only final published packages.

Could you please confirm the below:
I was looking at the output of FR 'packages' API output:
e.g., API call "https://api.govinfo.gov/packages/FR-2012-04-27/summary?api_key=DEMO_KEY"

  1. The above API call output has 'dateIssued' key/attribute. is it same as 'published_date'?
  2. the above output says the 'dateIssued' of the "FR-2012-04-27" is on "'2012-04-27", however the last modified date is "2018-12-14T19:05:13Z". Does it mean this publication is last modified in December, 2018. If so, is there a way to understand what was modified either from JSON or XMLlinks outputs.

Thank you so much.
Nirmala

I was looking extracting Federal register packages. From collections/packages/granules JSON output

from api.

jonquandt avatar jonquandt commented on July 30, 2024

@nirmalapudota

  1. Yes, dateIssued is the equivalent to publish date -- there are some instances for other collections where there are other values that might take precedence, particularly for granules, but for the purposes of FR packages, they are the same.
  2. The lastModified date indicates the last change to the package -- in this case, it was reprocessed -- likely to include additional data within mods. The Premis preservation metadata will indicate what events have occurred to the package throughout it's lifecycle, though it doesn't necessarily tell you specifically what things may have changed. My recommendation would be to treat a package as a whole unit -- if the lastModified date changes for the package, you will likely need to re-extract the content and metadata again to ensure that it is fully up to date.

https://api.govinfo.gov/packages/FR-2012-04-27/premis
Here's the event that cause the overall lastModified date to change - see the eventDetail.

<event>
	<eventIdentifier>
		<eventIdentifierType>FDsys:event</eventIdentifierType>
		<eventIdentifierValue>8e09a323-f1e4-464e-9b81-47f901832e94</eventIdentifierValue>
	</eventIdentifier>
	<eventType>Reprocessed for Access</eventType>
	<eventDateTime>2018-12-14T14:04:44-05:00</eventDateTime>
	<eventDetail>
		11002ee180000964 has reprocessed ACP P0b002ee1825e9e09 for access, which includes deleting and regenerating the granule folder and derived renditions. The content has been reparsed and there may be updates to the descriptive metadata in AIP and ACP.
	</eventDetail>
	<eventOutcomeInformation>
		<eventOutcome>Success</eventOutcome>
	</eventOutcomeInformation>
	<linkingAgentIdentifier>
		<linkingAgentIdentifierType>FDsys:agent</linkingAgentIdentifierType>
		<linkingAgentIdentifierValue>11002ee180000964</linkingAgentIdentifierValue>
		<linkingAgentRole>implementer</linkingAgentRole>
	</linkingAgentIdentifier>
	<linkingObjectIdentifier>
		<linkingObjectIdentifierType>FDsys</linkingObjectIdentifierType>
		<linkingObjectIdentifierValue>P0b002ee1825e9e09</linkingObjectIdentifierValue>
		<linkingObjectRole>source</linkingObjectRole>
	</linkingObjectIdentifier>
</event>

As an aside, since the packageid is predictable, you could construct package service requests for any package via: https://api.govinfo.gov/packages/FR-`YYYY`-`MM`-`DD`/

and use the relevant endpoint for you request, such as:
/summary - json metadata summar
/pdf - pdf content
/xml - xml content
/mods - descriptive metadata
/premis - preservation metadata

Handling this use case will be one of the first tests for the search service as we work on development.

from api.

nirmalapudota avatar nirmalapudota commented on July 30, 2024

thank you. This is very helpful. Will be waiting to see these new features in the API process.

Thank you.
Nirmlaa

from api.

aelfric avatar aelfric commented on July 30, 2024

Is there any kind of efficient workaround for this with the current API? We're trying to look at the CFR collection which has over 5000 packages. It seems the last modified dates are all within the last few months even for versions of the packages that are several years old.

In our use-case, we would want to grab all the CFR volume entries for a given year. The only two options I can think of seem very wasteful of network resources: either (1) query the whole list, lookup the summary, look up the published date, and then filter the whole list accordingly. That would require a large number of round-trips or (2) enumerate all possible URLs and check whether we picked up all the volumes of each title..

from api.

jonquandt avatar jonquandt commented on July 30, 2024

@aelfric -- we recently republished a large amount of the content on the system to update some data within our search indices.

Currently there's not a way within the API to flag the date values to go by publish date instead of lastModified. That's something we're looking at.

My suggestion for the moment would be to look at the CFR sitemaps. These are broken down by year. You could pull the package id out of the sitemap loc value by stripping the "https://www.govinfo.gov/app/details/" out.

Here's an example for 2019:
https://www.govinfo.gov/sitemap/CFR_2019_sitemap.xml

Once you had that list of package ids, you could grab the zips or whatever content version you wanted by inserting the package id into the api packages service

Understandably, this isn't perfect, but might be slightly faster than doing either 1 or 2 above.

Let me know if there's anything I can clarify.

from api.

jonquandt avatar jonquandt commented on July 30, 2024

Hello, we are currently previewing a new published endpoint that will allow retrieval by publication date rather than lastModified time. This is still in development, but we'd like input on the functionality that's available so far.

https://api.govinfo.gov/published/2019-01-01/2019-12-31?offset=0&pageSize=100&collection=CFR&api_key=DEMO_KEY

Some additional features:

@nirmalapudota @aelfric

@cnizzardini -- this may help with #57

from api.

jonquandt avatar jonquandt commented on July 30, 2024

Format:

https:// api.govinfo.gov/published/dateIssuedStartDate/dateIssuedEndDate?offset=startingRecord&pageSize=number of records in call&collection=comma-separated list of values&api_key=your api.data.gov api key

Examples:

BILLS issued between January and July 2019:
https://api.govinfo.gov/published/2019-01-01/2019-07-31?offset=0&pageSize=100&collection=BILLS&api_key=DEMO_KEY

Federal Register and CFR packages in 2019:
https://api.govinfo.gov/published/2019-01-01/2019-12-31?offset=0&pageSize=100&collection=CFR,FR&modifiedSince=2020-01-01T00:00:00&api_key=DEMO_KEY

Required parameters

Optional parameters:

  • dateIssuedEndDate: the latest package you are requesting by dateIssued – YYYY-MM-DD
  • docClass: Filter the results by overarching collection-specific categories. The values vary from collection to collection. For example, docClass in BILLS corresponds with Bill Type --e.g. s, hr, hres, sconres. CREC (the Congressional Record) has docClass by CREC section: HOUSE, SENATE, DIGEST, and EXTENSIONS
  • congress: congress number (e.g. “116”)
  • modifiedSince: equivalent to the startDate parameter in the collections service which is based on lastModified– allows you to request only packages that have been modified since a given date/time – useful for tracking updates. Requires ISO 8601 format -- e.g. 2020-02-28T00:00:00Z

from api.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.