Comments (8)
@nirmalapudota -- thanks for this feedback. This is probably something we'll try to handle with our Search Service (#1), but at a minimum I can think about a way to make it more explicit about the lastModified vs. publish date.
from api.
thank you for the response. getting API results based on "published_date" would be very helpful when wanted to extract data daily and only final published packages.
Could you please confirm the below:
I was looking at the output of FR 'packages' API output:
e.g., API call "https://api.govinfo.gov/packages/FR-2012-04-27/summary?api_key=DEMO_KEY"
- The above API call output has 'dateIssued' key/attribute. is it same as 'published_date'?
- the above output says the 'dateIssued' of the "FR-2012-04-27" is on "'2012-04-27", however the last modified date is "2018-12-14T19:05:13Z". Does it mean this publication is last modified in December, 2018. If so, is there a way to understand what was modified either from JSON or XMLlinks outputs.
Thank you so much.
Nirmala
I was looking extracting Federal register packages. From collections/packages/granules JSON output
from api.
- Yes, dateIssued is the equivalent to publish date -- there are some instances for other collections where there are other values that might take precedence, particularly for granules, but for the purposes of FR packages, they are the same.
- The lastModified date indicates the last change to the package -- in this case, it was reprocessed -- likely to include additional data within mods. The Premis preservation metadata will indicate what events have occurred to the package throughout it's lifecycle, though it doesn't necessarily tell you specifically what things may have changed. My recommendation would be to treat a package as a whole unit -- if the lastModified date changes for the package, you will likely need to re-extract the content and metadata again to ensure that it is fully up to date.
https://api.govinfo.gov/packages/FR-2012-04-27/premis
Here's the event that cause the overall lastModified date to change - see the eventDetail
.
<event>
<eventIdentifier>
<eventIdentifierType>FDsys:event</eventIdentifierType>
<eventIdentifierValue>8e09a323-f1e4-464e-9b81-47f901832e94</eventIdentifierValue>
</eventIdentifier>
<eventType>Reprocessed for Access</eventType>
<eventDateTime>2018-12-14T14:04:44-05:00</eventDateTime>
<eventDetail>
11002ee180000964 has reprocessed ACP P0b002ee1825e9e09 for access, which includes deleting and regenerating the granule folder and derived renditions. The content has been reparsed and there may be updates to the descriptive metadata in AIP and ACP.
</eventDetail>
<eventOutcomeInformation>
<eventOutcome>Success</eventOutcome>
</eventOutcomeInformation>
<linkingAgentIdentifier>
<linkingAgentIdentifierType>FDsys:agent</linkingAgentIdentifierType>
<linkingAgentIdentifierValue>11002ee180000964</linkingAgentIdentifierValue>
<linkingAgentRole>implementer</linkingAgentRole>
</linkingAgentIdentifier>
<linkingObjectIdentifier>
<linkingObjectIdentifierType>FDsys</linkingObjectIdentifierType>
<linkingObjectIdentifierValue>P0b002ee1825e9e09</linkingObjectIdentifierValue>
<linkingObjectRole>source</linkingObjectRole>
</linkingObjectIdentifier>
</event>
As an aside, since the packageid is predictable, you could construct package service requests for any package via: https://api.govinfo.gov/packages/FR-`YYYY`-`MM`-`DD`/
and use the relevant endpoint for you request, such as:
/summary - json metadata summar
/pdf - pdf content
/xml - xml content
/mods - descriptive metadata
/premis - preservation metadata
Handling this use case will be one of the first tests for the search service as we work on development.
from api.
thank you. This is very helpful. Will be waiting to see these new features in the API process.
Thank you.
Nirmlaa
from api.
Is there any kind of efficient workaround for this with the current API? We're trying to look at the CFR collection which has over 5000 packages. It seems the last modified dates are all within the last few months even for versions of the packages that are several years old.
In our use-case, we would want to grab all the CFR volume entries for a given year. The only two options I can think of seem very wasteful of network resources: either (1) query the whole list, lookup the summary, look up the published date, and then filter the whole list accordingly. That would require a large number of round-trips or (2) enumerate all possible URLs and check whether we picked up all the volumes of each title..
from api.
@aelfric -- we recently republished a large amount of the content on the system to update some data within our search indices.
Currently there's not a way within the API to flag the date values to go by publish date instead of lastModified. That's something we're looking at.
My suggestion for the moment would be to look at the CFR sitemaps. These are broken down by year. You could pull the package id out of the sitemap loc
value by stripping the "https://www.govinfo.gov/app/details/" out.
Here's an example for 2019:
https://www.govinfo.gov/sitemap/CFR_2019_sitemap.xml
Once you had that list of package ids, you could grab the zips or whatever content version you wanted by inserting the package id into the api packages service
Understandably, this isn't perfect, but might be slightly faster than doing either 1 or 2 above.
Let me know if there's anything I can clarify.
from api.
Hello, we are currently previewing a new published
endpoint that will allow retrieval by publication date rather than lastModified
time. This is still in development, but we'd like input on the functionality that's available so far.
Some additional features:
- filter by docClass:
&docClass=hres
to get house resolutions in the bills collection - this is less useful for CFR packages, admittedly, since they all share the same docClass - filter by congress
&congress=116
- retrieve multiple collections with one call - provide a comma separated list of values - e.g. https://api.govinfo.gov/published/2019-01-01/2019-12-31?offset=0&pageSize=100&collection=CFR,FR&api_key=DEMO_KEY
- track updates using &modifiedSince=2020-01-01T00:00:00Z
@cnizzardini -- this may help with #57
from api.
Format:
https:// api.govinfo.gov/published/dateIssuedStartDate
/dateIssuedEndDate
?offset=startingRecord
&pageSize=number of records in call
&collection=comma-separated list of values
&api_key=your api.data.gov api key
Examples:
BILLS issued between January and July 2019:
https://api.govinfo.gov/published/2019-01-01/2019-07-31?offset=0&pageSize=100&collection=BILLS&api_key=DEMO_KEY
Federal Register and CFR packages in 2019:
https://api.govinfo.gov/published/2019-01-01/2019-12-31?offset=0&pageSize=100&collection=CFR,FR&modifiedSince=2020-01-01T00:00:00&api_key=DEMO_KEY
Required parameters
dateIssuedStartDate
: the earliest package you are requesting by dateIssued – YYYY-MM-DDoffset
: starting record – usually 0. If pageSize=10, you could advance to the next page of results by applyingoffset=10
.pageSize
: number of records to return per request (e.g. 10)collection
: comma-separated list of collections that you are requesting, e.g. https://api.govinfo.gov/published/2019-01-01/2019-12-31?offset=0&pageSize=100&collection=BILLS,BILLSTATUS&api_key=DEMO_KEY - see /collections for a list of collections by code and human-readable name.
Optional parameters:
dateIssuedEndDate
: the latest package you are requesting by dateIssued – YYYY-MM-DDdocClass
: Filter the results by overarching collection-specific categories. The values vary from collection to collection. For example, docClass in BILLS corresponds with Bill Type --e.g. s, hr, hres, sconres. CREC (the Congressional Record) has docClass by CREC section: HOUSE, SENATE, DIGEST, and EXTENSIONScongress
: congress number (e.g. “116”)modifiedSince
: equivalent to the startDate parameter in the collections service which is based on lastModified– allows you to request only packages that have been modified since a given date/time – useful for tracking updates. Requires ISO 8601 format -- e.g. 2020-02-28T00:00:00Z
from api.
Related Issues (20)
- Api Key Usage; Headers vs. query HOT 2
- Congressional Record for 03/23/2023 cut off at page H1409 HOT 2
- Bad bill links being improperly parsed out from Supreme Court Cases and other text (U.S. 1850; 405 U.S. 1030) HOT 2
- WayOut HOT 1
- Packages - set default granulesLink pagination to use offsetMark
- Related - CPRT to CPRT relationships
- Missing Bill Text XML for BILLS-118hr3571ih and BILLS-118hres383eh HOT 8
- Anti Virus software update
- Hello would My dreams officiall Pakistan off world now technologie ❓️
- Brian j Reinwald HOT 1
- Reinwald Brian
- Brian j Reinwald
- Search - new tab generated from /docs page returns error
- Search - add relatedLink to Search Service results
- need to add upcoming feature HOT 1
- Govinfo API key HOT 6
- 401 Errors HOT 8
- Smartscriptmatrix
- Retrieve CREC documents which have the property "Congress Member Speaking" HOT 1
- Receive "HTTP Error 401: Unauthorized" after 100 scrapes HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from api.