GTFS Schedule Schema: add filesize and calendar range metadata#525
Open
mil wants to merge 2 commits intoMobilityData:mainfrom
Open
GTFS Schedule Schema: add filesize and calendar range metadata#525mil wants to merge 2 commits intoMobilityData:mainfrom
mil wants to merge 2 commits intoMobilityData:mainfrom
Conversation
New properties for schedule schema: extracted_filesize: The filesize in bytes of GTFS archive extracted extracted_calendar_start: Earliest date referenced in calendar/calendar_dates extracted_calendar_end: Latest date referenced in calendar/calendar_dates Also adds related helper functions: extract_gtfs_calendar_range: Extract calendar range from a GTFS archive get_filesize: Gets the filesize in bytes given a filepath is_gtfs_yyyymmdd_format: Determines if date is in GTFS YYYYMMDD format
Also adds tests for new helper extract_gtfs_calendar_range function
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds 3 new system generated metadata properties to the bounding_box specification for newly added GTFS schedule sources:
extracted_filesize: indicates GTFS archive filesize in bytesextracted_calendar_start: indicates GTFS archive calendar/calendar_dates min dateextracted_calendar_end: indicates GTFS archive calendar/calendar_dates max dateThese properties function similarly to the existing
extracted_onmetadata property in the schema.These 3 new metadata properties (
extracted_filesize,extracted_calendar_start, andextracted_calendar_end) would be very helpful for end-consumers for certain applications. For example, with respect to end-users understanding general GTFS filesize consider consuming an archive that is 10MB is very different from consuming an archive that is 500MB in both download & processing time; as such it would be very helpful for end-consumers to know this stat flagged ahead-of-time. (One example of an application that would benefit from this change is my android app, Transito, which consumes GTFS indicated from MDB; and I would greatly appreciate the ability to pass for example filesize metadata along to my end-users). Additionally the calendar start/end range would help in historically understanding when the source was updated/added what the original calendar range was. While this PR only adds the 3 new properties; followup PR(s) could address updating existing sources and all new sources would have this metadata by default once applied.In addition to the updated tests, if you just want to quickly test to see what the new format will look like for a sample, you can use for example: