wklumpen / gtfs-lite Goto Github PK
View Code? Open in Web Editor NEWLightweight GTFS Analysis
License: MIT License
Lightweight GTFS Analysis
License: MIT License
Allow the user to load only certain portions of the feed for ease of analysis.
Will have to validate/check if files that are required are loaded (issue warning)
There's an errant print statement in the load_zip()
function that needs removing.
The function trips_at_stops(stop_ids, date, start_time=datetime.time(0, 0), end_time=datetime.time(23, 59))
doesn't actually do anything when different time slices are provided.
Right now there are a few older functions which rely on geometric/spatial analysis. In order to keep GTFS-lite as lightweight as possible, these should be removed, unless there is demand for them later via a feature or pull request.
The idea here is if you want to make a specific route_id
(or set of route_id
s) disappear you can do so programatically.
In order to do some specific date-and-time analysis such as creating a frequency grid (See #21), we need to effectively have our stop times be date-aware instead of just straight up-and-down times.
To solve this problem we need to differentiate between a "date-aware" GTFS object (which would effectively only include a subset of the entire schedule feed filtered via the calendar
and calendar_dates
dataframes) and a "date-naive" or basic GTFS object.
I'm imagining that we would have something like a set_date
function which allows the user to pass the analysis date in question and which is persisted in some way throughout the analysis. The way we go about this could be a bit different:
We apply the filtering every time we do something. This means effectively creating a datetime column for arrival_time
and departure_time
columns in the stop_times
frame and updating this column whenever the date changes. We can then go about the analyses assuming those columns are up-to-date.
We could have a person create a "copy" of the GTFS object which has a subset of the data. This would be useful if we are planning to have functions that work in both a global context and a subset context (i.e. total trips or total service hours).
My opinion is that we should go the route of option 1, by creating extra columns attached to the stop_times column that update whenever a new date is set.
We need feeds that are useful for writing testing. We will want to generate a few GTFS feeds from scratch, or adapt simple agency feeds to our suiting.
Specifically we need to test the basics, but also frequencies.txt
The load_zip
method throws a UnicodeDecodeError
when loading certain GTFS feed packages like this one:
f-9q9-bart.zip
The suggested fix is to allow the user to specify an encoding on load, otherwise default to UTF-8. May also want to consider adding an option for the user to specify a behaviour on an error.
Often providers will throw all kinds of weird columns into their feeds. While it might be useful to have these in some instances, it can also lead to various loading warnings (mixed types) and slow the process down.
Proposed feature is to provide an option on load_zip()
called enforce_spec=False
which only loads columns specified in the official GTFS specification using the usecols
option from Pandas.
Hi! so I'm trying to read in a GTFS.zip as follows:
gtfs = GTFS.load_zip("SRTA GTFS-2020-06-29.zip")
but I'm getting the following error
File "/home/ja/miniconda3/envs/TC/lib/python3.8/site-packages/gtfslite/gtfs.py", line 101, in load_zip
trips = pd.read_csv(
File "/home/ja/miniconda3/envs/TC/lib/python3.8/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/ja/miniconda3/envs/TC/lib/python3.8/site-packages/pandas/io/parsers.py", line 454, in _read
data = parser.read(nrows)
File "/home/ja/miniconda3/envs/TC/lib/python3.8/site-packages/pandas/io/parsers.py", line 1133, in read
ret = self._engine.read(nrows)
File "/home/ja/miniconda3/envs/TC/lib/python3.8/site-packages/pandas/io/parsers.py", line 2037, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 952, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 1084, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas/_libs/parsers.pyx", line 1115, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas/_libs/parsers.pyx", line 1208, in pandas._libs.parsers.TextReader._convert_with_dtype
ValueError: Integer column has NA values in column 5
I think this pertains to the coding of 'direction_id': 'Int64'
Maybe the solution is to use float instead of int? https://stackoverflow.com/questions/21287624/convert-pandas-column-containing-nans-to-dtype-int
cheers,
When reading in a GTFS feed that uses the optional frequencies.txt file, there seems to be a parse dates error:
>>> import gtfslite.gtfs
>>> test = gtfslite.gtfs.GTFS.load_zip('data/20230504_070233_Euskadi_Bizkaibus.zip')
C:\Users\carlh\miniconda3\lib\site-packages\gtfslite\gtfs.py:348: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
frequencies = pd.read_csv(
C:\Users\carlh\miniconda3\lib\site-packages\gtfslite\gtfs.py:348: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
frequencies = pd.read_csv(
The data still reads in more or less fine, although it plugs in today's date for the start_time field,
>>> test.frequencies
trip_id start_time end_time headway_secs exact_times
0 trp_A0651_737_OP43FIN 2023-05-15 08:35:00 22:35:00 3600 1
1 trp_A0651_737_OP43LIN 2023-05-15 07:35:00 22:05:00 1800 1
2 trp_A0651_737_OP43SIN 2023-05-15 07:35:00 22:35:00 3600 1
3 trp_A0651_738_OP43FIN 2023-05-15 07:35:00 21:35:00 3600 1
4 trp_A0651_738_OP43SIN 2023-05-15 06:35:00 21:35:00 3600 1
.. ... ... ... ... ...
532 trp_A3932_1053_OP41VIN 2023-05-15 05:30:00 22:30:00 3600 1
533 trp_A3932_1054_OP41FIN 2023-05-15 07:00:00 23:00:00 3600 1
534 trp_A3932_1054_OP41LJIN 2023-05-15 07:00:00 23:00:00 3600 1
535 trp_A3932_1054_OP41SIN 2023-05-15 07:00:00 23:00:00 3600 1
536 trp_A3932_1054_OP41VIN 2023-05-15 07:00:00 23:00:00 3600 1
[537 rows x 5 columns]
presumably because of the error, end_time remains a string object:
>>> test.frequencies.dtypes
trip_id object
start_time datetime64[ns]
end_time object
headway_secs int32
exact_times int32
dtype: object
But things still seem to work, so it doesn't necessarily impact analysis:
>>> test.frequencies.loc[test.frequencies.end_time > '23:59:00']
trip_id start_time end_time headway_secs exact_times
119 trp_A3136_668_OP40LJIN2 2023-05-15 21:30:00 24:30:00 1800 1
120 trp_A3136_668_OP40SIN 2023-05-15 21:30:00 24:30:00 1800 1
121 trp_A3136_668_OP40VIN2 2023-05-15 21:30:00 24:30:00 1800 1
324 trp_A3516_872_OP42SPPV 2023-05-15 09:30:00 25:30:00 3600 1
I believe this may relates to time in GTFS feeds (HH:MM:SS) potentially going over 24 hours, with the result that times of 24:00:00 and later are no longer valid datetime objects.
Currently, times are explicitly parsed as dates when frequencies.txt is read in:
Lines 343 to 354 in 0bbdd2e
However, I think if they were treated as timedeltas this error would be avoided:
frequencies = self.read_clean_feed(
zip_file.open(filepaths["frequencies.txt"]),
dtype={
"trip_id": str,
"start_time": str,
"end_time": str,
"headway_secs": int,
"exact_times": int,
},
skipinitialspace=True,
)
frequencies["start_time"] = pd.to_timedelta(frequencies["start_time"])
frequencies["end_time"] = pd.to_timedelta(frequencies["end_time"])
Using this code means that error isn't raised, both start_time and end_time are parsed consistently, things still work, but without the awkward guess work of plugging in today's date (which most often wouldn't be the right day, technically); relative time seems more appropriate:
>>> import gtfslite.gtfs
>>> test = gtfslite.gtfs.GTFS.load_zip('data/20230504_070233_Euskadi_Bizkaibus.zip')
>>> test.frequencies
trip_id start_time end_time headway_secs exact_times
0 trp_A0651_737_OP43FIN 0 days 08:35:00 0 days 22:35:00 3600 1
1 trp_A0651_737_OP43LIN 0 days 07:35:00 0 days 22:05:00 1800 1
2 trp_A0651_737_OP43SIN 0 days 07:35:00 0 days 22:35:00 3600 1
3 trp_A0651_738_OP43FIN 0 days 07:35:00 0 days 21:35:00 3600 1
4 trp_A0651_738_OP43SIN 0 days 06:35:00 0 days 21:35:00 3600 1
.. ... ... ... ... ...
532 trp_A3932_1053_OP41VIN 0 days 05:30:00 0 days 22:30:00 3600 1
533 trp_A3932_1054_OP41FIN 0 days 07:00:00 0 days 23:00:00 3600 1
534 trp_A3932_1054_OP41LJIN 0 days 07:00:00 0 days 23:00:00 3600 1
535 trp_A3932_1054_OP41SIN 0 days 07:00:00 0 days 23:00:00 3600 1
536 trp_A3932_1054_OP41VIN 0 days 07:00:00 0 days 23:00:00 3600 1
[537 rows x 5 columns]
>>> test.frequencies.loc[test.frequencies.end_time > '23:59:00']
trip_id start_time end_time headway_secs exact_times
119 trp_A3136_668_OP40LJIN2 0 days 21:30:00 1 days 00:30:00 1800 1
120 trp_A3136_668_OP40SIN 0 days 21:30:00 1 days 00:30:00 1800 1
121 trp_A3136_668_OP40VIN2 0 days 21:30:00 1 days 00:30:00 1800 1
324 trp_A3516_872_OP42SPPV 0 days 09:30:00 1 days 01:30:00 3600 1
>>> test.frequencies.dtypes
trip_id object
start_time timedelta64[ns]
end_time timedelta64[ns]
headway_secs int32
exact_times int32
dtype: object
As per https://github.com/global-healthy-liveable-cities/global-indicators/issues/338, when loading a feed with frequencies.txt
if the optional field exact_times
has not been completed, this results in a ValueError exception--- like,
home/ghsci/process/data/transit_feeds/test_gtfs/20230329_130123_Metro_Sevilla
Traceback (most recent call last):
File "/home/ghsci/process/subprocesses/_10_gtfs_analysis.py", line 291, in <module>
main()
File "/home/ghsci/process/subprocesses/_10_gtfs_analysis.py", line 287, in main
gtfs_analysis(codename)
File "/home/ghsci/process/subprocesses/_10_gtfs_analysis.py", line 78, in gtfs_analysis
loaded_feeds = gtfslite.GTFS.load_zip(f'{gtfsfeed_path}.zip')
File "/env/lib/python3.10/site-packages/gtfslite/gtfs.py", line 348, in load_zip
frequencies = pd.read_csv(
File "/env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 583, in _read
return parser.read(nrows)
File "/env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1704, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/env/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 812, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 889, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 1034, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas/_libs/parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas/_libs/parsers.pyx", line 1192, in pandas._libs.parsers.TextReader._convert_with_dtype
ValueError: Integer column has NA values in column 4
The reason is, in gtfs.py, this field is specified to be interpreted as int
:
Line 371 in e94cfa5
but according to the spec, exact_times
is optional --- so should be "Int64"
, like:
"exact_times": "Int64",
I see that this dtype "Int64" has been used elsewhere in gtfs.py when reading in other files, so I have tested out a change of this on my fork of gtfslite and confirmed it resolves the issue when parsing this file.
In case its useful, I'll lodge a pull request with this change to address this issue in a tic!
Seem like there is something that is stopping the official swedish gtfs files from working with 0.2.0.
It works fine with 0.1.8 (and pandas 1.5.3), but when upgrading to gtfs-lite 0.2.0 no trips are anymore found.
E.g. functions date_trips() and trips_at_stops() returns empty datasets with 0.2.0, while with 0.1.8 they return correct information.
Example gtfs file: http://olal.se/gtfs/otraf.zip
Hello,
I met a problem when I was trying to load a GTFS dataset using GTFS.load_zip()
function. The GTFS dataset I'm using is from http://gtfs.ovapi.nl. Could you please check how this UnicodeDecodeError happens? Many thanks!
Motivated by the need to do some sanity checks on GTFS feeds in this r5py issue, add a feature which constructs a matrix of average headways in user-specified chunks of time throughout a given service day.
This is not an issue with gtfs-lite per se, but rather with a particular GTFS file I tried to load, resulting in the following error:
/home/ghsci/work/process/data/transit_feeds/bilbao_gtfs/20230509_010334_RENFE_AVLD
Traceback (most recent call last):
File "/env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3652, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'end_date'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ghsci/work/process/subprocesses/_10_gtfs_analysis.py", line 266, in <module>
main()
File "/home/ghsci/work/process/subprocesses/_10_gtfs_analysis.py", line 81, in main
loaded_feeds = gtfslite.GTFS.load_zip(f'{gtfsfeed_path}.zip')
File "/env/lib/python3.10/site-packages/gtfslite/gtfs.py", line 280, in load_zip
calendar["end_date"] = pd.to_datetime(calendar["end_date"]).dt.date
File "/env/lib/python3.10/site-packages/pandas/core/frame.py", line 3760, in __getitem__
indexer = self.columns.get_loc(key)
File "/env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3654, in get_loc
raise KeyError(key) from err
KeyError: 'end_date'
The apparent cause of this is that this calendar.txt file contains spaces before the newline symbol for the header as well as data rows, as seen in this text editor showing all characters:
I don't belive the data rows are the issue, as the read_csv uses the skipinitialspace=True
(and I confirmed this resolves the spaces-after-dates issue).
Line 274 in 0bbdd2e
However, the last column in this file ends up with its name included spaces, such that it can't be understood as simply 'end_date'
, as per below screenshot showing the reading of this file into pandas:
One possibility, if you did want to handle these kind of inconsistencies, would be to call str.strip()
on columns after loading each dataframe, as per https://stackoverflow.com/a/36082588/4636357, e.g.
calendar.columns = calendar.columns.str.strip()
I confirmed that the above code resolves the issue in this case:
without this addition:
>>> import gtfslite.gtfs
>>> test = gtfslite.gtfs.GTFS.load_zip('data/20230509_010334_RENFE_AVLD.zip')
Traceback (most recent call last):
File "C:\Users\carlh\miniconda3\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'end_date'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\gtfs-lite\gtfslite\gtfs.py", line 277, in load_zip
calendar["start_date"] = pd.to_datetime(calendar["start_date"]).dt.date
File "C:\Users\carlh\miniconda3\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\carlh\miniconda3\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'end_date'
with the addition:
>>> import gtfslite.gtfs
>>> test = gtfslite.gtfs.GTFS.load_zip('data/20230509_010334_RENFE_AVLD.zip')
>>> test
<gtfslite.gtfs.GTFS object at 0x0000024EFD75FE50>
>>> test.calendar
service_id monday tuesday wednesday ... saturday sunday start_date end_date
0 2023-05-082023-06-09001651 True True True ... True True 2023-05-08 1970-01-01
1 2023-05-082023-06-09001653 True True True ... True True 2023-05-08 1970-01-01
2 2023-05-082023-06-30001901 True True True ... True True 2023-05-08 1970-01-01
3 2023-05-082023-06-30001902 True True True ... True True 2023-05-08 1970-01-01
4 2023-05-082023-06-30001931 True True True ... True True 2023-05-08 1970-01-01
... ... ... ... ... ... ... ... ... ...
3425 2023-05-082023-05-28389071 True True True ... True True 2023-05-08 1970-01-01
3426 2023-05-082023-05-28389081 True True True ... True True 2023-05-08 1970-01-01
3427 2023-05-082023-05-28389091 True True True ... True True 2023-05-08 1970-01-01
3428 2023-05-082023-12-09941841 True True True ... True True 2023-05-08 1970-01-01
3429 2023-05-082023-12-09942751 True True True ... True True 2023-05-08 1970-01-01
[3430 rows x 10 columns]
If you were to implement this, to help robustness in loading GTFS feeds with slight validity issues, it would probably be a good idea to do this for all loaded frames.
I'm not sure if its of interest for you to implement this feature, as its a problem with some GTFS files, not the software. However, I suspect other GTFS readers must do something similar as a colleague was able to read this GTFS feed using urbanaccess, as per this thread https://github.com/global-healthy-liveable-cities/global-indicators/issues/275 (focused on a different issue, which I'm scoping whether usage of GTFS-Lite can resolve).
In case it helps, I'll look into drafting a pull request implementing this change.
We compute transit service intensity as the number of unique trips that stop at stops within a specified area (usually a buffer around a block group).
This is done by finding all stops within a certain zone, and then finding all trips, and then counting the number of unique trips visiting that zone within a 24-hour period (or within the GTFS service schedule).
For GTFS-lite, we simply need to verify the function works as intended.
Right now, unique trips (and service hours, but that's another issue) doesn't account for the "frequencies.txt" trip definitions.
Do to this, we will have to handle trip_ids that appear in frequencies.txt
separately as follows:
stop_times
to get all trips with stops in themfrequencies
dataset, if so infer the total number of tripsI encountered a few GTFS feeds with __MACOSX folder inside the feed. for example:
zipfile.ZipFile(orgzipfile, 'r').namelist()
['GTFS Import/',
'__MACOSX/._GTFS Import',
'GTFS Import/agency.txt',
'__MACOSX/GTFS Import/._agency.txt',
'GTFS Import/calendar_dates.txt',
'__MACOSX/GTFS Import/._calendar_dates.txt',
'GTFS Import/stop_times.txt',
'__MACOSX/GTFS Import/._stop_times.txt',
'GTFS Import/shapes.txt',
'__MACOSX/GTFS Import/._shapes.txt',
'GTFS Import/trips.txt',
'__MACOSX/GTFS Import/._trips.txt',
'GTFS Import/stops.txt',
'__MACOSX/GTFS Import/._stops.txt',
'GTFS Import/calendar.txt',
'__MACOSX/GTFS Import/._calendar.txt',
'GTFS Import/routes.txt',
'__MACOSX/GTFS Import/._routes.txt']
gtfs-lite fails to read such feed indicating a Unicode error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 37: invalid continuation byte
A possible solution could be a slight change when reading the nested files in
Line 188 in 52516ff
by adding an exclusion condition:
if req in file and not str(file).startswith('__MACOSX/'):
There are feeds where times in between major stops are not specified (as they are not required to be specified). In its current form, unique_trip_count_at_stops()
fails to count these as they pose a larger problem: When filtering by a time span, where do you count these?
We would have to add a function to "interpolate_all_trips" in some manner.
This will allow for the use of older (or stale) GTFS feeds that still operate in the same way to be used for present/later day analyses.
The route_frequency_matrix
returns an error when analyzing a GTFS file. The error seems to be in the concatenation process:
1132 # Assemble final matrix
-> 1133 mx = pd.concat(slices, axis="index")
1134 mx = mx.fillna(0)
1135 return mx.reset_index(drop=True)
ValueError: No objects to concatenate
If a GTFS feed has an empty text file for transfers.txt, then load_zip()
method results in a an error, as in below example:
/home/ghsci/process/data/transit_feeds/Marc issues/Malaga/20230519_130136_Metro_Malaga
Traceback (most recent call last):
File "/home/ghsci/process/subprocesses/_10_gtfs_analysis.py", line 311, in <module>
main()
File "/home/ghsci/process/subprocesses/_10_gtfs_analysis.py", line 307, in main
gtfs_analysis(codename)
File "/home/ghsci/process/subprocesses/_10_gtfs_analysis.py", line 92, in gtfs_analysis
loaded_feeds = gtfslite.GTFS.load_zip(f'{gtfsfeed_path}.zip')
File "/env/lib/python3.10/site-packages/gtfslite/gtfs.py", line 364, in load_zip
transfers = pd.read_csv(
File "/env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 577, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1407, in __init__
self._engine = self._make_engine(f, self.engine)
File "/env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1679, in _make_engine
return mapping[engine](f, **self.options)
File "/env/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 555, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
Transfers.txt is optional, so its possible that rather than the file being absent some agencies may leave it empty.
A possible solution is presented here: https://stackoverflow.com/a/42143354
Following that approach, perhaps a try except clause in load_clean_feeds()
could catch the EmptyDataError and return None in that case, perhaps if an argument indicating 'optional=True
' (that defaults to False) was provided. That would seem consistent with the current load if present Else None approach for optional gtfs feed files.
Update the namespace so that we can just have
from gtfslite import GTFS
Instead of
from gtfslite.gtfs import GTFS
When dates are checked for validity both calendar
and calendar_dates
are checked separately.
Lines 539 to 544 in 52516ff
These need to be combined to take the minimum of both minimums and the maximum of both maximums.
As a complement to implementing #21, add a start and end time option for the calculation of the route summary.
Following the advice of r5py/r5py#222
When attempting to load a zipped GTFS feed containing the file attributions.txt, a KeyError was raised, suggesting that this file wasn't found in the archive.
I'll paste this below -- I'm trialling using the GTFS-Lite library in an existing workflow, so there's a bit of extra stuff here:
/home/ghsci/work/process/data/transit_feeds/bilbao_gtfs/20230505_130305_Euskadi_Euskotren
Traceback (most recent call last):
File "/home/ghsci/work/process/subprocesses/_10_gtfs_analysis.py", line 266, in <module>
main()
File "/home/ghsci/work/process/subprocesses/_10_gtfs_analysis.py", line 81, in main
loaded_feeds = gtfslite.GTFS.load_zip(f'{gtfsfeed_path}.zip')
File "/env/lib/python3.10/site-packages/gtfslite/gtfs.py", line 460, in load_zip
zip_file.open("attributions.txt"),
File "/env/lib/python3.10/zipfile.py", line 1514, in open
zinfo = self.getinfo(name)
File "/env/lib/python3.10/zipfile.py", line 1441, in getinfo
raise KeyError(
KeyError: "There is no item named 'attributions.txt' in the archive"
I had a quick look at the gtfs-lite/gtfs.py file and I suspect what is happening is that the check for attributes.txt passes (the file was correctly identified as being present), but when the call is made to load the file from the zipped directory, this is made without using the filepaths
dictionary:
Line 455 in 0bbdd2e
elsewhere, the files are loaded using the filepaths
dictionary, for example,
Line 434 in 0bbdd2e
I'll see if I can have a go at making this change and if it makes a difference, but this looks to be the source of the error. I believe I didn't notice this until now as the other feeds I am using happen to not have this file.
Thanks for your work on this package, it looks useful! Let me know if you need any more information.
Looks like a date is being compared with a method. Need to go back and verify this function is working as intended.
File ".../gtfs.py", line 569, in valid_date
if first_date > date_to_check or last_date < date_to_check:
TypeError: '>' not supported between instances of 'datetime.date' and 'builtin_function_or_method'
Need to follow in the footsteps of r5py
and company and write a contribution guide.
Once loaded it's possible to manipulate and adjust the feeds in any number of ways. Those feeds, once changed, should be writeable back out to a standard GTFS package.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.