Fixed the date formatting#47
Conversation
|
@justinmayer @kernc I just fixed the linter errors. What a found a bit strange in that project: there is a So I just had to manually run ruff to check on the issues locally which I found a bit strange. So how are you supposed to work with pre-commit in that project? |
|
There are certain tools, like Pre-commit, that I install globally (e.g., via |
| tz = "-00:00" | ||
| return date.strftime("%Y-%m-%dT%H:%M:%S") + tz | ||
| def format_date(date_time): | ||
| """Format the datetime in the expected format.""" | ||
| if date_time.tzinfo is None: | ||
| date_time = date_time.replace(tzinfo=timezone.utc) |
There was a problem hiding this comment.
Does replacing tzinfo do any time conversion, or is naive datetime always UTC and not localtime?
There was a problem hiding this comment.
Naive datetime has no time zone. The old implementation did set an invalid UTC timezone -00:00 when no time zone was present. I just fixed that invalid UTC timezone.
Please check docs on .replace(): https://docs.python.org/3/library/datetime.html#datetime.datetime.replace
When you change a time zone with replace, it just changes timezone nothing else.
There was a problem hiding this comment.
Understood. However this then still somewhat buggily assumes all timestamps are UTC. How about:
local_tz = datetime.now().astimezone().tzinfo
date_time = date_time.replace(tzinfo=local_tz)?
There was a problem hiding this comment.
Well, this question should be answered elsewhere in the code. This is just a formatter function.
This function just takes a datetime as function argument. So where does this datetime come from? Why does it not have a time zone set?
I am just addressing the formatting issue here. If you think the datetime value is wrong, you probably want to open a new issue.
There was a problem hiding this comment.
This is a formatter function that adds mandatory tzinfo the first time it is required. 🤷 I'm not sure what the datetime policy is Pelican-wide. I guess I can agree.
There was a problem hiding this comment.
I think both of you, @max-pfeiffer & @kernc, are correct: this function should only do formatting, and we probably should not blindly stamp UTC into the returned date_time. Also, the latter seems to contradict the former: we aren't only doing formatting if we are adding information that wasn't there before.
Perhaps one solution could be to add and use a small function to get the time zone? Something like:
def get_time_zone():
"""Get the local time zone, defaulting to UTC if unavailable."""
time_zone = datetime.now().astimezone().tzinfo
if time_zone is None:
time_zone = timezone.utc
return time_zoneAnd use it in the format_date function via:
date_time = date_time.replace(tzinfo=get_time_zone())What do you think?
justinmayer
left a comment
There was a problem hiding this comment.
Thanks to both Max and @kernc for your patience. I made a few comments. Let me know what you think about them?
| tz = "-00:00" | ||
| return date.strftime("%Y-%m-%dT%H:%M:%S") + tz | ||
| def format_date(date_time): | ||
| """Format the datetime in the expected format.""" | ||
| if date_time.tzinfo is None: | ||
| date_time = date_time.replace(tzinfo=timezone.utc) |
There was a problem hiding this comment.
I think both of you, @max-pfeiffer & @kernc, are correct: this function should only do formatting, and we probably should not blindly stamp UTC into the returned date_time. Also, the latter seems to contradict the former: we aren't only doing formatting if we are adding information that wasn't there before.
Perhaps one solution could be to add and use a small function to get the time zone? Something like:
def get_time_zone():
"""Get the local time zone, defaulting to UTC if unavailable."""
time_zone = datetime.now().astimezone().tzinfo
if time_zone is None:
time_zone = timezone.utc
return time_zoneAnd use it in the format_date function via:
date_time = date_time.replace(tzinfo=get_time_zone())What do you think?
|
|
||
| xml_root = ET.fromstring(contents) | ||
| localhost_element = xml_root[0] | ||
| lastmod = localhost_element[1].text |
There was a problem hiding this comment.
These [0] & [1] indices could be fragile: if the sitemap structure changes, the test might break silently or with a confusing error. Perhaps an alternative approach might be to use something like xml_root.find('.//{appropriate-namespace-here}lastmod') in order to make the intent explicit and the test more resilient?
| xml_root = ET.fromstring(contents) | ||
| localhost_element = xml_root[0] | ||
| lastmod = localhost_element[1].text | ||
| assert lastmod.endswith("+00:00") |
There was a problem hiding this comment.
Rather than only checking the timezone suffix, perhaps we should validate that the overall string conforms to the expected format? For example, a regular expression such as \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}[+-]\d{2}:\d{2} that matches the entire expected string might catch format regressions beyond just the timezone portion. What do you think?
There was a problem hiding this comment.
I was addressing that particular formatting issue. My fix fixes this issue. I you want to do a bigger refactoring go ahead.
I have currently no more time I can spend on this.
There was a problem hiding this comment.
I understand. Your thoughts on my comments, if you have any, would still be welcome.
The current implementation was producing an invalid timezone.
fixes #45