Skip to content

Fixed the date formatting#47

Open
max-pfeiffer wants to merge 3 commits into
pelican-plugins:mainfrom
max-pfeiffer:bugfix/dateformat
Open

Fixed the date formatting#47
max-pfeiffer wants to merge 3 commits into
pelican-plugins:mainfrom
max-pfeiffer:bugfix/dateformat

Conversation

@max-pfeiffer

Copy link
Copy Markdown

The current implementation was producing an invalid timezone.

fixes #45

@justinmayer justinmayer requested a review from kernc November 23, 2025 15:29
@max-pfeiffer

Copy link
Copy Markdown
Author

@justinmayer @kernc I just fixed the linter errors.

What a found a bit strange in that project: there is a .pre-commit-config.yaml but pre-commit is not included in the projects dependencies. So I could not install the pre-commit handler with pre-commit install.

So I just had to manually run ruff to check on the issues locally which I found a bit strange.

So how are you supposed to work with pre-commit in that project?

@justinmayer

Copy link
Copy Markdown
Contributor

There are certain tools, like Pre-commit, that I install globally (e.g., via pipx install pre-commit), because I don’t need it installed dozens of times across every single virtual environment. For those that prefer to install these tools inside the respective virtual environments, there is an Invoke task provided for this purpose. Activate a virtual environment and then run: invoke tools

Comment on lines -45 to +42
tz = "-00:00"
return date.strftime("%Y-%m-%dT%H:%M:%S") + tz
def format_date(date_time):
"""Format the datetime in the expected format."""
if date_time.tzinfo is None:
date_time = date_time.replace(tzinfo=timezone.utc)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does replacing tzinfo do any time conversion, or is naive datetime always UTC and not localtime?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive datetime has no time zone. The old implementation did set an invalid UTC timezone -00:00 when no time zone was present. I just fixed that invalid UTC timezone.

Please check docs on .replace(): https://docs.python.org/3/library/datetime.html#datetime.datetime.replace

When you change a time zone with replace, it just changes timezone nothing else.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. However this then still somewhat buggily assumes all timestamps are UTC. How about:

        local_tz = datetime.now().astimezone().tzinfo
        date_time = date_time.replace(tzinfo=local_tz)

?

@max-pfeiffer max-pfeiffer Nov 25, 2025

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this question should be answered elsewhere in the code. This is just a formatter function.

This function just takes a datetime as function argument. So where does this datetime come from? Why does it not have a time zone set?

I am just addressing the formatting issue here. If you think the datetime value is wrong, you probably want to open a new issue.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a formatter function that adds mandatory tzinfo the first time it is required. 🤷 I'm not sure what the datetime policy is Pelican-wide. I guess I can agree.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kernc Can we merge it?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinmayer Any change to get this merged?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of you, @max-pfeiffer & @kernc, are correct: this function should only do formatting, and we probably should not blindly stamp UTC into the returned date_time. Also, the latter seems to contradict the former: we aren't only doing formatting if we are adding information that wasn't there before.

Perhaps one solution could be to add and use a small function to get the time zone? Something like:

def get_time_zone():
    """Get the local time zone, defaulting to UTC if unavailable."""
    time_zone = datetime.now().astimezone().tzinfo
    if time_zone is None:
        time_zone = timezone.utc

return time_zone

And use it in the format_date function via:

date_time = date_time.replace(tzinfo=get_time_zone())

What do you think?

@justinmayer justinmayer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to both Max and @kernc for your patience. I made a few comments. Let me know what you think about them?

Comment on lines -45 to +42
tz = "-00:00"
return date.strftime("%Y-%m-%dT%H:%M:%S") + tz
def format_date(date_time):
"""Format the datetime in the expected format."""
if date_time.tzinfo is None:
date_time = date_time.replace(tzinfo=timezone.utc)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of you, @max-pfeiffer & @kernc, are correct: this function should only do formatting, and we probably should not blindly stamp UTC into the returned date_time. Also, the latter seems to contradict the former: we aren't only doing formatting if we are adding information that wasn't there before.

Perhaps one solution could be to add and use a small function to get the time zone? Something like:

def get_time_zone():
    """Get the local time zone, defaulting to UTC if unavailable."""
    time_zone = datetime.now().astimezone().tzinfo
    if time_zone is None:
        time_zone = timezone.utc

return time_zone

And use it in the format_date function via:

date_time = date_time.replace(tzinfo=get_time_zone())

What do you think?


xml_root = ET.fromstring(contents)
localhost_element = xml_root[0]
lastmod = localhost_element[1].text

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These [0] & [1] indices could be fragile: if the sitemap structure changes, the test might break silently or with a confusing error. Perhaps an alternative approach might be to use something like xml_root.find('.//{appropriate-namespace-here}lastmod') in order to make the intent explicit and the test more resilient?

xml_root = ET.fromstring(contents)
localhost_element = xml_root[0]
lastmod = localhost_element[1].text
assert lastmod.endswith("+00:00")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than only checking the timezone suffix, perhaps we should validate that the overall string conforms to the expected format? For example, a regular expression such as \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}[+-]\d{2}:\d{2} that matches the entire expected string might catch format regressions beyond just the timezone portion. What do you think?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was addressing that particular formatting issue. My fix fixes this issue. I you want to do a bigger refactoring go ahead.

I have currently no more time I can spend on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. Your thoughts on my comments, if you have any, would still be welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sitemap plugin produces sometimes invalid timestamps for <lastmod> XML tag

3 participants