Generate Sitemap Automatically with Git Hooks Using Python

Published: May 15, 2024

Generate Sitemap Automatically with Git Hooks Using Python

Background

If you are a web developer and creating websites, you know how cumbersome it is to generate a new sitemap whenever you update the site/ add a new page.

This is doe to upload the sitemap to Google Search Console, so that your new/ updated pages will be indexed/ refreshed correctly.

The process goes like this:

  1. Update site
  2. Update sitemap.xml
  3. Push changes & Deploy site
  4. Upload the new sitemap to Google Search Console

In this blogpost, I'd like to propose a way to automate step 2 - sitemap generation - using git hooks and Pyhon, so that whenever we commit the changes using Git, the sitemap will be automatically refreshed.

We will use a Git pre-commit hook to achieve this, and once this is done, we don't need to worry about updating sitemap.xml manually because it will always be up to date automatically.

For your information, I'm using SvelteKit, so if you are using the same framework, the code I'll be sharing below can be used as is. With a different framework, I guess you can use the code as is, but no guarantee is given.

Method

1. Find the .git folder

I assume you already have git installed on your machinem but if you don't, don't worry. User instructions here to install git.

Once you have it installed, go to the folder where you are developing your website. If you haven't initialized a git repo already, start one using git init command on your terminal. Make sure that you are in this folder in the terminal when initializing.

Once git is initialized, you will have a folder that's named .git. If you can't see it don't worry, because files that start with a period are usually hidden in file explorer. On a MacBook you can use Command + Shift + . to see the folder. Below image shows the contents of this file:

Git Folder Contents

2. Find the hooks folder in .git folder

Once you can access .git folder, you will see a subfolder named hooks. This is where git hooks are saved.

Git hooks are scripts that run automatically every time a particular event occurs in a Git repository. They let you customize Git's internal behavior and trigger customizable actions at key points in the development life cycle.

Initially, there are no hooks, and there are only sample hooks found within this folder:

Git Hooks Subfolder

Each file in this subfolder correspond to a particular event when you are using Git. For example, actions determined in pre-commit run right before a commit is made. This is exactly when we want to create our sitemap.

3. Code in pre-commit file

Firstly, let's remove .sample suffix in .git/hooks/pre-commit.sample file and change its contents with below:

#!/usr/local/bin/python3

# Above line tells git that we'd like to use Python
# Your python executable might be in another location such as /usr/bin/python
# Please adjust accordingly

# Import necessary libraries
import os
from bs4 import BeautifulSoup
from datetime import datetime

# Build the website first to identify links
os.system("npm run build")

# Define variables
root = "www.danyelkoca.com"
today = datetime.today().strftime('%Y-%m-%d')

# Create a list that will keep all links in the website
links =  []

# Get all links in your pages using BeautifulSoup
# Exclude https:// sites, as they link to other pages
for path, subdirs, files in os.walk("path/to/your/build/folder"):
 for name in files:
  if name.endswith(".html"):
   with open(os.path.join(path, name)) as fp:
    soup = BeautifulSoup(fp, 'html.parser')
    page_links = [link["href"] for link in soup.find_all("a") if link["href"] and not link["href"].startswith("https") ]
    links += page_links
					

# Remove links that are repeated
links = list(set(links))

# Add one more item for the root (www.danyelkoca.com)
links += [""]

# Sort based on length (Shorter links usually are higher in hieararchy)
# E.g. /en
# /en/works
# /en/works/machine
links.sort(key=lambda s: len(s))

# Initialize xml with the links
xml = ""
for link in links:
    xml += f"   <url><loc>{root}{link}</loc><lastmod>{today}</lastmod></url>\n"


# Add parent elements
xml = f"""<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
{xml}</urlset>"""

# Save to your static folder
with open("path/to/your/static/sitemap.xml", "w") as file:
	file.write(xml)

Note: if you are getting Permission Denied error when committing, that means Git doesn't have the right permission to run code/ alter your files. Run below to give permission to Git:

chmod +x .git/hooks/pre-commit
chmod +x /usr/local/bin/python3

You are basically giving permission to your git hooks file and to your python executable. If you python executable is somewhere else, change the path accordingly.

Result

Once you stage the changes with git add . and commit your changes with git commit -m "message", the code that is defined above in .git/hooks/pre-commit will run and as a result, will save sitemap.xml in your static folder.

The code works and it creates a file like below (with prettify) as expected:

Resulting XML file

Looks neat right?

Once the sitemap.xml is generated, don't forrget to stage & commit the changes again as that file is created after the original commit is done.

Conclusion

In this blogpost we have devised a way to automate sitemap generation as a Git pre-commit hook using Python.

Using the code shared above, I hope you can save yourself some workload and have beer instead :)

Happy hacking!

Leave comment

Comments

Check out other blog posts

Create A Simple and Dynamic Tooltip With Svelte and JavaScript

2024/06/19

Create A Simple and Dynamic Tooltip With Svelte and JavaScript

JavaScriptSvelteSimpleDynamicTooltipFront-end
Create an Interactive Map of Tokyo with JavaScript

2024/06/17

Create an Interactive Map of Tokyo with JavaScript

SvelteSVGJavaScriptTailwindInteractive MapTokyoJapanTokyo Metropolitan Area23 Wards
How to Easily Fix Japanese Character Issue in Matplotlib

2024/06/14

How to Easily Fix Japanese Character Issue in Matplotlib

MatplotlibGraphChartPythonJapanese charactersIssueBug
Book Review | Talking to Strangers: What We Should Know about the People We Don't Know by Malcolm Gladwell

2024/06/13

Book Review | Talking to Strangers: What We Should Know about the People We Don't Know by Malcolm Gladwell

Book ReviewTalking to StrangersWhat We Should Know about the People We Don't KnowMalcolm Gladwell
Most Commonly Used 3,000 Kanjis in Japanese

2024/06/07

Most Commonly Used 3,000 Kanjis in Japanese

Most CommonKanji3000ListUsage FrequencyJapaneseJLPTLanguageStudyingWordsKanji ImportanceWord Prevalence
Replace With Regex Using VSCode

2024/06/07

Replace With Regex Using VSCode

VSCodeRegexFindReplaceConditional Replace
Do Not Use Readable Store in Svelte

2024/06/06

Do Not Use Readable Store in Svelte

SvelteReadableWritableState ManagementStoreSpeedMemoryFile Size
Increase Website Load Speed by Compressing Data with Gzip and Pako

2024/06/05

Increase Website Load Speed by Compressing Data with Gzip and Pako

GzipCompressionPakoWebsite Load SpeedSvelteKit
Find the Word the Mouse is Pointing to on a Webpage with JavaScript

2024/05/31

Find the Word the Mouse is Pointing to on a Webpage with JavaScript

JavascriptMousePointerHoverWeb Development
Create an Interactive Map with Svelte using SVG

2024/05/29

Create an Interactive Map with Svelte using SVG

SvelteSVGInteractive MapFront-end
Book Review | Originals: How Non-Conformists Move the World by Adam Grant & Sheryl Sandberg

2024/05/28

Book Review | Originals: How Non-Conformists Move the World by Adam Grant & Sheryl Sandberg

Book ReviewOriginalsAdam Grant & Sheryl SandbergHow Non-Conformists Move the World
How to Algorithmically Solve Sudoku Using Javascript

2024/05/27

How to Algorithmically Solve Sudoku Using Javascript

Solve SudokuAlgorithmJavaScriptProgramming
How I Increased Traffic to my Website by 10x in a Month

2024/05/26

How I Increased Traffic to my Website by 10x in a Month

Increase Website TrafficClicksImpressionsGoogle Search Console
Life is Like Cycling

2024/05/24

Life is Like Cycling

CyclingLifePhilosophySuccess
Generate a Complete Sudoku Grid with Backtracking Algorithm in JavaScript

2024/05/19

Generate a Complete Sudoku Grid with Backtracking Algorithm in JavaScript

SudokuComplete GridBacktracking AlgorithmJavaScript
Why Tailwind is Amazing and How It Makes Web Dev a Breeze

2024/05/16

Why Tailwind is Amazing and How It Makes Web Dev a Breeze

TailwindAmazingFront-endWeb Development
Book Review | Range: Why Generalists Triumph in a Specialized World by David Epstein

2024/05/14

Book Review | Range: Why Generalists Triumph in a Specialized World by David Epstein

Book ReviewRangeDavid EpsteinWhy Generalists Triumph in a Specialized World
What is Svelte and SvelteKit?

2024/05/13

What is Svelte and SvelteKit?

SvelteSvelteKitFront-endVite
Internationalization with SvelteKit (Multiple Language Support)

2024/05/12

Internationalization with SvelteKit (Multiple Language Support)

InternationalizationI18NSvelteKitLanguage Support
Reduce Svelte Deploy Time With Caching

2024/05/11

Reduce Svelte Deploy Time With Caching

SvelteEnhanced ImageCachingDeploy Time
Lazy Load Content With Svelte and Intersection Oberver

2024/05/10

Lazy Load Content With Svelte and Intersection Oberver

Lazy LoadingWebsite Speed OptimizationSvelteIntersection Observer
Find the Optimal Stock Portfolio with a Genetic Algorithm

2024/05/10

Find the Optimal Stock Portfolio with a Genetic Algorithm

Stock marketPortfolio OptimizationGenetic AlgorithmPython
Convert ShapeFile To SVG With Python

2024/05/09

Convert ShapeFile To SVG With Python

ShapeFileSVGPythonGeoJSON
Reactivity In Svelte: Variables, Binding, and Key Function

2024/05/08

Reactivity In Svelte: Variables, Binding, and Key Function

SvelteReactivityBindingKey Function
Book Review | The Art Of War by Sun Tzu

2024/05/07

Book Review | The Art Of War by Sun Tzu

Book ReviewThe Art Of WarSun TzuThomas Cleary
Specialists Are Dead. Long Live Generalists!

2024/05/06

Specialists Are Dead. Long Live Generalists!

SpecialistGeneralistParadigm ShiftSoftware Engineering
Analyze Voter Behavior in Turkish Elections with Python

2024/05/03

Analyze Voter Behavior in Turkish Elections with Python

TurkeyAge Analysis2018 ElectionsVoter Behavior
Create Turkish Voter Profile Database With Web Scraping

2024/05/01

Create Turkish Voter Profile Database With Web Scraping

PythonSeleniumWeb ScrapingTurkish Elections
Make Infinite Scroll With Svelte and Tailwind

2024/04/30

Make Infinite Scroll With Svelte and Tailwind

SvelteTailwindInfinite ScrollFront-end
How I Reached Japanese Proficiency In Under A Year

2024/04/29

How I Reached Japanese Proficiency In Under A Year

JapaneseProficiencyJLPTBusiness
Use-ready Website Template With Svelte and Tailwind

2024/04/25

Use-ready Website Template With Svelte and Tailwind

Website TemplateFront-endSvelteTailwind
Lazy Engineers Make Lousy Products

2024/01/29

Lazy Engineers Make Lousy Products

Lazy engineerLousy productStarbucksSBI
On Greatness

2024/01/28

On Greatness

GreatnessMeaning of lifeSatisfactory lifePurpose
Converting PDF to PNG on a MacBook

2024/01/28

Converting PDF to PNG on a MacBook

PDFPNGMacBookAutomator
Recapping 2023: Compilation of 24 books read

2023/12/31

Recapping 2023: Compilation of 24 books read

BooksReading2023Reflections
Create a Photo Collage with Python PIL

2023/12/30

Create a Photo Collage with Python PIL

PythonPILImage ProcessingCollage
Detect Device & Browser of Visitors to Your Website

2024/01/09

Detect Device & Browser of Visitors to Your Website

JavascriptDevice DetectionBrowser DetectionWebsite Analytics
Anatomy of a ChatGPT Response

2024/01/19

Anatomy of a ChatGPT Response

ChatGPTLarge Language ModelMachine LearningGenerative AI