Published: May 15, 2024
If you are a web developer and creating websites, you know how cumbersome it is to generate a new sitemap whenever you update the site/ add a new page.
This is doe to upload the sitemap to Google Search Console, so that your new/ updated pages will be indexed/ refreshed correctly.
The process goes like this:
In this blogpost, I'd like to propose a way to automate step 2 - sitemap generation - using git hooks and Pyhon, so that whenever we commit the changes using Git, the sitemap will be automatically refreshed.
We will use a Git pre-commit hook to achieve this, and once this is done, we don't need to worry about updating sitemap.xml manually because it will always be up to date automatically.
For your information, I'm using SvelteKit, so if you are using the same framework, the code I'll be sharing below can be used as is. With a different framework, I guess you can use the code as is, but no guarantee is given.
I assume you already have git installed on your machinem but if you don't, don't worry. User instructions here to install git.
Once you have it installed, go to the folder where you are developing your website. If you haven't initialized a git repo already, start one using git init command on your terminal. Make sure that you are in this folder in the terminal when initializing.
Once git is initialized, you will have a folder that's named .git. If you can't see it don't worry, because files that start with a period are usually hidden in file explorer. On a MacBook you can use Command + Shift + . to see the folder. Below image shows the contents of this file:
Once you can access .git folder, you will see a subfolder named hooks. This is where git hooks are saved.
Git hooks are scripts that run automatically every time a particular event occurs in a Git repository. They let you customize Git's internal behavior and trigger customizable actions at key points in the development life cycle.
Initially, there are no hooks, and there are only sample hooks found within this folder:
Each file in this subfolder correspond to a particular event when you are using Git. For example, actions determined in pre-commit run right before a commit is made. This is exactly when we want to create our sitemap.
Firstly, let's remove .sample suffix in .git/hooks/pre-commit.sample file and change its contents with below:
#!/usr/local/bin/python3
# Above line tells git that we'd like to use Python
# Your python executable might be in another location such as /usr/bin/python
# Please adjust accordingly
# Import necessary libraries
import os
from bs4 import BeautifulSoup
from datetime import datetime
# Build the website first to identify links
os.system("npm run build")
# Define variables
root = "www.danyelkoca.com"
today = datetime.today().strftime('%Y-%m-%d')
# Create a list that will keep all links in the website
links = []
# Get all links in your pages using BeautifulSoup
# Exclude https:// sites, as they link to other pages
for path, subdirs, files in os.walk("path/to/your/build/folder"):
for name in files:
if name.endswith(".html"):
with open(os.path.join(path, name)) as fp:
soup = BeautifulSoup(fp, 'html.parser')
page_links = [link["href"] for link in soup.find_all("a") if link["href"] and not link["href"].startswith("https") ]
links += page_links
# Remove links that are repeated
links = list(set(links))
# Add one more item for the root (www.danyelkoca.com)
links += [""]
# Sort based on length (Shorter links usually are higher in hieararchy)
# E.g. /en
# /en/works
# /en/works/machine
links.sort(key=lambda s: len(s))
# Initialize xml with the links
xml = ""
for link in links:
xml += f" <url><loc>{root}{link}</loc><lastmod>{today}</lastmod></url>\n"
# Add parent elements
xml = f"""<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
{xml}</urlset>"""
# Save to your static folder
with open("path/to/your/static/sitemap.xml", "w") as file:
file.write(xml)
Note: if you are getting Permission Denied error when committing, that means Git doesn't have the right permission to run code/ alter your files. Run below to give permission to Git:
chmod +x .git/hooks/pre-commit
chmod +x /usr/local/bin/python3
You are basically giving permission to your git hooks file and to your python executable. If you python executable is somewhere else, change the path accordingly.
Once you stage the changes with git add . and commit your changes with git commit -m "message", the code that is defined above in .git/hooks/pre-commit will run and as a result, will save sitemap.xml in your static folder.
The code works and it creates a file like below (with prettify) as expected:
Looks neat right?
Once the sitemap.xml is generated, don't forrget to stage & commit the changes again as that file is created after the original commit is done.
In this blogpost we have devised a way to automate sitemap generation as a Git pre-commit hook using Python.
Using the code shared above, I hope you can save yourself some workload and have beer instead :)
Happy hacking!
Leave comment
Comments
Check out other blog posts
2024/06/19
Create A Simple and Dynamic Tooltip With Svelte and JavaScript
2024/06/17
Create an Interactive Map of Tokyo with JavaScript
2024/06/14
How to Easily Fix Japanese Character Issue in Matplotlib
2024/06/13
Book Review | Talking to Strangers: What We Should Know about the People We Don't Know by Malcolm Gladwell
2024/06/07
Most Commonly Used 3,000 Kanjis in Japanese
2024/06/07
Replace With Regex Using VSCode
2024/06/06
Do Not Use Readable Store in Svelte
2024/06/05
Increase Website Load Speed by Compressing Data with Gzip and Pako
2024/05/31
Find the Word the Mouse is Pointing to on a Webpage with JavaScript
2024/05/29
Create an Interactive Map with Svelte using SVG
2024/05/28
Book Review | Originals: How Non-Conformists Move the World by Adam Grant & Sheryl Sandberg
2024/05/27
How to Algorithmically Solve Sudoku Using Javascript
2024/05/26
How I Increased Traffic to my Website by 10x in a Month
2024/05/24
Life is Like Cycling
2024/05/19
Generate a Complete Sudoku Grid with Backtracking Algorithm in JavaScript
2024/05/16
Why Tailwind is Amazing and How It Makes Web Dev a Breeze
2024/05/14
Book Review | Range: Why Generalists Triumph in a Specialized World by David Epstein
2024/05/13
What is Svelte and SvelteKit?
2024/05/12
Internationalization with SvelteKit (Multiple Language Support)
2024/05/11
Reduce Svelte Deploy Time With Caching
2024/05/10
Lazy Load Content With Svelte and Intersection Oberver
2024/05/10
Find the Optimal Stock Portfolio with a Genetic Algorithm
2024/05/09
Convert ShapeFile To SVG With Python
2024/05/08
Reactivity In Svelte: Variables, Binding, and Key Function
2024/05/07
Book Review | The Art Of War by Sun Tzu
2024/05/06
Specialists Are Dead. Long Live Generalists!
2024/05/03
Analyze Voter Behavior in Turkish Elections with Python
2024/05/01
Create Turkish Voter Profile Database With Web Scraping
2024/04/30
Make Infinite Scroll With Svelte and Tailwind
2024/04/29
How I Reached Japanese Proficiency In Under A Year
2024/04/25
Use-ready Website Template With Svelte and Tailwind
2024/01/29
Lazy Engineers Make Lousy Products
2024/01/28
On Greatness
2024/01/28
Converting PDF to PNG on a MacBook
2023/12/31
Recapping 2023: Compilation of 24 books read
2023/12/30
Create a Photo Collage with Python PIL
2024/01/09
Detect Device & Browser of Visitors to Your Website
2024/01/19
Anatomy of a ChatGPT Response