How to get the latest packages from YUM repository
Once I was working on an overengineered automation, I came across a problem of
downloading the latest packages from a YUM repository. Of course, users can
achieve that by defining the repository in /etc/yum.repos.d
and then run
yum download the-desired-package
. Sure, that is a way to go, but I decided to
go the harder way and play with the repository metadata.
Every YUM repository has a repodata directory that contains multiple files. Two of them I was interested in were repomd.xml and \<hash>-primary.xml.gz. The first one - repomd is basically an index of the repodata directory. It contains checksums, location, timestamp, and size of every file in the repomd directory. Then the primary is a list of packages contained in the repository. Every package in the list contains multiple attributes including their name and version. Knowing that we are able to programmatically get the latest package from the repository.
The only thing we need to know to be able to do it is a baseurl of the repository. Below is the example that is taken from the Gremlin's documentation. It is a small Bash script that does open the repomd and searches for the name of the primary file. Then it opens the primary file and looks for the exact version of the package.
#!/bin/bash
## Download a Gremlin RPM without using yum
##
## requires: curl, xpath, gzip
##
VERSION=$(curl https://rpm.gremlin.com/noarch/latest | cut -d- -f1)
## Or optionally set a specific version
# VERSION=2.16.3
curl -fsSL https://rpm.gremlin.com/noarch/$(\
curl -fsSL https://rpm.gremlin.com/noarch/$(\
curl -fsSL https://rpm.gremlin.com/noarch/repodata/repomd.xml \
| xpath -e 'string(/repomd/data[@type="primary"]/location/@href)' 2>/dev/null) \
| gunzip \
| xpath -e 'string(/metadata/package[version/@ver="${VERSION}" and name="gremlin"][last()]/location/@href)' 2>/dev/null) \
-o gremlin-${VERSION}.x86_64.rpm
curl -fsSL https://rpm.gremlin.com/noarch/$(\
curl -fsSL https://rpm.gremlin.com/noarch/$(\
curl -fsSL https://rpm.gremlin.com/noarch/repodata/repomd.xml \
| xpath -e 'string(/repomd/data[@type="primary"]/location/@href)' 2>/dev/null) \
| gunzip \
| xpath -e 'string(/metadata/package[version/@ver="${VERSION}" and name="gremlind"][last()]/location/@href)' 2>/dev/null) \
-o gremlind-${VERSION}.x86_64.rpm
However, the aim of my work was to get the latest package programmatically.
Luckily, the XPath contains a last()
function that returns the last item in
the list. Here is how I used it in Ansible.
- name: Get the package location from the primary file
xml:
path: "{{ get_primary.dest.split('.')[:-1] | join('.') }}"
namespaces:
rpm: http://linux.duke.edu/metadata/repo
common: http://linux.duke.edu/metadata/common
content: attribute
xpath: /common:metadata/common:package[common:arch="noarch"
and common:name="{{ item }}"][last()]/common:location
register: primary_parse
delegate_to: 127.0.0.1
run_once: true
loop: "{{ rh_internal_packages }}"
In the end, I get the location of the packages in the repository. If baseurl
and location
of the package are joined together, you will get the URL from
which you can download the desired package.
Other sources that helped me to accomplish my work:
https://lago.readthedocs.io/en/0.15/_modules/ovirtlago/repoverify.html