Documentation
UniOn (UO)
UniOn is an ontology representing universities and their educational offering: it describes the available degree programs, their access methods and requirements, the provided educational activities, achievable titles and qualifications, administrative locations, teaching places and more.
- Data repository: GitHub
- Authors: Alice Bordignon, Francesca Borriello, Lorenza Pierucci
- Created: September, 2022
- Specification: Click here
If you have any questions that are beyond the scope of this help file, please feel free to email alice.bordignon@studio.unibo.it, francesca.borriello@studio.unibo.it, lorenza.pierucci@studio.unibo.it.
1.2 Goal
UniOn allows to:
- Understand and traverse the academic domain, by getting an overview of universities’ educational programs;
- Compare different universities’ offerings;
- Extract useful information;
Potential users could thus include prospective students looking for the most suitable university course as well as for universities or other related institutions willing to describe and publish information about their training offer.
1.3 Competency questions
In order to better understand what UniOn should be able to represent and what a user should be able to extract from it, we designed some natural language competency questions.
- Which degree programs are delivered by the University of Bologna in the Humanities field?
- Which University offers a degree in Philosophy?
- What is the access modality for the master’s degree in Semiotics?
- What is the study plan of the Law degree program at the University La Sapienza?
- How many CFUs in M-STO/01 does the University of Venice History Degree Program provide?
- Which department manages the Digital Humanities Master Degree at the University of Pisa?
- How many laboratories does the University of Bologna Chemistry degree program entail?
- Which laboratories does the University of Bologna Chemistry degree program entail?
- Which degree programs belonging to the class L-13 – Biology does the University of Naples offer?
- Where are the Conservation and Restoration of Cultural Heritage degree program courses held?
- Which international master’s degree programs are offered by the University of Milan?
The CQs will later be of use in the assessment phase of the ontology development process through SPARQL queries adapted to our available data.
2. Building the Ontology
2.1 Sources
In order to build our vocabulary, we made use of the following sources:
- Universities websites, specifically webpages providing information about the training offer;
- Open datasets (mainly CSV format) containing related information;

3. Data collection
As we said, in the process of modelling our ontology we worked on a few open CSV files. Once we completed the first draft of our model, we moved on to create our own dataset to merge and integrate all information – which was then scattered among several sources – into a new, all-inclusive structure.
The goal was to then use the obtained dataset to automatically populate our ontology: this would allow us to put it to the test by trying to answer our competency questions.
3.1 The Opendata Beta project by University of Bologna
Our starting point were three datasets (CSV format) made publicly available by the University of Bologna as part of the Opendata Beta project.
Since complete data was only available for the academic year 2018/2019, we restricted our analysis to this period. Also, we decided to focus on one specific academic field, the Humanities.
The three files' selected columns respectively contain:
- A catalogue of all degree programs offered by the University of Bologna; for each of them, the following information is specified: webpage URL, administrative location and teaching place(s), reference academic field, degree type (bachelor’s/master’s ...), duration, whether it is international or not, released qualification, teaching language and type of access.
- A catalogue of all educational activities delivered by the University of Bologna, including their teaching codes. For each educational activity, a ‘degree code’ specifies the degree program it belongs to. Teaching codes are numerical codes univocally identifying each educational activity.
- Specification of each educational activity's SSD and CFUs. For each educational activity, the 'teaching code' specifies the educational activity the SSD and CFUs belong to.
- SSD stands for ‘scientific-disciplinary sectors’: they point to the disciplinary areas each educational activity belongs to.
- CFUs (university formative credits) are a numerical measure of the workload for each exam or activity.
3.2 Data integration
Since some data was missing from our starting datasets, we needed to semi-automatically integrate the following information:
- The degree class: an alphanumeric string which defines the faculty the degree program belongs to;
- The department: the administrative structure managing each degree program;
- The different types of educational activity: exams, seminars, laboratories, internships and final tests.
3.3 The final dataset
Once all information had been collected, we merged and integrated it into our final dataset. To do so, we relied on Pandas, a data analysis and manipulation tool which allowed us to select, align and merge information from the three CSV files.
You can check our code out on our jupyter notebook here
As a result, we obtained a data frame including:
- All degree programmes offered by the University of in the Humanities area for the year 2018/2019 and related information (type, access mode, location, duration, qualification etc.).
- For each degree programme, all related educational activities and their identification codes and CFUs
4. Populating the ontology (A-box)
We populated our ontology using OwlReady2, a package for ontology-oriented programming in Python. It can load OWL 2.0 ontologies as Python objects, modify them, save them, and perform reasoning via HermiT.
To populate our classes we created the individuals, their labels, and established relationships among them.
Throughout the various steps of this phase, we performed an iterative testing process in order to detect errors and consequently adjust our script to build an efficient population environment.
The resulting population script is potentially reusable by anyone having a CSV dataset(s) structured like ours and carrying information for any other Universities, academic years, academic fields etc.
Once all the individuals and all the relationships among them were correctly created, we ran the Protégé logic reasoner on the populated ontology to check its consistency and integrity. Since it inferred correctly what had not been declared manually, we concluded that our ontology was logically structured and working. Finally, we exported the inferred axioms as ontology saving it as an OWL file: it will be our final OWL file on which the SPARQL queries will be run. See our Jupyter notebook for a full explanation of the ontology population steps.
As we have already outlined, UniOn has been modelled pursuing a balance between the completeness of the domain’s description and the actual availability of data. This means that, for it not to be flattened on the (still defective) data, it entails more classes and properties than those that those described by our dataset. Therefore, not all its classes are populated by individuals.
5. SPARQL API
For testing our ontology through SPARQL queries we need an application programming interface (API). An API is an interface, or method/way, for two pieces of software to communicate.
5.1 The virtual environment
We decided that the best starting point to do it was to create a virtual environment using python. Firstly, we created a new local folder to receive the creation and activation of our virtual environment done by prompt commands using the venv module. The venv module provides support for creating lightweight “virtual environments” with their own site directories, optionally isolated from system site directories.
5.2 FLASK: creating the API
Once our virtual environment was activated, we installed all the packages and frameworks needed (Flask and Owlready2). Flask is a widely used micro web framework for creating APIs in Python. Then, we created and added a python script to the folder, that once run creates a local port to send and receive requests, loading our ontology in JSON format, and requiring to return results as a legible list of strings.
5.3 PostMan: using the API
To send requests to the local port and verify the functioning of the API we used PostMan, an API platform for building and using APIs. The script that uses flask and create the local port is available on our GitHub repository.
We verified the working API through a simple GET request, and then we switched to POST to start querying through SPARQL within the PostMan interface, adding the /free_query
at the end of our local host.
6. Conclusions and future developments
As detailed in the previous, sections we tried our best to reach a balance between completeness, feasibility, and applicability of the ontology by creating a simple and consistent logical model able to represent italian universities educational offering. We are aware that our model could be further expanded and developed.
A possible point for future development could be to broaden the ontology by adding other logical blocks regarding various aspects of the universities’ systems (e.g. administration, people and staff involved in the institution, scientific publications, etc.).
To reach a higher level of completeness the contribution of other universities with the publication of Open Data project such as the one developed by the University of Bologna would be suggested: in this way, it could be possible to fully exploit the potential of our ontology by making interesting comparisons between different universities and degree programs.
Another point for future development could be to model other vocabularies able to depict foreign universities systems that could possibly be aligned or compared to our model.
Finally, to enable people to perform queries and test our ontology, the publication of our SPARQL-end point is needed.
7. Specification
To publish UniOn’s documentation we relied on WIzard for DOCumenting Ontologies (WIDOCO) an open source tool which automatically creates a documentation with human readable descriptions of the ontology terms and a visualization through WebVowl.
Thank you for your attention.
This project and documentation have been created for the course in Knowledge Representation and Extraction for the Digital Humanities and Digital Knowledge's Degree Programme at the University of Bologna.