About this document
- Version: 1.0
- Date: September 30, 2024
- Author: PIDA Administration Team
- Last Updated: October 16, 2024
# PIDA Scope & Purpose
PIDA (Persistent IDentifiers for semantic Artifacts) is a free web service that allows users to create, manage, and maintain persistent URLs (PURLs). These PURLs are intended to be used as long-term identifiers, specifically internationalized resource identifiers (IRIs) within semantic artifacts such as terminologies and ontologies. PIDA is guaranteed to be operational and maintained for 10+ years. PIDA is a service dedicated to information engineering and academic research and development.
The purpose of this document is to define terminology and PIDA core features as well as define user responsibility and admin rights to guide correct use and thereby guarantee long-term service integrity. This policy applies to PIDA, including all of its components:
- Redirection service: purls.helmholtz-metadaten.de (opens new window)
- PIDA community repository: https://github.com/Materials-Data-Science-and-Informatics/PIDA (opens new window) with all registered PURLs and integrated “.htaccess” files
- PIDA development repository: https://codebase.helmholtz.cloud/hmc/hmc-public/pida (opens new window)
# Relevant Definitions
- Administrator(s): The person or team responsible for development, management and operation of the PIDA service.
- Registrar: An individual or organization who creates or alters entries in the PIDA database to create or manage PURLs within the system.
- User: An individual, organization or algorithm that sends a redirection request to PIDA (e.g. via HTTP request) to locate a resource on the web.
- PURL: A persistent uniform resource locator that serves as a persistent identifier for a web resource. A PIDA PURL is issued within the PIDA domain (purls.helmholtz-metadaten.de) and will resolve to a user-specified web address (“target URL”).
- Target URL: A URL that is assigned to the PIDA PURL, which the service resolves when a request is made to that PURL. The target URL can be changed so that a redirection remains successful even after a resource’s initial location has changed.
# PIDA Core Features
PIDA provides users with the possibility to create PURLs. These can be used as IRIs (internationalized resource identifiers) to dereference their semantic resources available on the web.
- PURL Creation & Management: Users can create a globally unique PURL that resolves to a specified target URL. The target URL may be changed by its registrar or administrator without changing the PURL itself. A PURL is created through the integration of an “.htaccess” file into the PIDA community repository. This may be done through a GitHub pull request, by creating an issue in the PIDA community repository or by directly contacting PIDA administration (see description here (opens new window)).
- Secured communication: HTTPS communication protocol is used to protect the integrity and confidentiality of data between clients and the service.
- Content negotiation: PIDA allows content negotiation of redirection requests. Content negotiation is a mechanism that allows redirecting a PIDA PURL to different representations (e.g., HTML, RDF, or else) of a resource so that users can specify the required representation for redirection within the request. Registrars can enable content negotiation for their PURLs by adding the required information and redirection rules to their “.htaccess” file.
- Service availability: PIDA is a reliable and persistent service maintained by the Helmholtz Metadata Collaboration Platform (HMC) and deployed on sophisticated cloud infrastructure at the Jülich Supercomputing Center (JSC) at Forschungszentrum Jülich. The deployment architecture now includes fail-safes and backups to guarantee 24/7 availability of the PIDA service. Furthermore, we commit to maintaining the PIDA for over 10 years in accordance with the availability criteria outlined in the FAIR principles and the requirements of most funding agencies.
- PURLs health check: The PIDA system software performs regular health checks of URLs registered within the “.htaccess” file metadata as “resource location”. If no “resource location” is specified, PIDA will by default check for
purls.helmholzt-metadaten.de/namespace
. If the specified URL returns a 404 or 403 response (i.e. target resource not found), the particular URL is identified as “broken”. - Automatic notification: Upon detection of any “broken URL”, automatic notification of the registrar is triggered. Notification emails are sent out automatically (1) upon novel detection of a “broken”
resource location
and (2) regularly on the 1st working day of the month for any namespace that contains broken URLs underresource location
. Notification is send out to email address of the contact person specified within each “.htaccess” file. - Removal of namespaces with continuously broken redirection targets: If an URL specified in the “.htaccess” metadata under
resource location
is identified as broken we assume the redirection rules defined for this particular namespace to be incorrect. We ask notified registrars to update the detected .htaccess files swiftly. PIDA administration reserves the right to deactivate PURLs and/or Namespaces which contain broken URLs underresource location
should the problem persists 12 months or more to guarantee the integrity of our service and free up the respective namespace. - Usage statistics: The activity of the PIDA redirection service is logged in a separate database. This includes the time and frequency of user requests for redirection. Further we selectively extract data from the open available data in the PIDA community repository for statistical purposes. The stored and processed data does not include personal data (e.g. user name, browser type, IP addresses, etc), email addresses from the openly visible “.htacess” files are processed internally for email notifications (see below)
- PIDA statistics dashboard: PIDA usage and health check results are published on the PIDA statistics dashboard which is regularly updated. No personalized information is published here.
# Usage Policy
Registrars interact with the PIDA system via the GitHub community repository and by defining redirection rules within .htaccess files. The following metadata is mandatory for each .htaccess file:
key | explanation | possible values |
---|---|---|
# Contact person: | name of registrar | string |
# Contact person email: | email of registrar | email address |
# Artifact: | abbreviated name of artifact, e.g. “HDO” | comma separated list of strings |
# Artifact name: | full name of artifact, e.g. “Helmholtz Digitization Ontology” | comma separated list of strings |
# Resource location: | URL(s) to be checked during automated health checks | comma separated list of URLs |
PIDA administration is committed to maintaining the continuous availability of the service and the security and privacy of user data. We are not storing any personal data outside the PIDA public repository. To guarantee the effective functioning of our system, we reserve the right to:
- deny integration of .htaccess files that do not contain the mandatory metadata.
- deny integration of .htaccess files where redirection to resources deemed inadequate or outside of the above-defined scope and purpose.
- periodically review and remove PURLs that have remained broken for 12 months or more. Removal of the ".htaccess" file will lead to the permanent deletion of these PURLs and their associated data from our systems, including subnamespaces, contact information, and usage history. Users who have had their data removed can re-register namespaces at any time. However, the availability of previously registered data cannot be guaranteed.