Initial commit and release of doi2dataset

This commit is contained in:
Alexander Minges 2025-03-21 14:53:23 +01:00
commit 9be53fd2fc
Signed by: Athemis
SSH key fingerprint: SHA256:TUXshgulbwL+FRYvBNo54pCsI0auROsSEgSvueKbkZ4
23 changed files with 2482 additions and 0 deletions

31
docs/source/conf.py Normal file
View file

@ -0,0 +1,31 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
import os
import sys
sys.path.insert(0, os.path.abspath('../..'))
project = 'doi2dataset'
copyright = '2025, Alexander Minges'
author = 'Alexander Minges'
release = '1.0'
# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
extensions = ["sphinx.ext.autodoc", "sphinx.ext.napoleon"]
templates_path = ['_templates']
exclude_patterns = []
# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
html_theme = "sphinx_rtd_theme"
html_static_path = ['_static']

View file

@ -0,0 +1,7 @@
doi2dataset module
==================
.. automodule:: doi2dataset
:members:
:show-inheritance:
:undoc-members:

14
docs/source/faq.rst Normal file
View file

@ -0,0 +1,14 @@
Frequently Asked Questions (FAQ)
================================
Q: What is **doi2dataset**?
A: **doi2dataset** is a tool to process DOIs and generate metadata for Dataverse datasets by fetching data from external APIs like OpenAlex and CrossRef.
Q: How do I install **doi2dataset**?
A: You can clone the repository from GitHub or install it via pip. Please refer to the Installation section for details.
Q: Can I upload metadata directly to a Dataverse server?
A: Yes, the tool provides an option to upload metadata via the command line using the ``-u`` flag. Ensure that your configuration in `config.yaml` is correct.
Q: Where can I find the API documentation?
A: The API reference is generated automatically in the Modules section of this documentation.

34
docs/source/index.rst Normal file
View file

@ -0,0 +1,34 @@
.. doi2dataset documentation master file, created by
sphinx-quickstart on Fri Mar 21 13:03:59 2025.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
doi2dataset documentation
=========================
Overview
--------
**doi2dataset** is a Python tool designed to process DOIs and generate metadata for Dataverse datasets.
It retrieves data from external APIs such as OpenAlex and CrossRef and converts it into a format that meets Dataverse requirements.
Key Features:
- **Validation** and normalization of DOIs
- Retrieval and processing of **metadata** (e.g., abstract, license, author information)
- Automatic mapping and generation of metadata fields (e.g., title, description, keywords)
- Support for controlled vocabularies and complex (compound) metadata fields
- Optional **uploading** of metadata to a Dataverse server
- **Progress tracking** and error handling using the Rich library
.. toctree::
:maxdepth: 2
:caption: Contents:
:titlesonly:
introduction
installation
usage
modules
faq

View file

@ -0,0 +1,28 @@
Installation
============
There are several ways to install **doi2dataset**:
Using Git
---------
Clone the repository from GitHub by running the following commands in your terminal:
.. code-block:: bash
git clone https://github.com/your_username/doi2dataset.git
cd doi2dataset
Using pip (if available)
-------------------------
You can also install **doi2dataset** via pip:
.. code-block:: bash
pip install doi2dataset
Configuration
-------------
After installation, ensure that the tool is configured correctly.
Check the `config.yaml` file in the project root for necessary settings such as Dataverse connection details and PI information.
For more detailed instructions, please refer to the README file provided with the project.

View file

@ -0,0 +1,8 @@
Introduction
============
Welcome to the **doi2dataset** documentation. This guide provides an in-depth look at the tool, its purpose, and how it can help you generate metadata for Dataverse datasets.
The **doi2dataset** tool is aimed at researchers, data stewards, and developers who need to convert DOI-based metadata into a format compatible with Dataverse. It automates the retrieval of metadata from external sources (like OpenAlex and CrossRef) and performs necessary data transformations.
In the following sections, you'll learn about the installation process, usage examples, and a detailed API reference.

9
docs/source/modules.rst Normal file
View file

@ -0,0 +1,9 @@
API Reference
=============
This section contains the API reference generated from the source code docstrings.
.. automodule:: doi2dataset
:members:
:undoc-members:
:show-inheritance:

7
docs/source/setup.rst Normal file
View file

@ -0,0 +1,7 @@
setup module
============
.. automodule:: setup
:members:
:show-inheritance:
:undoc-members:

77
docs/source/usage.rst Normal file
View file

@ -0,0 +1,77 @@
Usage
=====
Running **doi2dataset** is done from the command line. Below is an example of how to use the tool.
Basic Example
-------------
To process one or more DOIs, run:
.. code-block:: bash
python doi2dataset.py 10.1234/doi1 10.5678/doi2
Command Line Options
--------------------
The tool offers several command line options:
- ``-f, --file``: Specify a file containing DOIs (one per line).
- ``-o, --output-dir``: Directory where metadata files will be saved.
- ``-d, --depositor``: Name of the depositor.
- ``-s, --subject``: Default subject for the metadata.
- ``-m, --contact-mail``: Contact email address.
- ``-u, --upload``: Flag to upload metadata to a Dataverse server.
Configuration via config.yaml
-------------------------------
Some options are also set via the **config.yaml** file. This file includes settings such as:
- Dataverse connection details (URL, API token, authentication credentials).
- Mapping of project phases.
- PI (principal investigator) information.
- Default grant configurations.
Make sure that your **config.yaml** is properly configured before running the tool. For example, your **config.yaml** might include:
.. code-block:: yaml
dataverse:
url: "https://your.dataverse.server"
api_token: "your_api_token"
auth_user: "your_username"
auth_password: "your_password"
dataverse: "your_dataverse_name"
phase:
Phase1:
start: 2010
end: 2015
Phase2:
start: 2016
end: 2020
pis:
- given_name: "John"
family_name: "Doe"
email: "john.doe@example.com"
orcid: "0000-0001-2345-6789"
affiliation: "Example University"
project:
- "Project A"
- "Project B"
default_grants:
- funder: "Funder Name"
id: "GrantID12345"
Usage Example with Configuration
----------------------------------
If you have configured your **config.yaml** and want to process DOIs from a file while uploading the metadata, you could run:
.. code-block:: bash
python doi2dataset.py -f dois.txt -o output/ -d "John Doe" -s "Medicine, Health and Life Sciences" -m "john.doe@example.com" -u
This command will use the options provided on the command line as well as the settings from **config.yaml**.
For more details on usage and configuration, please refer to the rest of the documentation.