docs: update for modular package structure

- Update README.md with new usage methods and package structure - Revise installation instructions for pip install and development setup - Update usage documentation to reflect CLI module and Python imports - Add package architecture overview to contributing guide - Update API reference documentation for all new modules Documentation changes: - README.md: Updated installation, usage examples, development setup - docs/source/installation.rst: Added verification and dev setup - docs/source/usage.rst: Updated for new CLI and import methods - docs/source/introduction.rst: Updated package overview - docs/source/contributing.rst: Added package architecture overview - docs/source/modules.rst: Complete API reference for all modules All documentation now reflects the modular structure with proper usage instructions for console commands, Python modules, and imports.
2025-07-22 11:54:07 +02:00 · 2025-07-22 11:54:07 +02:00 · c60817702b
commit c60817702b
parent b1dd2917b2
6 changed files with 395 additions and 32 deletions
--- a/README.md
+++ b/README.md
@ -28,13 +28,23 @@

 - Python 3.12 or higher

-Clone the repository from GitHub:
+Clone the repository:

 ```bash
-git clone https://git.athemis.de/Athemis/doi2dataset
+git clone https://git.uni-due.de/cbm343e/doi2dataset
 cd doi2dataset
 ```

+### Quick Start
+
+```bash
+# Install the package
+pip install -e .
+
+# Run with a DOI
+doi2dataset 10.1038/nature12373
+```
+
 ## Configuration

 Before running the tool, configure the necessary settings in the `config.yaml` file located in the project root. This file contains configuration details such as:
@ -109,14 +119,43 @@ This approach allows you to:

 ## Usage

-Run doi2dataset from the command line by providing one or more DOIs:
+doi2dataset can be used in several ways after installation:
+
+### Method 1: Console Command

 ```bash
-python doi2dataset.py [options] DOI1 DOI2 ...
+# After installation with pip install -e .
+doi2dataset [options] DOI1 DOI2 ...
+```
+
+### Method 2: Python Module
+
+```bash
+# Use CLI module directly
+python -m doi2dataset.cli [options] DOI1 DOI2 ...
+
+# Or use main module
+python -m doi2dataset.main [options] DOI1 DOI2 ...
+```
+
+### Method 3: Python Import
+
+```python
+from doi2dataset import MetadataProcessor
+from pathlib import Path
+
+processor = MetadataProcessor(
+    doi="10.1038/nature12373",
+    output_path=Path("metadata.json"),
+    depositor="Your Name"
+)
+metadata = processor.process()
 ```

 ### Command Line Options

+All methods support the same command-line options:
+
 - `-f, --file`
  Specify a file containing DOIs (one per line).

@ -138,6 +177,25 @@ python doi2dataset.py [options] DOI1 DOI2 ...
 - `-r, --use-ror`
  Use Research Organization Registry (ROR) identifiers for institutions when available.

+### Examples
+
+```bash
+# Process a single DOI
+doi2dataset 10.1038/nature12373
+
+# Process multiple DOIs
+doi2dataset 10.1038/nature12373 10.1126/science.1234567
+
+# Process DOIs from a file with custom output directory
+doi2dataset -f dois.txt -o ./output -d "Your Name"
+
+# Upload to Dataverse with contact email
+doi2dataset -u -m your.email@university.edu 10.1038/nature12373
+
+# Use ROR identifiers for institutions
+doi2dataset -r 10.1038/nature12373
+```
+
 ## Documentation

 Documentation is generated using Sphinx and is available online at:
@ -403,38 +461,66 @@ Contributions are welcome! Please fork the repository and submit a pull request
   pip install -r requirements-dev.txt
   ```

-2. Set up commit message template (recommended):
+2. Install the package in development mode:
+
+   ```bash
+   pip install -e .
+   ```
+
+3. Set up commit message template:

   ```bash
   git config commit.template .gitmessage
   ```

-3. Install pre-commit hooks (recommended):
+4. Install pre-commit hooks:

   ```bash
   pre-commit install --hook-type pre-commit --hook-type commit-msg
   ```

-4. Run tests to ensure everything works:
+5. Run tests:

   ```bash
   pytest
   ```

-5. Optionally run pre-commit on all files to check formatting:
+6. Run code quality checks:

   ```bash
   pre-commit run --all-files
   ```

+### Package Structure
+
+The project follows a modular architecture:
+
+```
+doi2dataset/
+├── cli.py                    # Command-line interface
+├── main.py                   # Main entry point
+├── core/                     # Core components
+│   ├── config.py            # Configuration management
+│   ├── models.py            # Data models (Person, Institution, etc.)
+│   └── metadata_fields.py   # Dataverse metadata field types
+├── api/                      # External API integration
+│   ├── client.py            # HTTP client for API requests
+│   └── processors.py        # License and abstract processors
+├── processing/               # Business logic
+│   ├── citation.py          # Citation building
+│   ├── metadata.py          # Metadata processing pipeline
+│   └── utils.py             # Processing utilities
+└── utils/                    # General utilities
+    └── validation.py        # Validation functions
+```
+
 ### Code Quality

- Follow the existing code style and formatting (enforced by pre-commit hooks)
+- Follow the existing code style and formatting
 - Write tests for new functionality
 - Ensure all tests pass before submitting
 - Use meaningful commit messages following the conventional commits format
- Pre-commit hooks will automatically validate commit messages and code formatting
- Run `python scripts/lint-commit.py` to manually validate commit messages
+- Run `python scripts/lint-commit.py` to validate commit messages

 ## License

--- a/docs/source/contributing.rst
+++ b/docs/source/contributing.rst
@ -1,7 +1,67 @@
 Contributing
 ============

-We welcome contributions to **doi2dataset**! This guide provides information for developers who want to contribute to the project or build the documentation locally.
+This guide provides information for developers who want to contribute to the project, understand the package architecture, or build the documentation locally.
+
+Package Architecture
+--------------------
+
+**doi2dataset** has a modular architecture:
+
+**Core Components (`core/`)**
+  - `config.py`: Configuration management with environment variable support
+  - `models.py`: Data models for Person, Institution, License, Abstract
+  - `metadata_fields.py`: Dataverse metadata field type definitions
+
+**API Integration (`api/`)**
+  - `client.py`: HTTP client for external API requests
+  - `processors.py`: Processors for licenses and abstracts
+
+**Processing Logic (`processing/`)**
+  - `citation.py`: Citation building from API data
+  - `metadata.py`: Metadata processing pipeline
+  - `utils.py`: Processing utilities (name processing, PI finding, subject mapping)
+
+**Utilities (`utils/`)**
+  - `validation.py`: Validation functions for DOIs, emails, etc.
+
+**User Interface**
+  - `cli.py`: Command-line interface implementation
+  - `main.py`: Entry point for the package
+
+Development Setup
+-----------------
+
+1. Clone the repository and install:
+
+.. code-block:: bash
+
+   git clone https://git.uni-due.de/cbm343e/doi2dataset.git
+   cd doi2dataset
+   pip install -e .
+   pip install -r requirements-dev.txt
+
+2. Set up development tools:
+
+.. code-block:: bash
+
+   # Set up commit message template
+   git config commit.template .gitmessage
+
+   # Install pre-commit hooks
+   pre-commit install --hook-type pre-commit --hook-type commit-msg
+
+3. Run tests:
+
+.. code-block:: bash
+
+   pytest
+
+4. Run code quality checks:
+
+.. code-block:: bash
+
+   pre-commit run --all-files

 Building Documentation
 ----------------------
--- a/docs/source/installation.rst
+++ b/docs/source/installation.rst
@ -5,13 +5,16 @@ There are several ways to install **doi2dataset**:

 Using Git
 ---------
-Clone the repository from GitHub by running the following commands in your terminal:
+Clone the repository by running the following commands in your terminal:

 .. code-block:: bash

   git clone https://git.uni-due.de/cbm343e/doi2dataset.git
   cd doi2dataset

+   # Install in development mode
+   pip install -e .
+
 Using pip (if available)
 -------------------------
 You can also install **doi2dataset** via pip:
@ -20,9 +23,37 @@ You can also install **doi2dataset** via pip:

   pip install doi2dataset

+Development Installation
+------------------------
+Install in editable mode for development:
+
+.. code-block:: bash
+
+   git clone https://git.uni-due.de/cbm343e/doi2dataset.git
+   cd doi2dataset
+   pip install -e .
+
+   # Install development dependencies
+   pip install -r requirements-dev.txt
+
+   # Set up pre-commit hooks
+   pre-commit install --hook-type pre-commit --hook-type commit-msg
+
+Verification
+------------
+Check the installation:
+
+.. code-block:: bash
+
+   # Check console command
+   doi2dataset --help
+
+   # Or use module
+   python -m doi2dataset.cli --help
+
 Configuration
 -------------
-After installation, ensure that the tool is configured correctly.
-Check the `config.yaml` file in the project root for necessary settings such as Dataverse connection details and PI information.
+After installation, configure the tool by editing the `config.yaml` file in the project root.
+Set Dataverse connection details and PI information as needed.

-For more detailed instructions, please refer to the README file provided with the project.
+See the README file for detailed configuration instructions.
--- a/docs/source/introduction.rst
+++ b/docs/source/introduction.rst
@ -1,8 +1,19 @@
 Introduction
 ============

-Welcome to the **doi2dataset** documentation. This guide provides an in-depth look at the tool, its purpose, and how it can help you generate standard citation metadata for Dataverse datasets.
+**doi2dataset** is a Python tool that processes DOIs and generates metadata for Dataverse datasets.

-The **doi2dataset** tool is aimed at researchers, data stewards, and developers who need to convert DOI-based metadata into a format compatible with Dataverse. It automates the retrieval of metadata from external sources (like OpenAlex and CrossRef) and generates standard Dataverse citation metadata blocks including title, authors, abstract, keywords, and funding information.
+It retrieves metadata from external sources (OpenAlex and CrossRef) and generates Dataverse citation metadata blocks including title, authors, abstract, keywords, and funding information.

-In the following sections, you'll learn about the installation process, usage examples, and a detailed API reference.
+The package is organized into modules:
+
+- `core/`: Configuration, data models, and metadata field definitions
+- `api/`: HTTP client and API processors for external services
+- `processing/`: Citation building and metadata processing logic
+- `utils/`: Validation and utility functions
+- `cli.py`: Command-line interface
+- `main.py`: Entry point
+
+The tool can be used as a command-line application or imported as a Python package.
+
+The documentation covers installation, usage, and API reference.
--- a/docs/source/modules.rst
+++ b/docs/source/modules.rst
@ -3,7 +3,113 @@ API Reference

 This section contains the API reference generated from the source code docstrings.

+Main Package
+------------
+
 .. automodule:: doi2dataset
   :members:
   :undoc-members:
   :show-inheritance:
+
+Core Components
+---------------
+
+Configuration Management
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. automodule:: doi2dataset.core.config
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Data Models
+~~~~~~~~~~~
+
+.. automodule:: doi2dataset.core.models
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Metadata Fields
+~~~~~~~~~~~~~~~
+
+.. automodule:: doi2dataset.core.metadata_fields
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+API Integration
+---------------
+
+HTTP Client
+~~~~~~~~~~~
+
+.. automodule:: doi2dataset.api.client
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+API Processors
+~~~~~~~~~~~~~~
+
+.. automodule:: doi2dataset.api.processors
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Processing Components
+---------------------
+
+Citation Building
+~~~~~~~~~~~~~~~~~
+
+.. automodule:: doi2dataset.processing.citation
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Metadata Processing
+~~~~~~~~~~~~~~~~~~~
+
+.. automodule:: doi2dataset.processing.metadata
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Processing Utilities
+~~~~~~~~~~~~~~~~~~~~
+
+.. automodule:: doi2dataset.processing.utils
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Utilities
+---------
+
+Validation Functions
+~~~~~~~~~~~~~~~~~~~~
+
+.. automodule:: doi2dataset.utils.validation
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Command Line Interface
+----------------------
+
+CLI Module
+~~~~~~~~~~
+
+.. automodule:: doi2dataset.cli
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Main Entry Point
+~~~~~~~~~~~~~~~~
+
+.. automodule:: doi2dataset.main
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/usage.rst
+++ b/docs/source/usage.rst
@ -1,7 +1,7 @@
 Usage
 =====

-Running **doi2dataset** is done from the command line. Below is an example of how to use the tool.
+**doi2dataset** can be run from the command line or imported as a Python package.

 Demo
 ----
@ -11,13 +11,39 @@ Here's a demonstration of **doi2dataset** in action:
   :alt: doi2dataset demonstration
   :align: center

-Basic Example
+Usage Methods
 -------------
-To process one or more DOIs, run:
+**doi2dataset** can be used in several ways:
+
+**Method 1: Console Command**

 .. code-block:: bash

-   python doi2dataset.py 10.1234/doi1 10.5678/doi2
+   doi2dataset 10.1234/doi1 10.5678/doi2
+
+**Method 2: Python Module**
+
+.. code-block:: bash
+
+   # Use CLI module directly
+   python -m doi2dataset.cli 10.1234/doi1 10.5678/doi2
+
+   # Or use main module
+   python -m doi2dataset.main 10.1234/doi1 10.5678/doi2
+
+**Method 3: Python Import**
+
+.. code-block:: python
+
+   from doi2dataset import MetadataProcessor
+   from pathlib import Path
+
+   processor = MetadataProcessor(
+       doi="10.1234/doi1",
+       output_path=Path("metadata.json"),
+       depositor="Your Name"
+   )
+   metadata = processor.process()

 Command Line Options
 --------------------
@ -87,25 +113,68 @@ Example usage:
   export DATAVERSE_AUTH_PASSWORD="your-secure-password"

   # Run doi2dataset - it will use environment variables for credentials
-   python doi2dataset.py 10.1234/example.doi
+   doi2dataset 10.1234/example.doi

   # Or set them inline for a single run
-   DATAVERSE_API_TOKEN="token" python doi2dataset.py 10.1234/example.doi
+   DATAVERSE_API_TOKEN="token" doi2dataset 10.1234/example.doi

 This approach allows you to:

 - Keep sensitive credentials out of version control
 - Use different configurations for different environments (dev, staging, production)
- Deploy the tool with secure environment-based configuration
+- Use different configurations per environment

-Usage Example with Configuration
----------------------------------
-If you have configured your **config.yaml** and want to process DOIs from a file while uploading the metadata, you could run:
+Usage Examples
+---------------
+Here are some practical examples of using **doi2dataset**:
+
+**Process a single DOI:**

 .. code-block:: bash

-   python doi2dataset.py -f dois.txt -o output/ -d "Doe, John" -s "Medicine, Health and Life Sciences" -m "john.doe@example.com" -u -r
+   doi2dataset 10.1038/nature12373

-This command will use the options provided on the command line as well as the settings from **config.yaml**.
+**Process multiple DOIs:**

-For more details on usage and configuration, please refer to the rest of the documentation.
+.. code-block:: bash
+
+   doi2dataset 10.1038/nature12373 10.1126/science.1234567
+
+**Process DOIs from a file with custom settings:**
+
+.. code-block:: bash
+
+   doi2dataset -f dois.txt -o output/ -d "Doe, John" -s "Medicine, Health and Life Sciences" -m "john.doe@example.com" -u -r
+
+**Upload to Dataverse with ROR identifiers:**
+
+.. code-block:: bash
+
+   doi2dataset -u -r -m your.email@university.edu 10.1038/nature12373
+
+Commands use options from the command line and settings from **config.yaml**.
+
+Package Structure
+-----------------
+The **doi2dataset** package modules:
+
+.. code-block:: text
+
+   doi2dataset/
+   ├── cli.py                    # Command-line interface
+   ├── main.py                   # Main entry point
+   ├── core/                     # Core components
+   │   ├── config.py            # Configuration management
+   │   ├── models.py            # Data models (Person, Institution, etc.)
+   │   └── metadata_fields.py   # Dataverse metadata field types
+   ├── api/                      # External API integration
+   │   ├── client.py            # HTTP client for API requests
+   │   └── processors.py        # License and abstract processors
+   ├── processing/               # Business logic
+   │   ├── citation.py          # Citation building
+   │   ├── metadata.py          # Metadata processing pipeline
+   │   └── utils.py             # Processing utilities
+   └── utils/                    # General utilities
+       └── validation.py        # Validation functions
+
+See other documentation sections for more details.