feat!: generalize script by removing organizational metadata
All checks were successful
Test pipeline / test (push) Successful in 14s
All checks were successful
Test pipeline / test (push) Successful in 14s
Remove Phase class, organizational metadata blocks, and unused project fields. Update configuration to use 'default_grants' and simplify PI usage to fallback corresponding author determination only. BREAKING CHANGES: - Remove 'phase' and 'project' fields from configuration - Use 'default_grants' instead of 'default_grant' - Generate only standard Dataverse citation metadata
This commit is contained in:
parent
01bc537bd8
commit
67b46d5140
11 changed files with 207 additions and 269 deletions
80
README.md
80
README.md
|
@ -8,7 +8,14 @@
|
|||
|
||||
- **DOI Validation and Normalization:** Validates DOIs and converts them into a standardized format.
|
||||
- **Metadata Retrieval:** Fetches metadata such as title, abstract, license, and author information from external sources.
|
||||
- **Metadata Mapping:** Automatically maps and generates metadata fields (e.g., title, description, keywords) including support for controlled vocabularies and compound fields.
|
||||
- **Standard Dataverse Metadata:** Generates standard Dataverse citation metadata including:
|
||||
- Title, publication date, and alternative URL
|
||||
- Author information with affiliations and ORCID identifiers
|
||||
- Dataset contact information (corresponding authors)
|
||||
- Abstract and description
|
||||
- Keywords and subject classification
|
||||
- Grant/funding information
|
||||
- License information when available
|
||||
- **Optional Upload:** Allows uploading of metadata directly to a Dataverse.org server.
|
||||
- **Progress Tracking:** Uses the Rich library for user-friendly progress tracking and error handling.
|
||||
|
||||
|
@ -23,14 +30,41 @@ cd doi2dataset
|
|||
|
||||
## Configuration
|
||||
|
||||
Configuration
|
||||
|
||||
Before running the tool, configure the necessary settings in the `config.yaml` file located in the project root. This file contains configuration details such as:
|
||||
|
||||
- Connection details (URL, API token, authentication credentials)
|
||||
- Mapping of project phases
|
||||
- Principal Investigator (PI) information
|
||||
- Default grant configurations
|
||||
- **Connection details**: URL, API token, authentication credentials for Dataverse server
|
||||
- **Principal Investigator (PI) information**: Optional - used for fallback determination of corresponding authors when not explicitly specified in the publication
|
||||
- **Default grant configurations**: Funding information to be included in the metadata (supports multiple grants)
|
||||
|
||||
### Configuration File Structure
|
||||
|
||||
The configuration file should follow this structure:
|
||||
|
||||
```yaml
|
||||
# Dataverse server connection details
|
||||
dataverse:
|
||||
url: "https://your-dataverse-instance.org"
|
||||
api_token: "your-api-token"
|
||||
|
||||
# Default grant information (supports multiple grants)
|
||||
default_grants:
|
||||
- funder: "Your Funding Agency"
|
||||
id: "GRANT123456"
|
||||
- funder: "Another Funding Agency"
|
||||
id: "GRANT789012"
|
||||
|
||||
# Principal investigators for fallback corresponding author determination (optional)
|
||||
pis:
|
||||
- family_name: "Doe"
|
||||
given_name: "John"
|
||||
orcid: "0000-0000-0000-0000"
|
||||
email: "john.doe@university.edu"
|
||||
affiliation: "Department of Science, University"
|
||||
```
|
||||
|
||||
See `config_example.yaml` for a complete example configuration.
|
||||
|
||||
**Note**: The PI section is optional. If no corresponding authors are found in the publication metadata and no PIs are configured, the tool will still generate metadata but may issue a warning about missing corresponding author information.
|
||||
|
||||
## Usage
|
||||
|
||||
|
@ -102,11 +136,13 @@ pytest --cov=. --cov-report=html
|
|||
This creates a `htmlcov` directory. Open `htmlcov/index.html` in a browser to view the detailed coverage report.
|
||||
|
||||
A `.coveragerc` configuration file is provided that:
|
||||
|
||||
- Excludes test files, documentation, and boilerplate code from coverage analysis
|
||||
- Configures reporting to ignore common non-testable lines (like defensive imports)
|
||||
- Sets the output directory for HTML reports
|
||||
|
||||
Recent improvements have increased coverage from 48% to 61% by adding focused tests for:
|
||||
|
||||
- Citation building functionality
|
||||
- License processing and validation
|
||||
- Metadata field extraction
|
||||
|
@ -114,6 +150,7 @@ Recent improvements have increased coverage from 48% to 61% by adding focused te
|
|||
- Publication data parsing and validation
|
||||
|
||||
Areas that could benefit from additional testing:
|
||||
|
||||
- More edge cases in the MetadataProcessor class workflow
|
||||
- Additional CitationBuilder scenarios with diverse inputs
|
||||
- Complex network interactions and error handling
|
||||
|
@ -122,7 +159,7 @@ Areas that could benefit from additional testing:
|
|||
|
||||
The test suite is organized into six main files:
|
||||
|
||||
1. **test_doi2dataset.py**: Basic tests for core functions like phase checking, name splitting and DOI validation.
|
||||
1. **test_doi2dataset.py**: Basic tests for core functions like name splitting, DOI validation, and filename sanitization.
|
||||
2. **test_fetch_doi_mock.py**: Tests API interactions using a mock OpenAlex response stored in `srep45389.json`.
|
||||
3. **test_citation_builder.py**: Tests for building citation metadata from API responses.
|
||||
4. **test_metadata_processor.py**: Tests for the metadata processing workflow.
|
||||
|
@ -136,7 +173,6 @@ The test suite covers the following categories of functionality:
|
|||
#### Core Functionality Tests
|
||||
|
||||
- **DOI Validation and Processing**: Parameterized tests for DOI normalization, validation, and filename sanitization with various inputs.
|
||||
- **Phase Management**: Tests for checking publication year against defined project phases, including boundary cases.
|
||||
- **Name Processing**: Extensive tests for parsing and splitting author names in different formats (with/without commas, middle initials, etc.).
|
||||
- **Email Validation**: Tests for proper validation of email addresses with various domain configurations.
|
||||
|
||||
|
@ -151,12 +187,36 @@ The test suite covers the following categories of functionality:
|
|||
|
||||
- **Citation Building**: Tests for properly building citation metadata from API responses.
|
||||
- **License Processing**: Tests for correctly identifying and formatting license information from various license IDs.
|
||||
- **Principal Investigator Matching**: Tests for finding project PIs based on ORCID identifiers.
|
||||
- **Principal Investigator Matching**: Tests for finding project PIs based on ORCID identifiers (used for fallback corresponding author determination).
|
||||
- **Configuration Loading**: Tests for properly loading and validating configuration from files.
|
||||
- **Metadata Workflow**: Tests for the complete metadata processing workflow.
|
||||
|
||||
These tests ensure that all components work correctly in isolation and together as a system, with special attention to edge cases and error handling.
|
||||
|
||||
## Changelog
|
||||
|
||||
### Version 0.2.0 - Generalization Update
|
||||
|
||||
This version has been updated to make the tool more generalized and suitable for broader use cases:
|
||||
|
||||
**Breaking Changes:**
|
||||
|
||||
- Removed organizational-specific metadata blocks (project phases, organizational fields)
|
||||
- Removed `Phase` class and phase-related configuration
|
||||
- Simplified configuration structure
|
||||
|
||||
**What's New:**
|
||||
|
||||
- Streamlined metadata generation focusing on standard Dataverse citation metadata
|
||||
- Reduced configuration requirements for easier adoption
|
||||
- Maintained PI information support for corresponding author fallback functionality
|
||||
|
||||
**Migration Guide:**
|
||||
|
||||
- Remove the `phase` section from your configuration file
|
||||
- The tool will now generate only standard citation metadata blocks
|
||||
- PI information is still supported and used for fallback corresponding author determination
|
||||
|
||||
## Contributing
|
||||
|
||||
Contributions are welcome! Please fork the repository and submit a pull request with your improvements.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue