Update testing documentation and improve test structure
All checks were successful
Test pipeline / test (push) Successful in 12s

This commit is contained in:
Alexander Minges 2025-05-20 15:17:18 +02:00
parent 1c84cae93b
commit eb270cba9b
Signed by: Athemis
SSH key fingerprint: SHA256:TUXshgulbwL+FRYvBNo54pCsI0auROsSEgSvueKbkZ4
9 changed files with 617 additions and 20 deletions

View file

@ -69,30 +69,34 @@ Documentation is generated using Sphinx. See the `docs/` directory for detailed
## Testing
Tests are implemented with pytest. The test suite provides comprehensive coverage of core functionalities.
## Testing
### Running Tests
To run the tests, execute:
Tests are implemented with pytest. The test suite provides comprehensive coverage of core functionalities. To run the tests, execute:
```bash
pytest
```
Or using the Python module syntax:
```bash
python -m pytest
```
### Code Coverage
The project includes code coverage analysis using pytest-cov. Current coverage is approximately 53% of the codebase, with key utilities and test infrastructure at 99-100% coverage.
The project includes code coverage analysis using pytest-cov. Current coverage is approximately 61% of the codebase, with key utilities and test infrastructure at 99-100% coverage.
To run tests with code coverage analysis:
```bash
pytest --cov=doi2dataset
pytest --cov=.
```
Generate a detailed HTML coverage report:
```bash
pytest --cov=doi2dataset --cov-report=html
pytest --cov=. --cov-report=html
```
This creates a `htmlcov` directory. Open `htmlcov/index.html` in a browser to view the detailed coverage report.
@ -102,38 +106,56 @@ A `.coveragerc` configuration file is provided that:
- Configures reporting to ignore common non-testable lines (like defensive imports)
- Sets the output directory for HTML reports
To increase coverage:
1. Focus on adding tests for the MetadataProcessor class
2. Add tests for the LicenseProcessor and SubjectMapper with more diverse inputs
3. Create tests for the Configuration loading system
Recent improvements have increased coverage from 48% to 61% by adding focused tests for:
- Citation building functionality
- License processing and validation
- Metadata field extraction
- OpenAlex integration
- Publication data parsing and validation
Areas that could benefit from additional testing:
- More edge cases in the MetadataProcessor class workflow
- Additional CitationBuilder scenarios with diverse inputs
- Complex network interactions and error handling
### Test Structure
The test suite is organized into six main files:
1. **test_doi2dataset.py**: Basic tests for core functions like phase checking, name splitting and DOI validation.
2. **test_fetch_doi_mock.py**: Tests API interactions using a mock OpenAlex response stored in `srep45389.json`.
3. **test_citation_builder.py**: Tests for building citation metadata from API responses.
4. **test_metadata_processor.py**: Tests for the metadata processing workflow.
5. **test_license_processor.py**: Tests for license processing and validation.
6. **test_publication_utils.py**: Tests for publication year extraction and date handling.
### Test Categories
The test suite includes the following categories of tests:
The test suite covers the following categories of functionality:
#### Core Functionality Tests
- **DOI Validation and Processing**: Tests for DOI normalization, validation, and filename sanitization.
- **Phase Management**: Tests for checking publication year against defined project phases.
- **Name Processing**: Tests for proper parsing and splitting of author names in different formats.
- **Email Validation**: Tests for proper validation of email addresses.
- **DOI Validation and Processing**: Parameterized tests for DOI normalization, validation, and filename sanitization with various inputs.
- **Phase Management**: Tests for checking publication year against defined project phases, including boundary cases.
- **Name Processing**: Extensive tests for parsing and splitting author names in different formats (with/without commas, middle initials, etc.).
- **Email Validation**: Tests for proper validation of email addresses with various domain configurations.
#### API Integration Tests
- **Mock API Responses**: Tests that use a saved OpenAlex API response (`srep45389.json`) to simulate API interactions without making actual network requests.
- **Data Fetching**: Tests for retrieving and parsing data from the OpenAlex API.
- **Abstract Extraction**: Tests for extracting and cleaning abstracts from OpenAlex's inverted index format.
- **Abstract Extraction**: Tests for extracting and cleaning abstracts from OpenAlex's inverted index format, including handling of empty or malformed abstracts.
- **Subject Mapping**: Tests for mapping OpenAlex topics to controlled vocabulary subject terms.
#### Metadata Processing Tests
- **Citation Building**: Tests for properly building citation metadata from API responses.
- **License Processing**: Tests for correctly identifying and formatting license information.
- **License Processing**: Tests for correctly identifying and formatting license information from various license IDs.
- **Principal Investigator Matching**: Tests for finding project PIs based on ORCID identifiers.
- **Configuration Loading**: Tests for properly loading and validating configuration from files.
- **Metadata Workflow**: Tests for the complete metadata processing workflow.
These tests ensure that all components work correctly in isolation and together as a system.
These tests ensure that all components work correctly in isolation and together as a system, with special attention to edge cases and error handling.
## Contributing