Update testing documentation and improve test structure

2025-05-20 15:17:18 +02:00 · 2025-05-20 15:17:18 +02:00 · eb270cba9b
commit eb270cba9b
parent 1c84cae93b
9 changed files with 617 additions and 20 deletions
--- a/README.md
+++ b/README.md
@ -69,30 +69,34 @@ Documentation is generated using Sphinx. See the `docs/` directory for detailed

 ## Testing

-Tests are implemented with pytest. The test suite provides comprehensive coverage of core functionalities.
+## Testing

-### Running Tests
-
-To run the tests, execute:
+Tests are implemented with pytest. The test suite provides comprehensive coverage of core functionalities. To run the tests, execute:

 ```bash
 pytest
 ```

+Or using the Python module syntax:
+
+```bash
+python -m pytest
+```
+
 ### Code Coverage

-The project includes code coverage analysis using pytest-cov. Current coverage is approximately 53% of the codebase, with key utilities and test infrastructure at 99-100% coverage.
+The project includes code coverage analysis using pytest-cov. Current coverage is approximately 61% of the codebase, with key utilities and test infrastructure at 99-100% coverage.

 To run tests with code coverage analysis:

 ```bash
-pytest --cov=doi2dataset
+pytest --cov=.
 ```

 Generate a detailed HTML coverage report:

 ```bash
-pytest --cov=doi2dataset --cov-report=html
+pytest --cov=. --cov-report=html
 ```

 This creates a `htmlcov` directory. Open `htmlcov/index.html` in a browser to view the detailed coverage report.
@ -102,38 +106,56 @@ A `.coveragerc` configuration file is provided that:
 - Configures reporting to ignore common non-testable lines (like defensive imports)
 - Sets the output directory for HTML reports

-To increase coverage:
-1. Focus on adding tests for the MetadataProcessor class
-2. Add tests for the LicenseProcessor and SubjectMapper with more diverse inputs
-3. Create tests for the Configuration loading system
+Recent improvements have increased coverage from 48% to 61% by adding focused tests for:
+- Citation building functionality
+- License processing and validation
+- Metadata field extraction
+- OpenAlex integration
+- Publication data parsing and validation
+
+Areas that could benefit from additional testing:
+- More edge cases in the MetadataProcessor class workflow
+- Additional CitationBuilder scenarios with diverse inputs
+- Complex network interactions and error handling
+
+### Test Structure
+
+The test suite is organized into six main files:
+
+1. **test_doi2dataset.py**: Basic tests for core functions like phase checking, name splitting and DOI validation.
+2. **test_fetch_doi_mock.py**: Tests API interactions using a mock OpenAlex response stored in `srep45389.json`.
+3. **test_citation_builder.py**: Tests for building citation metadata from API responses.
+4. **test_metadata_processor.py**: Tests for the metadata processing workflow.
+5. **test_license_processor.py**: Tests for license processing and validation.
+6. **test_publication_utils.py**: Tests for publication year extraction and date handling.

 ### Test Categories

-The test suite includes the following categories of tests:
+The test suite covers the following categories of functionality:

 #### Core Functionality Tests

- **DOI Validation and Processing**: Tests for DOI normalization, validation, and filename sanitization.
- **Phase Management**: Tests for checking publication year against defined project phases.
- **Name Processing**: Tests for proper parsing and splitting of author names in different formats.
- **Email Validation**: Tests for proper validation of email addresses.
+- **DOI Validation and Processing**: Parameterized tests for DOI normalization, validation, and filename sanitization with various inputs.
+- **Phase Management**: Tests for checking publication year against defined project phases, including boundary cases.
+- **Name Processing**: Extensive tests for parsing and splitting author names in different formats (with/without commas, middle initials, etc.).
+- **Email Validation**: Tests for proper validation of email addresses with various domain configurations.

 #### API Integration Tests

 - **Mock API Responses**: Tests that use a saved OpenAlex API response (`srep45389.json`) to simulate API interactions without making actual network requests.
 - **Data Fetching**: Tests for retrieving and parsing data from the OpenAlex API.
- **Abstract Extraction**: Tests for extracting and cleaning abstracts from OpenAlex's inverted index format.
+- **Abstract Extraction**: Tests for extracting and cleaning abstracts from OpenAlex's inverted index format, including handling of empty or malformed abstracts.
 - **Subject Mapping**: Tests for mapping OpenAlex topics to controlled vocabulary subject terms.

 #### Metadata Processing Tests

 - **Citation Building**: Tests for properly building citation metadata from API responses.
- **License Processing**: Tests for correctly identifying and formatting license information.
+- **License Processing**: Tests for correctly identifying and formatting license information from various license IDs.
 - **Principal Investigator Matching**: Tests for finding project PIs based on ORCID identifiers.
 - **Configuration Loading**: Tests for properly loading and validating configuration from files.
 - **Metadata Workflow**: Tests for the complete metadata processing workflow.

-These tests ensure that all components work correctly in isolation and together as a system.
+These tests ensure that all components work correctly in isolation and together as a system, with special attention to edge cases and error handling.

 ## Contributing