A Python package to crawl OSC (Open Science Catalog) STAC catalogs and ingest them into an eoapi STAC API instance.
pip install .osc-to-eoapi crawl [OPTIONS]Options:
--github-url TEXT: URL to the root OSC STACcatalog.jsonon GitHub. (Default: ESA OSC main branch)--eoapi-url TEXT: URL to theeoapiSTAC API instance. (Default:http://localhost:8080)--update: If a collection/item already exists (409), attempt to update it with aPUTrequest.--overwrite: Force overwrite by deleting existing collections/items before ingestion.--reset-db: Clear all collections from the target STAC API before starting the crawl.--test-endpoint: Perform a health check on the STAC API before starting.--crawl-external: Enable recursive crawling of external STAC links found in the catalog. Includes cycle detection to prevent infinite loops.--kb-cache TEXT: Path to a local JSON file to cache the taxonomies (variables, projects, etc.) to significantly speed up subsequent runs. In order to disable it set it to an empty string. (Default:kb_cache.json)--debug: Enable verbose debug logging to trace the recursive traversal of external catalogs and item discovery. Useful for identifying bottlenecks or infinite loops in remote datasets.
To enable filtering by the custom OSC properties (e.g., osc:project, kb:variable:title), load the queryables schema:
osc-to-eoapi load-queryablesYou can also provide a custom schema:
osc-to-eoapi load-queryables --schema ./my-schema.json- Ensure you have
buildandtwineinstalled:pip install build twine
- Build the package:
python -m build
- Upload to PyPI:
python -m twine upload dist/*
-
Create and activate a Python virtual environment:
python -m venv venv source venv/bin/activate # On Linux/macOS # OR: venv\Scripts\activate # On Windows
-
Install in editable mode with dependencies:
pip install -e .
A docker-compose.yml is provided to easily spin up a local PgSTAC database and eoapi STAC API instance for testing.
-
Start the local infrastructure:
docker compose up -d
-
Export local database credentials (required for
pypgstacto load queryables). A.envfile is provided, which you can source directly:set -a; source .env; set +a
-
Load the custom queryables into the local database:
osc-to-eoapi load-queryables
-
Run the crawler against the local API (running on
http://localhost:8080):osc-to-eoapi crawl --test-endpoint --reset-db --eoapi-url http://localhost:8080
(Add
--crawl-externalif you want to test recursive external link crawling). -
Tear down the local infrastructure and wipe test data when finished:
docker compose down -v