API Reference¶
Modules¶
MELD: A multilingual and multi-domain dataset for named entity recognition.
Supports fully reproducible downloading, processing, and format normalization of NER datasets.
local_dataset_names(data_directory)
¶
Yields names of datasets located in the specified directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_directory
|
AnyPath
|
Path to the directory containing dataset folders. |
required |
Yields: Name of each dataset.
Source code in meld/formats.py
local_datasets(data_directory)
¶
Yields all locally downloaded datasets within a given directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_directory
|
AnyPath
|
Path to the directory containing dataset metadata files. |
required |
Yields: An iterator over Dataset instances.
Source code in meld/formats.py
This documentation is organized into the following modules:
- Data - Data handling utilities
- Data Pipes - Data pipeline operations
- Tokenization - Tokenization functionality
- Formats - File format support
- CONLL - CONLL format handling
- Readers - Dataset Readers
- Data Stats - Data statistics
- Manifest - Manifest management
- Download - Download utilities
- Dataset Conversion - Conversion between MELD dataset formats
- Registry - Registry pattern base class for, e.g., registering readers and CoNLL dialects