API Reference¶
DeepSearch Toolkit
artifacts
¶
artifact_manager
¶
ARTF_META_FILENAME = os.getenv('DEEPSEARCH_ARTIFACT_META_FILENAME', default='meta.info')
module-attribute
¶
ARTF_META_URL_FIELD = os.getenv('DEEPSEARCH_ARTIFACT_URL_FIELD', default='static_url')
module-attribute
¶
DFLT_ARTFCT_CACHE_DIR = os.getenv('DEEPSEARCH_ARTIFACT_CACHE', default=Path(platformdirs.user_cache_dir('deepsearch', 'ibm')) / 'artifact_cache')
module-attribute
¶
DFLT_ARTFCT_INDEX_DIR = os.getenv('DEEPSEARCH_ARTIFACT_INDEX', default=os.getcwd())
module-attribute
¶
ArtifactManager
¶
HitStrategy
¶
__init__(index=None, cache=None)
¶
download_artifact_to_cache(artifact_name, unpack_archives=True, hit_strategy=HitStrategy.OVERWRITE, with_progress_bar=False)
¶
get_artifact_path_in_cache(artifact_name)
¶
get_artifacts_in_cache()
¶
get_artifacts_in_index()
¶
get_cache_path()
¶
get_index_path()
¶
chemistry
¶
models
¶
ChemistryCompound
¶
Bases: ChemistryModel
display_name
instance-attribute
¶
User friendly representation of compound.
inchi
instance-attribute
¶
InChI representation of compound structure.
inchikey
instance-attribute
¶
Hashed form of InChI.
smiles
instance-attribute
¶
SMILES representation of compound structure.
sum_formula
instance-attribute
¶
Sum formula of compound. For example 'C6 O2 H5'
ChemistryDocument
¶
Bases: ChemistryModel
queries
¶
ChemistryCompound
¶
Bases: ChemistryModel
display_name
instance-attribute
¶
User friendly representation of compound.
inchi
instance-attribute
¶
InChI representation of compound structure.
inchikey
instance-attribute
¶
Hashed form of InChI.
smiles
instance-attribute
¶
SMILES representation of compound structure.
sum_formula
instance-attribute
¶
Sum formula of compound. For example 'C6 O2 H5'
ChemistryDocument
¶
Bases: ChemistryModel
ChemistryQuery
¶
Bases: BaseModel
, ABC
CompoundsByIds
¶
Bases: CompoundsQuery
Query compounds that have any of the given identifiers.
CompoundsBySimilarity
¶
Bases: CompoundsQuery
Query compounds that are similar to the given SMILES code.
CompoundsBySmarts
¶
Bases: CompoundsQuery
Query compounds that (exactly) match the given SMARTS code.
structure
instance-attribute
¶
CompoundsBySmiles
¶
Bases: CompoundsQuery
Query compounds that (exactly) match the given SMILES code.
structure
instance-attribute
¶
CompoundsBySubstructure
¶
Bases: CompoundsQuery
Query compounds that contain a substructure with the given SMILES code.
structure
instance-attribute
¶
CompoundsIn
¶
Bases: CompoundsQuery
Query compounds that occur in the given documents.
documents
instance-attribute
¶
CompoundsQuery
¶
Bases: ChemistryQuery
DocumentsByIds
¶
Bases: DocumentsQuery
Query documents that have any of the given identifiers.
DocumentsHaving
¶
Bases: DocumentsQuery
Query documents that contain compounds matching the given query.
compounds
instance-attribute
¶
DocumentsQuery
¶
Bases: ChemistryQuery
Query
¶
query_chemistry(api, query, offset=0, limit=10)
¶
Perform a chemistry query on the knowledge base.
api
¶
CpsApi
¶
data_catalogs
instance-attribute
¶
data_indices
instance-attribute
¶
documents
instance-attribute
¶
elastic
instance-attribute
¶
knowledge_graphs
instance-attribute
¶
projects
instance-attribute
¶
queries
instance-attribute
¶
tasks
instance-attribute
¶
uploader
instance-attribute
¶
__init__(client)
¶
from_env(profile_name=None)
classmethod
¶
Create an API object resolving the required settings from the environment if possible, otherwise from a stored profile.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
profile_name
|
Optional[str]
|
profile to use if resolution from environment not possible. Defaults to None (active profile). |
None
|
Returns:
Name | Type | Description |
---|---|---|
CpsApi |
CpsApi
|
the created API object |
from_settings(settings)
classmethod
¶
Create an API object from the provided settings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
settings
|
ProfileSettings
|
the settings to use. |
required |
Returns:
Name | Type | Description |
---|---|---|
CpsApi |
CpsApi
|
the created API object |
refresh_token(admin=False)
¶
Refresh access token
Parameters:
Name | Type | Description | Default |
---|---|---|---|
admin
|
bool
|
controls whether an admin token should be requested. Defaults to False. |
False
|
Raises:
Type | Description |
---|---|
RuntimeError
|
raised in case API Key or User is invalid |
CpsApiClient
¶
molecules
¶
CHEMVECDB_COLLECTIONS = {MolQueryType.SIMILARITY: 'patcid_tanimoto', MolQueryType.SUBSTRUCTURE: 'patcid_substructure'}
module-attribute
¶
MolIdType
¶
MolQueryLang
¶
MolQueryType
¶
MoleculeQuery(query, query_type, query_lang=MolQueryLang.SMILES, num_items=10)
¶
Use the knowledge database in Deep Search for querying molecules
by substructure or similarity.
The result is contained in the molecules
output of the response.
MoleculesInPatentsQuery(patents, num_items=10, partial_lookup=False)
¶
List all molecules contained in a list of patents.
The result is contained in the molecules
output of the response.
PatentsWithMoleculesQuery(molecules, num_items=10)
¶
List all patents containing any of the input molecules.
The result is contained in the patents
output of the response.
queries
¶
ChemistryQuery
¶
Bases: BaseModel
, ABC
CompoundsByIds
¶
Bases: CompoundsQuery
Query compounds that have any of the given identifiers.
CompoundsBySimilarity
¶
Bases: CompoundsQuery
Query compounds that are similar to the given SMILES code.
CompoundsBySmarts
¶
Bases: CompoundsQuery
Query compounds that (exactly) match the given SMARTS code.
structure
instance-attribute
¶
CompoundsBySmiles
¶
Bases: CompoundsQuery
Query compounds that (exactly) match the given SMILES code.
structure
instance-attribute
¶
CompoundsBySubstructure
¶
Bases: CompoundsQuery
Query compounds that contain a substructure with the given SMILES code.
structure
instance-attribute
¶
CompoundsIn
¶
Bases: CompoundsQuery
Query compounds that occur in the given documents.
documents
instance-attribute
¶
CompoundsQuery
¶
Bases: ChemistryQuery
DocumentsByIds
¶
Bases: DocumentsQuery
Query documents that have any of the given identifiers.
DocumentsHaving
¶
Bases: DocumentsQuery
Query documents that contain compounds matching the given query.
compounds
instance-attribute
¶
DocumentsQuery
¶
Bases: ChemistryQuery
query_chemistry(api, query, offset=0, limit=10)
¶
Perform a chemistry query on the knowledge base.
cps
¶
__all__ = ['CpsApi', 'CpsApiClient']
module-attribute
¶
CpsApi
¶
data_catalogs
instance-attribute
¶
data_indices
instance-attribute
¶
documents
instance-attribute
¶
elastic
instance-attribute
¶
knowledge_graphs
instance-attribute
¶
projects
instance-attribute
¶
queries
instance-attribute
¶
tasks
instance-attribute
¶
uploader
instance-attribute
¶
__init__(client)
¶
from_env(profile_name=None)
classmethod
¶
Create an API object resolving the required settings from the environment if possible, otherwise from a stored profile.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
profile_name
|
Optional[str]
|
profile to use if resolution from environment not possible. Defaults to None (active profile). |
None
|
Returns:
Name | Type | Description |
---|---|---|
CpsApi |
CpsApi
|
the created API object |
from_settings(settings)
classmethod
¶
Create an API object from the provided settings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
settings
|
ProfileSettings
|
the settings to use. |
required |
Returns:
Name | Type | Description |
---|---|---|
CpsApi |
CpsApi
|
the created API object |
refresh_token(admin=False)
¶
Refresh access token
Parameters:
Name | Type | Description | Default |
---|---|---|---|
admin
|
bool
|
controls whether an admin token should be requested. Defaults to False. |
False
|
Raises:
Type | Description |
---|---|
RuntimeError
|
raised in case API Key or User is invalid |
CpsApiClient
¶
data_indices
¶
utils
¶
logger = logging.getLogger(__name__)
module-attribute
¶
process_external_cos(api, coords, s3_coordinates, progress_bar=False)
¶
Individual files are processed before upload.
process_local_file(api, coords, local_file, progress_bar=False, conv_settings=None, target_settings=None)
¶
Individual files are uploaded for conversion and storage in data index.
process_url_input(api, coords, urls, url_chunk_size, progress_bar=False)
¶
Individual urls are uploaded for conversion and storage in data index.
upload_files(api, coords, url=None, local_file=None, s3_coordinates=None, conv_settings=None, target_settings=None, url_chunk_size=1)
¶
Orchestrate document conversion and upload to an index in a project
kg
¶
workflow
¶
wf_functions
¶
run(wf, config)
¶
Run the workflow against the given KG :param workflow: Workflow object :type workflow: Workflow :param config: Knowledge Graph API Configuration :type config: Configuration :returns workflow results
validate(wf, config)
¶
Validate the workflow DAG :param workflow: Workflow object :type workflow: Workflow :param config: Knowledge Graph API Configuration :type config: Configuration
workflow
¶
Workflow
¶
__add__(workflow)
¶__and__(workflow)
¶__init__(starting_node=None)
¶__mul__(workflow)
¶__or__(workflow)
¶as_output(limit=None)
¶Set node type as output :param limit: Response limit :type limit: int
combine(*workflows)
¶Combine result
:param *workflows
: Nodes to combine
:type *workflows
: List['Workflow']
edge_traversal(edges=[], include=[])
¶Traverse edges :param edges: The edges to traverse :type edges: List[str] :param include: Include nodes in operation :type include: List['Workflow']
filter(filter_type='cut-off', field_operation='==', field_value='', include=[])
¶Filter values :param filter_type: Filter type. Possible values "cut-off", "field-value" :type filter_type: str :param field_operation: The field operation to use if filter type is "field-value". Possible values "<", "==", ">" :type field_operation: str :param field_value: The field value to filter by :type field_value: str :param include: Include nodes in operation :type include: List['Workflow']
filter_categories(*categories, include=[])
¶Filter node type by category :param categories: the categories to filter :type categories: List[str] :param include: Include nodes in operation :type include: List['Workflow']
get_operations()
¶Return workflow operations
intersect(*workflows)
¶Intersect result
:param *workflows
: Nodes to intersect
:type *workflows
: List['Workflow']
matrix_function(matrix_function='abs', include=[])
¶Run result through matrix function :param matrix_function: Scalar function to use. Possible values "e^A", "cosh", "sinh" :type matrix_function: str :param include: Include nodes in operation :type include: List['Workflow']
multiply(*workflows)
¶Multiply result
:param *workflows
: Nodes to multiply
:type *workflows
: List['Workflow']
negate(*workflows)
¶Negate result
:param *workflows
: Nodes to negate
:type *workflows
: List['Workflow']
normalize(normalize_type='RENORMALIZE_L2', include=[])
¶Normalize result :param normalize_type: Normalize type to use. Possible values "RENORMALIZE_L1", "RENORMALIZE_L2", "RENORMALIZE_LINF" :type normalize_type: str :param include: Include nodes in operation :type include: List['Workflow']
pearson_traversal(edges=[], include=[])
¶Traverse edges using pearson traversal :param edges: The edges to traverse :type edges: List[str] :param include: Include nodes in operation :type include: List['Workflow']
scalar_function(scalar_function='abs', include=[])
¶Run result through scalar function :param scalar_function: Scalar function to use. Possible values "uniform", "abs", "inv", "sigmoid", "softmax" :type scalar_function: str :param include: Include nodes in operation :type include: List['Workflow']
search_nodes_by_approximation(*args, tolerance=0.8, include=[])
¶Search nodes where the arguments are approximate
:param *args
: the search arguments
:type *args
: List[str]
:param tolerance: the tolerance
:type tolerance: float
:param include: Include nodes in operation
:type include: List['Workflow']
search_nodes_by_db_id_pair(*args, include=[])
¶Search nodes that contain the db\id pair
:param *args
: the db\id pairs in format {"_db": "db value", "_id": "id value"}
:type *args
: List[str]
:param include: Include nodes in operation
:type include: List['Workflow']
search_nodes_by_index(indices=[], weights=[], include=[])
¶Search nodes by index :param indices: the indices to search :type indices: str :param weights: the weight to search :type weights: float :param include: Include nodes in operation :type include: List['Workflow']
search_nodes_by_regex(*args, include=[])
¶Search nodes by regex that match args
:param *args
: the search arguments
:type *args
: List[str]
:param include: Include nodes in operation
:type include: List['Workflow']
search_nodes_containing(*args, include=[])
¶Search nodes that contain the args
:param *args
: the search arguments
:type *args
: List[str]
:param include: Include nodes in operation
:type include: List['Workflow']
search_nodes_equal(*args, include=[])
¶Search nodes that equal the args
:param *args
: the search arguments
:type *args
: List[str]
:param include: Include nodes in operation
:type include: List['Workflow']
search_nodes_in_category(*categories, include=[])
¶Search nodes in categories :param categories: the categories to search :type categories: List[str] :param include: Include nodes in operation :type include: List['Workflow']
set_to_field_value(field_name='', include=[])
¶Set node to field value :param field_name: The field name :type field_name: str :param include: Include nodes in operation :type include: List['Workflow']
split(times=1)
¶Add children to node :param times: Number of children to add :type times: int :returns node childs
sum(*workflows)
¶Sum result
:param *workflows
: Nodes to sum
:type *workflows
: List['Workflow']
to_json(indent=2)
¶Return workflow as json string :param indent: result indentation :type indent: int
queries
¶
ConstrainedWeight = Annotated[float, Field(strict=True, ge=0.0, le=1.0, multiple_of=0.1)]
module-attribute
¶
DataQuery(search_query, *, source=None, aggregations=None, highlight=None, sort=None, limit=20, search_after=None, coordinates)
¶
Fts(search_query, collection_name, kg)
¶
RAGQuery(question, *, project, data_source, retr_k=10, rerank=False, text_weight=0.1, model_id=None, prompt_template=None, gen_params=None, gen_ctx_extr_method='window', gen_ctx_window_size=5000, gen_ctx_window_lead_weight=0.5, return_prompt=False, chunk_refs=None, gen_timeout=None)
¶
Create a RAG query
Parameters:
Name | Type | Description | Default |
---|---|---|---|
question
|
str
|
the natural-language query |
required |
project
|
Union[str, Project]
|
project to use |
required |
data_source
|
DataSource
|
the data source to query |
required |
retr_k
|
int
|
num of items to retrieve; defaults to 10 |
10
|
rerank
|
bool
|
whether to rerank retrieval results; defaults to False |
False
|
text_weight
|
ConstrainedWeight
|
lexical weight for hybrid search; allowed values: {0.0, 0.1, 0.2, ..., 1.0}; defaults to 0.1 |
0.1
|
model_id
|
str
|
the LLM to use for generation; defaults to None, i.e. determined by system |
None
|
prompt_template
|
str
|
the prompt template to use; defaults to None, i.e. determined by system |
None
|
gen_params
|
dict
|
the generation params to send to the Gen AI platforms; defaults to None, i.e. determined by system |
None
|
gen_ctx_extr_method
|
Literal['window', 'page']
|
method for gen context extraction from document; defaults to "window" |
'window'
|
gen_ctx_window_size
|
int
|
(relevant only if gen_ctx_extr_method=="window") max chars to use for extracted gen context (actual extraction quantized on doc item level); defaults to 5000 |
5000
|
gen_ctx_window_lead_weight
|
float
|
(relevant only if gen_ctx_extr_method=="window") weight of leading text for distributing remaining window size after extracting the |
0.5
|
return_prompt
|
bool
|
whether to return the instantiated prompt; defaults to False |
False
|
chunk_refs
|
Optional[List[ChunkRef]]
|
list of explicit chunk references to use instead of performing retrieval; defaults to None (i.e. retrieval-mode) |
None
|
gen_timeout
|
float
|
timeout for LLM generation; defaults to None, i.e. determined by system |
None
|
SemanticQuery(question, *, project, data_source, retr_k=10, rerank=False, text_weight=0.1)
¶
Create a semantic retrieval query
Parameters:
Name | Type | Description | Default |
---|---|---|---|
question
|
str
|
the natural-language query |
required |
document_hash
|
str
|
hash of target document |
required |
project
|
Union[str, Project]
|
project to use |
required |
data_source
|
DataSource
|
the data source to query |
required |
retr_k
|
int
|
num of items to retrieve; defaults to 10 |
10
|
rerank
|
bool
|
whether to rerank retrieval results; defaults to False |
False
|
text_weight
|
ConstrainedWeight
|
lexical weight for hybrid search; allowed values: {0.0, 0.1, 0.2, ..., 1.0}; defaults to 0.1 |
0.1
|
Wf(wf_query, kg)
¶
documents
¶
core
¶
common_routines
¶
ERROR_MSG = f'{dashes}Suggestion:(1) Check your input.(2) Contact Deep Search developers if problem persists.{dashes}'
module-attribute
¶
WELCOME = f'{dashes}{''}Welcome to the Deep Search Toolkit{dashes}'
module-attribute
¶
dashes = f'{'-' * 86}'
module-attribute
¶
progressbar = ProgressBarParameters()
module-attribute
¶
progressbar_length = 30
module-attribute
¶
convert
¶
TASK_STOP_STATUS = ['SUCCESS', 'FAILURE']
module-attribute
¶
logger = logging.getLogger(__name__)
module-attribute
¶
check_cps_single_task_status(sw_api, cps_proj_key, task_id, wait=2)
¶
Check cps status of individual tasks.
check_cps_status_running_tasks(api, cps_proj_key, task_ids, progress_bar=False)
¶
Check status of multiple running cps tasks and optionally display progress with progress bar.
download_converted_documents(result_dir, download_urls, progress_bar=False)
¶
get_wait_task_result(sw_api, cps_proj_key, task_id, wait=2)
¶
Wait and get task result that.
make_payload(url, conversion_settings)
¶
Create payload for requesting conversion
send_file_for_conversion(api, cps_proj_key, source_path, conversion_settings, progress_bar=False)
¶
Send file for conversion.
send_url_for_conversion(api, cps_proj_key, url, conversion_settings, progress_bar=False)
¶
Send online document for conversion.
submit_conversion_payload(api, cps_proj_key, url, conversion_settings)
¶
Convert an online pdf using DeepSearch Technology.
create_report
¶
export
¶
JsonToHTML
¶
__init__()
¶
clean(data, escape=True)
¶
enum_has_ids(enums)
¶
execute(data)
¶
get_body_new(data)
¶
get_page(item)
¶
get_refs(ref)
¶
get_style(item)
¶
get_tablecell_span(cell, ix)
¶
get_title(data)
¶
make_bbox(page, bbox_rect)
¶
make_bbox_dict(page, bbox_rect)
¶
split_item_in_boxes(item)
¶
template()
¶
write_enum(item)
¶
write_table(item)
¶
write_table_simple(item)
¶
export_to_html(document)
¶
export_to_markdown(document)
¶
input_process
¶
process_local_input(api, cps_proj_key, source_path, conversion_settings, progress_bar=False, export_md=False)
¶
Classify the user provided local input and take appropriate action.
process_url_input(api, cps_proj_key, url, conversion_settings, progress_bar=False, export_md=False)
¶
Classify user provided url(s) and take appropriate action.
lookup
¶
main
¶
convert_documents(proj_key, api, url=None, source_path=None, conversion_settings=None, progress_bar=False, export_md=False)
¶
Document conversion via Deep Search Technology. Function to orchestrate document conversion.
Inputs¶
proj_key : string [REQUIRED] Your DeepSearch CPS Project Key. Contact DeepSearch Developers to request one.
url : string [OPTIONAL] For converting documents from the web, please provide a single url.
source_file : path [OPTIONAL] For converting local files, please provide absolute path to file or to directory containing multiple files.
progress_bar : Boolean (default is False in code, True in CLI) Show progress bar for processing, submitting, converting input and downloading converted document.
NOTE: Either url or source_path should be supplied.
models
¶
ExportTarget = Union[ZipTarget, MongoS3Target, ElasticS3Target, COSTarget]
module-attribute
¶
COSTarget
¶
ConversionSettings
¶
DocumentExistsInTargetAction
¶
Bases: str
, Enum
What to do if the document already exists on the target.
- replace
will replace the document, destroying any external modifications.
- skip
will not touch the document on the target, leaving it as-is.
Using skip
will incur in a performance increase, however, if the document
is modified externally, CCS will not update it back to the original state.
ElasticIndexCoordinates
¶
ElasticS3Target
¶
Bases: BaseModel
add_annotations = False
class-attribute
instance-attribute
¶
add_cells = False
class-attribute
instance-attribute
¶
add_raw_pages = False
class-attribute
instance-attribute
¶
coordinates
instance-attribute
¶
escape_ref_fields = Field(default=True, description='If true, `$ref` fields are renamed to `__ref`. This allows the data to then be written into a MongoDB collection.')
class-attribute
instance-attribute
¶
if_document_exists = DocumentExistsInTargetAction.REPLACE
class-attribute
instance-attribute
¶
type = 'elastic_s3'
class-attribute
instance-attribute
¶
ElasticS3TargetCoordinates
¶
MongoCollectionCoordinates
¶
MongoS3Target
¶
MongoS3TargetCoordinates
¶
OCROptions
¶
S3Coordinates
¶
Bases: BaseModel
access_key
instance-attribute
¶
bucket
instance-attribute
¶
external_endpoint = None
class-attribute
instance-attribute
¶
host
instance-attribute
¶
key_infix_format = Field('', description=dedent('\n Control the infix of the object keys that are saved on the document\'s `_s3_data`, after `key_prefix`,\n and before `PDFDocuments/{document_hash}.pdf` or `PDFPages/{page_hash}.pdf`.\n\n By default, the infix is empty.\n For using the name of the index in the coordinates, you can use `key_infix_format = "{index_name}"`.\n\n For example, if:\n\n ```\n key_prefix = "my_prefix/"\n key_infix_format = "{index_name}"\n index_name = "my_elastic_index"\n\n document_hash = "123"\n ```\n\n Then, the document above would be uploaded to: `my_prefix/my_elastic_index/PDFDocuments/123.pdf`.\n\n If one were to set `key_infix_format = ""`, it would be uploaded to `my_prefix/PDFDocuments/123.pdf`.\n\n If one were to set `key_infix_format = "foo"`, it would be uploaded to `my_prefix/foo/PDFDocuments/123.pdf`\n\n Finally, one can combine `{index_name}` with constants and even path separators.\n\n So, `{index_name}/test` would produce `my_prefix/my_elastic_index/test/PDFDocuments/123.pdf`\n '))
class-attribute
instance-attribute
¶
key_prefix = ''
class-attribute
instance-attribute
¶
location
instance-attribute
¶
port
instance-attribute
¶
secret_key
instance-attribute
¶
ssl
instance-attribute
¶
verify_ssl
instance-attribute
¶
TableStructureOptions
¶
TargetSettings
¶
ZipPackageContentType
¶
ZipTarget
¶
Bases: BaseModel
Specify how the documents should be exported to a Zip file. If the [coordinates] are not specified, the project's coordinates will be used.
render
¶
results
¶
DocumentConversionResult
¶
An instance of DocumentConversionResult is generated when document conversion is requested.
export_md = export_md
instance-attribute
¶
proj_key = proj_key
instance-attribute
¶
result = result
instance-attribute
¶
task_id = task_id
instance-attribute
¶
__init__(proj_key, task_id, result, api, source_path=None, source_url=None, batched_files=None, export_md=False)
¶
download_all(result_dir, progress_bar=False)
¶
Download all converted documents.
Input¶
result_dir : path local directory where converted documents will be saved progress_bar: boolean, optional (default = False) shows progress bar is True
generate_report(result_dir, progress_bar=False)
¶
Saves a csv report file for detailed information about the document conversion job. Returns a dictionary object containing counts of files/urls converted.
utils
¶
ALLOWED_FILE_EXTENSIONS = ['.pdf', '.jpg', '.jpeg', '.tiff', '.tif', '.png', '.gif']
module-attribute
¶
batch_single_files(source_path, root_dir, progress_bar=False)
¶
Batch individual input files into zip files.
Output bfiles: List[List[str]] outer list corresponds to each batch inner list corresponds to individual file in a batch
cleanup(root_dir)
¶
Clean temporarily created zip batches.
create_root_dir()
¶
Creates root directory labelled with timestamp
download_url(url, save_path, chunk_size=128)
¶
Download contents from a url.
read_lines(file_path)
¶
Returns list of lines from input file.
write_taskids(result_dir, list_to_write)
¶
Write lines in result_dir
model
¶
base
¶
controller
¶
types
¶
Annotations
¶
Bases: StrictModel
deepsearch_res_ibm_com_x_attempt_number = Field(..., alias='deepsearch.res.ibm.com/x-attempt-number')
class-attribute
instance-attribute
¶
deepsearch_res_ibm_com_x_deadline = Field(..., alias='deepsearch.res.ibm.com/x-deadline')
class-attribute
instance-attribute
¶
deepsearch_res_ibm_com_x_max_attempts = Field(..., alias='deepsearch.res.ibm.com/x-max-attempts')
class-attribute
instance-attribute
¶
deepsearch_res_ibm_com_x_transaction_id = Field(..., alias='deepsearch.res.ibm.com/x-transaction-id')
class-attribute
instance-attribute
¶
BaseAppPredInput
¶
Bases: StrictModel
BaseModelConfig
¶
Bases: BaseModelMetadata
kind
instance-attribute
¶
BaseModelMetadata
¶
CtrlInfoOutputDefs
¶
Kind
¶
Metadata
¶
Bases: StrictModel
annotations
instance-attribute
¶
ModelInfoOutputDefsSpec
¶
StrictModel
¶
Bases: BaseModel
examples
¶
dummy_qa_generator
¶
model
¶
DummyQAGenerator
¶
Bases: BaseQAGenerator
A dummy QA generator which answers a question with the question itself.
simple_geo_nlp_annotator
¶
entities
¶
model
¶
logger = logging.getLogger('cps-nlp')
module-attribute
¶
SimpleGeoNLPAnnotator
¶
Bases: BaseNLPModel
entity_names = list(self._ent_annots.keys())
instance-attribute
¶property_names = []
instance-attribute
¶relationship_names = list(self._rel_annots.keys())
instance-attribute
¶__init__()
¶annotate_batched_entities(object_type, items, entity_names)
¶annotate_batched_properties(object_type, items, entities, property_names)
¶annotate_batched_relationships(object_type, items, entities, relationship_names)
¶get_nlp_config()
¶
relationships
¶
cities_to_countries_annotator
¶
CitiesToCountriesAnnotator
¶
Bases: MultiEntitiesRelationshipAnnotator
__init__()
¶
cities_to_provincies_annotator
¶
CitiesToProvinciesAnnotator
¶
Bases: MultiEntitiesRelationshipAnnotator
__init__()
¶
common
¶
provincies_to_countries_annotator
¶
ProvinciesToCountriesAnnotator
¶
Bases: MultiEntitiesRelationshipAnnotator
__init__()
¶
kinds
¶
nlp
¶
controller
¶
model
¶
BaseNLPModel
¶
Bases: BaseDSModel
annotate_batched_entities(object_type, items, entity_names)
abstractmethod
¶annotate_batched_properties(object_type, items, entities, property_names)
abstractmethod
¶annotate_batched_relationships(object_type, items, entities, relationship_names)
abstractmethod
¶get_config()
¶get_nlp_config()
abstractmethod
¶
types
¶
AnnotateEntitiesOutput = List[Dict[str, List[AnnotateEntitiesEntry]]]
module-attribute
¶
AnnotatePropertiesOutput = List[Dict]
module-attribute
¶
AnnotateRelationshipsOutput = List[Dict[str, AnnotateRelationshipsEntry]]
module-attribute
¶
NLPCtrlPredOutput = Union[NLPEntsCtrlPredOuput, NLPRelsCtrlPredOutput, NLPPropsCtrlPredOutput]
module-attribute
¶
NLPReqSpec = Union[NLPEntitiesReqSpec, NLPRelationshipsReqSpec, NLPPropertiesReqSpec]
module-attribute
¶
AnnotateEntitiesEntry
¶
Bases: StrictModel
AnnotateRelationshipsEntry
¶
Bases: StrictModel
AnnotationLabels
¶
Bases: StrictModel
EntityLabel
¶
Bases: StrictModel
FindEntitiesText
¶
Bases: StrictModel
FindPropertiesText
¶
Bases: StrictModel
FindRelationshipsText
¶
Bases: StrictModel
NLPAppPredInput
¶
Bases: BaseAppPredInput
NLPConfig
¶
Bases: BaseModelConfig
NLPEntitiesReqSpec
¶
Bases: StrictModel
findEntities
instance-attribute
¶
NLPEntsCtrlPredOuput
¶
Bases: StrictModel
entities
instance-attribute
¶
NLPInfoOutput
¶
Bases: CtrlInfoOutput
definitions
instance-attribute
¶
NLPInfoOutputDefinitions
¶
Bases: CtrlInfoOutputDefs
NLPInfoOutputDefinitionsSpec
¶
Bases: ModelInfoOutputDefsSpec
metadata
instance-attribute
¶
NLPModelMetadata
¶
Bases: BaseModelMetadata
supported_object_types
instance-attribute
¶
NLPPropertiesReqSpec
¶
Bases: StrictModel
findProperties
instance-attribute
¶
NLPPropsCtrlPredOutput
¶
Bases: StrictModel
properties
instance-attribute
¶
NLPRelationshipsReqSpec
¶
Bases: StrictModel
findRelationships
instance-attribute
¶
NLPRelsCtrlPredOutput
¶
Bases: StrictModel
relationships
instance-attribute
¶
PropertyLabel
¶
Bases: StrictModel
RelationshipColumn
¶
Bases: StrictModel
RelationshipLabel
¶
Bases: StrictModel
qagen
¶
controller
¶
model
¶
BaseQAGenerator
¶
Bases: BaseDSModel
types
¶
GenerateAnswersOutput = List[GenerateAnswersOutEntry]
module-attribute
¶
ContextEntry
¶
Bases: StrictModel
GenerateAnswers
¶
Bases: StrictModel
GenerateAnswersOutEntry
¶
Bases: StrictModel
QAGenAppPredInput
¶
Bases: BaseAppPredInput
QAGenConfig
¶
Bases: BaseModelConfig
kind
instance-attribute
¶
QAGenCtrlPredOutput
¶
Bases: StrictModel
answers
instance-attribute
¶
QAGenInfoOutput
¶
Bases: StrictModel
definitions
instance-attribute
¶
QAGenInfoOutputDefinitions
¶
Bases: CtrlInfoOutputDefs
kind
instance-attribute
¶
QAGenReqSpec
¶
Bases: StrictModel
generateAnswers
instance-attribute
¶
server
¶
config
¶
inference_types
¶
AppModelInfoOutput = Union[NLPInfoOutput, QAGenInfoOutput]
module-attribute
¶
AppPredInput = Union[NLPAppPredInput, QAGenAppPredInput]
module-attribute
¶
CtrlPredInput = Union[NLPReqSpec, QAGenReqSpec]
module-attribute
¶
CtrlPredOutput = Union[NLPCtrlPredOutput, QAGenCtrlPredOutput]
module-attribute
¶
model_app
¶
logger = logging.getLogger('cps-fastapi')
module-attribute
¶
ModelApp
¶
app = FastAPI()
instance-attribute
¶
__init__(settings)
¶
register_model(model, name=None, controller=None)
¶
Registers a model with the app.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
BaseDSModel
|
the model to register. |
required |
name
|
Optional[str]
|
an optional name under which to register the model; if not set, the model's default name is used. |
None
|
controller
|
Optional[BaseController]
|
an optional custom controller to use; if not set, the default controller for the kind is used. |
None
|