API Reference¶

DeepSearch Toolkit

`artifacts` ¶

`artifact_manager` ¶

`ARTF_META_FILENAME = os.getenv('DEEPSEARCH_ARTIFACT_META_FILENAME', default='meta.info')` `module-attribute` ¶

`ARTF_META_URL_FIELD = os.getenv('DEEPSEARCH_ARTIFACT_URL_FIELD', default='static_url')` `module-attribute` ¶

`DFLT_ARTFCT_CACHE_DIR = os.getenv('DEEPSEARCH_ARTIFACT_CACHE', default=Path(platformdirs.user_cache_dir('deepsearch', 'ibm')) / 'artifact_cache')` `module-attribute` ¶

`DFLT_ARTFCT_INDEX_DIR = os.getenv('DEEPSEARCH_ARTIFACT_INDEX', default=os.getcwd())` `module-attribute` ¶

`ArtifactManager` ¶

`HitStrategy` ¶

Bases: str, Enum

`OVERWRITE = 'overwrite'` `class-attribute` `instance-attribute` ¶

`PASS = 'pass'` `class-attribute` `instance-attribute` ¶

`RAISE = 'raise'` `class-attribute` `instance-attribute` ¶

`init(index=None, cache=None)` ¶

`download_artifact_to_cache(artifact_name, unpack_archives=True, hit_strategy=HitStrategy.OVERWRITE, with_progress_bar=False)` ¶

`get_artifact_path_in_cache(artifact_name)` ¶

`get_artifacts_in_cache()` ¶

`get_artifacts_in_index()` ¶

`get_cache_path()` ¶

`get_index_path()` ¶

`chemistry` ¶

`models` ¶

`ChemistryCompound` ¶

Bases: ChemistryModel

`display_name` `instance-attribute` ¶

User friendly representation of compound.

`inchi` `instance-attribute` ¶

InChI representation of compound structure.

`inchikey` `instance-attribute` ¶

Hashed form of InChI.

`smiles` `instance-attribute` ¶

SMILES representation of compound structure.

`sum_formula` `instance-attribute` ¶

Sum formula of compound. For example 'C6 O2 H5'

`ChemistryDocument` ¶

Bases: ChemistryModel

`application_id` `instance-attribute` ¶

Identifier under which a patent application has been filed.

`publication_id` `instance-attribute` ¶

Identifier under which a patent has been published.

`title` `instance-attribute` ¶

(Readable) title of the document.

`ChemistryModel` ¶

Bases: BaseModel

`id` `instance-attribute` ¶

Transient identifier for short term use.

`persistent_id` `instance-attribute` ¶

Identifier for long term (storage) use.

`queries` ¶

`ChemistryCompound` ¶

Bases: ChemistryModel

`display_name` `instance-attribute` ¶

User friendly representation of compound.

`inchi` `instance-attribute` ¶

InChI representation of compound structure.

`inchikey` `instance-attribute` ¶

Hashed form of InChI.

`smiles` `instance-attribute` ¶

SMILES representation of compound structure.

`sum_formula` `instance-attribute` ¶

Sum formula of compound. For example 'C6 O2 H5'

`ChemistryDocument` ¶

Bases: ChemistryModel

`application_id` `instance-attribute` ¶

Identifier under which a patent application has been filed.

`publication_id` `instance-attribute` ¶

Identifier under which a patent has been published.

`title` `instance-attribute` ¶

(Readable) title of the document.

`ChemistryQuery` ¶

Bases: BaseModel, ABC

`CompoundsByIds` ¶

Bases: CompoundsQuery

Query compounds that have any of the given identifiers.

`inchikeys = []` `class-attribute` `instance-attribute` ¶

`persistent_ids = []` `class-attribute` `instance-attribute` ¶

`CompoundsBySimilarity` ¶

Bases: CompoundsQuery

Query compounds that are similar to the given SMILES code.

`structure` `instance-attribute` ¶

`threshold = 0.9` `class-attribute` `instance-attribute` ¶

`CompoundsBySmarts` ¶

Bases: CompoundsQuery

Query compounds that (exactly) match the given SMARTS code.

`structure` `instance-attribute` ¶

`CompoundsBySmiles` ¶

Bases: CompoundsQuery

Query compounds that (exactly) match the given SMILES code.

`structure` `instance-attribute` ¶

`CompoundsBySubstructure` ¶

Bases: CompoundsQuery

Query compounds that contain a substructure with the given SMILES code.

`structure` `instance-attribute` ¶

`CompoundsIn` ¶

Bases: CompoundsQuery

Query compounds that occur in the given documents.

`documents` `instance-attribute` ¶

`CompoundsQuery` ¶

Bases: ChemistryQuery

`DocumentsByIds` ¶

Bases: DocumentsQuery

Query documents that have any of the given identifiers.

`application_ids = []` `class-attribute` `instance-attribute` ¶

`persistent_ids = []` `class-attribute` `instance-attribute` ¶

`publication_ids = []` `class-attribute` `instance-attribute` ¶

`DocumentsHaving` ¶

Bases: DocumentsQuery

Query documents that contain compounds matching the given query.

`compounds` `instance-attribute` ¶

`DocumentsQuery` ¶

Bases: ChemistryQuery

`KnowledgeDbResource` ¶

`to_resource()` ¶

`Query` ¶

`paginated_task = None` `instance-attribute` ¶

`tasks = []` `instance-attribute` ¶

`variables = {}` `instance-attribute` ¶

`init()` ¶

`add(kind_or_task, *, task_id=None, parameters=None, inputs=None, coordinates=None)` ¶

add(kind_or_task: TTask) -> TTask

add(
    kind_or_task: str,
    *,
    task_id: Optional[str] = None,
    parameters: Optional[Dict[str, Any]] = None,
    inputs: Optional[TaskInputs] = None,
    coordinates: Optional[TaskCoordinates] = None
) -> Task

`parse(value)` `classmethod` ¶

`to_flow()` ¶

`query_chemistry(api, query, offset=0, limit=10)` ¶

query_chemistry(
    api: api.CpsApi,
    query: CompoundsQuery,
    offset: int = 0,
    limit: int = 10,
) -> list[ChemistryCompound]

query_chemistry(
    api: api.CpsApi,
    query: DocumentsQuery,
    offset: int = 0,
    limit: int = 10,
) -> list[ChemistryDocument]

Perform a chemistry query on the knowledge base.

`api` ¶

`CpsApi` ¶

`data_catalogs` `instance-attribute` ¶

`data_indices` `instance-attribute` ¶

`documents` `instance-attribute` ¶

`elastic` `instance-attribute` ¶

`knowledge_graphs` `instance-attribute` ¶

`projects` `instance-attribute` ¶

`queries` `instance-attribute` ¶

`tasks` `instance-attribute` ¶

`uploader` `instance-attribute` ¶

`init(client)` ¶

`from_env(profile_name=None)` `classmethod` ¶

Create an API object resolving the required settings from the environment if possible, otherwise from a stored profile.

Parameters:

Name	Type	Description	Default
`profile_name`	`Optional[str]`	profile to use if resolution from environment not possible. Defaults to None (active profile).	`None`

Returns:

Name	Type	Description
`CpsApi`	`CpsApi`	the created API object

`from_settings(settings)` `classmethod` ¶

Create an API object from the provided settings.

Parameters:

Name	Type	Description	Default
`settings`	`ProfileSettings`	the settings to use.	required

Returns:

Name	Type	Description
`CpsApi`	`CpsApi`	the created API object

`refresh_token(admin=False)` ¶

Refresh access token

Parameters:

Name	Type	Description	Default
`admin`	`bool`	controls whether an admin token should be requested. Defaults to False.	`False`

Raises:

Type	Description
`RuntimeError`	raised in case API Key or User is invalid

`CpsApiClient` ¶

`bearer_token_auth = DeepSearchBearerTokenAuth(bearer_token=self._authenticate_with_api_key(self.config.host, self.config.auth.username, self.config.auth.api_key))` `instance-attribute` ¶

`config = config` `instance-attribute` ¶

`session = requests.Session()` `instance-attribute` ¶

`init(config)` ¶

`molecules` ¶

`CHEMVECDB_COLLECTIONS = {MolQueryType.SIMILARITY: 'patcid_tanimoto', MolQueryType.SUBSTRUCTURE: 'patcid_substructure'}` `module-attribute` ¶

`MolId` ¶

Bases: BaseModel

`type` `instance-attribute` ¶

`value` `instance-attribute` ¶

`MolIdType` ¶

Bases: str, Enum

`INCHI = 'inchi'` `class-attribute` `instance-attribute` ¶

`INCHIKEY = 'inchikey'` `class-attribute` `instance-attribute` ¶

`SMARTS = 'smarts'` `class-attribute` `instance-attribute` ¶

`SMILES = 'smiles'` `class-attribute` `instance-attribute` ¶

`MolQueryLang` ¶

Bases: str, Enum

`SMARTS = 'smarts'` `class-attribute` `instance-attribute` ¶

`SMILES = 'smiles'` `class-attribute` `instance-attribute` ¶

`MolQueryType` ¶

Bases: str, Enum

`SIMILARITY = 'similarity'` `class-attribute` `instance-attribute` ¶

`SUBSTRUCTURE = 'substructure'` `class-attribute` `instance-attribute` ¶

`MoleculeQuery(query, query_type, query_lang=MolQueryLang.SMILES, num_items=10)` ¶

Use the knowledge database in Deep Search for querying molecules by substructure or similarity. The result is contained in the molecules output of the response.

`MoleculesInPatentsQuery(patents, num_items=10, partial_lookup=False)` ¶

List all molecules contained in a list of patents. The result is contained in the molecules output of the response.

`PatentsWithMoleculesQuery(molecules, num_items=10)` ¶

List all patents containing any of the input molecules. The result is contained in the patents output of the response.

`queries` ¶

`ChemistryQuery` ¶

Bases: BaseModel, ABC

`CompoundsByIds` ¶

Bases: CompoundsQuery

Query compounds that have any of the given identifiers.

`inchikeys = []` `class-attribute` `instance-attribute` ¶

`persistent_ids = []` `class-attribute` `instance-attribute` ¶

`CompoundsBySimilarity` ¶

Bases: CompoundsQuery

Query compounds that are similar to the given SMILES code.

`structure` `instance-attribute` ¶

`threshold = 0.9` `class-attribute` `instance-attribute` ¶

`CompoundsBySmarts` ¶

Bases: CompoundsQuery

Query compounds that (exactly) match the given SMARTS code.

`structure` `instance-attribute` ¶

`CompoundsBySmiles` ¶

Bases: CompoundsQuery

Query compounds that (exactly) match the given SMILES code.

`structure` `instance-attribute` ¶

`CompoundsBySubstructure` ¶

Bases: CompoundsQuery

Query compounds that contain a substructure with the given SMILES code.

`structure` `instance-attribute` ¶

`CompoundsIn` ¶

Bases: CompoundsQuery

Query compounds that occur in the given documents.

`documents` `instance-attribute` ¶

`CompoundsQuery` ¶

Bases: ChemistryQuery

`DocumentsByIds` ¶

Bases: DocumentsQuery

Query documents that have any of the given identifiers.

`application_ids = []` `class-attribute` `instance-attribute` ¶

`persistent_ids = []` `class-attribute` `instance-attribute` ¶

`publication_ids = []` `class-attribute` `instance-attribute` ¶

`DocumentsHaving` ¶

Bases: DocumentsQuery

Query documents that contain compounds matching the given query.

`compounds` `instance-attribute` ¶

`DocumentsQuery` ¶

Bases: ChemistryQuery

`query_chemistry(api, query, offset=0, limit=10)` ¶

query_chemistry(
    api: api.CpsApi,
    query: CompoundsQuery,
    offset: int = 0,
    limit: int = 10,
) -> list[ChemistryCompound]

query_chemistry(
    api: api.CpsApi,
    query: DocumentsQuery,
    offset: int = 0,
    limit: int = 10,
) -> list[ChemistryDocument]

Perform a chemistry query on the knowledge base.

`resources` ¶

`ChemVecDbResource` ¶

`to_resource()` ¶

`KnowledgeDbResource` ¶

`to_resource()` ¶

`core` ¶

`DeepSearchBearerTokenAuth` ¶

Bases: BaseModel

`bearer_token` `instance-attribute` ¶

`DeepSearchConfig` ¶

Bases: BaseModel

`auth` `instance-attribute` ¶

`host` `instance-attribute` ¶

`verify_ssl = True` `class-attribute` `instance-attribute` ¶

`DeepSearchKeyAuth` ¶

Bases: BaseModel

`api_key` `instance-attribute` ¶

`username` `instance-attribute` ¶

`util` ¶

`cps` ¶

`all = ['CpsApi', 'CpsApiClient']` `module-attribute` ¶

`CpsApi` ¶

`data_catalogs` `instance-attribute` ¶

`data_indices` `instance-attribute` ¶

`documents` `instance-attribute` ¶

`elastic` `instance-attribute` ¶

`knowledge_graphs` `instance-attribute` ¶

`projects` `instance-attribute` ¶

`queries` `instance-attribute` ¶

`tasks` `instance-attribute` ¶

`uploader` `instance-attribute` ¶

`init(client)` ¶

`from_env(profile_name=None)` `classmethod` ¶

Create an API object resolving the required settings from the environment if possible, otherwise from a stored profile.

Parameters:

Name	Type	Description	Default
`profile_name`	`Optional[str]`	profile to use if resolution from environment not possible. Defaults to None (active profile).	`None`

Returns:

Name	Type	Description
`CpsApi`	`CpsApi`	the created API object

`from_settings(settings)` `classmethod` ¶

Create an API object from the provided settings.

Parameters:

Name	Type	Description	Default
`settings`	`ProfileSettings`	the settings to use.	required

Returns:

Name	Type	Description
`CpsApi`	`CpsApi`	the created API object

`refresh_token(admin=False)` ¶

Refresh access token

Parameters:

Name	Type	Description	Default
`admin`	`bool`	controls whether an admin token should be requested. Defaults to False.	`False`

Raises:

Type	Description
`RuntimeError`	raised in case API Key or User is invalid

`CpsApiClient` ¶

`bearer_token_auth = DeepSearchBearerTokenAuth(bearer_token=self._authenticate_with_api_key(self.config.host, self.config.auth.username, self.config.auth.api_key))` `instance-attribute` ¶

`config = config` `instance-attribute` ¶

`session = requests.Session()` `instance-attribute` ¶

`init(config)` ¶

`data_indices` ¶

`utils` ¶

`logger = logging.getLogger(name)` `module-attribute` ¶

`process_external_cos(api, coords, s3_coordinates, progress_bar=False)` ¶

Individual files are processed before upload.

`process_local_file(api, coords, local_file, progress_bar=False, conv_settings=None, target_settings=None)` ¶

Individual files are uploaded for conversion and storage in data index.

`process_url_input(api, coords, urls, url_chunk_size, progress_bar=False)` ¶

Individual urls are uploaded for conversion and storage in data index.

`upload_files(api, coords, url=None, local_file=None, s3_coordinates=None, conv_settings=None, target_settings=None, url_chunk_size=1)` ¶

Orchestrate document conversion and upload to an index in a project

`kg` ¶

`workflow` ¶

`MultiLinkedList` ¶

head = node instance-attribute ¶

tail = node instance-attribute ¶

__eq__(other) ¶

__init__(node=None) ¶

__ne__(other) ¶

append(value) ¶

append_child(child=None) ¶

flatten_list() ¶

print_list() ¶

`Node` ¶

child = None instance-attribute ¶

data = data instance-attribute ¶

id = id or str(uuid4()) instance-attribute ¶

next = None instance-attribute ¶

prev = None instance-attribute ¶

__init__(data, id=None) ¶

`wf_functions` ¶

`run(wf, config)` ¶

Run the workflow against the given KG :param workflow: Workflow object :type workflow: Workflow :param config: Knowledge Graph API Configuration :type config: Configuration :returns workflow results

`validate(wf, config)` ¶

Validate the workflow DAG :param workflow: Workflow object :type workflow: Workflow :param config: Knowledge Graph API Configuration :type config: Configuration

`workflow` ¶

`Workflow` ¶

__add__(workflow) ¶

__and__(workflow) ¶

__init__(starting_node=None) ¶

__mul__(workflow) ¶

__or__(workflow) ¶

as_output(limit=None) ¶

Set node type as output :param limit: Response limit :type limit: int

combine(*workflows) ¶

Combine result :param *workflows: Nodes to combine :type *workflows: List['Workflow']

edge_traversal(edges=[], include=[]) ¶

Traverse edges :param edges: The edges to traverse :type edges: List[str] :param include: Include nodes in operation :type include: List['Workflow']

filter(filter_type='cut-off', field_operation='==', field_value='', include=[]) ¶

Filter values :param filter_type: Filter type. Possible values "cut-off", "field-value" :type filter_type: str :param field_operation: The field operation to use if filter type is "field-value". Possible values "<", "==", ">" :type field_operation: str :param field_value: The field value to filter by :type field_value: str :param include: Include nodes in operation :type include: List['Workflow']

filter_categories(*categories, include=[]) ¶

Filter node type by category :param categories: the categories to filter :type categories: List[str] :param include: Include nodes in operation :type include: List['Workflow']

get_operations() ¶

Return workflow operations

intersect(*workflows) ¶

Intersect result :param *workflows: Nodes to intersect :type *workflows: List['Workflow']

matrix_function(matrix_function='abs', include=[]) ¶

Run result through matrix function :param matrix_function: Scalar function to use. Possible values "e^A", "cosh", "sinh" :type matrix_function: str :param include: Include nodes in operation :type include: List['Workflow']

multiply(*workflows) ¶

Multiply result :param *workflows: Nodes to multiply :type *workflows: List['Workflow']

negate(*workflows) ¶

Negate result :param *workflows: Nodes to negate :type *workflows: List['Workflow']

normalize(normalize_type='RENORMALIZE_L2', include=[]) ¶

Normalize result :param normalize_type: Normalize type to use. Possible values "RENORMALIZE_L1", "RENORMALIZE_L2", "RENORMALIZE_LINF" :type normalize_type: str :param include: Include nodes in operation :type include: List['Workflow']

pearson_traversal(edges=[], include=[]) ¶

Traverse edges using pearson traversal :param edges: The edges to traverse :type edges: List[str] :param include: Include nodes in operation :type include: List['Workflow']

scalar_function(scalar_function='abs', include=[]) ¶

Run result through scalar function :param scalar_function: Scalar function to use. Possible values "uniform", "abs", "inv", "sigmoid", "softmax" :type scalar_function: str :param include: Include nodes in operation :type include: List['Workflow']

search_nodes_by_approximation(*args, tolerance=0.8, include=[]) ¶

Search nodes where the arguments are approximate :param *args: the search arguments :type *args: List[str] :param tolerance: the tolerance :type tolerance: float :param include: Include nodes in operation :type include: List['Workflow']

search_nodes_by_db_id_pair(*args, include=[]) ¶

Search nodes that contain the db\id pair :param *args: the db\id pairs in format {"_db": "db value", "_id": "id value"} :type *args: List[str] :param include: Include nodes in operation :type include: List['Workflow']

search_nodes_by_index(indices=[], weights=[], include=[]) ¶

Search nodes by index :param indices: the indices to search :type indices: str :param weights: the weight to search :type weights: float :param include: Include nodes in operation :type include: List['Workflow']

search_nodes_by_regex(*args, include=[]) ¶

Search nodes by regex that match args :param *args: the search arguments :type *args: List[str] :param include: Include nodes in operation :type include: List['Workflow']

search_nodes_containing(*args, include=[]) ¶

Search nodes that contain the args :param *args: the search arguments :type *args: List[str] :param include: Include nodes in operation :type include: List['Workflow']

search_nodes_equal(*args, include=[]) ¶

Search nodes that equal the args :param *args: the search arguments :type *args: List[str] :param include: Include nodes in operation :type include: List['Workflow']

search_nodes_in_category(*categories, include=[]) ¶

Search nodes in categories :param categories: the categories to search :type categories: List[str] :param include: Include nodes in operation :type include: List['Workflow']

set_to_field_value(field_name='', include=[]) ¶

Set node to field value :param field_name: The field name :type field_name: str :param include: Include nodes in operation :type include: List['Workflow']

split(times=1) ¶

Add children to node :param times: Number of children to add :type times: int :returns node childs

sum(*workflows) ¶

Sum result :param *workflows: Nodes to sum :type *workflows: List['Workflow']

to_json(indent=2) ¶

Return workflow as json string :param indent: result indentation :type indent: int

`queries` ¶

`ConstrainedWeight = Annotated[float, Field(strict=True, ge=0.0, le=1.0, multiple_of=0.1)]` `module-attribute` ¶

`DataQuery(search_query, *, source=None, aggregations=None, highlight=None, sort=None, limit=20, search_after=None, coordinates)` ¶

`Fts(search_query, collection_name, kg)` ¶

`RAGQuery(question, *, project, data_source, retr_k=10, rerank=False, text_weight=0.1, model_id=None, prompt_template=None, gen_params=None, gen_ctx_extr_method='window', gen_ctx_window_size=5000, gen_ctx_window_lead_weight=0.5, return_prompt=False, chunk_refs=None, gen_timeout=None)` ¶

Create a RAG query

Parameters:

Name	Type	Description	Default
`question`	`str`	the natural-language query	required
`project`	`Union[str, Project]`	project to use	required
`data_source`	`DataSource`	the data source to query	required
`retr_k`	`int`	num of items to retrieve; defaults to 10	`10`
`rerank`	`bool`	whether to rerank retrieval results; defaults to False	`False`
`text_weight`	`ConstrainedWeight`	lexical weight for hybrid search; allowed values: {0.0, 0.1, 0.2, ..., 1.0}; defaults to 0.1	`0.1`
`model_id`	`str`	the LLM to use for generation; defaults to None, i.e. determined by system	`None`
`prompt_template`	`str`	the prompt template to use; defaults to None, i.e. determined by system	`None`
`gen_params`	`dict`	the generation params to send to the Gen AI platforms; defaults to None, i.e. determined by system	`None`
`gen_ctx_extr_method`	`Literal['window', 'page']`	method for gen context extraction from document; defaults to "window"	`'window'`
`gen_ctx_window_size`	`int`	(relevant only if gen_ctx_extr_method=="window") max chars to use for extracted gen context (actual extraction quantized on doc item level); defaults to 5000	`5000`
`gen_ctx_window_lead_weight`	`float`	(relevant only if gen_ctx_extr_method=="window") weight of leading text for distributing remaining window size after extracting the `main_path`; defaults to 0.5 (centered around `main_path`)	`0.5`
`return_prompt`	`bool`	whether to return the instantiated prompt; defaults to False	`False`
`chunk_refs`	`Optional[List[ChunkRef]]`	list of explicit chunk references to use instead of performing retrieval; defaults to None (i.e. retrieval-mode)	`None`
`gen_timeout`	`float`	timeout for LLM generation; defaults to None, i.e. determined by system	`None`

`SemanticQuery(question, *, project, data_source, retr_k=10, rerank=False, text_weight=0.1)` ¶

Create a semantic retrieval query

Parameters:

Name	Type	Description	Default
`question`	`str`	the natural-language query	required
`document_hash`	`str`	hash of target document	required
`project`	`Union[str, Project]`	project to use	required
`data_source`	`DataSource`	the data source to query	required
`retr_k`	`int`	num of items to retrieve; defaults to 10	`10`
`rerank`	`bool`	whether to rerank retrieval results; defaults to False	`False`
`text_weight`	`ConstrainedWeight`	lexical weight for hybrid search; allowed values: {0.0, 0.1, 0.2, ..., 1.0}; defaults to 0.1	`0.1`

`Wf(wf_query, kg)` ¶

`results` ¶

`ChunkRef` ¶

Bases: BaseModel

`doc_hash` `instance-attribute` ¶

`main_path` `instance-attribute` ¶

`path_group` `instance-attribute` ¶

`GenerationError` ¶

Bases: SemanticError

`init(msg='', *args, **kwargs)` ¶

`NoSearchResultsError` ¶

Bases: SemanticError

`init(msg='Search returned no results', *args, **kwargs)` ¶

`RAGAnswerItem` ¶

Bases: BaseModel

`answer` `instance-attribute` ¶

`grounding` `instance-attribute` ¶

`prompt = None` `class-attribute` `instance-attribute` ¶

`RAGGroundingInfo` ¶

Bases: BaseModel

`gen_ctx_paths` `instance-attribute` ¶

`retr_items = None` `class-attribute` `instance-attribute` ¶

`RAGResult` ¶

Bases: BaseModel

`answers` `instance-attribute` ¶

`search_result_items = None` `class-attribute` `instance-attribute` ¶

`from_api_output(data, raise_on_error=True)` `classmethod` ¶

`SearchResult` ¶

Bases: BaseModel

`search_result_items` `instance-attribute` ¶

`from_api_output(data, raise_on_error=True)` `classmethod` ¶

`SearchResultItem` ¶

Bases: ChunkRef

`chunk` `instance-attribute` ¶

`source_is_text` `instance-attribute` ¶

`SemanticError` ¶

Bases: Exception

`documents` ¶

`core` ¶

`common_routines` ¶

`ERROR_MSG = f'{dashes}Suggestion:(1) Check your input.(2) Contact Deep Search developers if problem persists.{dashes}'` `module-attribute` ¶

`WELCOME = f'{dashes}{''}Welcome to the Deep Search Toolkit{dashes}'` `module-attribute` ¶

`dashes = f'{'-' * 86}'` `module-attribute` ¶

`progressbar = ProgressBarParameters()` `module-attribute` ¶

`progressbar_length = 30` `module-attribute` ¶

`ProgressBarParameters` `dataclass` ¶

`bar_format = '{l_bar}{bar:%d}{r_bar}{bar:-10b}' % progressbar_length` `class-attribute` `instance-attribute` ¶

`colour = '#0f62fe'` `class-attribute` `instance-attribute` ¶

`padding = 22` `class-attribute` `instance-attribute` ¶

`init()` ¶

`convert` ¶

`TASK_STOP_STATUS = ['SUCCESS', 'FAILURE']` `module-attribute` ¶

`logger = logging.getLogger(name)` `module-attribute` ¶

`check_cps_single_task_status(sw_api, cps_proj_key, task_id, wait=2)` ¶

Check cps status of individual tasks.

`check_cps_status_running_tasks(api, cps_proj_key, task_ids, progress_bar=False)` ¶

Check status of multiple running cps tasks and optionally display progress with progress bar.

`download_converted_documents(result_dir, download_urls, progress_bar=False)` ¶

Download converted documents.

Input¶

path

directory for saving converted json doc

`get_wait_task_result(sw_api, cps_proj_key, task_id, wait=2)` ¶

Wait and get task result that.

`make_payload(url, conversion_settings)` ¶

Create payload for requesting conversion

`send_file_for_conversion(api, cps_proj_key, source_path, conversion_settings, progress_bar=False)` ¶

Send file for conversion.

`send_url_for_conversion(api, cps_proj_key, url, conversion_settings, progress_bar=False)` ¶

Send online document for conversion.

`submit_conversion_payload(api, cps_proj_key, url, conversion_settings)` ¶

Convert an online pdf using DeepSearch Technology.

`create_report` ¶

`logger = logging.getLogger(name)` `module-attribute` ¶

`generate_report_csv(task_result, task_id, result_dir, progress_bar=False)` ¶

Generate report of a document conversion task id and saves in a csv file

`export` ¶

`JsonToHTML` ¶

`init()` ¶

`clean(data, escape=True)` ¶

`enum_has_ids(enums)` ¶

`execute(data)` ¶

`get_body_new(data)` ¶

`get_page(item)` ¶

`get_refs(ref)` ¶

`get_style(item)` ¶

`get_tablecell_span(cell, ix)` ¶

`get_title(data)` ¶

`make_bbox(page, bbox_rect)` ¶

`make_bbox_dict(page, bbox_rect)` ¶

`split_item_in_boxes(item)` ¶

`template()` ¶

`write_enum(item)` ¶

`write_table(item)` ¶

`write_table_simple(item)` ¶

`export_to_html(document)` ¶

`export_to_markdown(document)` ¶

`input_process` ¶

`process_local_input(api, cps_proj_key, source_path, conversion_settings, progress_bar=False, export_md=False)` ¶

Classify the user provided local input and take appropriate action.

`process_url_input(api, cps_proj_key, url, conversion_settings, progress_bar=False, export_md=False)` ¶

Classify user provided url(s) and take appropriate action.

`lookup` ¶

`EntitiesLookup` ¶

`document = document` `instance-attribute` ¶

`init(document)` ¶

`get(*, entity_type, entity)` ¶

Lookup where a given entity is mentioned in a document.

`main` ¶

`convert_documents(proj_key, api, url=None, source_path=None, conversion_settings=None, progress_bar=False, export_md=False)` ¶

Document conversion via Deep Search Technology. Function to orchestrate document conversion.

Inputs¶

proj_key : string [REQUIRED] Your DeepSearch CPS Project Key. Contact DeepSearch Developers to request one.

url : string [OPTIONAL] For converting documents from the web, please provide a single url.

source_file : path [OPTIONAL] For converting local files, please provide absolute path to file or to directory containing multiple files.

progress_bar : Boolean (default is False in code, True in CLI) Show progress bar for processing, submitting, converting input and downloading converted document.

NOTE: Either url or source_path should be supplied.

`models` ¶

`ExportTarget = Union[ZipTarget, MongoS3Target, ElasticS3Target, COSTarget]` `module-attribute` ¶

`COSTarget` ¶

Bases: BaseModel

`add_annotations = False` `class-attribute` `instance-attribute` ¶

`add_raw_pages = False` `class-attribute` `instance-attribute` ¶

`coordinates` `instance-attribute` ¶

`type = 'cos'` `class-attribute` `instance-attribute` ¶

`ConversionSettings` ¶

Bases: BaseModel

`ocr = OCROptions()` `class-attribute` `instance-attribute` ¶

`table_structure = TableStructureOptions()` `class-attribute` `instance-attribute` ¶

`DocumentExistsInTargetAction` ¶

Bases: str, Enum

What to do if the document already exists on the target. - replace will replace the document, destroying any external modifications. - skip will not touch the document on the target, leaving it as-is. Using skip will incur in a performance increase, however, if the document is modified externally, CCS will not update it back to the original state.

`REPLACE = 'replace'` `class-attribute` `instance-attribute` ¶

`SKIP = 'skip'` `class-attribute` `instance-attribute` ¶

`ElasticIndexCoordinates` ¶

Bases: BaseModel

`ca_certificate_base64 = None` `class-attribute` `instance-attribute` ¶

`dangerously_disable_ssl_validation = False` `class-attribute` `instance-attribute` ¶

`hosts` `instance-attribute` ¶

`index` `instance-attribute` ¶

`ElasticS3Target` ¶

Bases: BaseModel

`add_annotations = False` `class-attribute` `instance-attribute` ¶

`add_cells = False` `class-attribute` `instance-attribute` ¶

`add_raw_pages = False` `class-attribute` `instance-attribute` ¶

`coordinates` `instance-attribute` ¶

escape_ref_fields = Field(default=True, description='If true, `$ref` fields are renamed to `__ref`. This allows the data to then be written into a MongoDB collection.') `class-attribute` `instance-attribute` ¶

`if_document_exists = DocumentExistsInTargetAction.REPLACE` `class-attribute` `instance-attribute` ¶

`type = 'elastic_s3'` `class-attribute` `instance-attribute` ¶

`ElasticS3TargetCoordinates` ¶

Bases: BaseModel

`elastic` `instance-attribute` ¶

`s3 = None` `class-attribute` `instance-attribute` ¶

`MongoCollectionCoordinates` ¶

Bases: BaseModel

`collection` `instance-attribute` ¶

`database` `instance-attribute` ¶

`uri` `instance-attribute` ¶

`MongoS3Target` ¶

Bases: BaseModel

`coordinates` `instance-attribute` ¶

`if_document_exists = DocumentExistsInTargetAction.REPLACE` `class-attribute` `instance-attribute` ¶

`type = 'mongo_s3'` `class-attribute` `instance-attribute` ¶

`MongoS3TargetCoordinates` ¶

Bases: BaseModel

Coordinates to a Mongo collection, and optionally, an S3 bucket

`mongo` `instance-attribute` ¶

`s3 = None` `class-attribute` `instance-attribute` ¶

`OCROptions` ¶

Bases: BaseModel

`do_ocr = True` `class-attribute` `instance-attribute` ¶

`kind = 'easyocr'` `class-attribute` `instance-attribute` ¶

`S3Coordinates` ¶

Bases: BaseModel

`access_key` `instance-attribute` ¶

`bucket` `instance-attribute` ¶

`external_endpoint = None` `class-attribute` `instance-attribute` ¶

`host` `instance-attribute` ¶

key_infix_format = Field('', description=dedent('\n Control the infix of the object keys that are saved on the document\'s `_s3_data`, after `key_prefix`,\n and before `PDFDocuments/{document_hash}.pdf` or `PDFPages/{page_hash}.pdf`.\n\n By default, the infix is empty.\n For using the name of the index in the coordinates, you can use `key_infix_format = "{index_name}"`.\n\n For example, if:\n\n ```\n key_prefix = "my_prefix/"\n key_infix_format = "{index_name}"\n index_name = "my_elastic_index"\n\n document_hash = "123"\n ```\n\n Then, the document above would be uploaded to: `my_prefix/my_elastic_index/PDFDocuments/123.pdf`.\n\n If one were to set `key_infix_format = ""`, it would be uploaded to `my_prefix/PDFDocuments/123.pdf`.\n\n If one were to set `key_infix_format = "foo"`, it would be uploaded to `my_prefix/foo/PDFDocuments/123.pdf`\n\n Finally, one can combine `{index_name}` with constants and even path separators.\n\n So, `{index_name}/test` would produce `my_prefix/my_elastic_index/test/PDFDocuments/123.pdf`\n ')) `class-attribute` `instance-attribute` ¶

`key_prefix = ''` `class-attribute` `instance-attribute` ¶

`location` `instance-attribute` ¶

`port` `instance-attribute` ¶

`secret_key` `instance-attribute` ¶

`ssl` `instance-attribute` ¶

`verify_ssl` `instance-attribute` ¶

`TableStructureOptions` ¶

Bases: BaseModel

`do_table_structure = True` `class-attribute` `instance-attribute` ¶

`table_structure_mode = 'fast'` `class-attribute` `instance-attribute` ¶

`TargetSettings` ¶

Bases: BaseModel

`add_annotations = None` `class-attribute` `instance-attribute` ¶

`add_raw_pages = None` `class-attribute` `instance-attribute` ¶

`check_raw_or_ann()` ¶

`ZipPackageContentType` ¶

Bases: str, Enum

Specify the content type for the documents in the Zip file.

`HTML = 'html'` `class-attribute` `instance-attribute` ¶

`JSON = 'json'` `class-attribute` `instance-attribute` ¶

`ZipTarget` ¶

Bases: BaseModel

Specify how the documents should be exported to a Zip file. If the [coordinates] are not specified, the project's coordinates will be used.

`add_cells = False` `class-attribute` `instance-attribute` ¶

`content_type = ZipPackageContentType.JSON` `class-attribute` `instance-attribute` ¶

`type = 'zip'` `class-attribute` `instance-attribute` ¶

`render` ¶

`get_figure_svg(document, figure)` ¶

Generates a SVG which crops the figure from the image of the document page.

`get_page_svg_with_item(document, item)` ¶

Generates a SVG which overlays the bounding-box of the item with the image of the page.

`results` ¶

`DocumentConversionResult` ¶

An instance of DocumentConversionResult is generated when document conversion is requested.

`export_md = export_md` `instance-attribute` ¶

`proj_key = proj_key` `instance-attribute` ¶

`result = result` `instance-attribute` ¶

`task_id = task_id` `instance-attribute` ¶

`init(proj_key, task_id, result, api, source_path=None, source_url=None, batched_files=None, export_md=False)` ¶

`download_all(result_dir, progress_bar=False)` ¶

Download all converted documents.

Input¶

result_dir : path local directory where converted documents will be saved progress_bar: boolean, optional (default = False) shows progress bar is True

`generate_report(result_dir, progress_bar=False)` ¶

Saves a csv report file for detailed information about the document conversion job. Returns a dictionary object containing counts of files/urls converted.

`utils` ¶

`ALLOWED_FILE_EXTENSIONS = ['.pdf', '.jpg', '.jpeg', '.tiff', '.tif', '.png', '.gif']` `module-attribute` ¶

`batch_single_files(source_path, root_dir, progress_bar=False)` ¶

Batch individual input files into zip files.

Output bfiles: List[List[str]] outer list corresponds to each batch inner list corresponds to individual file in a batch

`cleanup(root_dir)` ¶

Clean temporarily created zip batches.

`create_root_dir()` ¶

Creates root directory labelled with timestamp

`download_url(url, save_path, chunk_size=128)` ¶

Download contents from a url.

`read_lines(file_path)` ¶

Returns list of lines from input file.

`write_taskids(result_dir, list_to_write)` ¶

Write lines in result_dir

`model` ¶

`base` ¶

`controller` ¶

`BaseController` ¶

Bases: ABC

`dispatch_predict(spec)` `abstractmethod` ¶

`get_info()` `abstractmethod` ¶

`get_kind()` `abstractmethod` ¶

`get_model_exec_time()` ¶

`get_model_kind()` ¶

`get_model_name()` ¶

`model` ¶

`BaseDSModel` ¶

Bases: ABC

`get_config()` `abstractmethod` ¶

`types` ¶

`Annotations` ¶

Bases: StrictModel

`deepsearch_res_ibm_com_x_attempt_number = Field(..., alias='deepsearch.res.ibm.com/x-attempt-number')` `class-attribute` `instance-attribute` ¶

`deepsearch_res_ibm_com_x_deadline = Field(..., alias='deepsearch.res.ibm.com/x-deadline')` `class-attribute` `instance-attribute` ¶

`deepsearch_res_ibm_com_x_max_attempts = Field(..., alias='deepsearch.res.ibm.com/x-max-attempts')` `class-attribute` `instance-attribute` ¶

`deepsearch_res_ibm_com_x_transaction_id = Field(..., alias='deepsearch.res.ibm.com/x-transaction-id')` `class-attribute` `instance-attribute` ¶

`BaseAppPredInput` ¶

Bases: StrictModel

`apiVersion` `instance-attribute` ¶

`kind` `instance-attribute` ¶

`metadata` `instance-attribute` ¶

`spec` `instance-attribute` ¶

`BaseModelConfig` ¶

Bases: BaseModelMetadata

`kind` `instance-attribute` ¶

`BaseModelMetadata` ¶

Bases: StrictModel

`author = None` `class-attribute` `instance-attribute` ¶

`description = None` `class-attribute` `instance-attribute` ¶

`expected_compute_time = None` `class-attribute` `instance-attribute` ¶

`name` `instance-attribute` ¶

`url = None` `class-attribute` `instance-attribute` ¶

`version` `instance-attribute` ¶

`CtrlInfoOutput` ¶

Bases: BaseModel

`definitions` `instance-attribute` ¶

`CtrlInfoOutputDefs` ¶

Bases: BaseModel

`apiVersion` `instance-attribute` ¶

`kind` `instance-attribute` ¶

`spec` `instance-attribute` ¶

`Kind` ¶

Bases: str, Enum

`NLPModel = 'NLPModel'` `class-attribute` `instance-attribute` ¶

`QAGenModel = 'QAGenModel'` `class-attribute` `instance-attribute` ¶

`Metadata` ¶

Bases: StrictModel

`annotations` `instance-attribute` ¶

`ModelInfoOutputDefsSpec` ¶

Bases: BaseModel

`definition` `instance-attribute` ¶

`metadata` `instance-attribute` ¶

`StrictModel` ¶

Bases: BaseModel

`examples` ¶

`dummy_nlp_annotator` ¶

`main` ¶

`run()` ¶

`model` ¶

`DummyNLPAnnotator` ¶

Bases: BaseNLPModel

__init__() ¶

annotate_batched_entities(object_type, items, entity_names) ¶

annotate_batched_properties(object_type, items, entities, property_names) ¶

annotate_batched_relationships(object_type, items, entities, relationship_names) ¶

get_nlp_config() ¶

`dummy_qa_generator` ¶

`main` ¶

`run()` ¶

`model` ¶

`DummyQAGenerator` ¶

Bases: BaseQAGenerator

A dummy QA generator which answers a question with the question itself.

generate_answers(texts, extras) ¶

Just answers with the question itself. Args: texts: a list of context, question pairs. extras: any extras to pass.

get_qagen_config() ¶

`simple_geo_nlp_annotator` ¶

`entities` ¶

`cities_annotator` ¶

CitiesAnnotator ¶

Bases: DictionaryTextEntityAnnotator

__init__() ¶

description() ¶

key() ¶

`common` ¶

base_text_entity_annotator ¶

BaseTextEntityAnnotator ¶

annotate_entities_text(text) abstractmethod ¶

description() abstractmethod ¶

initialize() ¶

key() abstractmethod ¶

dictionary_text_entity_annotator ¶

logger = logging.getLogger('cps-nlp') module-attribute ¶

Config dataclass ¶

dictionary_filename instance-attribute ¶

__init__(dictionary_filename) ¶

DictionaryTextEntityAnnotator ¶

Bases: BaseTextEntityAnnotator

config = config instance-attribute ¶

__init__(config) ¶

annotate_entities_text(text) ¶

initialize() ¶

utils ¶

resources_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '../../resources')) module-attribute ¶

`countries_annotator` ¶

CountriesAnnotator ¶

Bases: DictionaryTextEntityAnnotator

__init__() ¶

description() ¶

key() ¶

`provincies_annotator` ¶

ProvinciesAnnotator ¶

Bases: DictionaryTextEntityAnnotator

__init__() ¶

description() ¶

key() ¶

`main` ¶

`run()` ¶

`model` ¶

`logger = logging.getLogger('cps-nlp')` `module-attribute` ¶

`SimpleGeoNLPAnnotator` ¶

Bases: BaseNLPModel

entity_names = list(self._ent_annots.keys()) instance-attribute ¶

property_names = [] instance-attribute ¶

relationship_names = list(self._rel_annots.keys()) instance-attribute ¶

__init__() ¶

annotate_batched_entities(object_type, items, entity_names) ¶

annotate_batched_properties(object_type, items, entities, property_names) ¶

annotate_batched_relationships(object_type, items, entities, relationship_names) ¶

get_nlp_config() ¶

`relationships` ¶

`cities_to_countries_annotator` ¶

CitiesToCountriesAnnotator ¶

Bases: MultiEntitiesRelationshipAnnotator

__init__() ¶

`cities_to_provincies_annotator` ¶

CitiesToProvinciesAnnotator ¶

Bases: MultiEntitiesRelationshipAnnotator

__init__() ¶

`common` ¶

base_text_relationship_annotator ¶

BaseTextRelationshipAnnotator ¶

annotate_relationships_text(text, entity_map, relationship_name) abstractmethod ¶

columns() abstractmethod ¶

description() abstractmethod ¶

key() abstractmethod ¶

multi_entities_relationship_annotator ¶

logger = logging.getLogger('cps-nlp') module-attribute ¶

Config dataclass ¶

entities instance-attribute ¶

__init__(entities) ¶

MultiEntitiesRelationshipAnnotator ¶

Bases: BaseTextRelationshipAnnotator

Create a relationship if all entitiy types are in the given text input.

__init__(config) ¶

annotate_relationships_text(text, entity_map, relationship_name) ¶

columns() ¶

description() ¶

key() ¶

`provincies_to_countries_annotator` ¶

ProvinciesToCountriesAnnotator ¶

Bases: MultiEntitiesRelationshipAnnotator

__init__() ¶

`kinds` ¶

`nlp` ¶

`controller` ¶

`NLPController` ¶

Bases: BaseController

__init__(model) ¶

dispatch_predict(spec) ¶

get_info() ¶

get_kind() ¶

`model` ¶

`BaseNLPModel` ¶

Bases: BaseDSModel

annotate_batched_entities(object_type, items, entity_names) abstractmethod ¶

annotate_batched_properties(object_type, items, entities, property_names) abstractmethod ¶

annotate_batched_relationships(object_type, items, entities, relationship_names) abstractmethod ¶

get_config() ¶

get_nlp_config() abstractmethod ¶

`types` ¶

`AnnotateEntitiesOutput = List[Dict[str, List[AnnotateEntitiesEntry]]]` `module-attribute` ¶

`AnnotatePropertiesOutput = List[Dict]` `module-attribute` ¶

`AnnotateRelationshipsOutput = List[Dict[str, AnnotateRelationshipsEntry]]` `module-attribute` ¶

`NLPCtrlPredOutput = Union[NLPEntsCtrlPredOuput, NLPRelsCtrlPredOutput, NLPPropsCtrlPredOutput]` `module-attribute` ¶

`NLPReqSpec = Union[NLPEntitiesReqSpec, NLPRelationshipsReqSpec, NLPPropertiesReqSpec]` `module-attribute` ¶

`AnnotateEntitiesEntry` ¶

Bases: StrictModel

match instance-attribute ¶

original instance-attribute ¶

range instance-attribute ¶

type instance-attribute ¶

`AnnotateRelationshipsEntry` ¶

Bases: StrictModel

data instance-attribute ¶

header instance-attribute ¶

`AnnotationLabels` ¶

Bases: StrictModel

entities instance-attribute ¶

properties instance-attribute ¶

relationships instance-attribute ¶

`EntityLabel` ¶

Bases: StrictModel

description instance-attribute ¶

key instance-attribute ¶

`FindEntitiesText` ¶

Bases: StrictModel

entityNames = None class-attribute instance-attribute ¶

objectType instance-attribute ¶

texts instance-attribute ¶

`FindPropertiesText` ¶

Bases: StrictModel

entities = None class-attribute instance-attribute ¶

objectType instance-attribute ¶

propertyNames = None class-attribute instance-attribute ¶

texts instance-attribute ¶

`FindRelationshipsText` ¶

Bases: StrictModel

entities instance-attribute ¶

objectType instance-attribute ¶

relationshipNames = None class-attribute instance-attribute ¶

texts instance-attribute ¶

`NLPAppPredInput` ¶

Bases: BaseAppPredInput

kind instance-attribute ¶

spec instance-attribute ¶

`NLPConfig` ¶

Bases: BaseModelConfig

kind instance-attribute ¶

labels instance-attribute ¶

supported_types instance-attribute ¶

`NLPEntitiesReqSpec` ¶

Bases: StrictModel

findEntities instance-attribute ¶

`NLPEntsCtrlPredOuput` ¶

Bases: StrictModel

entities instance-attribute ¶

`NLPInfoOutput` ¶

Bases: CtrlInfoOutput

definitions instance-attribute ¶

`NLPInfoOutputDefinitions` ¶

Bases: CtrlInfoOutputDefs

kind instance-attribute ¶

spec instance-attribute ¶

`NLPInfoOutputDefinitionsSpec` ¶

Bases: ModelInfoOutputDefsSpec

metadata instance-attribute ¶

`NLPModelMetadata` ¶

Bases: BaseModelMetadata

supported_object_types instance-attribute ¶

`NLPPropertiesReqSpec` ¶

Bases: StrictModel

findProperties instance-attribute ¶

`NLPPropsCtrlPredOutput` ¶

Bases: StrictModel

properties instance-attribute ¶

`NLPRelationshipsReqSpec` ¶

Bases: StrictModel

findRelationships instance-attribute ¶

`NLPRelsCtrlPredOutput` ¶

Bases: StrictModel

relationships instance-attribute ¶

`NLPType` ¶

Bases: str, Enum

text = 'text' class-attribute instance-attribute ¶

`PropertyLabel` ¶

Bases: StrictModel

description instance-attribute ¶

key instance-attribute ¶

`RelationshipColumn` ¶

Bases: StrictModel

entities instance-attribute ¶

key instance-attribute ¶

`RelationshipLabel` ¶

Bases: StrictModel

columns instance-attribute ¶

description instance-attribute ¶

key instance-attribute ¶

`qagen` ¶

`controller` ¶

`QAGenController` ¶

Bases: BaseController

__init__(model) ¶

dispatch_predict(spec) ¶

get_info() ¶

get_kind() ¶

`model` ¶

`BaseQAGenerator` ¶

Bases: BaseDSModel

generate_answers(texts, extras) abstractmethod ¶

get_config() ¶

get_qagen_config() abstractmethod ¶

`types` ¶

`GenerateAnswersOutput = List[GenerateAnswersOutEntry]` `module-attribute` ¶

`ContextEntry` ¶

Bases: StrictModel

representation_type instance-attribute ¶

text instance-attribute ¶

type instance-attribute ¶

`GenerateAnswers` ¶

Bases: StrictModel

contexts instance-attribute ¶

extras = None class-attribute instance-attribute ¶

questions instance-attribute ¶

check_lengths_match(values) ¶

`GenerateAnswersOutEntry` ¶

Bases: StrictModel

answer instance-attribute ¶

metadata instance-attribute ¶

`QAGenAppPredInput` ¶

Bases: BaseAppPredInput

kind instance-attribute ¶

spec instance-attribute ¶

`QAGenConfig` ¶

Bases: BaseModelConfig

kind instance-attribute ¶

`QAGenCtrlPredOutput` ¶

Bases: StrictModel

answers instance-attribute ¶

`QAGenInfoOutput` ¶

Bases: StrictModel

definitions instance-attribute ¶

`QAGenInfoOutputDefinitions` ¶

Bases: CtrlInfoOutputDefs

kind instance-attribute ¶

`QAGenReqSpec` ¶

Bases: StrictModel

generateAnswers instance-attribute ¶

`server` ¶

`config` ¶

`Settings` ¶

Bases: BaseSettings

`api_key` `instance-attribute` ¶

`model_config = SettingsConfigDict(env_prefix='DS_MODEL_')` `class-attribute` `instance-attribute` ¶

`controller_factory` ¶

`ControllerFactory` ¶

`create_controller(model)` ¶

`inference_types` ¶

`AppModelInfoOutput = Union[NLPInfoOutput, QAGenInfoOutput]` `module-attribute` ¶

`AppPredInput = Union[NLPAppPredInput, QAGenAppPredInput]` `module-attribute` ¶

`CtrlPredInput = Union[NLPReqSpec, QAGenReqSpec]` `module-attribute` ¶

`CtrlPredOutput = Union[NLPCtrlPredOutput, QAGenCtrlPredOutput]` `module-attribute` ¶

`model_app` ¶

`logger = logging.getLogger('cps-fastapi')` `module-attribute` ¶

`ModelApp` ¶

`app = FastAPI()` `instance-attribute` ¶

`init(settings)` ¶

`register_model(model, name=None, controller=None)` ¶

Registers a model with the app.

Parameters:

Name	Type	Description	Default
`model`	`BaseDSModel`	the model to register.	required
`name`	`Optional[str]`	an optional name under which to register the model; if not set, the model's default name is used.	`None`
`controller`	`Optional[BaseController]`	an optional custom controller to use; if not set, the default controller for the kind is used.	`None`

API Reference¶

artifacts ¶

artifact_manager ¶

ARTF_META_FILENAME = os.getenv('DEEPSEARCH_ARTIFACT_META_FILENAME', default='meta.info') module-attribute ¶

ARTF_META_URL_FIELD = os.getenv('DEEPSEARCH_ARTIFACT_URL_FIELD', default='static_url') module-attribute ¶

DFLT_ARTFCT_CACHE_DIR = os.getenv('DEEPSEARCH_ARTIFACT_CACHE', default=Path(platformdirs.user_cache_dir('deepsearch', 'ibm')) / 'artifact_cache') module-attribute ¶

DFLT_ARTFCT_INDEX_DIR = os.getenv('DEEPSEARCH_ARTIFACT_INDEX', default=os.getcwd()) module-attribute ¶

ArtifactManager ¶

HitStrategy ¶

OVERWRITE = 'overwrite' class-attribute instance-attribute ¶

PASS = 'pass' class-attribute instance-attribute ¶

RAISE = 'raise' class-attribute instance-attribute ¶

__init__(index=None, cache=None) ¶

download_artifact_to_cache(artifact_name, unpack_archives=True, hit_strategy=HitStrategy.OVERWRITE, with_progress_bar=False) ¶

get_artifact_path_in_cache(artifact_name) ¶

get_artifacts_in_cache() ¶

get_artifacts_in_index() ¶

get_cache_path() ¶

get_index_path() ¶

chemistry ¶

models ¶

ChemistryCompound ¶

display_name instance-attribute ¶

inchi instance-attribute ¶

inchikey instance-attribute ¶

smiles instance-attribute ¶

sum_formula instance-attribute ¶

ChemistryDocument ¶

application_id instance-attribute ¶

publication_id instance-attribute ¶

title instance-attribute ¶

ChemistryModel ¶

id instance-attribute ¶

persistent_id instance-attribute ¶

queries ¶

ChemistryCompound ¶

display_name instance-attribute ¶

inchi instance-attribute ¶

inchikey instance-attribute ¶

smiles instance-attribute ¶

sum_formula instance-attribute ¶

ChemistryDocument ¶

application_id instance-attribute ¶

publication_id instance-attribute ¶

title instance-attribute ¶

ChemistryQuery ¶

CompoundsByIds ¶

inchikeys = [] class-attribute instance-attribute ¶

persistent_ids = [] class-attribute instance-attribute ¶

CompoundsBySimilarity ¶

structure instance-attribute ¶

threshold = 0.9 class-attribute instance-attribute ¶

CompoundsBySmarts ¶

structure instance-attribute ¶

CompoundsBySmiles ¶

structure instance-attribute ¶

CompoundsBySubstructure ¶

structure instance-attribute ¶

CompoundsIn ¶

documents instance-attribute ¶

CompoundsQuery ¶

DocumentsByIds ¶

application_ids = [] class-attribute instance-attribute ¶

persistent_ids = [] class-attribute instance-attribute ¶

publication_ids = [] class-attribute instance-attribute ¶

DocumentsHaving ¶

compounds instance-attribute ¶

DocumentsQuery ¶

KnowledgeDbResource ¶

to_resource() ¶

Query ¶

paginated_task = None instance-attribute ¶

tasks = [] instance-attribute ¶

variables = {} instance-attribute ¶

__init__() ¶

add(kind_or_task, *, task_id=None, parameters=None, inputs=None, coordinates=None) ¶

parse(value) classmethod ¶

to_flow() ¶

query_chemistry(api, query, offset=0, limit=10) ¶

api ¶

`artifacts` ¶

`artifact_manager` ¶

`ARTF_META_FILENAME = os.getenv('DEEPSEARCH_ARTIFACT_META_FILENAME', default='meta.info')` `module-attribute` ¶

`ARTF_META_URL_FIELD = os.getenv('DEEPSEARCH_ARTIFACT_URL_FIELD', default='static_url')` `module-attribute` ¶

`DFLT_ARTFCT_CACHE_DIR = os.getenv('DEEPSEARCH_ARTIFACT_CACHE', default=Path(platformdirs.user_cache_dir('deepsearch', 'ibm')) / 'artifact_cache')` `module-attribute` ¶

`DFLT_ARTFCT_INDEX_DIR = os.getenv('DEEPSEARCH_ARTIFACT_INDEX', default=os.getcwd())` `module-attribute` ¶

`ArtifactManager` ¶

`HitStrategy` ¶

`OVERWRITE = 'overwrite'` `class-attribute` `instance-attribute` ¶

`PASS = 'pass'` `class-attribute` `instance-attribute` ¶

`RAISE = 'raise'` `class-attribute` `instance-attribute` ¶

`init(index=None, cache=None)` ¶

`download_artifact_to_cache(artifact_name, unpack_archives=True, hit_strategy=HitStrategy.OVERWRITE, with_progress_bar=False)` ¶

`get_artifact_path_in_cache(artifact_name)` ¶

`get_artifacts_in_cache()` ¶

`get_artifacts_in_index()` ¶

`get_cache_path()` ¶

`get_index_path()` ¶

`chemistry` ¶

`models` ¶

`ChemistryCompound` ¶

`display_name` `instance-attribute` ¶

`inchi` `instance-attribute` ¶

`inchikey` `instance-attribute` ¶

`smiles` `instance-attribute` ¶

`sum_formula` `instance-attribute` ¶

`ChemistryDocument` ¶

`application_id` `instance-attribute` ¶

`publication_id` `instance-attribute` ¶

`title` `instance-attribute` ¶

`ChemistryModel` ¶

`id` `instance-attribute` ¶

`persistent_id` `instance-attribute` ¶

`queries` ¶

`ChemistryCompound` ¶

`display_name` `instance-attribute` ¶

`inchi` `instance-attribute` ¶

`inchikey` `instance-attribute` ¶

`smiles` `instance-attribute` ¶

`sum_formula` `instance-attribute` ¶

`ChemistryDocument` ¶

`application_id` `instance-attribute` ¶

`publication_id` `instance-attribute` ¶

`title` `instance-attribute` ¶

`ChemistryQuery` ¶

`CompoundsByIds` ¶

`inchikeys = []` `class-attribute` `instance-attribute` ¶

`persistent_ids = []` `class-attribute` `instance-attribute` ¶

`CompoundsBySimilarity` ¶

`structure` `instance-attribute` ¶

`threshold = 0.9` `class-attribute` `instance-attribute` ¶

`CompoundsBySmarts` ¶

`structure` `instance-attribute` ¶

`CompoundsBySmiles` ¶

`structure` `instance-attribute` ¶

`CompoundsBySubstructure` ¶

`structure` `instance-attribute` ¶

`CompoundsIn` ¶

`documents` `instance-attribute` ¶

`CompoundsQuery` ¶

`DocumentsByIds` ¶

`application_ids = []` `class-attribute` `instance-attribute` ¶

`persistent_ids = []` `class-attribute` `instance-attribute` ¶

`publication_ids = []` `class-attribute` `instance-attribute` ¶

`DocumentsHaving` ¶

`compounds` `instance-attribute` ¶

`DocumentsQuery` ¶

`KnowledgeDbResource` ¶

`to_resource()` ¶

`Query` ¶

`paginated_task = None` `instance-attribute` ¶

`tasks = []` `instance-attribute` ¶

`variables = {}` `instance-attribute` ¶

`init()` ¶

`add(kind_or_task, *, task_id=None, parameters=None, inputs=None, coordinates=None)` ¶

`parse(value)` `classmethod` ¶

`to_flow()` ¶

`query_chemistry(api, query, offset=0, limit=10)` ¶

`api` ¶

`CpsApi` ¶