Skip to content

API Reference

DeepSearch Toolkit

artifacts

artifact_manager

ARTF_META_FILENAME = os.getenv('DEEPSEARCH_ARTIFACT_META_FILENAME', default='meta.info') module-attribute

ARTF_META_URL_FIELD = os.getenv('DEEPSEARCH_ARTIFACT_URL_FIELD', default='static_url') module-attribute

DFLT_ARTFCT_CACHE_DIR = os.getenv('DEEPSEARCH_ARTIFACT_CACHE', default=Path(platformdirs.user_cache_dir('deepsearch', 'ibm')) / 'artifact_cache') module-attribute

DFLT_ARTFCT_INDEX_DIR = os.getenv('DEEPSEARCH_ARTIFACT_INDEX', default=os.getcwd()) module-attribute

ArtifactManager

HitStrategy

Bases: str, Enum

OVERWRITE = 'overwrite' class-attribute instance-attribute
PASS = 'pass' class-attribute instance-attribute
RAISE = 'raise' class-attribute instance-attribute
__init__(index=None, cache=None)
download_artifact_to_cache(artifact_name, unpack_archives=True, hit_strategy=HitStrategy.OVERWRITE, with_progress_bar=False)
get_artifact_path_in_cache(artifact_name)
get_artifacts_in_cache()
get_artifacts_in_index()
get_cache_path()
get_index_path()

chemistry

models

ChemistryCompound

Bases: ChemistryModel

display_name instance-attribute

User friendly representation of compound.

inchi instance-attribute

InChI representation of compound structure.

inchikey instance-attribute

Hashed form of InChI.

smiles instance-attribute

SMILES representation of compound structure.

sum_formula instance-attribute

Sum formula of compound. For example 'C6 O2 H5'

ChemistryDocument

Bases: ChemistryModel

application_id instance-attribute

Identifier under which a patent application has been filed.

publication_id instance-attribute

Identifier under which a patent has been published.

title instance-attribute

(Readable) title of the document.

ChemistryModel

Bases: BaseModel

id instance-attribute

Transient identifier for short term use.

persistent_id instance-attribute

Identifier for long term (storage) use.

queries

ChemistryCompound

Bases: ChemistryModel

display_name instance-attribute

User friendly representation of compound.

inchi instance-attribute

InChI representation of compound structure.

inchikey instance-attribute

Hashed form of InChI.

smiles instance-attribute

SMILES representation of compound structure.

sum_formula instance-attribute

Sum formula of compound. For example 'C6 O2 H5'

ChemistryDocument

Bases: ChemistryModel

application_id instance-attribute

Identifier under which a patent application has been filed.

publication_id instance-attribute

Identifier under which a patent has been published.

title instance-attribute

(Readable) title of the document.

ChemistryQuery

Bases: BaseModel, ABC

CompoundsByIds

Bases: CompoundsQuery

Query compounds that have any of the given identifiers.

inchikeys = [] class-attribute instance-attribute
persistent_ids = [] class-attribute instance-attribute

CompoundsBySimilarity

Bases: CompoundsQuery

Query compounds that are similar to the given SMILES code.

structure instance-attribute
threshold = 0.9 class-attribute instance-attribute

CompoundsBySmarts

Bases: CompoundsQuery

Query compounds that (exactly) match the given SMARTS code.

structure instance-attribute

CompoundsBySmiles

Bases: CompoundsQuery

Query compounds that (exactly) match the given SMILES code.

structure instance-attribute

CompoundsBySubstructure

Bases: CompoundsQuery

Query compounds that contain a substructure with the given SMILES code.

structure instance-attribute

CompoundsIn

Bases: CompoundsQuery

Query compounds that occur in the given documents.

documents instance-attribute

CompoundsQuery

DocumentsByIds

Bases: DocumentsQuery

Query documents that have any of the given identifiers.

application_ids = [] class-attribute instance-attribute
persistent_ids = [] class-attribute instance-attribute
publication_ids = [] class-attribute instance-attribute

DocumentsHaving

Bases: DocumentsQuery

Query documents that contain compounds matching the given query.

compounds instance-attribute

DocumentsQuery

KnowledgeDbResource

to_resource()

Query

paginated_task = None instance-attribute
tasks = [] instance-attribute
variables = {} instance-attribute
__init__()
add(kind_or_task, *, task_id=None, parameters=None, inputs=None, coordinates=None)
add(kind_or_task: TTask) -> TTask
add(
    kind_or_task: str,
    *,
    task_id: Optional[str] = None,
    parameters: Optional[Dict[str, Any]] = None,
    inputs: Optional[TaskInputs] = None,
    coordinates: Optional[TaskCoordinates] = None
) -> Task
parse(value) classmethod
to_flow()

query_chemistry(api, query, offset=0, limit=10)

query_chemistry(
    api: api.CpsApi,
    query: CompoundsQuery,
    offset: int = 0,
    limit: int = 10,
) -> list[ChemistryCompound]
query_chemistry(
    api: api.CpsApi,
    query: DocumentsQuery,
    offset: int = 0,
    limit: int = 10,
) -> list[ChemistryDocument]

Perform a chemistry query on the knowledge base.

api

CpsApi
data_catalogs instance-attribute
data_indices instance-attribute
documents instance-attribute
elastic instance-attribute
knowledge_graphs instance-attribute
projects instance-attribute
queries instance-attribute
tasks instance-attribute
uploader instance-attribute
__init__(client)
from_env(profile_name=None) classmethod

Create an API object resolving the required settings from the environment if possible, otherwise from a stored profile.

Parameters:

Name Type Description Default
profile_name Optional[str]

profile to use if resolution from environment not possible. Defaults to None (active profile).

None

Returns:

Name Type Description
CpsApi CpsApi

the created API object

from_settings(settings) classmethod

Create an API object from the provided settings.

Parameters:

Name Type Description Default
settings ProfileSettings

the settings to use.

required

Returns:

Name Type Description
CpsApi CpsApi

the created API object

refresh_token(admin=False)

Refresh access token

Parameters:

Name Type Description Default
admin bool

controls whether an admin token should be requested. Defaults to False.

False

Raises:

Type Description
RuntimeError

raised in case API Key or User is invalid

CpsApiClient
bearer_token_auth = DeepSearchBearerTokenAuth(bearer_token=self._authenticate_with_api_key(self.config.host, self.config.auth.username, self.config.auth.api_key)) instance-attribute
config = config instance-attribute
session = requests.Session() instance-attribute
__init__(config)

molecules

CHEMVECDB_COLLECTIONS = {MolQueryType.SIMILARITY: 'patcid_tanimoto', MolQueryType.SUBSTRUCTURE: 'patcid_substructure'} module-attribute
MolId

Bases: BaseModel

type instance-attribute
value instance-attribute
MolIdType

Bases: str, Enum

INCHI = 'inchi' class-attribute instance-attribute
INCHIKEY = 'inchikey' class-attribute instance-attribute
SMARTS = 'smarts' class-attribute instance-attribute
SMILES = 'smiles' class-attribute instance-attribute
MolQueryLang

Bases: str, Enum

SMARTS = 'smarts' class-attribute instance-attribute
SMILES = 'smiles' class-attribute instance-attribute
MolQueryType

Bases: str, Enum

SIMILARITY = 'similarity' class-attribute instance-attribute
SUBSTRUCTURE = 'substructure' class-attribute instance-attribute
MoleculeQuery(query, query_type, query_lang=MolQueryLang.SMILES, num_items=10)

Use the knowledge database in Deep Search for querying molecules by substructure or similarity. The result is contained in the molecules output of the response.

MoleculesInPatentsQuery(patents, num_items=10, partial_lookup=False)

List all molecules contained in a list of patents. The result is contained in the molecules output of the response.

PatentsWithMoleculesQuery(molecules, num_items=10)

List all patents containing any of the input molecules. The result is contained in the patents output of the response.

queries

ChemistryQuery

Bases: BaseModel, ABC

CompoundsByIds

Bases: CompoundsQuery

Query compounds that have any of the given identifiers.

inchikeys = [] class-attribute instance-attribute
persistent_ids = [] class-attribute instance-attribute
CompoundsBySimilarity

Bases: CompoundsQuery

Query compounds that are similar to the given SMILES code.

structure instance-attribute
threshold = 0.9 class-attribute instance-attribute
CompoundsBySmarts

Bases: CompoundsQuery

Query compounds that (exactly) match the given SMARTS code.

structure instance-attribute
CompoundsBySmiles

Bases: CompoundsQuery

Query compounds that (exactly) match the given SMILES code.

structure instance-attribute
CompoundsBySubstructure

Bases: CompoundsQuery

Query compounds that contain a substructure with the given SMILES code.

structure instance-attribute
CompoundsIn

Bases: CompoundsQuery

Query compounds that occur in the given documents.

documents instance-attribute
CompoundsQuery
DocumentsByIds

Bases: DocumentsQuery

Query documents that have any of the given identifiers.

application_ids = [] class-attribute instance-attribute
persistent_ids = [] class-attribute instance-attribute
publication_ids = [] class-attribute instance-attribute
DocumentsHaving

Bases: DocumentsQuery

Query documents that contain compounds matching the given query.

compounds instance-attribute
DocumentsQuery
query_chemistry(api, query, offset=0, limit=10)
query_chemistry(
    api: api.CpsApi,
    query: CompoundsQuery,
    offset: int = 0,
    limit: int = 10,
) -> list[ChemistryCompound]
query_chemistry(
    api: api.CpsApi,
    query: DocumentsQuery,
    offset: int = 0,
    limit: int = 10,
) -> list[ChemistryDocument]

Perform a chemistry query on the knowledge base.

resources

ChemVecDbResource

to_resource()

KnowledgeDbResource

to_resource()

core

DeepSearchBearerTokenAuth

Bases: BaseModel

bearer_token instance-attribute

DeepSearchConfig

Bases: BaseModel

auth instance-attribute

host instance-attribute

verify_ssl = True class-attribute instance-attribute

DeepSearchKeyAuth

Bases: BaseModel

api_key instance-attribute

username instance-attribute

util

cps

__all__ = ['CpsApi', 'CpsApiClient'] module-attribute

CpsApi

data_catalogs instance-attribute

data_indices instance-attribute

documents instance-attribute

elastic instance-attribute

knowledge_graphs instance-attribute

projects instance-attribute

queries instance-attribute

tasks instance-attribute

uploader instance-attribute

__init__(client)

from_env(profile_name=None) classmethod

Create an API object resolving the required settings from the environment if possible, otherwise from a stored profile.

Parameters:

Name Type Description Default
profile_name Optional[str]

profile to use if resolution from environment not possible. Defaults to None (active profile).

None

Returns:

Name Type Description
CpsApi CpsApi

the created API object

from_settings(settings) classmethod

Create an API object from the provided settings.

Parameters:

Name Type Description Default
settings ProfileSettings

the settings to use.

required

Returns:

Name Type Description
CpsApi CpsApi

the created API object

refresh_token(admin=False)

Refresh access token

Parameters:

Name Type Description Default
admin bool

controls whether an admin token should be requested. Defaults to False.

False

Raises:

Type Description
RuntimeError

raised in case API Key or User is invalid

CpsApiClient

bearer_token_auth = DeepSearchBearerTokenAuth(bearer_token=self._authenticate_with_api_key(self.config.host, self.config.auth.username, self.config.auth.api_key)) instance-attribute

config = config instance-attribute

session = requests.Session() instance-attribute

__init__(config)

data_indices

utils

logger = logging.getLogger(__name__) module-attribute
process_external_cos(api, coords, s3_coordinates, progress_bar=False)

Individual files are processed before upload.

process_local_file(api, coords, local_file, progress_bar=False, conv_settings=None, target_settings=None)

Individual files are uploaded for conversion and storage in data index.

process_url_input(api, coords, urls, url_chunk_size, progress_bar=False)

Individual urls are uploaded for conversion and storage in data index.

upload_files(api, coords, url=None, local_file=None, s3_coordinates=None, conv_settings=None, target_settings=None, url_chunk_size=1)

Orchestrate document conversion and upload to an index in a project

kg

workflow

MultiLinkedList
MultiLinkedList
head = node instance-attribute
tail = node instance-attribute
__eq__(other)
__init__(node=None)
__ne__(other)
append(value)
append_child(child=None)
flatten_list()
print_list()
Node
child = None instance-attribute
data = data instance-attribute
id = id or str(uuid4()) instance-attribute
next = None instance-attribute
prev = None instance-attribute
__init__(data, id=None)
wf_functions
run(wf, config)

Run the workflow against the given KG :param workflow: Workflow object :type workflow: Workflow :param config: Knowledge Graph API Configuration :type config: Configuration :returns workflow results

validate(wf, config)

Validate the workflow DAG :param workflow: Workflow object :type workflow: Workflow :param config: Knowledge Graph API Configuration :type config: Configuration

workflow
Workflow
__add__(workflow)
__and__(workflow)
__init__(starting_node=None)
__mul__(workflow)
__or__(workflow)
as_output(limit=None)

Set node type as output :param limit: Response limit :type limit: int

combine(*workflows)

Combine result :param *workflows: Nodes to combine :type *workflows: List['Workflow']

edge_traversal(edges=[], include=[])

Traverse edges :param edges: The edges to traverse :type edges: List[str] :param include: Include nodes in operation :type include: List['Workflow']

filter(filter_type='cut-off', field_operation='==', field_value='', include=[])

Filter values :param filter_type: Filter type. Possible values "cut-off", "field-value" :type filter_type: str :param field_operation: The field operation to use if filter type is "field-value". Possible values "<", "==", ">" :type field_operation: str :param field_value: The field value to filter by :type field_value: str :param include: Include nodes in operation :type include: List['Workflow']

filter_categories(*categories, include=[])

Filter node type by category :param categories: the categories to filter :type categories: List[str] :param include: Include nodes in operation :type include: List['Workflow']

get_operations()

Return workflow operations

intersect(*workflows)

Intersect result :param *workflows: Nodes to intersect :type *workflows: List['Workflow']

matrix_function(matrix_function='abs', include=[])

Run result through matrix function :param matrix_function: Scalar function to use. Possible values "e^A", "cosh", "sinh" :type matrix_function: str :param include: Include nodes in operation :type include: List['Workflow']

multiply(*workflows)

Multiply result :param *workflows: Nodes to multiply :type *workflows: List['Workflow']

negate(*workflows)

Negate result :param *workflows: Nodes to negate :type *workflows: List['Workflow']

normalize(normalize_type='RENORMALIZE_L2', include=[])

Normalize result :param normalize_type: Normalize type to use. Possible values "RENORMALIZE_L1", "RENORMALIZE_L2", "RENORMALIZE_LINF" :type normalize_type: str :param include: Include nodes in operation :type include: List['Workflow']

pearson_traversal(edges=[], include=[])

Traverse edges using pearson traversal :param edges: The edges to traverse :type edges: List[str] :param include: Include nodes in operation :type include: List['Workflow']

scalar_function(scalar_function='abs', include=[])

Run result through scalar function :param scalar_function: Scalar function to use. Possible values "uniform", "abs", "inv", "sigmoid", "softmax" :type scalar_function: str :param include: Include nodes in operation :type include: List['Workflow']

search_nodes_by_approximation(*args, tolerance=0.8, include=[])

Search nodes where the arguments are approximate :param *args: the search arguments :type *args: List[str] :param tolerance: the tolerance :type tolerance: float :param include: Include nodes in operation :type include: List['Workflow']

search_nodes_by_db_id_pair(*args, include=[])

Search nodes that contain the db\id pair :param *args: the db\id pairs in format {"_db": "db value", "_id": "id value"} :type *args: List[str] :param include: Include nodes in operation :type include: List['Workflow']

search_nodes_by_index(indices=[], weights=[], include=[])

Search nodes by index :param indices: the indices to search :type indices: str :param weights: the weight to search :type weights: float :param include: Include nodes in operation :type include: List['Workflow']

search_nodes_by_regex(*args, include=[])

Search nodes by regex that match args :param *args: the search arguments :type *args: List[str] :param include: Include nodes in operation :type include: List['Workflow']

search_nodes_containing(*args, include=[])

Search nodes that contain the args :param *args: the search arguments :type *args: List[str] :param include: Include nodes in operation :type include: List['Workflow']

search_nodes_equal(*args, include=[])

Search nodes that equal the args :param *args: the search arguments :type *args: List[str] :param include: Include nodes in operation :type include: List['Workflow']

search_nodes_in_category(*categories, include=[])

Search nodes in categories :param categories: the categories to search :type categories: List[str] :param include: Include nodes in operation :type include: List['Workflow']

set_to_field_value(field_name='', include=[])

Set node to field value :param field_name: The field name :type field_name: str :param include: Include nodes in operation :type include: List['Workflow']

split(times=1)

Add children to node :param times: Number of children to add :type times: int :returns node childs

sum(*workflows)

Sum result :param *workflows: Nodes to sum :type *workflows: List['Workflow']

to_json(indent=2)

Return workflow as json string :param indent: result indentation :type indent: int

queries

ConstrainedWeight = Annotated[float, Field(strict=True, ge=0.0, le=1.0, multiple_of=0.1)] module-attribute

DataQuery(search_query, *, source=None, aggregations=None, highlight=None, sort=None, limit=20, search_after=None, coordinates)

Fts(search_query, collection_name, kg)

RAGQuery(question, *, project, data_source, retr_k=10, rerank=False, text_weight=0.1, model_id=None, prompt_template=None, gen_params=None, gen_ctx_extr_method='window', gen_ctx_window_size=5000, gen_ctx_window_lead_weight=0.5, return_prompt=False, chunk_refs=None, gen_timeout=None)

Create a RAG query

Parameters:

Name Type Description Default
question str

the natural-language query

required
project Union[str, Project]

project to use

required
data_source DataSource

the data source to query

required
retr_k int

num of items to retrieve; defaults to 10

10
rerank bool

whether to rerank retrieval results; defaults to False

False
text_weight ConstrainedWeight

lexical weight for hybrid search; allowed values: {0.0, 0.1, 0.2, ..., 1.0}; defaults to 0.1

0.1
model_id str

the LLM to use for generation; defaults to None, i.e. determined by system

None
prompt_template str

the prompt template to use; defaults to None, i.e. determined by system

None
gen_params dict

the generation params to send to the Gen AI platforms; defaults to None, i.e. determined by system

None
gen_ctx_extr_method Literal['window', 'page']

method for gen context extraction from document; defaults to "window"

'window'
gen_ctx_window_size int

(relevant only if gen_ctx_extr_method=="window") max chars to use for extracted gen context (actual extraction quantized on doc item level); defaults to 5000

5000
gen_ctx_window_lead_weight float

(relevant only if gen_ctx_extr_method=="window") weight of leading text for distributing remaining window size after extracting the main_path; defaults to 0.5 (centered around main_path)

0.5
return_prompt bool

whether to return the instantiated prompt; defaults to False

False
chunk_refs Optional[List[ChunkRef]]

list of explicit chunk references to use instead of performing retrieval; defaults to None (i.e. retrieval-mode)

None
gen_timeout float

timeout for LLM generation; defaults to None, i.e. determined by system

None

SemanticQuery(question, *, project, data_source, retr_k=10, rerank=False, text_weight=0.1)

Create a semantic retrieval query

Parameters:

Name Type Description Default
question str

the natural-language query

required
document_hash str

hash of target document

required
project Union[str, Project]

project to use

required
data_source DataSource

the data source to query

required
retr_k int

num of items to retrieve; defaults to 10

10
rerank bool

whether to rerank retrieval results; defaults to False

False
text_weight ConstrainedWeight

lexical weight for hybrid search; allowed values: {0.0, 0.1, 0.2, ..., 1.0}; defaults to 0.1

0.1

Wf(wf_query, kg)

results

ChunkRef

Bases: BaseModel

doc_hash instance-attribute
main_path instance-attribute
path_group instance-attribute
GenerationError

Bases: SemanticError

__init__(msg='', *args, **kwargs)
NoSearchResultsError

Bases: SemanticError

__init__(msg='Search returned no results', *args, **kwargs)
RAGAnswerItem

Bases: BaseModel

answer instance-attribute
grounding instance-attribute
prompt = None class-attribute instance-attribute
RAGGroundingInfo

Bases: BaseModel

gen_ctx_paths instance-attribute
retr_items = None class-attribute instance-attribute
RAGResult

Bases: BaseModel

answers instance-attribute
search_result_items = None class-attribute instance-attribute
from_api_output(data, raise_on_error=True) classmethod
SearchResult

Bases: BaseModel

search_result_items instance-attribute
from_api_output(data, raise_on_error=True) classmethod
SearchResultItem

Bases: ChunkRef

chunk instance-attribute
source_is_text instance-attribute
SemanticError

Bases: Exception

documents

core

common_routines

ERROR_MSG = f'{dashes}Suggestion:(1) Check your input.(2) Contact Deep Search developers if problem persists.{dashes}' module-attribute
WELCOME = f'{dashes}{''}Welcome to the Deep Search Toolkit{dashes}' module-attribute
dashes = f'{'-' * 86}' module-attribute
progressbar = ProgressBarParameters() module-attribute
progressbar_length = 30 module-attribute
ProgressBarParameters dataclass
bar_format = '{l_bar}{bar:%d}{r_bar}{bar:-10b}' % progressbar_length class-attribute instance-attribute
colour = '#0f62fe' class-attribute instance-attribute
padding = 22 class-attribute instance-attribute
__init__()

convert

TASK_STOP_STATUS = ['SUCCESS', 'FAILURE'] module-attribute
logger = logging.getLogger(__name__) module-attribute
check_cps_single_task_status(sw_api, cps_proj_key, task_id, wait=2)

Check cps status of individual tasks.

check_cps_status_running_tasks(api, cps_proj_key, task_ids, progress_bar=False)

Check status of multiple running cps tasks and optionally display progress with progress bar.

download_converted_documents(result_dir, download_urls, progress_bar=False)

Download converted documents.

Input
path

directory for saving converted json doc

get_wait_task_result(sw_api, cps_proj_key, task_id, wait=2)

Wait and get task result that.

make_payload(url, conversion_settings)

Create payload for requesting conversion

send_file_for_conversion(api, cps_proj_key, source_path, conversion_settings, progress_bar=False)

Send file for conversion.

send_url_for_conversion(api, cps_proj_key, url, conversion_settings, progress_bar=False)

Send online document for conversion.

submit_conversion_payload(api, cps_proj_key, url, conversion_settings)

Convert an online pdf using DeepSearch Technology.

create_report

logger = logging.getLogger(__name__) module-attribute
generate_report_csv(task_result, task_id, result_dir, progress_bar=False)

Generate report of a document conversion task id and saves in a csv file

export

JsonToHTML
__init__()
clean(data, escape=True)
enum_has_ids(enums)
execute(data)
get_body_new(data)
get_page(item)
get_refs(ref)
get_style(item)
get_tablecell_span(cell, ix)
get_title(data)
make_bbox(page, bbox_rect)
make_bbox_dict(page, bbox_rect)
split_item_in_boxes(item)
template()
write_enum(item)
write_table(item)
write_table_simple(item)
export_to_html(document)
export_to_markdown(document)

input_process

process_local_input(api, cps_proj_key, source_path, conversion_settings, progress_bar=False, export_md=False)

Classify the user provided local input and take appropriate action.

process_url_input(api, cps_proj_key, url, conversion_settings, progress_bar=False, export_md=False)

Classify user provided url(s) and take appropriate action.

lookup

EntitiesLookup
document = document instance-attribute
__init__(document)
get(*, entity_type, entity)

Lookup where a given entity is mentioned in a document.

main

convert_documents(proj_key, api, url=None, source_path=None, conversion_settings=None, progress_bar=False, export_md=False)

Document conversion via Deep Search Technology. Function to orchestrate document conversion.

Inputs

proj_key : string [REQUIRED] Your DeepSearch CPS Project Key. Contact DeepSearch Developers to request one.

url : string [OPTIONAL] For converting documents from the web, please provide a single url.

source_file : path [OPTIONAL] For converting local files, please provide absolute path to file or to directory containing multiple files.

progress_bar : Boolean (default is False in code, True in CLI) Show progress bar for processing, submitting, converting input and downloading converted document.

NOTE: Either url or source_path should be supplied.

models

ExportTarget = Union[ZipTarget, MongoS3Target, ElasticS3Target, COSTarget] module-attribute
COSTarget

Bases: BaseModel

add_annotations = False class-attribute instance-attribute
add_raw_pages = False class-attribute instance-attribute
coordinates instance-attribute
type = 'cos' class-attribute instance-attribute
ConversionSettings

Bases: BaseModel

ocr = OCROptions() class-attribute instance-attribute
table_structure = TableStructureOptions() class-attribute instance-attribute
DocumentExistsInTargetAction

Bases: str, Enum

What to do if the document already exists on the target. - replace will replace the document, destroying any external modifications. - skip will not touch the document on the target, leaving it as-is. Using skip will incur in a performance increase, however, if the document is modified externally, CCS will not update it back to the original state.

REPLACE = 'replace' class-attribute instance-attribute
SKIP = 'skip' class-attribute instance-attribute
ElasticIndexCoordinates

Bases: BaseModel

ca_certificate_base64 = None class-attribute instance-attribute
dangerously_disable_ssl_validation = False class-attribute instance-attribute
hosts instance-attribute
index instance-attribute
ElasticS3Target

Bases: BaseModel

add_annotations = False class-attribute instance-attribute
add_cells = False class-attribute instance-attribute
add_raw_pages = False class-attribute instance-attribute
coordinates instance-attribute
escape_ref_fields = Field(default=True, description='If true, `$ref` fields are renamed to `__ref`. This allows the data to then be written into a MongoDB collection.') class-attribute instance-attribute
if_document_exists = DocumentExistsInTargetAction.REPLACE class-attribute instance-attribute
type = 'elastic_s3' class-attribute instance-attribute
ElasticS3TargetCoordinates

Bases: BaseModel

elastic instance-attribute
s3 = None class-attribute instance-attribute
MongoCollectionCoordinates

Bases: BaseModel

collection instance-attribute
database instance-attribute
uri instance-attribute
MongoS3Target

Bases: BaseModel

coordinates instance-attribute
if_document_exists = DocumentExistsInTargetAction.REPLACE class-attribute instance-attribute
type = 'mongo_s3' class-attribute instance-attribute
MongoS3TargetCoordinates

Bases: BaseModel

Coordinates to a Mongo collection, and optionally, an S3 bucket

mongo instance-attribute
s3 = None class-attribute instance-attribute
OCROptions

Bases: BaseModel

do_ocr = True class-attribute instance-attribute
kind = 'easyocr' class-attribute instance-attribute
S3Coordinates

Bases: BaseModel

access_key instance-attribute
bucket instance-attribute
external_endpoint = None class-attribute instance-attribute
host instance-attribute
key_infix_format = Field('', description=dedent('\n Control the infix of the object keys that are saved on the document\'s `_s3_data`, after `key_prefix`,\n and before `PDFDocuments/{document_hash}.pdf` or `PDFPages/{page_hash}.pdf`.\n\n By default, the infix is empty.\n For using the name of the index in the coordinates, you can use `key_infix_format = "{index_name}"`.\n\n For example, if:\n\n ```\n key_prefix = "my_prefix/"\n key_infix_format = "{index_name}"\n index_name = "my_elastic_index"\n\n document_hash = "123"\n ```\n\n Then, the document above would be uploaded to: `my_prefix/my_elastic_index/PDFDocuments/123.pdf`.\n\n If one were to set `key_infix_format = ""`, it would be uploaded to `my_prefix/PDFDocuments/123.pdf`.\n\n If one were to set `key_infix_format = "foo"`, it would be uploaded to `my_prefix/foo/PDFDocuments/123.pdf`\n\n Finally, one can combine `{index_name}` with constants and even path separators.\n\n So, `{index_name}/test` would produce `my_prefix/my_elastic_index/test/PDFDocuments/123.pdf`\n ')) class-attribute instance-attribute
key_prefix = '' class-attribute instance-attribute
location instance-attribute
port instance-attribute
secret_key instance-attribute
ssl instance-attribute
verify_ssl instance-attribute
TableStructureOptions

Bases: BaseModel

do_table_structure = True class-attribute instance-attribute
table_structure_mode = 'fast' class-attribute instance-attribute
TargetSettings

Bases: BaseModel

add_annotations = None class-attribute instance-attribute
add_raw_pages = None class-attribute instance-attribute
check_raw_or_ann()
ZipPackageContentType

Bases: str, Enum

Specify the content type for the documents in the Zip file.

HTML = 'html' class-attribute instance-attribute
JSON = 'json' class-attribute instance-attribute
ZipTarget

Bases: BaseModel

Specify how the documents should be exported to a Zip file. If the [coordinates] are not specified, the project's coordinates will be used.

add_cells = False class-attribute instance-attribute
content_type = ZipPackageContentType.JSON class-attribute instance-attribute
type = 'zip' class-attribute instance-attribute

render

get_figure_svg(document, figure)

Generates a SVG which crops the figure from the image of the document page.

get_page_svg_with_item(document, item)

Generates a SVG which overlays the bounding-box of the item with the image of the page.

results

DocumentConversionResult

An instance of DocumentConversionResult is generated when document conversion is requested.

export_md = export_md instance-attribute
proj_key = proj_key instance-attribute
result = result instance-attribute
task_id = task_id instance-attribute
__init__(proj_key, task_id, result, api, source_path=None, source_url=None, batched_files=None, export_md=False)
download_all(result_dir, progress_bar=False)

Download all converted documents.

Input

result_dir : path local directory where converted documents will be saved progress_bar: boolean, optional (default = False) shows progress bar is True

generate_report(result_dir, progress_bar=False)

Saves a csv report file for detailed information about the document conversion job. Returns a dictionary object containing counts of files/urls converted.

utils

ALLOWED_FILE_EXTENSIONS = ['.pdf', '.jpg', '.jpeg', '.tiff', '.tif', '.png', '.gif'] module-attribute
batch_single_files(source_path, root_dir, progress_bar=False)

Batch individual input files into zip files.

Output bfiles: List[List[str]] outer list corresponds to each batch inner list corresponds to individual file in a batch

cleanup(root_dir)

Clean temporarily created zip batches.

create_root_dir()

Creates root directory labelled with timestamp

download_url(url, save_path, chunk_size=128)

Download contents from a url.

read_lines(file_path)

Returns list of lines from input file.

write_taskids(result_dir, list_to_write)

Write lines in result_dir

model

base

controller

BaseController

Bases: ABC

dispatch_predict(spec) abstractmethod
get_info() abstractmethod
get_kind() abstractmethod
get_model_exec_time()
get_model_kind()
get_model_name()

model

BaseDSModel

Bases: ABC

get_config() abstractmethod

types

Annotations

Bases: StrictModel

deepsearch_res_ibm_com_x_attempt_number = Field(..., alias='deepsearch.res.ibm.com/x-attempt-number') class-attribute instance-attribute
deepsearch_res_ibm_com_x_deadline = Field(..., alias='deepsearch.res.ibm.com/x-deadline') class-attribute instance-attribute
deepsearch_res_ibm_com_x_max_attempts = Field(..., alias='deepsearch.res.ibm.com/x-max-attempts') class-attribute instance-attribute
deepsearch_res_ibm_com_x_transaction_id = Field(..., alias='deepsearch.res.ibm.com/x-transaction-id') class-attribute instance-attribute
BaseAppPredInput

Bases: StrictModel

apiVersion instance-attribute
kind instance-attribute
metadata instance-attribute
spec instance-attribute
BaseModelConfig

Bases: BaseModelMetadata

kind instance-attribute
BaseModelMetadata

Bases: StrictModel

author = None class-attribute instance-attribute
description = None class-attribute instance-attribute
expected_compute_time = None class-attribute instance-attribute
name instance-attribute
url = None class-attribute instance-attribute
version instance-attribute
CtrlInfoOutput

Bases: BaseModel

definitions instance-attribute
CtrlInfoOutputDefs

Bases: BaseModel

apiVersion instance-attribute
kind instance-attribute
spec instance-attribute
Kind

Bases: str, Enum

NLPModel = 'NLPModel' class-attribute instance-attribute
QAGenModel = 'QAGenModel' class-attribute instance-attribute
Metadata

Bases: StrictModel

annotations instance-attribute
ModelInfoOutputDefsSpec

Bases: BaseModel

definition instance-attribute
metadata instance-attribute
StrictModel

Bases: BaseModel

examples

dummy_nlp_annotator

main
run()
model
DummyNLPAnnotator

Bases: BaseNLPModel

__init__()
annotate_batched_entities(object_type, items, entity_names)
annotate_batched_properties(object_type, items, entities, property_names)
annotate_batched_relationships(object_type, items, entities, relationship_names)
get_nlp_config()

dummy_qa_generator

main
run()
model
DummyQAGenerator

Bases: BaseQAGenerator

A dummy QA generator which answers a question with the question itself.

generate_answers(texts, extras)

Just answers with the question itself. Args: texts: a list of context, question pairs. extras: any extras to pass.

get_qagen_config()

simple_geo_nlp_annotator

entities
cities_annotator
CitiesAnnotator

Bases: DictionaryTextEntityAnnotator

__init__()
description()
key()
common
base_text_entity_annotator
BaseTextEntityAnnotator
annotate_entities_text(text) abstractmethod
description() abstractmethod
initialize()
key() abstractmethod
dictionary_text_entity_annotator
logger = logging.getLogger('cps-nlp') module-attribute
Config dataclass
dictionary_filename instance-attribute
__init__(dictionary_filename)
DictionaryTextEntityAnnotator

Bases: BaseTextEntityAnnotator

config = config instance-attribute
__init__(config)
annotate_entities_text(text)
initialize()
utils
resources_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '../../resources')) module-attribute
countries_annotator
CountriesAnnotator

Bases: DictionaryTextEntityAnnotator

__init__()
description()
key()
provincies_annotator
ProvinciesAnnotator

Bases: DictionaryTextEntityAnnotator

__init__()
description()
key()
main
run()
model
logger = logging.getLogger('cps-nlp') module-attribute
SimpleGeoNLPAnnotator

Bases: BaseNLPModel

entity_names = list(self._ent_annots.keys()) instance-attribute
property_names = [] instance-attribute
relationship_names = list(self._rel_annots.keys()) instance-attribute
__init__()
annotate_batched_entities(object_type, items, entity_names)
annotate_batched_properties(object_type, items, entities, property_names)
annotate_batched_relationships(object_type, items, entities, relationship_names)
get_nlp_config()
relationships
cities_to_countries_annotator
CitiesToCountriesAnnotator
cities_to_provincies_annotator
CitiesToProvinciesAnnotator
common
base_text_relationship_annotator
BaseTextRelationshipAnnotator
annotate_relationships_text(text, entity_map, relationship_name) abstractmethod
columns() abstractmethod
description() abstractmethod
key() abstractmethod
multi_entities_relationship_annotator
logger = logging.getLogger('cps-nlp') module-attribute
Config dataclass
entities instance-attribute
__init__(entities)
MultiEntitiesRelationshipAnnotator

Bases: BaseTextRelationshipAnnotator

Create a relationship if all entitiy types are in the given text input.

__init__(config)
annotate_relationships_text(text, entity_map, relationship_name)
columns()
description()
key()
provincies_to_countries_annotator
ProvinciesToCountriesAnnotator

kinds

nlp

controller
NLPController

Bases: BaseController

__init__(model)
dispatch_predict(spec)
get_info()
get_kind()
model
BaseNLPModel

Bases: BaseDSModel

annotate_batched_entities(object_type, items, entity_names) abstractmethod
annotate_batched_properties(object_type, items, entities, property_names) abstractmethod
annotate_batched_relationships(object_type, items, entities, relationship_names) abstractmethod
get_config()
get_nlp_config() abstractmethod
types
AnnotateEntitiesOutput = List[Dict[str, List[AnnotateEntitiesEntry]]] module-attribute
AnnotatePropertiesOutput = List[Dict] module-attribute
AnnotateRelationshipsOutput = List[Dict[str, AnnotateRelationshipsEntry]] module-attribute
NLPCtrlPredOutput = Union[NLPEntsCtrlPredOuput, NLPRelsCtrlPredOutput, NLPPropsCtrlPredOutput] module-attribute
NLPReqSpec = Union[NLPEntitiesReqSpec, NLPRelationshipsReqSpec, NLPPropertiesReqSpec] module-attribute
AnnotateEntitiesEntry

Bases: StrictModel

match instance-attribute
original instance-attribute
range instance-attribute
type instance-attribute
AnnotateRelationshipsEntry

Bases: StrictModel

data instance-attribute
header instance-attribute
AnnotationLabels

Bases: StrictModel

entities instance-attribute
properties instance-attribute
relationships instance-attribute
EntityLabel

Bases: StrictModel

description instance-attribute
key instance-attribute
FindEntitiesText

Bases: StrictModel

entityNames = None class-attribute instance-attribute
objectType instance-attribute
texts instance-attribute
FindPropertiesText

Bases: StrictModel

entities = None class-attribute instance-attribute
objectType instance-attribute
propertyNames = None class-attribute instance-attribute
texts instance-attribute
FindRelationshipsText

Bases: StrictModel

entities instance-attribute
objectType instance-attribute
relationshipNames = None class-attribute instance-attribute
texts instance-attribute
NLPAppPredInput

Bases: BaseAppPredInput

kind instance-attribute
spec instance-attribute
NLPConfig

Bases: BaseModelConfig

kind instance-attribute
labels instance-attribute
supported_types instance-attribute
NLPEntitiesReqSpec

Bases: StrictModel

findEntities instance-attribute
NLPEntsCtrlPredOuput

Bases: StrictModel

entities instance-attribute
NLPInfoOutput

Bases: CtrlInfoOutput

definitions instance-attribute
NLPInfoOutputDefinitions

Bases: CtrlInfoOutputDefs

kind instance-attribute
spec instance-attribute
NLPInfoOutputDefinitionsSpec

Bases: ModelInfoOutputDefsSpec

metadata instance-attribute
NLPModelMetadata

Bases: BaseModelMetadata

supported_object_types instance-attribute
NLPPropertiesReqSpec

Bases: StrictModel

findProperties instance-attribute
NLPPropsCtrlPredOutput

Bases: StrictModel

properties instance-attribute
NLPRelationshipsReqSpec

Bases: StrictModel

findRelationships instance-attribute
NLPRelsCtrlPredOutput

Bases: StrictModel

relationships instance-attribute
NLPType

Bases: str, Enum

text = 'text' class-attribute instance-attribute
PropertyLabel

Bases: StrictModel

description instance-attribute
key instance-attribute
RelationshipColumn

Bases: StrictModel

entities instance-attribute
key instance-attribute
RelationshipLabel

Bases: StrictModel

columns instance-attribute
description instance-attribute
key instance-attribute

qagen

controller
QAGenController

Bases: BaseController

__init__(model)
dispatch_predict(spec)
get_info()
get_kind()
model
BaseQAGenerator

Bases: BaseDSModel

generate_answers(texts, extras) abstractmethod
get_config()
get_qagen_config() abstractmethod
types
GenerateAnswersOutput = List[GenerateAnswersOutEntry] module-attribute
ContextEntry

Bases: StrictModel

representation_type instance-attribute
text instance-attribute
type instance-attribute
GenerateAnswers

Bases: StrictModel

contexts instance-attribute
extras = None class-attribute instance-attribute
questions instance-attribute
check_lengths_match(values)
GenerateAnswersOutEntry

Bases: StrictModel

answer instance-attribute
metadata instance-attribute
QAGenAppPredInput

Bases: BaseAppPredInput

kind instance-attribute
spec instance-attribute
QAGenConfig

Bases: BaseModelConfig

kind instance-attribute
QAGenCtrlPredOutput

Bases: StrictModel

answers instance-attribute
QAGenInfoOutput

Bases: StrictModel

definitions instance-attribute
QAGenInfoOutputDefinitions

Bases: CtrlInfoOutputDefs

kind instance-attribute
QAGenReqSpec

Bases: StrictModel

generateAnswers instance-attribute

server

config

Settings

Bases: BaseSettings

api_key instance-attribute
model_config = SettingsConfigDict(env_prefix='DS_MODEL_') class-attribute instance-attribute

controller_factory

ControllerFactory
create_controller(model)

inference_types

AppModelInfoOutput = Union[NLPInfoOutput, QAGenInfoOutput] module-attribute
AppPredInput = Union[NLPAppPredInput, QAGenAppPredInput] module-attribute
CtrlPredInput = Union[NLPReqSpec, QAGenReqSpec] module-attribute
CtrlPredOutput = Union[NLPCtrlPredOutput, QAGenCtrlPredOutput] module-attribute

model_app

logger = logging.getLogger('cps-fastapi') module-attribute
ModelApp
app = FastAPI() instance-attribute
__init__(settings)
register_model(model, name=None, controller=None)

Registers a model with the app.

Parameters:

Name Type Description Default
model BaseDSModel

the model to register.

required
name Optional[str]

an optional name under which to register the model; if not set, the model's default name is used.

None
controller Optional[BaseController]

an optional custom controller to use; if not set, the default controller for the kind is used.

None
run(host='127.0.0.1', port=8000, **kwargs)

plugins

query