Deduplication Semantique

Detects semantic duplicates in a set of entities by comparing their vector similarity. Merges or flags found duplicates.

Parameters

Parameter	Type	Required	Variable	Description
`input`	dynamic value	No	Yes	Entities to analyze for duplicate detection. Array of IDs or objects.
`entityTypeFilter`	array	No	No	Entity types to consider for deduplication.
`similarityThreshold`	number	No	No	Similarity threshold (0.0 to 1.0). Higher = stricter duplicate detection. (Default: `0.85`, min 0, max 1)
`mode`	choice (`report_only`, `flag_entities`, `merge_with_review`, `auto_merge`)	No	No	Action on duplicates: report only, flag, merge with review, or auto merge. (Default: `"report_only"`)
`mergeStrategy`	choice (`keep_first`, `keep_newest`, `merge_properties`, `manual_review`)	No	No	Merge strategy: keep first, keep newest, merge properties, or manual review. (Default: `"keep_first"`)
`comparisonFields`	array	No	No	Specific fields to compare for similarity between entities.
`batchSize`	number	No	No	Number of entities per batch for embedding comparisons. (Default: `50`, min 1, max 1000)
`enableLLMNameMatching`	boolean	No	No	Enable LLM to refine similar name detection (e.g. abbreviations, variants). (Default: `false`)
`model.provider`	text	No	No	Language model provider for result refinement.
`model.model`	text	No	No	Model identifier to use (e.g. gpt-4o).
`outputVariable`	text	No	No	Output variable name containing the detected duplicate groups.

Parameters marked Variable = Yes accept the {{blockName.field}} syntax.

Output variable : deduplicationResult

{
  "duplicates": [],
  "merged": 0
}

Detect duplicates among leads.

Input :

{"entityType": "Lead"}

Output :

{"duplicates": [{"pair": ["lead-1", "lead-2"], "similarity": 0.95}], "merged": 0}

Tip

{{deduplicationResult.duplicates}} lists duplicate pairs with similarity scores. Adjust threshold (default 0.85) to control sensitivity.