Deduplication Semantique
Detects semantic duplicates in a set of entities by comparing their vector similarity. Merges or flags found duplicates.
Parameters
| Parameter | Type | Required | Variable | Description |
|---|---|---|---|---|
input | dynamic value | No | Yes | Entities to analyze for duplicate detection. Array of IDs or objects. |
entityTypeFilter | array | No | No | Entity types to consider for deduplication. |
similarityThreshold | number | No | No | Similarity threshold (0.0 to 1.0). Higher = stricter duplicate detection. (Default: 0.85, min 0, max 1) |
mode | choice (report_only, flag_entities, merge_with_review, auto_merge) | No | No | Action on duplicates: report only, flag, merge with review, or auto merge. (Default: "report_only") |
mergeStrategy | choice (keep_first, keep_newest, merge_properties, manual_review) | No | No | Merge strategy: keep first, keep newest, merge properties, or manual review. (Default: "keep_first") |
comparisonFields | array | No | No | Specific fields to compare for similarity between entities. |
batchSize | number | No | No | Number of entities per batch for embedding comparisons. (Default: 50, min 1, max 1000) |
enableLLMNameMatching | boolean | No | No | Enable LLM to refine similar name detection (e.g. abbreviations, variants). (Default: false) |
model.provider | text | No | No | Language model provider for result refinement. |
model.model | text | No | No | Model identifier to use (e.g. gpt-4o). |
outputVariable | text | No | No | Output variable name containing the detected duplicate groups. |
Parameters marked Variable = Yes accept the
{{blockName.field}}syntax.
Output
Output variable : deduplicationResult
{
"duplicates": [],
"merged": 0
}
Example
Detect duplicates among leads.
Input :
{"entityType": "Lead"}
Output :
{"duplicates": [{"pair": ["lead-1", "lead-2"], "similarity": 0.95}], "merged": 0}
Tip
{{deduplicationResult.duplicates}} lists duplicate pairs with similarity scores. Adjust threshold (default 0.85) to control sensitivity.