Skip to main content

Deduplication Semantique

Detects semantic duplicates in a set of entities by comparing their vector similarity. Merges or flags found duplicates.

Parameters

ParameterTypeRequiredVariableDescription
inputdynamic valueNoYesEntities to analyze for duplicate detection. Array of IDs or objects.
entityTypeFilterarrayNoNoEntity types to consider for deduplication.
similarityThresholdnumberNoNoSimilarity threshold (0.0 to 1.0). Higher = stricter duplicate detection. (Default: 0.85, min 0, max 1)
modechoice (report_only, flag_entities, merge_with_review, auto_merge)NoNoAction on duplicates: report only, flag, merge with review, or auto merge. (Default: "report_only")
mergeStrategychoice (keep_first, keep_newest, merge_properties, manual_review)NoNoMerge strategy: keep first, keep newest, merge properties, or manual review. (Default: "keep_first")
comparisonFieldsarrayNoNoSpecific fields to compare for similarity between entities.
batchSizenumberNoNoNumber of entities per batch for embedding comparisons. (Default: 50, min 1, max 1000)
enableLLMNameMatchingbooleanNoNoEnable LLM to refine similar name detection (e.g. abbreviations, variants). (Default: false)
model.providertextNoNoLanguage model provider for result refinement.
model.modeltextNoNoModel identifier to use (e.g. gpt-4o).
outputVariabletextNoNoOutput variable name containing the detected duplicate groups.

Parameters marked Variable = Yes accept the {{blockName.field}} syntax.

Output

Output variable : deduplicationResult

{
"duplicates": [],
"merged": 0
}

Example

Detect duplicates among leads.

Input :

{"entityType": "Lead"}

Output :

{"duplicates": [{"pair": ["lead-1", "lead-2"], "similarity": 0.95}], "merged": 0}
Tip

{{deduplicationResult.duplicates}} lists duplicate pairs with similarity scores. Adjust threshold (default 0.85) to control sensitivity.