-
Notifications
You must be signed in to change notification settings - Fork 81
Open
Description
csvlink hangs after a few seconds with 0.0% CPU
- python version: 3.7.3
- environment: centos
CSV Files to Match
$ wc -l train-*
494 train-left.csv
481 train-right.csvConfig file
Attempting to match on 9 fields.
{
"field_names": [
"state",
"email",
"address_2",
"address_1",
"county",
"postal_code",
"city",
"name"
],
"field_definition": [
{
"field": "state",
"type": "String",
"Has Missing": true
},
{
"field": "email",
"type": "String",
"Has Missing": true
},
{
"field": "address_2",
"type": "String",
"Has Missing": true
},
{
"field": "address_1",
"type": "String",
"Has Missing": true
},
{
"field": "county",
"type": "String",
"Has Missing": true
},
{
"field": "postal_code",
"type": "String",
"Has Missing": true
},
{
"field": "city",
"type": "String",
"Has Missing": true
},
{
"field": "name",
"type": "String",
"Has Missing": true
}
],
"output_file": "deduped.csv",
"skip_training": false,
"training_file": false,
"sample_size": 150000,
"recall_weight": 2
}
Command
Running csvlink with the following:
csvlink train-left.csv train-right.csv --config_file=config.json --inner_joinAfter an initial large cpu hit, the script settles down into a very relaxed state:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14191 somebody+ 20 0 558660 143092 10144 S 0.0 0.9 0:52.45 csvlinkAm I doing something wrong?
Metadata
Metadata
Assignees
Labels
No labels