Importing csv creates several duplicates

I don’t know Portuguese so apologies if this is in the wrong category.

I am working on an archive with multiple collections. All our items are available in a spreadsheet. I tried to import 3 records first and it worked correctly.

Next, I tried to import about 250 items into the same collection with the same CSV column structure (some 10 columns) and mapping. The CSV took a long time to process, almost 4-5 hours. And in the end there are 770 items in the collection. Some of the items have 100 or more duplicates (I can see that from the automatic slugs that are created). I am wondering why this happened.

One thing to note is that we have a server side crown running and wp cron is disabled.

Any help would be appreciated.

Hey @guneet welcome to our community!

That is weird and certainly sounds like a bug. 4-5 hours is no common dealy for 250 items and the duplicates are obviously something wrong. Can you provide us with a copy of your CSV files so we can do some testing? If you don’t feel confortable sharing it here you can send me a private message here in the Discourse chat.

1 curtida

Hi, I recently had a similar issue, and realized it happened because I had the same data in the core_title column. I deleted all data, re-imported without repeated information, and it worked.

1 curtida

Thank you both for the reply. The issue seems like it is related to process/cron management. Because the number of duplicates kept increasing after I posted here. I checked the processes page in Tainacan and the CSV import process was 0% but continued to import the same CSV entries over and and over again. I manually cancelled the process and and have bulk deleted the posts using WP CLI.

Hi @guneet,

Are you using shared hosting or a server with processing limitations?

This behavior occurs when the import process is abruptly interrupted and is unable to save its execution status in the database.

I see. We are on a VPS with stable versions of LEMP though. Available resources are enough. Is there anything specific I can share with you about the VPS config or the WordPress environment?

Your point is correct though. I was able to successfully import 15 records from a CSV and the process showed 100% complete within 1 minute. After that I tried to import 30 records, and the process got stuck after importing 25 records, and then started duplicating them. I had to manually cancel the process. Can I share the log file with you?

The first thing to do is to identify why the item creation process is running so slowly.
Are you importing images or just metadata? If you are importing images, I recommend testing the import with metadata only first. This will help you verify whether the import speed improves without the additional overhead of image uploads.

By default, Tainacan uses a 20-second time limit per execution and up to 90% of the available memory. When one of these limits is reached, the process is interrupted and its status is saved in the database so that it can resume later.
However, some server configurations may kill the process before Tainacan is able to save this status, causing slowness or unexpected interruptions.

If you want to change these limits, you can use the following filters:

Change the maximum execution time

add_filter( 'tnc-bg_importer_default_time_limit', function( $time_limit ) {
    return 10; // new time, in seconds
});

Change the memory usage limit

add_filter( 'tnc-bg_importer_memory_exceeded', function( $memory_result ) {
    $memory_limit   = $this->get_memory_limit() * 0.5; // 50% of max memory
    $current_memory = memory_get_usage( true );

    return $current_memory >= $memory_limit;
});

Additional tip

Enable the importer debug mode to retrieve more detailed information about the process. Just add the following constant to your wp-config.php:

define( 'TAINACAN_DEBUG_BG_PROCESS', true );

This will be very helpful for identifying bottlenecks or errors during the import process. Once the debug mode is enabled and the logs are generated, please share them here so we can analyze the issue.

1 curtida

Thank you for the detailed response!

We have the PHP execution time set to 300s. And the memory limit is also set to 256MB (and max memory limit to 512MB). I will enable the debug constant to get more information.

About importing images - no the images and documents are already uploaded to the server and in the import file I am only adding url:/wp-content/uploads/folder-name/file-name.jpg to column named ‘special_document’. The import is actually not slow, 25 items were added within a few seconds.

I checked the log of the process that I manually cancelled and after inserting 23 items, the log says “New Request” and then moves back to “processing item on line 2”. I am attaching a screenshot of the log for you to inspect. I have modified the URL in the thumbnail file imported for my org’s confidentiality at this stage.

What do you think?