ctrl_product must be optimized for large import

UPDATE:

a more effective solution is to create a new own ctrl_product only for import => new ctrl_product_import

please ignore this topic

----------------------------------------------------------------------------------

Hi Tim, i created import script for more than 30 000 products via
$product = new ctrl_product();
...
$product->save();
and this process generates a very large number of queries to the database.

For example - assigning a product to a category:

original from v2.1.6 ctrl_product.inc.php line 236:
      database::query(
        "delete from ". DB_TABLE_PRODUCTS_TO_CATEGORIES ."
        where product_id = '". (int)$this->data['id'] ."';"
      );
      foreach ($this->data['categories'] as $category_id){
        database::query(
          "insert into ". DB_TABLE_PRODUCTS_TO_CATEGORIES ."
          (product_id, category_id)
          values ('". (int)$this->data['id'] ."', '". (int)$category_id ."');"
        );
      }

my optimized solution:
      $old_categories = array();
      $categories_query = database::query(
        "select category_id from ". DB_TABLE_PRODUCTS_TO_CATEGORIES ."
        where product_id = '". (int)$this->data['id'] ."';"
      );
      while ($category_id = database::fetch($categories_query, 'category_id')){
        $old_categories[] = (int)$category_id;
        if(!in_array((int)$category_id, $this->data['categories'])) {
          database::query(
            "delete from " . DB_TABLE_PRODUCTS_TO_CATEGORIES . "
            where product_id = '". (int)$this->data['id'] ."' and category_id = '". (int)$category_id ."';"
          );
        }
      }
      foreach ($this->data['categories'] as $category_id){
        if(!in_array((int)$category_id, $old_categories)){
          database::query(
            "insert into ". DB_TABLE_PRODUCTS_TO_CATEGORIES ."
            (product_id, category_id)
            values (". (int)$this->data['id'] .", ". (int)$category_id .");"
          );
        }
      }
It significantly reduces the number of queries to the database - only when changed.

Here is pull request for this: https://github.com/litecart/litecart/pull/151/files

The same is needed for campaigns, options, stock options, images, ...
a more effective solution is to create a new own ctrl_product only for import => new ctrl_product_import

please ignore this topic
:) I like it that you are experimenting. 30 000 products is a lot of data and I would recommend segmental imports or setting script timeout limit. I see you found a solution. Don't hesitate to publish an add-on. That way someone else can continue your work.

I will deny your push request and might implement a small optimization for this part. Basically we don't need to delete all the category mappings and insert them all again.

database::query(
  "delete from ". DB_TABLE_PRODUCTS_TO_CATEGORIES ."
  where product_id = ". (int)$this->data['id'] ."
  and category_id not in ('". implode("', '", database::input($this->data['categories'])) ."');"
);

database::query(
  "insert ignore into ". DB_TABLE_PRODUCTS_TO_CATEGORIES ."
  (product_id, category_id) values
  (". (int)$this->data['id'] .", '". implode("'), (". (int)$this->data['id'] .", '", database::input($this->data['categories'])) ."');"
);

I did not test the code.
Your code is also bad for me.
Creates 2 queries for each product for each import (= 60k queries for 30k products).
And that is only the assignment of the product to the category = database killer

The best solution for me is:
1) load the current state (select * from DB_TABLE_PRODUCTS_TO_CATEGORIES where product_id = X)
2) save only the changes - no change = no more query to the database
This is the best solution for me for large imports.
See my optimized solution in my first post.

I took over and modified the batch import from my own e-commerce system - a lot of development and debugging time...
I'm sorry your proposed example here for an optimization produced how many queries again?
I'm sorry, I don't understand your question. My English isn't very well...
You said my example produces too many mysql queries. How many mysql queries does your example produce?
Okay, I'll try to explain:

number of database queries | my / your example
-------------------------------------------------------------------
first import - creating assignment | 2 / 2
second a next imports - without change (most common case) | 1 / 2
second a next imports - delete assignment | 2 / 2

The thing is, most imports (second and next imports) do not change product assignment to category.
My example in this case needs only 1 query. Moreover, it's just a "SELECT" statement.

Creating assignment - only in first import.
Delete assignment - very little occurrence in second and next imports.

Did I explain it better?
In your code example. If a category is tied to 4 categories then that piece of code would produce 8 queries.

This is the best I can do for you:

    // Categories
      database::query(
        "delete from " . DB_TABLE_PRODUCTS_TO_CATEGORIES . "
        where product_id = ". (int)$this->data['id'] ."
        and category_id not in ('". implode(", ", database::input($this->data['categories'])) ."');"
      );

      foreach ($this->data['categories'] as $category_id) {
        if (in_array($category_id, $this->previous['categories'])) continue;
        database::query(
          "insert into ". DB_TABLE_PRODUCTS_TO_CATEGORIES ."
          (product_id, category_id)
          values (". (int)$this->data['id'] .", ". (int)$category_id .");"
        );
      }
You're right, my solution can be more optimized:

$old_categories = $del_categories = array();
      $categories_query = database::query(
        "select category_id from ". DB_TABLE_PRODUCTS_TO_CATEGORIES ."
        where product_id = '". (int)$this->data['id'] ."';"
      );
      while ($category_id = database::fetch($categories_query, 'category_id')){
        $old_categories[] = (int)$category_id;
        if(!in_array((int)$category_id, $this->data['categories'])) {
          $del_categories[]= (int)$category_id;
        }
      }

      if(!empty($del_categories)){
        database::query(
          "delete from " . DB_TABLE_PRODUCTS_TO_CATEGORIES . "
            where product_id = '". (int)$this->data['id'] ."' and category_id in (". implode(',', $del_categories) .");"
        );
      }

      foreach ($this->data['categories'] as $category_id){
        if(!in_array((int)$category_id, $old_categories)){
          database::query(
            "insert into ". DB_TABLE_PRODUCTS_TO_CATEGORIES ."
            (product_id, category_id)
            values (". (int)$this->data['id'] .", ". (int)$category_id .");"
          );
        }
      }
-------------------------------------

Anyway, I am testing both solutions in real import.
I'm interested in what's faster (= less database load).
The results will be published here.

Thank you for cooperating.
MySQL is very fast. But the amount of queries is the lengthy process. A grouped insert would perform better than separated queries.

But optimizing this part does not impact the customers viewing the site. I think you should just use a longer script timeout or create a background process. How often is this job of 30,000 products executed? Every 5 minutes or once a day?