combinations without duplicates using php
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

The problem is explained here

https://stackoverflow.com/questions/47912489/combinations-without-duplicates-using-php

Please reply here no in stackoverflow , thanks.

can you provide us with a good dataset example, and an output result example? (use pastebin or edit your bounty)
Chlegou 6 months ago
ok please wait
graz68 6 months ago
this is the archive
graz68 6 months ago
https://pastebin.com/tut6kFXf
I think no more than first 200 rows are needed
graz68 6 months ago
If you want test the script you may use ($combx = 2; $combx < 6; $combx++) instead of ($combx = 2; $combx < 19; $combx++)
graz68 6 months ago
yes i'm trying to do that, but $numeri in line 34 is undefined! what should it be?
Chlegou 6 months ago
yes i'm trying to do that, but $numeri in line 34 is undefined! what should it be?
Chlegou 6 months ago
my fault , I added $numeri , please check stack
graz68 6 months ago
i'm trying to understand your inquires, let's say, you have this combination of 18 numbers, generated: "01.02.03.04.05.06.07.08.09.10.11.12.13.14.15.16.17.18", then you will check the archive file, and you will try to read check if there are at least 3 numbers in same row, if yes you save it. otherwise, you don't . if i'm understanding it correctly, then i will suppose, that you are already generating the numbers, so i as solution to your bounty, i will make the filtering function
Chlegou 6 months ago
no, it's the countrary. let's say the function generates this 01.02.03.04.05.06.07.08.09.10.11.12.13.14.15.16.17.18 . The script should check the archive from top to down if in the 5 numbers there are at least 3 from 01.02.03.04.05.06.07.08.09.10.11.12.13.14.15.16.17.18 . If the min 3 numbers are found in the first 30 rows (in one of these first 30 rows) of archive the combination must NOT be saved . If the min 3 numbers are not found in the first 30 rows (in one of these first 30 rows) of archive the combination must be saved in array and returned.
graz68 6 months ago
ok, i got it well now, my solution will be filtering the solutions you have as a result for now is that ok? or should i modify it?
Chlegou 6 months ago
yes, thank you, it seems you understood what I need.
graz68 6 months ago
just a quick question, our interest, is the first 30 rows only!! no need to check the rest of $archivio dataset right?
Chlegou 6 months ago
yes, it's correct. It will be faster too.
graz68 6 months ago
I need to go away now, I will check tomorrow morning 08:00 CET , thank you I appreciated your work.
graz68 6 months ago
i can be done in 10min if you could wait, i'm almost done
Chlegou 6 months ago

Crowdsource coding tasks.

2 Solutions

Winning solution

This is a rather interesting problem so I decided to have a go at it despite the small pay.

The big problem that makes you run out of memory is that you're generating all the combinations explicitly (and rather inefficiently at that) into an array. Filtering that array after generating it will not solve that problem.

Instead, you should use a generator, which will allow you to generate each combination one after the other, and use them one by one.

Here, I ported the example implementation of the itertools.combinations generator from the Python docs:

<?php
  function combinations($elems, $k) {
    $n = sizeof($elems);
    if ($k > $n) {
      return;
    }

    $map_fn = function($i) use ($elems) {
      return $elems[$i];
    };

    $elems = array_values($elems);
    $indices = range(0, $k-1);
    yield array_map($map_fn, $indices);
    while (true) {
      $broke = false;
      foreach (range($k-1, 0, -1) as $i) {
        if ($indices[$i] != $i + $n - $k) {
          $broke = true;
          break;
        }
      }
      if (!$broke) {
        return;
      }
      $indices[$i] += 1;
      if ($k-1 >= $i+1) {
        foreach (range($i+1, $k-1) as $j) {
          $indices[$j] = $indices[$j-1] + 1;
        }
      }
      yield array_map($map_fn, $indices);
    }
  }

  $numeri = "01.02.03.04.05.06.07.08.09.10.11.12.13.14.15.16.17.18.19.20.21.22.23.24.25.26.27.28.29.30";
  $numeri_ar = array_map('intval', explode(".", $numeri));
  $numeri_ar = array_unique($numeri_ar);

  $archivio = array();
  foreach (file("archive.txt") as $line) {
    $assoc = array();
    foreach (explode(".", trim($line)) as $n) {
      $assoc[intval($n)] = 1;
    }
    array_push($archivio, $assoc);
    if (sizeof($archivio) >= 200) {
      break;
    }
  }

  foreach (combinations($numeri_ar, 5) as $comb) {
    $okay = true;
    foreach ($archivio as $arch_comb) {
      $found = 0;
      foreach ($comb as $c) {
        if (array_key_exists($c, $arch_comb)) {
          if (++$found >= 3) {
            $okay = false;
            break;
          }
        }
      }
      if (!$okay) {
        break;
      }
    }
    if ($okay) {
      printf("%s\n", join(".", array_map('strval', $comb)));
    }
  }
?>

The output is printed line-by-line to stdout using the n.n.n.n.n format, which can be dumped into a file using php script.php > output.txt. Otherwise, you can also change line 69 to append $comb to an array instead.

200 lines are taken from archive.txt. $numeri is set to [01-30] as per the Stack Overflow question. For testing, k is set to 5, which doesn't take too long to complete.

At 30 choose 5, the number of combinations is 142'506, but after filtering, it drops to 123'447 valid combinations. That's only a ~15% reduction, but with a bigger k, I suppose it could get reduced more. However, I'm not sure if it would really drop to only a few thousand combinations considering 30 choose 18 is 86'493'225.

It would be more efficient to embed the filtering logic into the generator, but I've already stretched that $10 far too much.

Thank you CyteBode , please let me check
graz68 6 months ago
thank you , please let me check.
graz68 6 months ago
the code seems to be excellent. Using 18 , and entering in $numbers 30 numbers which are delayed (not a simple series 1-30), the filtered numbers are very few and I am receiving very interesting results. Thank you.
graz68 6 months ago
one question, should I enter number < 10 in this format 1,2...9 or this format 01,02,03..09, considering that in the archive they are entered in 01-09 format. Does it matter ?
graz68 6 months ago
My code converts the numbers to ints so the zero padding does not matter.
CyteBode 6 months ago

Hi there,

ok here is my solution for your bounty, as we discussed:

    function subcombi($arr, $arr_size, $count)
    {
       $combi_arr = array();
       if ($count > 1) {
          for ($i = $count - 1; $i < $arr_size; $i=$i+1) {
             $highest_index_elem_arr = array($i => $arr[$i]);
             foreach (subcombi($arr, $i, $count - 1) as $subcombi_arr) 
             {
                 $combi_arr[] = $subcombi_arr + $highest_index_elem_arr;
             }
          }
       } else {
          for ($i = $count - 1; $i < $arr_size; $i=$i+1) {
             $combi_arr[] = array($i => $arr[$i]);
          }
       }

       return $combi_arr;
    }

    function combinations($arr, $count)
    {
       if ( !(0 <= $count && $count <= count($arr))) {
          return false;
       }


       return $count ? subcombi($arr, count($arr), $count) : array();
    } 

$numeri="55.77.03.04.30.06.47.71";

$numeri_ar=explode(".",$numeri);  
$numeri_ar=array_unique($numeri_ar);


for ($combx = 2; $combx < 6; $combx++)
   {
$combi_ar = combinations($numeri_ar, $combx);
}
//full cominations array is ready
//print_r($combi_ar);


//get archivio content
$archivio = file_get_contents('archivio.txt');
$archivio_ar = explode("\n",$archivio);

//save only the first 30 rows
$archivio_ar = array_slice($archivio_ar, 0, 30, true);

//echo $archivio;
print_r($archivio_ar);

$wantedCombinations = [];

//now we will test for each combination
foreach($combi_ar as $combination){
    //we make a test for each line 
    foreach($archivio_ar as $dataset_row){
        $dataset_row_ar = explode(".",$dataset_row);
        $intersect_ar = array_intersect($combination, $dataset_row_ar);
        //we count intersection, if less than 3, we save it
        if( count($intersect_ar) < 3 ){
            $wantedCombinations[] = $combination; break;
        }
    }
}


//echo $wantedCombinations;
print_r($wantedCombinations);

i have completed your code, and commented it, the idea is simple, read the archive content, and check intersections, between the two combinations. in the intersection count is less than 3, so we save it as you require. otherwise, we skip it. you can change the condition for further inquiries ;)

hope this fits your case ;)

Your code generates far too many combinations considering 8 choose 5 (the number of combinations) is only 56.
CyteBode 6 months ago
@CyteBode, i focused on testing combinations, since he already had the generator function, my code, will only test validity of combinations.
Chlegou 6 months ago
The original code generated the right number of combinations. It's your code that outputs multiple copies of a same combination.
CyteBode 6 months ago
ah yes yes, i break was missing, didn't noticed it, i edited my code :) thanks!
Chlegou 6 months ago
It's still incorrect. The output array has length 56, same as the number of combinations, so no filtering was done. Take the last combination 04.30.06.47.71. Line 12 of archive.txt is 71.29.47.30.63 which has 3 numbers in common (71, 47, 30).
CyteBode 6 months ago
Thank you please let me check .
graz68 6 months ago
Sorry Chlegou, CyteBode code seems to me better and execution is very fast.
graz68 6 months ago
@graz68, it's ok, no worries :) ,o need for paypal, you can tip here if you want.
Chlegou 6 months ago
View Timeline