Optimise PHP Script and MYSQL INSERT statement
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

We are pulling public Met Office Forecast data from the DataPoint API into a database on CRON each hour. The XML file contains weather forecasts for 6000 locations in the UK. Here is a sample of the data from the XML file we are parsing:

<DV dataDate="2015-12-04T14:00:00Z" type="Forecast">
    <Location i="14" lat="54.9375" lon="-2.8092" name="CARLISLE AIRPORT" country="ENGLAND" continent="EUROPE" elevation="50.0">     
        <Period type="Day" value="2015-12-04Z">
            <Rep D="SSW" F="6" G="56" H="76" Pp="90" S="36" T="11" V="MO" W="15" U="0">1080</Rep>
        </Period>
    </Location>
    <Location i="22" lat="53.5797" lon="-0.3472" name="HUMBERSIDE AIRPORT" country="ENGLAND" continent="EUROPE" elevation="24.0">
        <Period type="Day" value="2015-12-04Z">
            <Rep D="SSW" F="7" G="38" H="80" Pp="4" S="22" T="11" V="VG" W="7" U="0">1080</Rep>
        </Period>
    </Location>
    ...+6000 other locations
 </DV>

Currently we are pushing 6000 individual INSERT statements which is very inefficient way of doing this (and is taking several minutes to process). We want to optimise the following PHP function and improve the performance of the INSERT statement into one INSERT rather than 6000. Or you please suggest ways of further improvement.

Here is the primary PHP function doing the work on the XML file:

function get_met_office_forecast_data() {
// Todays date.
$date = date ( 'Y-m-d' ); 
// Tomorrows date.
$tomorrow = date ( 'Y-m-d', strtotime ( $date . '+1 day' ) );
$hour = date ( 'H' );
switch ($hour) {
case 00 :
case 01 :
case 02 :
  $datetime = $date . 'T03Z';
  break;
case 03 :
case 04 :
case 05 :
  $datetime = $date . 'T06Z';
  break;
case 06 :
case 07 :
case 08 :
  $datetime = $date . 'T09Z';
  break;
case 09 :
case 10 :
case 11 :
  $datetime = $date . 'T12Z';
  break;
case 12 :
case 13 :
case 14 :
  $datetime = $date . 'T15Z';
  break;
case 15 :
case 16 :
case 17 :
  $datetime = $date . 'T18Z';
  break;
case 18 :
case 19 :
case 20 :
  $datetime = $date . 'T21Z';
  break;
case 21 :
case 22 :
case 23 :
  $datetime = $tomorrow . 'T00Z';
  break;
}
  // Get the data from Met Office at corresponding time.
 $response = file_get_contents ( 'http://datapoint.metoffice.gov.uk/public/data/val/wxfcs/all/xml/all?res=3hourly&time=' . $datetime . '&key=MYAPI' );

$file_name = 'metofficeforecast.xml';
$directory = 'public://metoffice/';
file_prepare_directory ( $directory, FILE_CREATE_DIRECTORY );
file_unmanaged_save_data ( $response, $directory . $file_name, FILE_EXISTS_REPLACE );
$localfile = 'sites/default/files/metoffice/metofficeforecast.xml';
$data = simplexml_load_file ( $localfile );
$stations = count ( $data->DV->Location );

foreach ( $data->DV->Location as $station ) {

foreach ( $station->attributes () as $key => $value ) {
  switch ($key) {
    case 'name' :
      $name = ucwords ( $value );
      // drupal_set_message('The station name is ' . $name);
      break;
    case 'i' :
      $id = $value;
      // drupal_set_message('The station ID ' . $id);
      // We've got the data lets build the node:
      break;
    case 'lat' :
      $lat = $value;
      break;
    case 'lon' :
      $lon = $value;
      break;
    case 'elevation' :
      $elevation = $value;
      break;
  }
}

foreach ( $station->Period as $period ) {
  foreach ( $period->attributes () as $key => $value ) {
    if ($key == 'value') {
      $date = rtrim ( $value, 'Z' );
      // Now check that the date is todays date.
      $today = date ( 'Y-m-d' );
      if ($date == $today) {
        // How many reports are in this section so we can grab the last one.
        $count = count ( $period->Rep );
        // drupal_set_message($count . ' reports available');
        // Set up a counter see how many reports we have looped through.
        $reps = 0;
        foreach ( $period->Rep as $report ) {
          $reps ++;
          if ($reps == $count) {
            $time = $report / 60;
            $datetime = $today . ' ' . $time . ':00';
            // Show the date and time of this report
            // drupal_set_message('The datetime of this report is ' . $datetime);

            // drupal_set_message('The count is at ' . $count . ' and the report number is at ' . $reps);
            foreach ( $report->attributes () as $key => $value ) {
              switch ($key) {
                case 'D' :
                  $wind_compass = $value;
                  // drupal_set_message('The wind compass is ' . $wind_compass);
                  break;
                case 'G' :
                  $wind_gust = $value;
                  // drupal_set_message('The wind gust is ' . $wind_gust);
                  break;
                case 'P' :
                  $air_pressure = $value;
                  // drupal_set_message('The air pressure is ' . $air_pressure);
                  break;
                case 'H' :
                  $humidity = $value;
                  // drupal_set_message('The humidity is ' . $humidity);
                  break;
                case 'Dp' :
                  $dew_point = $value;
                  // drupal_set_message('The dew point is ' . $dew_point);
                  break;
                case 'S' :
                  $wind_speed = $value;
                  // drupal_set_message('The wind speed is ' . $wind_speed);
                  break;
                case 'T' :
                  $air_temperature = $value;
                  // drupal_set_message('The air temperature is ' . $air_temperature);
                  break;
                case 'V' :
                  $visibility = $value;
                  // drupal_set_message('The visibility is ' . $visibility);
                  break;
                case 'W' :
                  $weather_type = $value;
                  // drupal_set_message('The weather type is ' . $weather_type);
                  $weather = convert_weather_type ( $weather_type );
                  $weather_tid = $weather;
                  break;
                case 'Pt' :
                  $pressure = $value;
                  // drupal_set_message('The pressure tendency is ' . $pressure);
                  break;
              }
            }
          }
        }
      }
    }
  }
}
// Package the data into an array to send to our analysis table.
$stations = array ();
$stations = array (
    'station_id' => $id,
    'station_name' => $name,
    'station_region' => 'Not Set',
    'station_unitary_authority' => 'Not Set',
    'station_elevation' => $elevation,
    'station_latitude' => $lat,
    'station_longitude' => $lon,
    'station_wind_gust' => $wind_gust,
    'station_temperature' => $air_temperature,
    'station_visibility' => $visibility,
    'station_wind_direction' => $wind_compass,
    'station_wind_speed' => $wind_speed,
    'station_weather_type' => $weather_tid,
    'station_pressure' => $air_pressure,
    'station_pressure_tendancy' => $pressure_tendancy,
    'station_dew_point' => $dew_point,
    'station_screen_humidity' => $humidity,
    'station_observation_type' => 2,
    'station_last_update' => $datetime 
);

// Set the type of operation to an Observation Update.
$type = 2;
// Add data to log table.
insert_weather_data ( $stations, $type );
}
}

Here is the function to get the data into the DB which is being called 6000 times:

function insert_weather_data($stations, $type) {
// Clean up old forecast data and delete posts with match post id.
$station_id = $locations ['station_id'];
db_delete ( 'met_office_stations' )->condition ( 'station_id', $station_id )->condition ( 'station_observation_type', $type )->execute ();

// Insert the new data live.
 $insert_query = db_insert ( 'met_office_stations' )->fields ( array (
 'station_id',
 'station_name',
 'station_region',
 'station_unitary_authority',
 'station_elevation',
 'station_latitude',
 'station_longitude',
 'station_wind_gust',
 'station_temperature',
 'station_visibility',
 'station_wind_direction',
 'station_wind_speed',
 'station_weather_type',
 'station_pressure',
 'station_pressure_tendancy',
 'station_dew_point',
 'station_screen_humidity',
 'station_observation_type',
 'station_last_update' 
 ) );
$insert_query->values ( $stations );
$insert_query->execute ();
}

A point to note is that not all of the values are necessarily available for all locations so we need to return NULL for those values. You may suggest differently but ideally we need to package all of this data into the $stations array and pass that once to the insert statement.

Crowdsource coding tasks.

3 Solutions


The most reliable solution sounds like "there is a module for that": I guess, you'd better use a combination of Drupal modules (your site is Drupal-based, right?): Feeds (https://www.drupal.org/project/feeds), Feeds extensible parsers (https://www.drupal.org/project/feeds_ex) & Data (https://www.drupal.org/project/data). As you could see at Feeds project page, it offers "Batched import for large files" - I guess, that's what you need. The profit of such effort is that you re-use something that is already invented & debugged, avoiding the need of supporting your own custom solution.

If you really want to stay with your current implementation, you could use multi-insert syntax (MySQL only) or transactions. This link has very much related info: http://drupal.stackexchange.com/a/53319. Both of the ways increase performance as they reduce the number of commits (from about 6000 to just 1) - the most time-consuming part of all the database interactions.

Can you provide a worked version please
VDDev 3 years ago

It's looks like your are pinging to DB server on every iteration.
Just create a TEMP_INSERT_FILE.sql or TEMP_INSERT_FILE.csv where you keep appending the file with data. ( STORE THE DATA FROM ITERATION ONTO THE FILE ).
Move the below code out of the loop:
$type = 2;
insert_weather_data ( $stations, $type );
After the Loop upload the TEMP_INSERT_FILE.sql or TEMP_INSERT_FILE.csv to the db.
insert_weather_data($TEMP_INSERT_FILE.sql, $type){
//function to upload the file to db
}

Summary:
1. Create a temp file to store all the 6000 records in sql or csv format
2. Move or Remove the code : insert_weather_data ( $stations, $type ); from the get_met_office_forecast_data() function.
3. Modify insert_weather_data() function to upload tempfile to DB

Can you provide a worked version please
VDDev 3 years ago

My PHP is rusty but for the most part the problem is not just the frequent communication. To speed up RDBMS inserts one should disable integrity constraints and reduce log writes created per transaction. The second optimization as you mentioned is to reduce the amount of communication between the client and the RDBMS by using the multiple-row INSERT. I have no information about your schema so I have no idea about what indexing your tables have set up but the disabling the following constraints is common practice. Turning off autocommit stops MySql from flushing the log to disk for every inserted row.

Before performing inserts:
SET autocommit=0;
SET unique_checks=0;
SET foreign_key_checks=0;

Inserts: Iterate through the $stations array to create the following sort of insert statement INSERT INTO met_office_stations VALUES (5,62, 21), (25,58, 65), (3,2,5), ...;
with each group of values surrounded by brackets representing one row.

After:
SET foreign_key_checks=1;
SET unique_checks=1;
COMMIT;

Can you provide a worked version please
VDDev 3 years ago
View Timeline