ebay api - findItemsByCategory call - scanning every newly listed item
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

When using the eBay API's findItemsByCategory call and sorting StartTimeNewest sometimes the output from the call will show an item that was listed more recently than another item & then on the next call the older item/s will appear in the correct order below the newer listed item.

Example.
API Call #1 Output

Item 5: start time - 2012-11-07T22:37:11.000Z ** Saves this start time in db to break here on the next call

** Item 4 is missing

** Item 3 is missing

Item 2: start time - 2012-11-07T22:36:41.000Z ** item is scanned

Item 1: start time - 2012-11-07T22:36:13.000Z ** item is scanned

API Call #2 Output (the app makes this call every second)

Item 6: start time - 2012-11-07T22:37:14.000Z * Saves this start time in db to break here on the next call

Item 5: start time - 2012-11-07T22:37:11.000Z ** Loop Breaks here

Item 4: start time - 2012-11-07T22:37:07.000Z ** Item 4 shows up and is missed

Item 3: start time - 2012-11-07T22:37:03.000Z ** Item 3 shows up and is missed

Item 2:start time - 2012-11-07T22:36:41.000Z ** scanned in api call 1

Item 1:start time - 2012-11-07T22:36:13.000Z ** scanned in api call 1

The output from the api call can be very random. Sometimes it will not display one or two items.. other times 5 or 6 items and sometimes it displays them all correctly.

The app needs to scan every newly listed item without missing any and be as fast as possible. What is the most efficient way to set up the loop with the fewest possible database hits. Currently I am storing the start time of the first item scanned every api call so that it stops there on the next call.... but I just found the problem with that and I am not sure about to best way to fix it.

My stack is ruby/sinatra/mongodb using mongomapper & beanstalkd/stalker to process background jobs like this.

Here is the URL Format for the HTTP GET api call

http://svcs.ebay.com/services/search/FindingService/v1?
OPERATION-NAME=findItemsByCategory&
SERVICE-VERSION=1.12.0&
SECURITY-APPNAME=#{EBAY_DEVELOPER_API_KEY}&
GLOBAL-ID=EBAY-US&
RESPONSE-DATA-FORMAT=XML&
RESTPAYLOAD&
affiliate.networkId=&
categoryId=#{ebay_category_id}&   # 9355 to test 
paginationInput.entriesPerPage=100&
sortOrder=StartTimeNewest

Here is a simple version of the current loop to parse the items.

previous_api_call = EbayAPILog.first :order => :created_at.desc
current_api_call = EbayAPILog.new
doc = Nokogiri::XML(open(url)  
catch (:done) do   
  @doc.css("item").each_with_index do |item, i|
    item_number = item.css("itemId").text.to_i
    if i == 0
      start_time = Time.parse(item.css("listingInfo startTime").text
      current_api_call.start_time = start_time
      current_api_call.save
    end
    if previous_api_call.start_time >= start_time
      throw :done 
    end
  end
end

It seems like I am going to have to save every item number and check during the loop to see if it has been scanned. I just need some help figuring out the fastest most efficient option.

******* UPDATE ***************
Okay here is the best option I have come up with. I added an ebay_items Array to EbayAPILog model & am saving every newly scanned item so I can skip them on the next call. Any help would be great.

previous_api_call = EbayAPILog.first :order => :created_at.desc
current_api_call = EbayAPILog.new
call_log_items = []

doc = Nokogiri::XML(open(url)  
catch (:done) do   
  @doc.css("item").each_with_index do |item, i|
    item_number = item.css("itemId").text.to_i
    if i == 0
      start_time = Time.parse(item.css("listingInfo startTime").text
      current_api_call.start_time = start_time
      # Save the start_time as soon as possible so items will not be scanned twice
      current_api_call.save
    end

    if previous_api_call.start_time >= start_time

      # Hack to stop if the start time is over 20 seconds old
      # I have found that pretty much every item is displayed at this point
      if previous_item_start_time > start_time + 20
        throw :done
      end

      # Skip the item if it was in the last call
      next if previous_api_call.ebay_items.include?(item_number)

      # Check to see if the item is in the call log
      item_in_call_log = EbayAPILog.first :ebay_items => item_number
      if item_in_call_log
        next 
      end
    end

    # Add the item number to the EbayAPILog since it was scanned
    current_api_call.ebay_items << item_number

  end
end

# Some calls will not scan any new items because none will have been listed
# I decided to save the previous api calls items if this happens to speed up the next loop
# and prevent more database hits since I have to get the previous_api_call every API call anyway
if current_api_call.ebay_items.empty?
  current_api_call.ebay_items = previous_api_call.ebay_items
end    
current_api_call.save
Just as a note this is going to be processing hundreds of items per minute sometimes over a thousand in peak times... so every millisecond and less strain on the server will help
blakecash 7 years ago
So, just to clarify, you want a loop that will process the XML with as little requests as possible?
alex 7 years ago
Yes that is correct. I am about to update the description with the best option I have come up with.. I'm open to something better.. My code skills are fairly beginner so I thought someone else might have a better option
blakecash 7 years ago
awarded to MSF

Crowdsource coding tasks.

0 Solutions