Hello,
I am trying to asynchronously fetch people's page views. I'm trying a queue approach that I found on Stackoverflow.
import asyncio
from aiohttp import ClientSession, TCPConnector
async def get(session, url):
headers = {
'Authorization': 'Bearer KEY',
}
async with session.get(url, headers=headers) as response:
json = await response.json()
return json, response
async def process(session, url, q):
try:
try:
views, response = await get(session, url)
scode = response.status
if scode == 404:
return
except Exception as e:
print(e)
return
try:
await q.put(str(response.links["next"]["url"]))
except:
pass
except Exception as e:
print(e)
async def fetch_worker(session, q):
while True:
url = await q.get()
try:
await process(session, url, q)
except Exception as e:
print(e)
finally:
q.task_done()
async def d():
<BR /> connector = TCPConnector(limit=500)<BR /> async with ClientSession(connector=connector) as session:<BR /> url = '<some base url>'</P><P></P><P> for i in range(500):<BR /> tasks.append(asyncio.create_task(fetch_worker(session, url_queue)))</P><P></P><P> for row in stdrows:<BR /> await url_queue.put(url.format(row[1]))</P><P></P><P> await asyncio.gather(*tasks)<BR /> await url_queue.join()<BR />asyncio.run(d())</P></BLOCKQUOTE><P>This appears not to be going at 500 tasks/sec. is it even possible to get to this rate without knowing all the URLs ahead of time? I am hoping to fetch the next url from whatever initial url (or from its paginated url) while i work with `views`. The bucket in<SPAN class="">stantly drops to 699.8... and stays around there for the rest of the run.This matches up with when I print the URL in <CODE>process - it prints the initial say 24 then it slows down. There are definitely more than 500 generated urls. If I put 2000 connections/tasks, it's still the same time