Getting chunked responses in resolver with new forge runtime

bpolnik · November 4, 2023, 10:21am

Hi! I’m calling in the resolver openai api which supports streaming. It’s achieved with chunked encoding and server sent events. The problem I run into is that I’m not getting response body as soon as it’s available but only once server closes a connection. It seems that there is something buffering response data. Did anyone experience this issue?

noahinat · November 5, 2023, 8:18am

Same here. When I ran the code below, it waited so long to start streaming.

      const openai = new Openai({
        apiKey: "XXXXXXX"
      });
      const stream = await openai.chat.completions.create({
        model: "gpt-4",
        messages: [{ role: "user", content: "Write the poem in 250 words" }],
        stream: true,
      });
      console.log("Start streaming: ");
      for await (const part of stream) {
        process.stdout.write(part.choices[0]?.delta?.content || "");
      }

MichalZawalski · November 21, 2023, 11:31pm

Same issue here. The stream is available within the same timeframe (or even later) than normal completion (without streaming). With a stream, the chunks are printed separately but with a 1-millisecond between each chunk - so it is a completed answer - not the regular stream from OpenAI.