Skip to content

[Java][FlightRPC] Java arrow flight server stuck in reader.getDescriptor() #431

@billyean

Description

@billyean

Describe the usage question you have. Please include as many useful details as possible.

Platform: Ubuntu 20.04.6 LTS
Arrow version: 15.
Client: C++
Server: Java
I have a Java arrow flight server that uses doExchange to read the context, my server code is as follows, I have identified in multiple thread environment, in the tests many calls would stuck at the line FlightDescriptor descriptor = reader.getDescriptor(); in the second time(Which means the first call in the same thread usually is not blocked), and which makes the following requests also stucks.

        log.info("Start to call doExchange.........");
        // Trying to
        try (BufferAllocator allocator = allocatorPool.submit(
            () -> this.allocator.newChildAllocator("exchange", 0, Long.MAX_VALUE)).get()) {
            FlightDescriptor descriptor = reader.getDescriptor();
            List<String> path = descriptor.getPath();
            String type = path.get(0);
            String funcSignature = path.get(1);
           ...

The client code is as follows

std::vector<std::shared_ptr<RecordBatch>> UDFClient::Call(
    const std::vector<std::string>& paths,
    std::shared_ptr<RecordBatch>& batch) const {
  // Create a FlightDescriptor using a path
  FlightDescriptor descriptor = FlightDescriptor::Path(paths);
  auto exchange_result = Client_->DoExchange(descriptor);
  if (!exchange_result.ok()) {
    throw std::runtime_error("Do exchange descriptor: "
            + exchange_result.status().ToString());
  }
  auto exchange = std::move(exchange_result.ValueUnsafe());
  ...

I understand reader.getDescriptor(); is a blocking call that will use future to wait the descriptor sent. Since doExchange in the server has been called, I assume the client auto exchange_result = Client_->DoExchange(descriptor); has been received. I can't figure what could make reader.getDescriptor(); stuck, I am reporting this to the community that want to know is this a bug or there is something I did wrong. Thank you

This happened in the multiple threads environment. We also check the lock with jstack that shows SettableFuture<FlightDescriptor> descriptor is the one not set that caused the lock.

"pool-1-thread-16" #31 prio=5 os_prio=0 tid=0x00007f6d34009000 nid=0x3a19 waiting on condition [0x00007f6d253ea000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000076d596180> (a com.google.common.util.concurrent.SettableFuture)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:563)
        at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:110)
        at org.apache.arrow.flight.FlightStream.getDescriptor(FlightStream.java:158)
        at com.bytedance.dp.udf.UDFProducer.doExchange(UDFProducer.java:266)
        at org.apache.arrow.flight.FlightService.lambda$doExchangeCustom$2(FlightService.java:382)
        at org.apache.arrow.flight.FlightService$$Lambda$65/298187580.run(Unknown Source)
        at io.grpc.Context$1.run(Context.java:566)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Component(s)

C++, FlightRPC, Java

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions