Issue Deion In my tests, MPSCircuits take much longer to get

Hi, I'm also testing JIT compilation with MPSCircuit with <a class="user-mention notra

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Improve the slow complication of MPSCircuit about tensorcircuit HOT 16 OPEN

royess commented on July 23, 2024

Improve the slow complication of MPSCircuit

from tensorcircuit.

Comments (16)

Marsmmz commented on July 23, 2024 1

Hi, I'm also testing JIT compilation with MPSCircuit with @royess. I use a circuit of the same depth but with geometric locality, and the compile time is faster (since we are not able to finish jit compiling for the non-local circuit, I can't say how much faster). However, I don't understand why locality will cause such a difference. What is the logic of in Tensorcircuit when doing contraction?

from tensorcircuit.

$refraction-ray avatar$ refraction-ray commented on July 23, 2024 1

The locality is very important for MPSCircuit. If a non-local two-qubit gate is applied, it will be firstly transformed into a series of local swap gates + local two-qubit gate, and all these gates will be applied to the MPS sequentially. The reason is that only local two-qubit tensor can be safely applied and truncated to merge into the MPS for TEBD like algorithms.

from tensorcircuit.

Marsmmz commented on July 23, 2024 1

@refraction-ray Thank you for the discussion. It seems that it is no longer a technical problem. The task we face has a fundamental hardness, which is unsolvable with current classical simulation techniques and reasonable computation resources.

from tensorcircuit.

royess commented on July 23, 2024

@refraction-ray I would appreciate your comments or suggestions.

from tensorcircuit.

$refraction-ray avatar$ refraction-ray commented on July 23, 2024

Indeed, jitting a large MPS circuit would require longer times than a plain Circuit, the jitting time varies significantly for different backend (jax vs. tf) or different hardware (cpu vs. gpu). What is the depth of you test circuit, typically MPSCircuit of 8 qubits wouldn't require that longer jitting time. Besides, you may also try the unjit version to see whether the running time is acceptable?

from tensorcircuit.

royess commented on July 23, 2024

Thanks for the quick reply! Currently, I am using jax+cpu.

What is the depth of you test circuit

In total, 188 one- or two-qubit gates. The depth should be around 24, then.

By the way, I also tested a circuit of 320 gates via the snippet you provided in #204 (comment) (but changing the circuit to contain 8 qubits and increase the depth) in the same environment. That takes a jit time of about 120s.

Differences:

My circuit (the slow one) does not obey the 1D locality, i.e., gates are very non-local.
There are function calls to append components to build my circuit.

Besides, you may also try the unjit version to see whether the running time is acceptable?

I tried. But the training seems much slower than plain circuit simulator. It's not quite acceptable for my needs.

from tensorcircuit.

Marsmmz commented on July 23, 2024

However, the compiling time is still prolonged when scaling up the qubit number. It takes about 15 minutes for four qubits and more than 2 hours for 16 qubits (the compilation process is still unfinished). This runs counter to the purpose of using MPSCircuit, which is to optimize the computation resource (both time and memory) when scaling up the qubit number. Is there any way to make this better?

from tensorcircuit.

$refraction-ray avatar$ refraction-ray commented on July 23, 2024

for unjit version, if AD is not required in your workflow, maybe numpy backend is the fastest

from tensorcircuit.

$refraction-ray avatar$ refraction-ray commented on July 23, 2024

Another possible workaround is when your circuit has some time periodicity, then a scan wrapper can greatly reduce the jitting time, see an example for Circuit: https://github.com/tencent-quantum-lab/tensorcircuit/blob/master/examples/hea_scan_jit_acc.py. I believe the example can be transfered to MPSCircuit, with in and out the stacked MPS tensors.

from tensorcircuit.

royess commented on July 23, 2024

Thanks for your advice! But I think we need AD and do not have time periodicity in our circuit.

from tensorcircuit.

royess commented on July 23, 2024

Do you think it is doable to speed up the complication for MPSCircuit? We will be happy to help if you have ideas on how to work on that. (We need this feature in our research. And it seems to have no other workaround.)

Naively, I will suppose an MPSCircuit is not much more complicated than a normal one in its structure. And should it come with a reasonable jit time?

from tensorcircuit.

$refraction-ray avatar$ refraction-ray commented on July 23, 2024

Do you think it is doable to speed up the complication for MPSCircuit?

Firstly, from physics perspective, a TEBD like algorithm applied on non-periodic and very structured circuit often leads to a very large approximation error unless there are some types of theory guarantee, eg. one can show that the intermediate state in the circuit is always area law entangled.

From engineering perspective, accelerating jit time is much harder than accelerating running time, as the former is nearly fixed by the ML framework that we have very less control.

One possible way is to support MPS with grouping qubits as one tensor instead of one qubit for one tensor. In the former case, much fewer QR or SVD is required and the approximation error is more controllable. eg. d qubits as one tensor leg, the mps tensor has dimension ($\chi$, 2^d, $\chi$). Then only two-qubit gates across different qubit groups requires truncation, gates within one group is directly merged to the MPS tensor by matrix multuplication.

from tensorcircuit.

$refraction-ray avatar$ refraction-ray commented on July 23, 2024

And what is the target circuit metric (qubits number, error, circuit depth, gate number etc.) in your case? Also, have you tried tf backend? the jitting time is much shorter

from tensorcircuit.

Marsmmz commented on July 23, 2024

And what is the target circuit metric (qubits number, error, circuit depth, gate number etc.) in your case?

The target circuit contains about 80 qubits, and about 60 layers of all-to-all connected non-local 2-qubit gates, making it roughly 3000 non-local 2-qubit gates. Do you think it is possible to jit such circuits (with or without MPS)?

Also, have you tried tf backend? the jitting time is much shorter

Yes, we have tried tf backend. Unfortunately, tf backend does not have hittable version of QR decomposition, so jitting MPS is impossible.

from tensorcircuit.

$refraction-ray avatar$ refraction-ray commented on July 23, 2024

The target circuit contains about 80 qubits, and about 60 layers of all-to-all connected non-local 2-qubit gates, making it roughly 3000 non-local 2-qubit gates.

To me, the scale for the simulation is very challenging, by translating to local 2-qubit gates, I guess roughly 30000 two-qubit gates are required for 80 qubits system. Even if MPSCircuit can simulate this, the accuracy would be bad in general. Actually, the simulation scale is even beyond the quantum supremacy experiments, I don't see this as an easy task to run by calling API with one GPU.

Unfortunately, tf backend does not have hittable version of QR decomposition

What do you mean by this, can you run the circuit with jitted tf backend? I don't think vmap is relevant for your use case since one circuit is challenging enough to simulate after all? There is no need to "stack" multiple circuits together to simulate

from tensorcircuit.

Marsmmz commented on July 23, 2024

To me, the scale for the simulation is very challenging, by translating to local 2-qubit gates, I guess roughly 30000 two-qubit gates are required for 80 qubits system. Even if MPSCircuit can simulate this, the accuracy would be bad in general. Actually, the simulation scale is even beyond the quantum supremacy experiments, I don't see this as an easy task to run by calling API with one GPU.

I see, thanks a lot.

There is no need to "stack" multiple circuits together to simulate

Yes, I understand this. However, for our task, we need to run many batches. As you have said, if simulating one circuit is challenging enough, then there's no reason to consider stacking them together.

from tensorcircuit.

Improve the slow complication of MPSCircuit about tensorcircuit HOT 16 OPEN

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent