Abstract
The modeling and control of genetic regulatory networks carries tremendous potential for gaining a deep understanding of biological processes, and for developing effective therapeutic intervention in diseases such as cancer. A dynamical programming control has been proposed for determining an optimal intervention policy to shift the steady-state distribution of the network. The dynamic programming solution is, however, computationally prohibitive for large gene regulatory networks, as its complexity increases exponentially with the number of genes. Since the number of genes considered is directly related to the accuracy of the model, it is imperative to be able to design optimal intervention policies that can be reasonably implemented for large gene regulatory networks. To this endeavor, we will design a neural dynamic programming controller to optimize the same dynamic programming performance measure, while requiring only a polynomial time complexity. The proposed neural dynamic programming structure includes two networks: action and critic. The critic network is trained toward optimizing a total reward to objective, namely to balance the Bellman equation. The action network, constrained by the critic network, generates the optimal control strategy. Both the control strategy and the critic output are updated according to an error function that changes from one step to another. General theory of non-homogeneous Markov chain will be used to find the optimal strategies of non uniform policy method.