A Lightweight and Efficient DRL Implementation Using PyTorch

Share Your Love

In this post, today I am going to explain another GitHub repository ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.

Currently, model-free deep reinforcement learning (DRL) algorithms:

  • DDPG, TD3, SAC, A2C, PPO, PPO(GAE) for continuous actions
  • DQN, DoubleDQN, D3QN for discrete actions

For DRL algorithms, please check out the educational webpage OpenAI Spinning Up.

File Structure:


An agent in agent.py uses networks in net.py and is trained in run.py by interacting with an environment in env.py.

—–kernel file—-

  • elegantrl/net.py # Neural networks.
    • Q-Net,
    • Actor Network,
    • Critic Network,
  • elegantrl/agent.py # RL algorithms.
    • AgentBase
  • elegantrl/run.py # run DEMO 1 ~ 4
    • Parameter initialization,
    • Training loop,
    • Evaluator.

—–utils file—-

  • elegantrl/envs/ # gym env or custom env, including FinanceStockEnv.
    • gym_utils.py: A PreprocessEnv class for gym-environment modification.
    • Stock_Trading_Env: A self-created stock trading environment as an example for user customization.
  • eRL_demo_BipedalWalker.ipynb # BipedalWalker-v2 in jupyter notebooks
  • eRL_demos.ipynb # Demo 1~4 in jupyter notebooks. Tell you how to use tutorial version and advanced version.
  • eRL_demo_SingleFilePPO.py # Use single file to train PPO, more simple than tutorial version
  • eRL_demo_StockTrading.py # Stock Trading Application in jupyter notebooks

As a high-level overview, the relations among the files are as follows. Initialize an environment in Env.py and an agent in Agent.py. The agent is constructed with Actor and Critic networks in Net.py. In each training step in Run.py, the agent interacts with the environment, generating transitions that are stored into a Replay Buffer. Then, the agent fetches transitions from the Replay Buffer to train its networks. After each update, an evaluator evaluates the agent’s performance and saves the agent if the performance is good.


  • hyper-parameters args.
  • env = PreprocessEnv() : creates an environment (in the OpenAI gym format).
  • agent = agent.XXX() : creates an agent for a DRL algorithm.
  • evaluator = Evaluator() : evaluates and stores the trained model.
  • buffer = ReplayBuffer() : stores the transitions.

Then, the training process is controlled by a while-loop:

  • agent.explore_env(…): the agent explores the environment within target steps, generates transitions, and stores them into the ReplayBuffer.
  • agent.update_net(…): the agent uses a batch from the ReplayBuffer to update the network parameters.
  • evaluator.evaluate_save(…): evaluates the agent’s performance and keeps the trained model with the highest score.

The while-loop will terminate when the conditions are met, e.g., achieving a target score, maximum steps, or manually breaks.

Read More

Share Your Love
Avatar photo
Lingaraj Senapati

Hey There! I am Lingaraj Senapati, the Founder of lingarajtechhub.com My skills are Freelance, Web Developer & Designer, Corporate Trainer, Digital Marketer & Youtuber.

Articles: 411

Newsletter Updates

Enter your email address below to subscribe to our newsletter