Trading Environment¶
A trading environment is a reinforcement learning environment that follows OpenAI’s gym.Env
specification. This allows us to leverage many of the existing reinforcement learning models in our trading agent, if we’d like.
Trading environments are fully configurable gym environments with highly composable Exchange
, FeaturePipeline
, ActionScheme
, and RewardScheme
components.
- The
Exchange
provides observations to the environment and executes the agent’s trades. - The
FeaturePipeline
optionally transforms the exchange output into a more meaningful set of features before it is passed to the agent. - The
ActionScheme
converts the agent’s actions into executable trades. - The
RewardScheme
calculates the reward for each time step based on the agent’s performance.
That’s all there is to it, now it’s just a matter of composing each of these components into a complete environment.
When the reset method of a TradingEnvironment
is called, all of the child components will also be reset. The internal state of each exchange, feature pipeline, transformer, action scheme, and reward scheme will be set back to their default values, ready for the next episode.
Let’s begin with an example environment. As mentioned before, initializing a TradingEnvironment
requires an exchange, an action scheme, and a reward scheme, the feature pipeline is optional.
from tensortrade.environments import TradingEnvironment
environment = TradingEnvironment(exchange=exchange,
action_scheme=action_scheme,
reward_scheme=reward_scheme,
feature_pipeline=feature_pipeline)