tensortrade.environments.trading_environment module

class tensortrade.environments.trading_environment.TradingEnvironment(portfolio, action_scheme, reward_scheme, feed=None, window_size=1, use_internal=True, **kwargs)[source]

Bases: gym.core.Env, tensortrade.base.core.TimeIndexed

A trading environments made for use with Gym-compatible reinforcement learning algorithms.

__init__(portfolio, action_scheme, reward_scheme, feed=None, window_size=1, use_internal=True, **kwargs)[source]
Parameters:
  • portfolio (Union[Portfolio, str]) – The Portfolio of wallets used to submit and execute orders from.
  • action_scheme (Union[ActionScheme, str]) – The component for transforming an action into an Order at each timestep.
  • reward_scheme (Union[RewardScheme, str]) – The component for determining the reward at each timestep.
  • feed (optional) – The pipeline of features to pass the observations through.
  • kwargs (optional) – Additional arguments for tuning the environments, logging, etc.
action_scheme

The component for transforming an action into an Order at each time step.

Return type:ActionScheme
agent_id = None
broker

The broker used to execute orders within the environment.

Return type:Broker
close()[source]

Utility method to clean environment before closing.

compile()[source]

Sets the observation space and the action space of the environment. Creates the internal feed and sets initialization for different components.

episode_id = None
episode_trades

A dictionary of trades made this episode, organized by order id.

Return type:Dict[str, ForwardRef]
portfolio

The portfolio of instruments currently held on this exchange.

Return type:Portfolio
render(mode='none')[source]

Renders the environment via matplotlib.

reset()[source]

Resets the state of the environments and returns an initial observation.

Return type:array
Returns:The episode’s initial observation.
reward_scheme

The component for determining the reward at each time step.

Return type:RewardScheme
step(action)[source]

Run one timestep within the environments based on the specified action.

Parameters:action (int) – The trade action provided by the agent for this timestep.
Return type:Tuple[array, float, bool, dict]
Returns:observation (pandas.DataFrame) – Provided by the environments’s exchange, often OHLCV or tick trade history data points. reward (float): An size corresponding to the benefit earned by the action taken this timestep. done (bool): If True, the environments is complete and should be restarted. info (dict): Any auxiliary, diagnostic, or debugging information to output.