tensortrade.environments.trading_environment module¶

class tensortrade.environments.trading_environment.TradingEnvironment(portfolio, action_scheme, reward_scheme, feed=None, window_size=1, use_internal=True, **kwargs)[source]¶

Bases: gym.core.Env, tensortrade.base.core.TimeIndexed

A trading environments made for use with Gym-compatible reinforcement learning algorithms.

__init__(portfolio, action_scheme, reward_scheme, feed=None, window_size=1, use_internal=True, **kwargs)[source]¶

Parameters:

portfolio (Union[Portfolio, str]) – The Portfolio of wallets used to submit and execute orders from.
action_scheme (Union[ActionScheme, str]) – The component for transforming an action into an Order at each timestep.
reward_scheme (Union[RewardScheme, str]) – The component for determining the reward at each timestep.
feed (optional) – The pipeline of features to pass the observations through.
kwargs (optional) – Additional arguments for tuning the environments, logging, etc.

action_scheme¶

The component for transforming an action into an Order at each time step.

Return type:	`ActionScheme`

agent_id = None¶

broker¶

The broker used to execute orders within the environment.

Return type:	`Broker`

close()[source]¶: Utility method to clean environment before closing.

compile()[source]¶: Sets the observation space and the action space of the environment. Creates the internal feed and sets initialization for different components.

episode_id = None¶

episode_trades¶

A dictionary of trades made this episode, organized by order id.

Return type:	`Dict`[`str`, `ForwardRef`]

portfolio¶

The portfolio of instruments currently held on this exchange.

Return type:	`Portfolio`

render(mode='none')[source]¶: Renders the environment via matplotlib.

reset()[source]¶

Resets the state of the environments and returns an initial observation.

Return type:	`array`
Returns:	The episode’s initial observation.

reward_scheme¶

The component for determining the reward at each time step.

Return type:	`RewardScheme`

step(action)[source]¶

Run one timestep within the environments based on the specified action.

Parameters:	action (`int`) – The trade action provided by the agent for this timestep.
Return type:	`Tuple`[`array`, `float`, `bool`, `dict`]
Returns:	observation (pandas.DataFrame) – Provided by the environments’s exchange, often OHLCV or tick trade history data points. reward (float): An size corresponding to the benefit earned by the action taken this timestep. done (bool): If True, the environments is complete and should be restarted. info (dict): Any auxiliary, diagnostic, or debugging information to output.