tensortrade.environments.trading_environment module¶
-
class
tensortrade.environments.trading_environment.
TradingEnvironment
(portfolio, action_scheme, reward_scheme, feed=None, window_size=1, use_internal=True, **kwargs)[source]¶ Bases:
gym.core.Env
,tensortrade.base.core.TimeIndexed
A trading environments made for use with Gym-compatible reinforcement learning algorithms.
-
__init__
(portfolio, action_scheme, reward_scheme, feed=None, window_size=1, use_internal=True, **kwargs)[source]¶ Parameters: - portfolio (
Union
[Portfolio
,str
]) – The Portfolio of wallets used to submit and execute orders from. - action_scheme (
Union
[ActionScheme
,str
]) – The component for transforming an action into an Order at each timestep. - reward_scheme (
Union
[RewardScheme
,str
]) – The component for determining the reward at each timestep. - feed (optional) – The pipeline of features to pass the observations through.
- kwargs (optional) – Additional arguments for tuning the environments, logging, etc.
- portfolio (
-
action_scheme
¶ The component for transforming an action into an Order at each time step.
Return type: ActionScheme
-
agent_id
= None¶
-
compile
()[source]¶ Sets the observation space and the action space of the environment. Creates the internal feed and sets initialization for different components.
-
episode_id
= None¶
-
episode_trades
¶ A dictionary of trades made this episode, organized by order id.
Return type: Dict
[str
,ForwardRef
]
-
reset
()[source]¶ Resets the state of the environments and returns an initial observation.
Return type: array
Returns: The episode’s initial observation.
-
reward_scheme
¶ The component for determining the reward at each time step.
Return type: RewardScheme
-
step
(action)[source]¶ Run one timestep within the environments based on the specified action.
Parameters: action ( int
) – The trade action provided by the agent for this timestep.Return type: Tuple
[array
,float
,bool
,dict
]Returns: observation (pandas.DataFrame) – Provided by the environments’s exchange, often OHLCV or tick trade history data points. reward (float): An size corresponding to the benefit earned by the action taken this timestep. done (bool): If True, the environments is complete and should be restarted. info (dict): Any auxiliary, diagnostic, or debugging information to output.
-