Skip to content
This repository was archived by the owner on Oct 31, 2023. It is now read-only.
This repository was archived by the owner on Oct 31, 2023. It is now read-only.

Flipped tensor dimensions in reply when running train_minirts.sh #140

@SimpleConjugate

Description

@SimpleConjugate

After fixing errors in my local version of ELF related to async and device_id (async -> non_blocking and device_id -> device) I am still encountering an error in running the train_minirts.sh script.

The following error was obtained by me by simply following the following the install instructions https://github.com/facebookresearch/ELF/#install-scripts

The Main Error

sh ./train_minirts.sh --gpu 0
Warning: argument ValueMatcher/grad_clip_norm cannot be added. Skipped.
PID: 28041
========== Args ============
Loader: handicap_level=0,players="type=AI_NN,fs=50,args=backup/AI_SIMPLE|start/500|decay/0.99;type=AI_SIMPLE,fs=20",max_tick=30000,shuffle_player=False,num_frames_in_state=1,max_unit_cmd=1,seed=0,actor_only=False,model_no_spatial=False,save_replay_prefix=None,output_file=None,cmd_dumper_prefix=None,gpu=0,use_unit_action=False,disable_time_decay=False,use_prev_units=False,attach_complete_info=False,feature_type="ORIGINAL"
ContextArgs: num_games=1024,batchsize=128,game_multi=None,T=20,eval=False,wait_per_group=False,num_collectors=0,verbose_comm=False,verbose_collector=False,mcts_threads=0,mcts_rollout_per_thread=1,mcts_verbose=False,mcts_save_tree_filename="",mcts_verbose_time=False,mcts_use_prior=False,mcts_pseudo_games=0,mcts_pick_method="most_visited"
MoreLabels: additional_labels="id,last_terminal"
ActorCritic: 
PolicyGradient: entropy_ratio=0.01,grad_clip_norm=None,min_prob=1e-06,ratio_clamp=10,policy_action_nodes="pi,a"
DiscountedReward: discount=0.99
ValueMatcher: grad_clip_norm=None,value_node="V"
Sampler: sample_policy="epsilon-greedy",greedy=False,epsilon=0.0,sample_nodes="pi,a"
ModelLoader: load=None,onload=None,omit_keys=None,arch="ccpccp;-,64,64,64,-"
ModelInterface: opt_method="adam",lr=0.001,adam_eps=0.001
Trainer: freq_update=1
Evaluator: keys_in_reply="V"
Stats: trainer_stats="winrate"
ModelSaver: record_dir="./record",save_prefix="save",save_dir="./",latest_symlink="latest"
SingleProcessRun: num_minibatch=5000,num_episode=10000,tqdm=True
========== End of Args ============
Options:
Map: 20 by 20
Handicap: 0
Max tick: 30000
Max #Unit Cmd: 1
Seed: 0
Shuffled: False
[name=][fs=50][type=AI_NN][FoW=True][#frames_in_state=1][args=backup/AI_SIMPLE|start/500|decay/0.99]
[name=][fs=20][type=AI_SIMPLE][FoW=True][#frames_in_state=1]
Output_prompt_filename: ""
Cmd_dumper_prefix: ""
Save_replay_prefix: ""
ContextOptions:
#Game: 1024
#Max_thread: 0
#Collectors: 0
T: 20
Wait per group: False
Maximal #moves (0 = no constraint): 0
#Threads: 0
#Rollout per thread: 1
Verbose: False, Verbose_time: False
Use prior: False
Persistent tree: False
#Pseudo game: 0
Pick method: most_visited

Use time decay: True
Save prev seen units: False
Attach complete info: False

ORIGINAL
Version:  1f790173095cd910976d9f651b80beb872ec5d12_GIT_UNSTAGED
Num Actions:  9
Num unittype:  6
num planes:  22
#recv_thread = 4
Group 0: 
  Collector[0] Batchsize: 128 Info: [gid=0][T=1][name=""]
  Collector[1] Batchsize: 128 Info: [gid=1][T=1][name=""]
  Collector[2] Batchsize: 128 Info: [gid=2][T=1][name=""]
  Collector[3] Batchsize: 128 Info: [gid=3][T=1][name=""]
Group 1: 
  Collector[4] Batchsize: 128 Info: [gid=4][T=20][name=""]
  Collector[5] Batchsize: 128 Info: [gid=5][T=20][name=""]
  Collector[6] Batchsize: 128 Info: [gid=6][T=20][name=""]
  Collector[7] Batchsize: 128 Info: [gid=7][T=20][name=""]

  0%|                    | 0/5000 [00:00<?, ?it/s]./rts/game_MC/model.py:63: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  policy = self.softmax(self.linear_policy(h))
  0%|                    | 0/5000 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 34, in <module>
    runner.run()
  File "/workspace/gamebreaker/build/ELF/rlpytorch/runner/single_process.py", line 56, in run
    self.GC.Run()
  File "/workspace/gamebreaker/build/ELF/elf/utils_elf.py", line 378, in Run
    res = self._call(self.infos)
  File "/workspace/gamebreaker/build/ELF/elf/utils_elf.py", line 364, in _call
    sel_reply.copy_from(reply, batch_key=batch_key)
  File "/workspace/gamebreaker/build/ELF/elf/utils_elf.py", line 155, in copy_from
    bk[:] = v
RuntimeError: The expanded size of the tensor (1) must match the existing size (128) at non-singleton dimension 0.  Target sizes: [1, 128].  Tensor sizes: [128, 1]
Prepare to stop ...
^C

Configuration

Here is the description of my conda environment:

conda list
# packages in environment at $HOME/miniconda3/envs/elf:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_pytorch_select           0.2                       gpu_0  
blas                      1.0                         mkl  
ca-certificates           2020.1.1                      0  
certifi                   2020.4.5.1               py38_0  
cffi                      1.14.0           py38he30daa8_1  
cudatoolkit               10.1.243             h6bb024c_0  
cudnn                     7.6.5                cuda10.1_0  
intel-openmp              2020.1                      217  
ld_impl_linux-64          2.33.1               h53a641e_7  
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.3                  he6710b0_1  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
mkl                       2020.1                      217  
mkl-service               2.3.0            py38he904b0f_0  
mkl_fft                   1.0.15           py38ha843d7b_0  
mkl_random                1.1.0            py38h962f231_0  
msgpack                   1.0.0                    pypi_0    pypi
msgpack-numpy             0.4.5                    pypi_0    pypi
ncurses                   6.2                  he6710b0_1  
ninja                     1.9.0            py38hfd86e86_0  
numpy                     1.18.1           py38h4f9e942_0  
numpy-base                1.18.1           py38hde5b4d6_1  
openssl                   1.1.1g               h7b6447c_0  
pip                       20.1                     pypi_0    pypi
pycparser                 2.20                       py_0  
python                    3.8.2               hcff3b4d_14  
pytorch                   1.4.0           cuda101py38h02f0884_0  
readline                  8.0                  h7b6447c_0  
setuptools                46.2.0                   py38_0  
six                       1.14.0                   py38_0  
sqlite                    3.31.1               h62c20be_1  
tk                        8.6.8                hbc83047_0  
tqdm                      4.46.0                     py_0  
wheel                     0.34.2                   py38_0  
xz                        5.2.5                h7b6447c_0  
zlib                      1.2.11               h7b6447c_3

My CUDA information is as follows:

| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions