What `R` you? (R matrixes and R arrays in python)
Recap
Previously in this series, we discovered the equivalent python
data structures for the following R
data structures:
In this post, we will look at translating R
arrays (and matrixes) into python
.
1D R
array
A 1D R
array prints like a vector.
library(tidyverse)
library(reticulate)
py_run_string("import numpy as np")
py_run_string("import pandas as pd")
(OneD<-array(1:6))
## [1] 1 2 3 4 5 6
But it is not truly a vector
OneD %>% is.vector()
## [1] FALSE
It is more specifically an atomic.
OneD %>% is.atomic()
## [1] TRUE
An atomic is sometimes termed as an atomic vector, which adds more to the confusion. ?is.atomic
explains that “It is common to call the atomic types ‘atomic vectors’, but note that is.vector imposes further restrictions: an object can be atomic but not a vector (in that sense)”. Thus, OneD can be an atomic type but not a vector structure.
1D R
array is a python
…
No tricks here. A R
array is translated into a python
array. Thus, a 1D R
array is translated into a 1D python
array. The name of the python
array is known as ndarray
and is governed by the python
packaged called numpy
.
r.OneD
## array([1, 2, 3, 4, 5, 6])
type(r.OneD)
## <class 'numpy.ndarray'>
r.OneD.ndim
## 1
All python
code for this post will be run within the {python}
code chunk to explicitly print out the display for python
array (i.e. array([ ])
)
1D python
array is a R
…
1 dimension python
arrays are commonly used in data science for python
.
p_one= np.arange(6)
p_one
## array([0, 1, 2, 3, 4, 5])
The 1D python
array is translated into a 1D R
array.
py$p_one %>% class()
## [1] "array"
The translated array is an atomic type.
py$p_one %>% is.atomic()
## [1] TRUE
An the translated array is not a vector which is expected of a 1D R
array.
py$p_one %>% is.vector()
## [1] FALSE
2D R
array
A 2D R
array is also known as a matrix.
(TwoD<-array(1:6, dim=c(2,3)))
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
TwoD %>% class()
## [1] "matrix"
2D R
array is a python
…
A 2D python
array. python
does not name have a special name for their 2D array.
r.TwoD
## array([[1, 3, 5],
## [2, 4, 6]])
type(r.TwoD)
## <class 'numpy.ndarray'>
r.TwoD.ndim
## 2
2D python
array
Besides from 1D python
array, 2D python
array are also common in data science with python
.
p_two=np.random.randint(6, size=(2,3))
p_two
## array([[3, 1, 3],
## [3, 5, 5]])
A 2D python
array is translated into a 2D R
array/ matrix.
py$p_two %>% class()
## [1] "matrix"
Reshaping 1D python
array into 2D array
Sometimes a python
function requires a 2 dimension array and your input variable is a 1 dimension array. Thus, you will need to reshape your 1 dimension array into a 2 dimension array with numpy
’s reshape
function. Let us convert our 1 dimension array into a 2 dimension array which has 2 rows and 3 columns.
np.reshape(p_one, (2,3))
## array([[0, 1, 2],
## [3, 4, 5]])
Let’s convert it into a 2D array which has 6 rows and 1 column.
np.reshape(p_one, (6,1))
## array([[0],
## [1],
## [2],
## [3],
## [4],
## [5]])
The rows for the above is the same as the length of the 1D array. Thus, if you replace the 6
with the length of the 1D array, you will achieve the same result.
np.reshape(p_one, (len(p_one),1))
## array([[0],
## [1],
## [2],
## [3],
## [4],
## [5]])
Alternatively, you can also replace it with -1
if the input is a 1D array. -1
means that it is unspecified and that it will “inferred from the length of the array”.
np.reshape(p_one, (-1,1))
## array([[0],
## [1],
## [2],
## [3],
## [4],
## [5]])
#Difference between R
and python
array
One of the differences is the printing of values in the array.
R
are column-major arrays. The tables are filled column-wise. In other words, the left most column is filled from the top to the bottom before moving to neighbouring right column. This neighbouring column is filled up in a top-down fashion.
TwoD
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
The integrity of this column-major display is maintained when it is translated into python
.
r.TwoD
## array([[1, 3, 5],
## [2, 4, 6]])
You would have noticed that python
prints its array without the row (eg.[1,]) and column names (e.g. [,1]).
While python
is able to use column-major ordered arrays, but it defaults to row-major ordering when arrays are created in python
. In other words, values are filled from the first row in a left-to-right fashion before moving to the next row.
np.reshape(p_one, (2,3))
## array([[0, 1, 2],
## [3, 4, 5]])
You may refer to the reticulate
package page for more detail explanations and the implications of such differences.
python
series
Besides lists, 1D arrays, 2D arrays, there are other python
data structures which are commonly used in data science with python
. They are series and data frames which are governed by the pandas
library. We will look at series in this post and data frames will be covered in a separate post. Series is a 1D array with axis labels.
PD=pd.Series(['banana',2])
PD
## 0 banana
## 1 2
## dtype: object
As series is a 1D array, when translated to R
it will be classified as a R
array.
py$PD %>% class()
## [1] "array"
However, the translated series appears as a R
named list. The index of the series appear as the names in the R
list.
py$PD
## $`0`
## [1] "banana"
##
## $`1`
## [1] 2
What did you know? A translated series is both a R
array and R
list
py$PD %>% is.array()
## [1] TRUE
py$PD %>% is.list()
## [1] TRUE