You are viewing the site in preview mode

Skip to main content

Table 3 Summary of datasets

From: A unified approach to inferring chemical compounds with the desired aqueous solubility

\(\mathcal {C}\)

\( |\mathcal {C}| \)

\( \underline{y},~\overline{y} \)

|D|

Protac

21

\(-6.64,\,-3.18\)

83

Wassvik

26

\(-8.49,\,-2.48\)

94

Alex Manfred

72

\(-0.833,\,0.65\)

163

Goodman

87

\(-6.74,\,-1.06\)

130

D5

91

\(-5.88,\,0.58\)

118

Duffy

98

\(-10.32,\,-2.48\)

139

Boobier

99

\(-8.8,\,1.7\)

133

Dearden

118

\(-6.24,\,-0.57\)

142

Ran

129

\(-10.8,\,2.06\)

157

Llinas

132

\(-8.75,\,-1.18\)

167

Bergstrom

163

\(-7.59,\,0.55\)

154

Grigorev

362

\(-7.85,\,0.38\)

173

Jain

456

\(-12.95,\,1.58\)

223

Lovric

805

\(-8.75,\,1.149\)

323

Huuskonen

827

\( -11.62,\,1.58\)

310

David

826

\( -10.41,\,1.58\)

263

Water set wide

845

\(-12.79,\,1.58\)

320

Daniel

915

\(-10.43,\,6.4\)

372

Esol

1054

\(-11.6,\,1.58\)

338

Aqua

1238

\(-11.62,\,1.58\)

364

Tang

1221

\(-1162,\,1.58\)

364

Wang

1414

\(-9.33,\,1.58\)

405

Phys

1812

\(-12.06,\,1.58\)

469

Training set

5315

\(-13.17,\,2.89\)

675

Ochem

6006

\(-12.1,\,1.58\)

668

Cui

6678

\(-18.21,\,1.7\)

766

Aqsol

8230

\(-13.17,\,2.13\)

965

Charles N. Lowe

*9150

\(-13.17,\,2.41\)

835

Ademola

10343

\(-13.17,\,2.14\)

949

  1. \(\mathcal {C}\): the dataset; \(|\mathcal {C}|\): the size of \(\mathcal {C}\) after the preprocessing; *the preprocessing is performed on 10000 randomly selected chemical compounds; \( \underline{y},~\overline{y} \): the lower and upper bounds of AS in each dataset; and |D|: the total number of descriptors